⮜ Blog

⮜ List of tags

Showing all posts tagged meta-

📝 Posted:
🚚 Summary of:
P0205, P0206
3259190...327730f, 327730f...454c105
💰 Funded by:
[Anonymous], Yanga
🏷 Tags:
rec98+ th01+ gameplay+ boss+ mima-th01+ danmaku-pattern+ rng+ performance+ palette+ glitch+ jank+ konngara+ meta-

Oh look, it's another rather short and straightforward boss with a rather small number of bugs and quirks. Yup, contrary to the character's popularity, Mima's premiere is really not all that special in terms of code, and continues the trend established with 📝 Kikuri and 📝 SinGyoku. I've already covered 📝 the initial sprite-related bugs last November, so this post focuses on the main code of the fight itself. The overview:

And there aren't even any weird hitboxes this time. What is maybe special about Mima, however, is how there's something to cover about all of her patterns. Since this is TH01, it's won't surprise anyone that the rotating square patterns are one giant copy-pasta of unblitting, updating, and rendering code. At least ZUN placed the core polar→Cartesian transformation in a separate function for creating regular polygons with an arbitrary number of sides, which might hint toward some more varied shapes having been planned at one point?
5 of the 6 patterns even follow the exact same steps during square update frames:

  1. Calculate square corner coordinates
  2. Unblit the square
  3. Update the square angle and radius
  4. Use the square corner coordinates for spawning pellets or missiles
  5. Recalculate square corner coordinates
  6. Render the square

Notice something? Bullets are spawned before the corner coordinates are updated. That's why their initial positions seem to be a bit off – they are spawned exactly in the corners of the square, it's just that it's the square from 8 frames ago. :tannedcirno:

Mima's first pattern on Normal difficulty.

Once ZUN reached the final laser pattern though, he must have noticed that there's something wrong there… or maybe he just wanted to fire those lasers independently from the square unblit/update/render timer for a change. Spending an additional 16 bytes of the data segment for conveniently remembering the square corner coordinates across frames was definitely a decent investment.

Mima's laser pattern on Lunatic difficulty, now with correct laser spawn positions. If this pattern reminds you of the game crashing immediately when defeating Mima, 📝 check out the Elis blog post for the details behind this bug, and grab the bugfix patch from there.

When Mima isn't shooting bullets from the corners of a square or hopping across the playfield, she's raising flame pillars from the bottom of the playfield within very specifically calculated random ranges… which are then rendered at byte-aligned VRAM positions, while collision detection still uses their actual pixel position. Since I don't want to sound like a broken record all too much, I'll just direct you to 📝 Kikuri, where we've seen the exact same issue with the teardrop ripple sprites. The conclusions are identical as well.

Mima's flame pillar pattern. This video was recorded on a particularly unlucky seed that resulted in great disparities between a pillar's internal X coordinate and its byte-aligned on-screen appearance, leading to lots of right-shifted hitboxes.
Also note how the change from the meteor animation to the three-arm 🚫 casting sprite doesn't unblit the meteor, and leaves that job to any sprite that happens to fly over those pixels.

However, I'd say that the saddest part about this pattern is how choppy it is, with the circle/pillar entities updating and rendering at a meager 7 FPS. Why go that low on purpose when you can just make the game render ✨ smoothly ✨ instead?

So smooth it's almost uncanny.

The reason quickly becomes obvious: With TH01's lack of optimization, going for the full 56.4 FPS would have significantly slowed down the game on its intended 33 MHz CPUs, requiring more than cheap surface-level ASM optimization for a stable frame rate. That might very well have been ZUN's reason for only ever rendering one circle per frame to VRAM, and designing the pattern with these time offsets in mind. It's always been typical for PC-98 developers to target the lowest-spec models that could possibly still run a game, and implementing dynamic frame rates into such an engine-less game is nothing I would wish on anybody. And it's not like TH01 is particularly unique in its choppiness anyway; low frame rates are actually a rather typical part of the PC-98 game aesthetic.

The final piece of weirdness in this fight can be found in phase 1's hop pattern, and specifically its palette manipulation. Just from looking at the pattern code itself, each of the 4 hops is supposed to darken the hardware palette by subtracting #444 from every color. At the last hop, every color should have therefore been reduced to a pitch-black #000, leaving the player completely blind to the movement of the chasing pellets for 30 frames and making the pattern quite ghostly indeed. However, that's not what we see in the actual game:

Looking at the frame counter, it appears that something outside the pattern resets the palette every 40 frames. The only known constant with a value of 40 would be the invincibility frames after hitting a boss with the Orb, but we're not hitting Mima here… :thonk:
But as it turns out, that's exactly where the palette reset comes from: The hop animation darkens the hardware palette directly, while the 📝 infamous 12-parameter boss collision handler function unconditionally resets the hardware palette to the "default boss palette" every 40 frames, regardless of whether the boss was hit or not. I'd classify this as a bug: That function has no business doing periodic hardware palette resets outside the invincibility flash effect, and it completely defies common sense that it does.

That explains one unexpected palette change, but could this function possibly also explain the other infamous one, namely, the temporary green discoloration in the Konngara fight? That glitch comes down to how the game actually uses two global "default" palettes: a default boss palette for undoing the invincibility flash effect, and a default stage palette for returning the colors back to normal at the end of the bomb animation or when leaving the Pause menu. And sure enough, the stage palette is the one with the green color, while the boss palette contains the intended colors used throughout the fight. Sending the latter palette to the graphics chip every 40 frames is what corrects the discoloration, which would otherwise be permanent.

The green color comes from BOSS7_D1.GRP, the scroll background, and that's what turns this into a clear bug: The stage palette is only set a single time in the entire fight, at the beginning of the entrance animation, to the palette of this image. Apart from consistency reasons, it doesn't even make sense to set the stage palette there, as you can't enter the Pause menu or bomb during a blocking animation function.
And just 3 lines of code later, ZUN loads BOSS8_A1.GRP, the main background image of the fight. Moving the stage palette assignment there would have easily prevented the discoloration.

But yeah, as you can tell, palette manipulation is complete jank in this game. Why differentiate between a stage and a boss palette to begin with? The blocking Pause menu function could have easily copied the original palette to a local variable before darkening it, and then restored it after closing the menu. It's not so easy for bombs as the intended palette could change between the start and end of the animation, but the code could have still been simplified a lot if there was just one global "default palette" variable instead of two. Heck, even the other bosses who manipulate their palettes correctly only do so because they manually synchronize the two after every change. The proper defense against bugs that result from wild mutation of global state is to get rid of global state, and not to put up safety nets hidden in the middle of existing effect code.

The easiest way of reproducing the green discoloration bug in the TH01 Konngara fight, timed to show the maximum amount of time the discoloration can possibly last.

In any case, that's Mima done! 7th PC-98 Touhou boss fully decompiled, 24 bosses remaining, and 59 functions left in all of TH01.

In other thrilling news, my call for secondary funding priorities in new TH01 contributions has given us three different priorities so far. This raises an interesting question though: Which of these contributions should I now put towards TH01 immediately, and which ones should I leave in the backlog for the time being? Since I've never liked deciding on priorities, let's turn this into a popularity contest instead: The contributions with the least popular secondary priorities will go towards TH01 first, giving the most popular priorities a higher chance to still be left over after TH01 is done. As of this delivery, we'd have the following popularity order:

  1. TH05 (1.67 pushes), from T0182
  2. Seihou (1 push), from T0184
  3. TH03 (0.67 pushes), from T0146

Which means that T0146 will be consumed for TH01 next, followed by T0184 and then T0182. I only assign transactions immediately before a delivery though, so you all still have the chance to change up these priorities before the next one.

Next up: The final boss of TH01 decompilation, YuugenMagan… if the current or newly incoming TH01 funds happen to be enough to cover the entire fight. If they don't turn out to be, I will have to pass the time with some Seihou work instead, missing the TH01 anniversary deadline as a result. Edit (2022-07-18): Thanks to Yanga for securing the funding for YuugenMagan after all! That fight will feature slightly more than half of all remaining code in TH01's REIIDEN.EXE and the single biggest function in all of PC-98 Touhou, let's go!

📝 Posted:
🚚 Summary of:
P0203, P0204
4568bf7...86cdf5f, 86cdf5f...0c682b5
💰 Funded by:
GhostRiderCog, [Anonymous], Yanga
🏷 Tags:
rec98+ th01+ meta- gameplay+ card-flipping+ player+ shot+ mod+ rng+ bomb+ waste+

Let's start right with the milestones:

So, how did this card-flipping stage obstacle delivery get so horribly delayed? With all the different layouts showcased in the 28 card-flipping stages, you'd expect this to be among the more stable and bug-free parts of the codebase. Heck, with all stage objects being placed on a 32×32-pixel grid, this is the first TH01-related blog post this year that doesn't have to describe an alignment-related unblitting glitch!

That alone doesn't mean that this code is free from quirky behavior though, and we have to look no further than the first few lines of the collision handling for round bumpers to already find a whole lot of that. Simplified, they do the following:

pixel_t delta_y_between_orb_and_bumper = (orb.top - bumper.top);
if(delta_y_between_orb_and_bumper <= 0) {
	orb.top = (bumper.top - 24);
} else {
	orb.top = (bumper.top + 24);

Immediately, you wonder why these assignments only exist for the Y coordinate. Sure, hitting a bumper from the left or right side should happen less often, but it's definitely possible. Is it really a good idea to warp the Orb to the top or bottom edge of a bumper regardless?
What's more important though: The fact that these immediate assignments exist at all. The game's regular Orb physics work by producing a Y velocity from the single force acting on the Orb and a gravity factor, and are completely independent of its current Y position. A bumper collision does also apply a new force onto the Orb further down in the code, but these assignments still bypass the physics system and are bound to have some knock-on effect on the Orb's movement.

To observe that effect, we just have to enter Stage 18 on the 地獄/Jigoku route, where it's particularly trivial to reproduce. At a 📝 horizontal velocity of ±4, these assignments are exactly what can cause the Orb to endlessly bounce between two bumpers. As rudimentary as the Orb's physics may be, just letting them do their work would have entirely prevented these loops:

The blue areas indicate the pixel-perfect* hitboxes of each bumper.

Now, you might be thinking that these Y assignments were just an attempt to prevent the Orb from colliding with the same bumper again on the next frame. After all, those 24 pixels exactly correspond to ⅓ of the height of a bumper's hitbox with an additional pixel added on top. However, the game already perfectly prevents repeated collisions by turning off collision testing with the same bumper for the next 7 frames after a collision. Thus, we can conclude that ZUN either explicitly coded bumper collision handling to facilitate these loops, or just didn't take out that code after inevitably discovering what it did. This is not janky code, it's not a glitch, it's not sarcasm from my end, and it's not the game's physics being bad.

But wait. Couldn't these assignments just be a remnant from a time in development before ZUN decided on the 7-frame delay on further collisions? Well, even that explanation stops holding water after the next few lines of code. Simplified, again:

pixel_t delta_x_between_orb_and_bumper = (orb.left - bumper.left);
if((orb.velocity.x == +4) && (delta_x_between_orb_and_bumper < 0)) {
	orb.velocity.x = -4;
} else if((orb.velocity.x == -4) && (delta_x_between_orb_and_bumper > 0)) {
	orb.velocity.x = +4;

What's important here is the part that's not in the code – namely, anything that handles X velocities of -8 or +8. In those cases, the Orb simply continues in the same horizontal direction. The manual Y assignment is the only part of the code that actually prevents a collision there, as the newly applied force is not guaranteed to be enough:

Forgetting to handle ⅖ of your discrete X velocity cases is simply not something you do by accident. So we might as well say that ZUN deliberately designed the game to behave exactly as it does in this regard.

Bumpers also come in vertical or horizontal bar shapes. Their collision handling also turns off further collision testing for the next 7 frames, and doesn't do any manual coordinate assignment. That's definitely a step up in cleanliness from round bumpers, but it doesn't seem to keep in mind that the player can fire a new shot every 4 frames when standing still. That makes it immediately obvious why this works:

The green numbers show the amount of frames since the last detected collision with the respective bumper bar, and indicate that collision testing with the bar below is currently disabled.

That's the most well-known case of reducing the Orb's horizontal velocity to 0 by exactly hitting it with shots in its center and then button-mashing it through a horizontal bar. This also works with vertical bars and yields even more interesting results there, but if we want to have any chance of understanding what happens there, we have to first go over some basics:

However, if that were everything the game did, kicking the Orb into a column of vertical bumper bars would lead them to behave more like a rope that the Orb can climb, as the initial collision with two hitboxes cancels out the intended sign change that reflects the Orb away from the bars:

This footage was recorded without the workaround I am about to describe. It does not reflect the behavior of the original game. You cannot do this in the original game.
While the visualization reveals small sections where three hitboxes overlap, the Orb can never actually collide with three of them at the same time, as those 3-hitbox regions are 2 pixels smaller than they would need to be to fit the Orb. That's exactly the difference between using < rather than <= in these hitbox comparisons.

While that would have been a fun gameplay mechanic on its own, it immediately breaks apart once you place two vertical bumper bars next to each other. Due to how these bumper bar hitboxes extend past their sprites, any two adjacent vertical bars will end up with the exact same hitbox in absolute screen coordinates. Stage 17 on the 魔界 /Makai route contains exactly such a layout:

The collision handlers of adjacent vertical bars always activate in the same frame, independently invert the Orb's X velocity, and therefore fully cancel out their intended effect on the Orb… if the game did not have the workaround I am about to describe. This cannot happen in the original game.

ZUN's workaround: Setting a "vertical bumper bar block flag" after any collision with such a bar, which simply disables any collision with any vertical bar for the next 7 frames. This quick hack made all vertical bars work as intended, and avoided the need for involving the Orb's X velocity in any kind of physics system. :zunpet:

Edit (2022-07-12): This flag only works around glitches that would be caused by simultaneously colliding with more than one vertical bar. The actual response to a bumper bar collision still remains unaffected, and is very naive:

These conditions are only correct if the Orb comes in at an angle roughly between 45° and 135° on either side of a bar. If it's anywhere close to 0° or 180°, this response will be incorrect, and send the Orb straight through the bar. Since the large hitboxes make this easily possible, you can still get the Orb to climb a vertical column, or glide along a horizontal row:

Here's the hitbox overlay for 地獄 /Jigoku Stage 19, and here's an updated version of the 📝 Orb physics debug mod that now also shows bumper bar collision frame numbers: 2022-07-10-TH01OrbPhysicsDebug.zip See the th01_orb_debug branch for the code. To use it, simply replace REIIDEN.EXE, and run the game in debug mode, via game d on the DOS prompt. If you encounter a gameplay situation that doesn't seem to be covered by this blog post, you can now verify it for yourself. Thanks to touhou-memories for bringing these issues to my attention! That definitely was a glaring omission from the initial version of this blog post.

With that clarified, we can now try mashing the Orb into these two vertical bars:

At first, that workaround doesn't seem to make a difference here. As we expect, the frame numbers now tell us that only one of the two bumper bars in a row activates, but we couldn't have told otherwise as the number of bars has no effect on newly applied Y velocity forces. On a closer look, the Orb's rise to the top of the playfield is in fact caused by that workaround though, combined with the unchanged top-to-bottom order of collision testing. As soon as any bumper bar completed its 7 collision delay frames, it resets the aforementioned flag, which already reactivates collision handling for any remaining vertical bumper bars during the same frame. Look out for frames with both a 7 and a 1: The 7 will always appear before the 1 in the row-major order. Whenever this happens, the current oscillation period is cut down from 7 to 6 frames – and because collision testing runs from top to bottom, this will always happen during the falling part. Depending on the Y velocity, the rising part may also be cut down to 6 frames from time to time, but that one at least has a chance to last for the full 7 frames. This difference adds those crucial extra frames of upward movement, which add up to send the Orb to the top. Without the flag, you'd always see the Orb oscillating between a fixed range of the bar column.
Finally, it's the "top of playfield" force that gradually slows down the Orb and makes sure it ultimately only moves at sub-pixel velocities, which have no visible effect. Because 📝 the regular effect of gravity is reset with each newly applied force, it's completely negated during most of the climb. This even holds true once the Orb reached the top: Since the Orb requires a negative force to repeatedly arrive up there and be bounced back, this force will stay active for the first 5 of the 7 collision frames and not move the Orb at all. Once gravity kicks in at the 5th frame and adds 1 to the Y velocity, it's already too late: The new velocity can't be larger than 0.5, and the Orb only has 1 or 2 frames before the flag reset causes it to be bounced back up to the top again.

Portals, on the other hand, turn out to be much simpler than the old description that ended up on Touhou Wiki in October 2005 might suggest. Everything about their teleportations is random: The destination portal, the exit force (as an integer between -9 and +9), as well as the exit X velocity, with each of the 📝 5 distinct horizontal velocities having an equal chance of being chosen. Of course, if the destination portal is next to the left or right edge of the playfield and it chooses to fire the Orb towards that edge, it immediately bounces off into the opposite direction, whereas the 0 velocity is always selected with a constant 20% probability.

The selection process for the destination portal involves a bit more than a single rand() call. The game bundles all obstacles in a single structure of dynamically allocated arrays, and only knows how many obstacles there are in total, not per type. Now, that alone wouldn't have much of an impact on random portal selection, as you could simply roll a random obstacle ID and try again if it's not a portal. But just to be extra cute, ZUN instead iterates over all obstacles, selects any non-entered portal with a chance of ¼, and just gives up if that dice roll wasn't successful after 16 loops over the whole array, defaulting to the entered portal in that case.
In all its silliness though, this works perfectly fine, and results in a chance of 0.7516(𝑛 - 1) for the Orb exiting out of the same portal it entered, with 𝑛 being the total number of portals in a stage. That's 1% for two portals, and 0.01% for three. Pretty decent for a random result you don't want to happen, but that hurts nobody if it does.

The one tiny ZUN bug with portals is technically not even part of the newly decompiled code here. If Reimu gets hit while the Orb is being sent through a portal, the Orb is immediately kicked out of the portal it entered, no matter whether it already shows up inside the sprite of the destination portal. Neither of the two portal sprites is reset when this happens, leading to "two Orbs" being visible simultaneously. :tannedcirno::onricdennat:
This makes very little sense no matter how you look at it. The Orb doesn't receive a new velocity or force when this happens, so it will simply re-enter the same portal once the gameplay resumes on Reimu's next life:

And that's it! At least the turrets don't have anything notable to say about them 📝 that I haven't said before.

That left another ½ of a push over at the end. Way too much time to finish FUUIN.exe, way too little time to start with Mima… but the bomb animation fit perfectly in there. No secrets or bugs there, just a bunch of sprite animation code wasting at least another 82 bytes in the data segment. The special effect after the kuji-in sprites uses the same single-bitplane 32×32 square inversion effect seen at the end of Kikuri's and Sariel's entrance animation, except that it's a 3-stack of 16-rings moving at 6, 7, and 8 pixels per frame respectively. At these comparatively slow speeds, the byte alignment of each square adds some further noise to the discoloration pattern… if you even notice it below all the shaking and seizure-inducing hardware palette manipulation.
And yes, due to the very destructive nature of the effect, the game does in fact rely on it only being applied to VRAM page 0. While that will cause every moving sprite to tear holes into the inverted squares along its trajectory, keeping a clean playfield on VRAM page 1 is what allows all that pixel damage to be easily undone at the end of this 89-frame animation.

Next up: Mima! Let's hope that stage obstacles already were the most complex part remaining in TH01…

📝 Posted:
🚚 Summary of:
P0170, P0171
(Website) 0c4ab41...4f04091, (Website) 4f04091...e12cf26
💰 Funded by:
🏷 Tags:
website+ meta-

The "bad" news first: Expanding to Stripe in order to support Google Pay requires bureaucratic effort that is not quite justified yet, and would only be worth it after the next price increase.

Visualizing technical debt has definitely been overdue for a while though. With 1 of these 2 pushes being focused on this topic, it makes sense to summarize once again what "technical debt" means in the context of ReC98, as this info was previously kind of scattered over multiple blog posts. Mainly, it encompasses

Technically (ha), it would also include all of master.lib, which has always been compiled into the binaries in this way, and which will require quite a bit of dedicated effort to be moved out into a properly linkable library, once it's feasible. But this code has never been part of any progress metric – in fact, 0% RE is defined as the total number of x86 instructions in the binary minus any library code. There is also no relation between instruction numbers and the time it will take to finalize master.lib code, let alone a precedent of how much it would cost.

If we now want to express technical debt as a percentage, it's clear where the 100% point would be: when all RE'd code is also compiled in from a translation unit outside the big .ASM one. But where would 0% be? Logically, it would be the point where no reverse-engineered code has ever been moved out of the big translation units yet, and nothing has ever been decompiled. With these boundary points, this is what we get:

Visualizing technical debt in terms of the total amount of instructions that could possibly be not finalized

Not too bad! So it's 6.22% of total RE that we will have to revisit at some point, concentrated mostly around TH04 and TH05 where it resulted from a focus on position independence. The prices also give an accurate impression of how much more work would be required there.

But is that really the best visualization? After all, it requires an understanding of our definition of technical debt, so it's maybe not the most useful measurement to have on a front page. But how about subtracting those 6.22% from the number shown on the RE% bars? Then, we get this:

Visualizing technical debt in terms of the absolute number of 'finalized' instructions per binary

Which is where we get to the good news: Twitter surprisingly helped me out in choosing one visualization over the other, voting 7:2 in favor of the Finalized version. While this one requires you to manually calculate € finalized - € RE'd to obtain the raw financial cost of technical debt, it clearly shows, for the first time, how far away we are from the main goal of fully decompiling all 5 games… at least to the extent it's possible.

Now that the parser is looking at these recursively included .ASM files for the first time, it needed a small number of improvements to correctly handle the more advanced directives used there, which no automatic disassembler would ever emit. Turns out I've been counting some directives as instructions that never should have been, which is where the additional 0.02% total RE came from.

One more overcounting issue remains though. Some of the RE'd assembly slices included by multiple games contain different if branches for each game, like this:

; An example assembly file included by both TH04's and TH05's MAIN.EXE:
if (GAME eq 5)
	; (Code for TH05)
	; (Code for TH04)

Currently, the parser simply ignores if, else, and endif, leading to the combined code of all branches being counted for every game that includes such a file. This also affects the calculated speed, and is the reason why finalization seems to be slightly faster than reverse-engineering, at currently 471 instructions per push compared to 463. However, it's not that bad of a signal to send: Most of the not yet finalized code is shared between TH04 and TH05, so finalizing it will roughly be twice as fast as regular reverse-engineering to begin with. (Unless the code then turns out to be twice as complex than average code… :tannedcirno:).

For completeness, finalization is now also shown as part of the per-commit metrics. Now it's clearly visible what I was doing in those very slow five months between P0131 and P0140, where the progress bar didn't move at all: Repaying 3.49% of previously accumulated technical debt across all games. 👌

As announced, I've also implemented a new caching system for this website, as the second main feature of these two pushes. By appending a hash string to the URLs of static resources, your browser should now both cache them forever and re-download them once they did change on the server. This avoids the unnecessary (and quite frankly, embarrassing) re-requests for all static resources that typically just return a 304 Not Modified response. As a result, the blog should now load a bit faster on repeated visits, especially on slower connections. That should allow me to deliberately not paginate it for another few years, without it getting all too slow – and should prepare us for the day when our first game reaches 100% and the server will get smashed. :onricdennat: However, I am open to changing the progress blog link in the navigation bar at the top to the list of tags, once people start complaining.

Apart frome some more invisible correctness and QoL improvements, I've also prepared some new funding goals, but I'll cover those once the store reopens, next year. Syntax highlighting for code snippets would have also been cool, but unfortunately didn't make it into those two pushes. It's still on the list though!

Next up: Back to RE with the TH03 score file format, and other code that surrounds it.

📝 Posted:
🏷 Tags:

Made it through almost three years without a price increase! It's been overdue for a while, though.

With the last months being full of rather research- and documentation-heavy pushes, I've been just about able to keep up with the existing subscriptions. By now, the amount of quality control and documentation I found myself putting into this project has far surpassed the raw reverse-engineering work. Back at the beginning of 2019 when I decided on the previous push price of 30 €, I didn't have this blog nor the current aspirations at code quality. Neither of these have ever been reflected in the price, and I still find it hard to put a number on them. On the other hand, I continue to dislike the typical Patreon model of no inherent defined obligations on my part, and no direct association of the resulting work with the person who funded it. You might have noticed that I don't use the word "donations" anywhere, and instead refer to them as "orders" or "purchases" – and that's precisely for this reason.
The result, however, has been a sold-out store for pretty much all of 2021. I can only begin to imagine how much potential revenue I've already lost from people who might have wanted to contribute at one point, but couldn't, and have already written off this project…

Raising prices is pretty much the only way to get the pending workload back to a more comfortable amount. I also thought about a two-tiered system: Have a documentation-less option for 30 €, and take 60 € for any push that should be accompanied by a blog post. However, skimping on documentation will compromise the quality of the code as well. Writing these blog posts presents another chance of improving it before release, which has made quite a difference on many occasions. And after all, this documentation is the one thing about ReC98 that people mainly interact with. As long as we haven't hit 100% RE, the actual code seems to be an afterthought, which is perfectly understandable: Why start work on a bigger mod or port now if the code is steadily improving in every aspect, and it all will be just a bit more maintainable in a few months?

But why go more commercial then, and especially now? If the recent attention to spaztron64's PC-98 Touhou collection package is any indication, ReC98 has a way bigger career potential than the dead-end RL job I found myself in. Demand for fixed translations and replay support is definitely there – and given that these haven't been done so far, it's very likely that I'll end up as the one to implement such mods, especially if that should happen before reaching 100% RE or PI. People also still seem to want* a port to IBM-compatible DOS, 📝 even though this makes no sense? But if this is something you all want to pay for, then sure, why not. :tannedcirno:
And even right now, working on ReC98 sure beats writing junk software using ill-suited technologies for highly corporate clients, or living close to a world where academic papers are valued higher than working and maintained code. I am in fact very happy whenever I'm done with that for the day, and get to work on ReC98! Who would have thought.

So let's try to grow this into an actual business and raise prices to match demand, going up to a nicely divisible 60 € per push. If you all still manage to regularly sell out the store at this level and I get to raise prices again, I should be able to reduce RL work further and therefore raise the cap as well. Now that I've also clarified a potential route towards self-employment, I'm going to react to these sell-out events more quickly, and with smaller raises. So, no further immediate doubling in the future.

Now, will this delay the currently highly awaited 100% completion of TH01 past August 2022, the 25th anniversary of its release? We'll see once we're back to an almost empty backlog, after I'm done with the TH01 Sariel fight. I'm hopeful that such a price increase will give a new voice to the goals and priorities of less wealthy potential patrons. This crowdfunding is very much designed to be hacked by "microtransactions" – small contributions with specific requests that require other, larger generic contributions to be fulfilled – and I'd like to see more of that. 😛 And even if 60 € per push is already more than the combined fandom wants to pay, that means I can get the 📝 16-bit build system done before the first big 100% release. (Trust me, you really want that!)

I will still deliver the entire current backlog at the value the contributions were originally purchased at. Due to the way the cap has to be calculated, these contributions now appear to have doubled in value. All existing subscriptions will then pay for half of their original pushes starting with their respective December 2021 transaction.

Next up: A bunch of smaller website features, including:

📝 Posted:
🏷 Tags:
rec98+ meta-

Last updated: 📝 2022-08-11

Secured a 30-hour RL workweek to leave plenty of time for this project, Touhou Patch Center's commissioned MediaWiki update work is also nearing completion, time to reopen the store! Since it's been a long time, here's an overview of where we currently are in each game and binary, and what the next logical step would be:

But as always, you can request pretty much any other part of any game. We're now at a pretty good place as far as arbitrary requests are concerned, as I simply can't decide myself where to put all the current pending contributions in the funding backlog. 😅 By spending only the missing amount of money to complete any of those, you can capture any of those "fractional" contributions towards a specific goal.

The next specific requests are going to set the priorities of this project for quite some time! The best strategy: Spend a low amount of money on something very specific, and watch as existing generic contributions will necessarily have to be put towards making that specific goal happen 😛

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ build-process+ meta- contribution-ideas+ mod+ tasm+ tcc+

Whoops, the build was broken again? Since P0127 from mid-November 2020, on TASM32 version 5.3, which also happens to be the one in the DevKit… That version changed the alignment for the default segments of certain memory models when requesting .386 support. And since redefining segment alignment apparently is highly illegal and absolutely has to be a build error, some of the stand-alone .ASM translation units didn't assemble anymore on this version. I've only spotted this on my own because I casually compiled ReC98 somewhere else – on my development system, I happened to have TASM32 version 5.0 in the PATH during all this time.
At least this was a good occasion to get rid of some weird segment alignment workarounds from 2015, and replace them with the superior convention of using the USE16 modifier for the .MODEL directive.

ReC98 would highly benefit from a build server – both in order to immediately spot issues like this one, and as a service for modders. Even more so than the usual open-source project of its size, I would say. But that might be exactly because it doesn't seem like something you can trivially outsource to one of the big CI providers for open-source projects, and quickly set it up with a few lines of YAML.
That might still work in the beginning, and we might get by with a regular 64-bit Windows 10 and DOSBox running the exact build tools from the DevKit. Ideally, though, such a server should really run the optimal configuration of a 32-bit Windows 10, allowing both the 32-bit and the 16-bit build step to run natively, which already is something that no popular CI service out there offers. Then, we'd optimally expand to Linux, every other Windows version down to 95, emulated PC-98 systems, other TASM versions… yeah, it'd be a lot. An experimental project all on its own, with additional hosting costs and probably diminishing returns, the more it expands…
I've added it as a category to the order form, let's see how much interest there is once the store reopens (which will be at the beginning of May, at the latest). That aside, it would 📝 also be a great project for outside contributors!

So, technical debt, part 8… and right away, we're faced with TH03's low-level input function, which 📝 once 📝 again 📝 insists on being word-aligned in a way we can't fake without duplicating translation units. Being undecompilable isn't exactly the best property for a function that has been interesting to modders in the past: In 2018, spaztron64 created an ASM-level mod that hardcoded more ergonomic key bindings for human-vs-human multiplayer mode: 2021-04-04-TH03-WASD-2player.zip However, this remapping attempt remained quite limited, since we hadn't (and still haven't) reached full position independence for TH03 yet. There's quite some potential for size optimizations in this function, which would allow more BIOS key groups to already be used right now, but it's not all that obvious to modders who aren't intimately familiar with x86 ASM. Therefore, I really wouldn't want to keep such a long and important function in ASM if we don't absolutely have to…

… and apparently, that's all the motivation I needed? So I took the risk, and spent the first half of this push on reverse-engineering TCC.EXE, to hopefully find a way to get word-aligned code segments out of Turbo C++ after all.

And there is! The -WX option, used for creating DPMI applications, messes up all sorts of code generation aspects in weird ways, but does in fact mark the code segment as word-aligned. We can consider ourselves quite lucky that we get to use Turbo C++ 4.0, because this feature isn't available in any previous version of Borland's C++ compilers.
That allowed us to restore all the decompilations I previously threw away… well, two of the three, that lookup table generator was too much of a mess in C. :tannedcirno: But what an abuse this is. The subtly different code generation has basically required one creative workaround per usage of -WX. For example, enabling that option causes the regular PUSH BP and POP BP prolog and epilog instructions to be wrapped with INC BP and DEC BP, for some reason:

a_function_compiled_with_wx proc
	inc 	bp    	; ???
	push	bp
	mov 	bp, sp
	    	      	; [… function code …]
	pop 	bp
	dec 	bp    	; ???
a_function_compiled_with_wx endp

Luckily again, all the functions that currently require -WX don't set up a stack frame and don't take any parameters.
While this hasn't directly been an issue so far, it's been pretty close: snd_se_reset(void) is one of the functions that require word alignment. Previously, it shared a translation unit with the immediately following snd_se_play(int new_se), which does take a parameter, and therefore would have had its prolog and epilog code messed up by -WX. Since the latter function has a consistent (and thus, fakeable) alignment, I simply split that code segment into two, with a new -WX translation unit for just snd_se_reset(void). Problem solved – after all, two C++ translation units are still better than one ASM translation unit. :onricdennat: Especially with all the previous #include improvements.

The rest was more of the usual, getting us 74% done with repaying the technical debt in the SHARED segment. A lot of the remaining 26% is TH04 needing to catch up with TH03 and TH05, which takes comparatively little time. With some good luck, we might get this done within the next push… that is, if we aren't confronted with all too many more disgusting decompilations, like the two functions that ended this push. If we are, we might be needing 10 pushes to complete this after all, but that piece of research was definitely worth the delay. Next up: One more of these.

📝 Posted:
🚚 Summary of:
P0126, P0127
6c22af7...8b01657, 8b01657...dc65b59
💰 Funded by:
Blue Bolt, [Anonymous]
🏷 Tags:
rec98+ th03+ th04+ th05+ pc98+ micro-optimization+ tcc+ tasm+ meta-

Alright, back to continuing the master.hpp transition started in P0124, and repaying technical debt. The last blog post already announced some ridiculous decompilations… and in fact, not a single one of the functions in these two pushes was decompilable into idiomatic C/C++ code.

As usual, that didn't keep me from trying though. The TH04 and TH05 version of the infamous 16-pixel-aligned, EGC-accelerated rectangle blitting function from page 1 to page 0 was fairly average as far as unreasonable decompilations are concerned.
The big blocker in TH03's MAIN.EXE, however, turned out to be the .MRS functions, used to render the gauge attack portraits and bomb backgrounds. The blitting code there uses the additional FS and GS segment registers provided by the Intel 386… which

  1. are not supported by Turbo C++'s inline assembler, and
  2. can't be turned into pointers, due to a compiler bug in Turbo C++ that generates wrong segment prefix opcodes for the _FS and _GS pseudo-registers.

Apparently I'm the first one to even try doing that with this compiler? I haven't found any other mention of this bug…
Compiling via assembly (#pragma inline) would work around this bug and generate the correct instructions. But that would incur yet another dependency on a 16-bit TASM, for something honestly quite insignificant.

What we can always do, however, is using __emit__() to simply output x86 opcodes anywhere in a function. Unlike spelled-out inline assembly, that can even be used in helper functions that are supposed to inline… which does in fact allow us to fully abstract away this compiler bug. Regular if() comparisons with pseudo-registers wouldn't inline, but "converting" them into C++ template function specializations does. All that's left is some C preprocessor abuse to turn the pseudo-registers into types, and then we do retain a normal-looking poke() call in the blitting functions in the end. 🤯

Yeah… the result is batshit insane. I may have gone too far in a few places…

One might certainly argue that all these ridiculous decompilations actually hurt the preservation angle of this project. "Clearly, ZUN couldn't have possibly written such unreasonable C++ code. So why pretend he did, and not just keep it all in its more natural ASM form?" Well, there are several reasons:

Unfortunately, these pushes also demonstrated a second disadvantage in trying to decompile everything possible: Since Turbo C++ lacks TASM's fine-grained ability to enforce code alignment on certain multiples of bytes, it might actually be unfeasible to link in a C-compiled object file at its intended original position in some of the .EXE files it's used in. Which… you're only going to notice once you encounter such a case. Due to the slightly jumbled order of functions in the 📝 second, shared code segment, that might be long after you decompiled and successfully linked in the function everywhere else.

And then you'll have to throw away that decompilation after all 😕 Oh well. In this specific case (the lookup table generator for horizontally flipping images), that decompilation was a mess anyway, and probably helped nobody. I could have added a dummy .OBJ that does nothing but enforce the needed 2-byte alignment before the function if I really insisted on keeping the C version, but it really wasn't worth it.

Now that I've also described yet another meta-issue, maybe there'll really be nothing to say about the next technical debt pushes? :onricdennat: Next up though: Back to actual progress again, with TH01. Which maybe even ends up pushing that game over the 50% RE mark?

📝 Posted:
🚚 Summary of:
P0105, P0106, P0107, P0108
3622eb6...11b776b, 11b776b...1f1829d, 1f1829d...1650241, 1650241...dcf4e2c
💰 Funded by:
🏷 Tags:
rec98+ th01+ meta- file-format+ animation+ blitting+ boss+ singyoku+ yuugenmagan+ elis+ kikuri+ konngara+ waste+

And indeed, I got to end my vacation with a lot of image format and blitting code, covering the final two formats, .GRC and .BOS. .GRC was nothing noteworthy – one function for loading, one function for byte-aligned blitting, and one function for freeing memory. That's it – not even a unblitting function for this one. .BOS, on the other hand…

…has no generic (read: single/sane) implementation, and is only implemented as methods of some boss entity class. And then again for Sariel's dress and wand animations, and then again for Reimu's animations, both of which weren't even part of these 4 pushes. Looking forward to decompiling essentially the same algorithms all over again… And that's how TH01 became the largest and most bloated PC-98 Touhou game. So yeah, still not done with image formats, even at 44% RE.

This means I also had to reverse-engineer that "boss entity" class… yeah, what else to call something a boss can have multiple of, that may or may not be part of a larger boss sprite, may or may not be animated, and that may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this class. Since renaming all these variables in ASM land is tedious anyway, I went the extra mile and directly defined separate, meaningful names for the entities of all bosses. These also now document the natural order in which the bosses will ultimately be decompiled. So, unless a backer requests anything else, this order will be:

  1. Konngara
  2. Sariel
  3. Elis
  4. Kikuri
  5. SinGyoku
  6. (code for regular card-flipping stages)
  7. Mima
  8. YuugenMagan

As everyone kind of expects from TH01 by now, this class reveals yet another… um, unique and quirky piece of code architecture. In addition to the position and hitbox members you'd expect from a class like this, the game also stores the .BOS metadata – width, height, animation frame count, and 📝 bitplane pointer slot number – inside the same class. But if each of those still corresponds to one individual on-screen sprite, how can YuugenMagan have 5 eye sprites, or Kikuri have more than one soul and tear sprite? By duplicating that metadata, of course! And copying it from one entity to another :onricdennat:
At this point, I feel like I even have to congratulate the game for not actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760 bytes of waste would have definitely been noticeable in the DOS days. Makes much more sense to waste that amount of space on an unused C++ exception handler, and a bunch of redundant, unoptimized blitting functions :tannedcirno:

(Thinking about it, YuugenMagan fits this entire system perfectly. And together with its position in the game's code – last to be decompiled means first on the linker command line – we might speculate that YuugenMagan was the first boss to be programmed for TH01?)

So if a boss wants to use sprites with different sizes, there's no way around using another entity. And that's why Girl-Elis and Bat-Elis are two distinct entities internally, and have to manually sync their position. Except that there's also a third one for Attacking-Girl-Elis, because Girl-Elis has 9 frames of animation in total, and the global .BOS bitplane pointers are divided into 4 slots of only 8 images each. :zunpet:
Same for SinGyoku, who is split into a sphere entity, a person entity, and a… white flash entity for all three forms, all at the same resolution. Or Konngara's facial expressions, which also require two entities just for themselves.

And once you decompile all this code, you notice just how much of it the game didn't even use. 13 of the 50 bytes of the boss entity class are outright unused, and 10 bytes are used for a movement clamping and lock system that would have been nice if ZUN also used it outside of Kikuri's soul sprites. Instead, all other bosses ignore this system completely, and just party on the X/Y coordinates of the boss entities directly.

As for the rendering functions, 5 out of 10 are unused. And while those definitely make up less than half of the code, I still must have spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis' entrance animation, the class provides functions for wavy blitting and unblitting, which use a separate X coordinate for every line of the sprite. But there's also an unused and sort of broken one for unblitting two overlapping wavy sprites, located at the same Y coordinate. This might indicate that Elis could originally split herself into two sprites, similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind of animation effect, who knows.

After over 3 months of TH01 progress though, it's finally time to look at other games, to cover the rest of the crowdfunding backlog. Next up: Going back to TH05, and getting rid of those last PI false positives. And since I can potentially spend the next 7 weeks on almost full-time ReC98 work, I've also re-opened the store until October!

📝 Posted:
🏷 Tags:
rec98+ meta- build-process+ pipeline+

TH01 pellets are coming up next, and for the first time, we'll have the chance to move hardcoded sprite data from ASM land to C land. As it would turn out, bad luck with the 2-byte alignment at the end of REIIDEN.EXE's data segment pretty much forces us to declare TH01's pellet sprites in C if we want to decompile the final few pellet functions without ugly workarounds for the float literals there. And while I could have just converted them into a C array and called it a day, it did raise the question of when we are going to do this The Right And Moddable Way, by auto-converting actual image files into ASM or C arrays during the build process. These arrays are even more annoying to edit in C, after all – unlike TASM, the old C++ we have to work with doesn't support binary number literals, only hexadecimal or, gasp, octal.
Without the explicit funding for such a converter, I reached out to GitHub, asking backers and outside contributors whether they'd be in favor of it. As something that requires no RE skills and collides with nothing else, it would be a perfect task for C/C++ coders who want to support ReC98 with something other than money.

And surprisingly, those still exist! Jonathan Campbell, of DOSBox-X fame, went ahead and implemented all the required functionality, within just a few days. Thanks again! The result is probably a lot more portable than it would have been if I had written it. Which is pretty relevant for future port authors – any additional tooling we write ourselves should not add to the list of problems they'll have to worry about.

Right now, all of the sprites are #included from the big ASM dump files, which means that they have to be converted before those files are assembled during the 32-bit build part. We could have introduced a third distinct build step there, perhaps even a 16-bit one so that we can use Turbo C++ 4.0J to also compile the converter… However, the more reasonable option was to do this at the beginning of the 32-bit build step, and add a 32-bit Windows C++ compiler to the list of tools required for ReC98's build process.
And the best choice for ReC98 is, in fact… 🥁… the 20-year-old Borland C++ 5.5 freeware release. See the README for a lengthy justification, as well as download links.

So yes, all sprites mentioned in the GitHub issue can now be modded by simply editing .BMP files, using an image editor of your choice. 🖌
And now that that's dealt with, it's finally time for more actual progress! TH01 pellets coming tomorrow.

📝 Posted:
🏷 Tags:
rec98+ meta-

Did WindowsTiger just cover 2% over all games on his own? While not all of that passed my review, +1.59% RE and +1.66% PI over all 5 games is still pretty noteworthy, and comfortably pushes TH05 over the 25% mark in RE, and the 60% mark in PI.


While I definitely do appreciate such contributions, reviewing and adapting these to my current code organization standards also takes more time than I'd like it to take. And taken to this level, it does kind of undermine this crowdfunding project, causing both a literal denial of service and exactly the stress that this crowdfunding was designed to avoid. Most of the time, I can't merge all of that as-is without knowingly creating annoyances down the line. But I don't want to just ignore it either, or reject every non-perfect commit…
That's also why I let it slide this time, due to some of the RE work in there being genuinely amazing. In the future though, be aware that your chance of having your work merged diminishes the further you move ahead of my current master branch. In extreme cases like this one, I'll then just be waiting until enough generic reverse-engineering pushes have accrued, and treat the merge as regular work.

But now, time to continue with the regular programming… I am kind of exhausted from all of this, so no bullets for the next two Touhou Patch Center pushes, still… Good thing there's still plenty of simpler things with big percentage gains to be done:

📝 Posted:
🚚 Summary of:
💰 Funded by:
[Anonymous], -Tom-
🏷 Tags:
rec98+ th04+ th05+ pc98+ position-independence+ uth05win+ meta-

With no feedback to 📝 last week's blog post, I assume you all are fine with how things are going? Alright then, another one towards position independence, with the same approach as before…

Since -Tom- wanted to learn something about how the PC-98 EGC is used in TH04 and TH05, I took a look at master.lib's egc_shift_*() functions. These simply do a hardware-accelerated memmove() of any VRAM region, and are used for screen shaking effects. Hover over the image below for the raw effect:

Demonstration of an egc_shift_left() call

Then, I finally wanted to take a look at the bullet structures, but it required way too much reverse-engineering to even start within ¾ of a position independence push. Even with the help of uth05win – bullet handling was changed quite a bit from TH04 to TH05.

What I ultimately settled on was more raw, "boring" PI work based around an already known set of functions. For this one, I looked at vector construction… and this time, that actually made the games a little bit more position-independent, and wasn't just all about removing false positives from the calculation. This was one of the few sets of functions that would also apply to TH01, and it revealed just how chaotically that game was coded. This one commit shows three ways how ZUN stored regular 2D points in TH01:

… yeah. But in more productive news, this did actually lay the groundwork for TH04 and TH05 bullet structures. Which might even be coming up within the next big, 5-push order from Touhou Patch Center? These are the priorities I got from them, let's see how close I can get!

📝 Posted:
🚚 Summary of:
P0057, P0058
1cb9731...ac7540d, ac7540d...fef0299
💰 Funded by:
[Anonymous], -Tom-
🏷 Tags:
rec98+ th04+ th05+ gameplay+ item+ animation+ uth05win+ meta-

So, here we have the first two pushes with an explicit focus on position independence… and they start out looking barely different from regular reverse-engineering? They even already deduplicate a bunch of item-related code, which was simple enough that it required little additional work? Because the actual work, once again, was in comparing uth05win's interpretations and naming choices with the original PC-98 code? So that we only ended up removing a handful of memory references there?

(Oh well, you can mod item drops now!)

So, continuing to interpret PI as a mere by-product of reverse-engineering might ultimately drive up the total PI cost quite a bit. But alright then, let's systematically clear out some false positives by looking at master.lib function calls instead… and suddenly we get the PI progress we were looking for, nicely spread out over all games since TH02. That kinda makes it sound like useless work, only done because it's dictated by some counting algorithm on a website. But decompilation will want to convert all of these values to decimal anyway. We're merely doing that right now, across all games.

Then again, it doesn't actually make any game more position-independent, and only proves how position-independent it already was. So I'm really wondering right now whether I should just rush actual position independence by simply identifying structures and their sizes, and not bother with members or false positives until that's done. That would certainly get the job done for TH04 and TH05 in just a few more pushes, but then leave all the proving work (and the road to 100% PI on the front page) to reverse-engineering.

I don't know. Would it be worth it to have a game that's „maybe fully position-independent“, only for there to maybe be rare edge cases where it isn't?

Or maybe, continuing to strike a balance between identifying false positives (fast) and reverse-engineering structures (slow) will continue to work out like it did now, and make us end up close to the current estimate, which was attractive enough to sell out the crowdfunding for the first time… 🤔

Please give feedback! If possible, by Friday evening UTC+1, before I start working on the next PI push, this time with a focus on TH04.