⮜ Blog

📝 Posted:
🚚 Summary of:
P0160, P0161
e491cd7...42ba4a5, 42ba4a5...81dd96e
💰 Funded by:
Yanga, [Anonymous]
🏷 Tags:
rec98 th01 gameplay resident bullet boss mima-th01 animation waste glitch tcc

Nothing really noteworthy in TH01's stage timer code, just yet another HUD element that is needlessly drawn into VRAM. Sure, ZUN applies his custom boldfacing effect on top of the glyphs retrieved from font ROM, but he could have easily installed those modified glyphs as gaiji.
Well, OK, halfwidth gaiji aren't exactly well documented, and sometimes not even correctly emulated 📝 due to the same PC-98 hardware oddity I was researching last month. I've reserved two of the pending anonymous "anything" pushes for the conclusion of this research, just in case you were wondering why the outstanding workload is now lower after the two delivered here.

And since it doesn't seem to be clearly documented elsewhere: Every 2 ticks on the stage timer correspond to 4 frames.

So, TH01 rank pellet speed. The resident pellet speed value is a factor ranging from a minimum of -0.375 up to a maximum of 0.5 (pixels per frame), multiplied with the difficulty-adjusted base speed for each pellet and added on top of that same speed. This multiplier is modified

Apparently, ZUN noted that these deltas couldn't be losslessly stored in an IEEE 754 floating-point variable, and therefore didn't store the pellet speed factor exactly in a way that would correspond to its gameplay effect. Instead, it's stored similar to Q12.4 subpixels: as a simple integer, pre-multiplied by 40. This results in a raw range of -15 to 20, which is what the undecompiled ASM calls still use. When spawning a new pellet, its base speed is first multiplied by that factor, and then divided by 40 again. This is actually quite smart: The calculation doesn't need to be aware of either Q12.4 or the 40× format, as ((Q12.4 * factor×40) / factor×40) still comes out as a Q12.4 subpixel even if all numbers are integers. The only limiting issue here would be the potential overflow of the 16-bit multiplication at unadjusted base speeds of more than 50 pixels per frame, but that'd be seriously unplayable.
So yeah, pellet speed modifications are indeed gradual, and don't just fall into the coarse three "high, normal, and low" categories.

That's ⅝ of P0160 done, and the continue and pause menus would make good candidates to fill up the remaining ⅜… except that it seemed impossible to figure out the correct compiler options for this code?
The issues centered around the two effects of Turbo C++ 4.0J's -O switch:

  1. Optimizing jump instructions: merging duplicate successive jumps into a single one, and merging duplicated instructions at the end of conditional branches into a single place under a single branch, which the other branches then jump to
  2. Compressing ADD SP and POP CX stack-clearing instructions after multiple successive CALLs to __cdecl functions into a single ADD SP with the combined parameter stack size of all function calls

But how can the ASM for these functions exhibit #1 but not #2? How can it be seemingly optimized and unoptimized at the same time? The only option that gets somewhat close would be -O- -y, which emits line number information into the .OBJ files for debugging. This combination provides its own kind of #1, but these functions clearly need the real deal.

The research into this issue ended up consuming a full push on its own. In the end, this solution turned out to be completely unrelated to compiler options, and instead came from the effects of a compiler bug in a totally different place. Initializing a local structure instance or array like

const uint4_t flash_colors[3] = { 3, 4, 5 };

always emits the { 3, 4, 5 } array into the program's data segment, and then generates a call to the internal SCOPY@ function which copies this data array to the local variable on the stack. And as soon as this SCOPY@ call is emitted, the -O optimization #1 is disabled for the entire rest of the translation unit?!
So, any code segment with an SCOPY@ call followed by __cdecl functions must strictly be decompiled from top to bottom, mirroring the original layout of translation units. That means no TH01 continue and pause menus before we haven't decompiled the bomb animation, which contains such an SCOPY@ call. 😕
Luckily, TH01 is the only game where this bug leads to significant restrictions in decompilation order, as later games predominantly use the pascal calling convention, in which each function itself clears its stack as part of its RET instruction.

What now, then? With 51% of REIIDEN.EXE decompiled, we're slowly running out of small features that can be decompiled within ⅜ of a push. Good that I haven't been looking a lot into OP.EXE and FUUIN.EXE, which pretty much only got easy pieces of code left to do. Maybe I'll end up finishing their decompilations entirely within these smaller gaps?
I still ended up finding one more small piece in REIIDEN.EXE though: The particle system, seen in the Mima fight.

I like how everything about this animation is contained within a single function that is called once per frame, but ZUN could have really consolidated the spawning code for new particles a bit. In Mima's fight, particles are only spawned from the top and right edges of the screen, but the function in fact contains unused code for all other 7 possible directions, written in quite a bloated manner. This wouldn't feel quite as unused if ZUN had used an angle parameter instead… :thonk: Also, why unnecessarily waste another 40 bytes of the BSS segment?

But wait, what's going on with the very first spawned particle that just stops near the bottom edge of the screen in the video above? Well, even in such a simple and self-contained function, ZUN managed to include an off-by-one error. This one then results in an out-of-bounds array access on the 80th frame, where the code attempts to spawn a 41st particle. If the first particle was unlucky to be both slow enough and spawned away far enough from the bottom and right edges, the spawning code will then kill it off before its unblitting code gets to run, leaving its pixel on the screen until something else overlaps it and causes it to be unblitted.
Which, during regular gameplay, will quickly happen with the Orb, all the pellets flying around, and your own player movement. Also, the RNG can easily spawn this particle at a position and velocity that causes it to leave the screen more quickly. Kind of impressive how ZUN laid out the structure of arrays in a way that ensured practically no effect of this bug on the game; this glitch could have easily happened every 80 frames instead. He almost got close to all bugs canceling out each other here! :godzun:

Next up: The player control functions, including the second-biggest function in all of PC-98 Touhou.