- 📝 Posted:
- 🚚 Summary of:
- P0160, P0161
- ⌨ Commits:
e491cd7...42ba4a5
,42ba4a5...81dd96e
- 💰 Funded by:
- Yanga, [Anonymous]
- 🏷 Tags:
Nothing really noteworthy in TH01's stage timer code, just yet another HUD
element that is needlessly drawn into VRAM. Sure, ZUN applies his custom
boldfacing effect on top of the glyphs retrieved from font ROM, but he could
have easily installed those modified glyphs as gaiji.
Well, OK, halfwidth gaiji aren't exactly well documented, and sometimes not
even correctly emulated
📝 due to the same PC-98 hardware oddity I was researching last month.
I've reserved two of the pending anonymous "anything" pushes for the
conclusion of this research, just in case you were wondering why the
outstanding workload is now lower after the two delivered here.
And since it doesn't seem to be clearly documented elsewhere: Every 2 ticks on the stage timer correspond to 4 frames.
So, TH01 rank pellet speed. The resident pellet speed value is a
factor ranging from a minimum of -0.375 up to a maximum of 0.5 (pixels per
frame), multiplied with the difficulty-adjusted base speed for each pellet
and added on top of that same speed. This multiplier is modified
- every time the stage timer reaches 0 and HARRY UP is shown (+0.05)
- for every score-based extra life granted below the maximum number of lives (+0.025)
- every time a bomb is used (+0.025)
- on every frame in which the
rand
value (shown in debug mode) is evenly divisible by(1800 - (lives × 200) - (bombs × 50))
(+0.025) - every time Reimu got hit (set to 0 if higher, then -0.05)
- when using a continue (set to -0.05 if higher, then -0.125)
Apparently, ZUN noted that these deltas couldn't be losslessly stored in an
IEEE 754 floating-point variable, and therefore didn't store the pellet
speed factor exactly in a way that would correspond to its gameplay effect.
Instead, it's stored similar to Q12.4 subpixels: as a simple integer,
pre-multiplied by 40. This results in a raw range of -15 to 20, which is
what the undecompiled ASM calls still use. When spawning a new pellet, its
base speed is first multiplied by that factor, and then divided by 40 again.
This is actually quite smart: The calculation doesn't need to be aware of
either Q12.4 or the 40× format, as
((Q12.4 * factor×40) / factor×40)
still comes out as a
Q12.4 subpixel even if all numbers are integers. The only limiting issue
here would be the potential overflow of the 16-bit multiplication at
unadjusted base speeds of more than 50 pixels per frame, but that'd be
seriously unplayable.
So yeah, pellet speed modifications are indeed gradual, and don't just fall
into the coarse three "high, normal, and low" categories.
That's ⅝ of P0160 done, and the continue and pause menus would make good
candidates to fill up the remaining ⅜… except that it seemed impossible to
figure out the correct compiler options for this code?
The issues centered around the two effects of Turbo C++ 4.0J's
-O
switch:
- Optimizing jump instructions: merging duplicate successive jumps into a single one, and merging duplicated instructions at the end of conditional branches into a single place under a single branch, which the other branches then jump to
- Compressing
ADD SP
andPOP CX
stack-clearing instructions after multiple successiveCALL
s to__cdecl
functions into a singleADD SP
with the combined parameter stack size of all function calls
But how can the ASM for these functions exhibit #1 but not #2? How
can it be seemingly optimized and unoptimized at the same time? The
only option that gets somewhat close would be -O- -y
, which
emits line number information into the .OBJ files for debugging. This
combination provides its own kind of #1, but these functions clearly need
the real deal.
The research into this issue ended up consuming a full push on its own. In the end, this solution turned out to be completely unrelated to compiler options, and instead came from the effects of a compiler bug in a totally different place. Initializing a local structure instance or array like
const uint4_t flash_colors[3] = { 3, 4, 5 };
always emits the { 3, 4, 5 }
array into the program's data
segment, and then generates a call to the internal SCOPY@
function which copies this data array to the local variable on the stack.
And as soon as this SCOPY@
call is emitted, the -O
optimization #1 is disabled for the entire rest of the translation
unit?!
So, any code segment with an SCOPY@
call followed by
__cdecl
functions must strictly be decompiled from top to
bottom, mirroring the original layout of translation units. That means no
TH01 continue and pause menus before we haven't decompiled the bomb
animation, which contains such an SCOPY@
call. 😕
Luckily, TH01 is the only game where this bug leads to significant
restrictions in decompilation order, as later games predominantly use the
pascal
calling convention, in which each function itself clears
its stack as part of its RET
instruction.
What now, then? With 51% of REIIDEN.EXE
decompiled, we're
slowly running out of small features that can be decompiled within ⅜ of a
push. Good that I haven't been looking a lot into OP.EXE
and
FUUIN.EXE
, which pretty much only got easy pieces of
code left to do. Maybe I'll end up finishing their decompilations entirely
within these smaller gaps?
I still ended up finding one more small
piece in REIIDEN.EXE
though: The particle system, seen in the
Mima fight.
I like how everything about this animation is contained within a single function that is called once per frame, but ZUN could have really consolidated the spawning code for new particles a bit. In Mima's fight, particles are only spawned from the top and right edges of the screen, but the function in fact contains unused code for all other 7 possible directions, written in quite a bloated manner. This wouldn't feel quite as unused if ZUN had used an angle parameter instead… Also, why unnecessarily waste another 40 bytes of the BSS segment?
But wait, what's going on with the very first spawned particle that just
stops near the bottom edge of the screen in the video above? Well, even in
such a simple and self-contained function, ZUN managed to include an
off-by-one error. This one then results in an out-of-bounds array access on
the 80th frame, where the code attempts to spawn a 41st
particle. If the first particle was unlucky to be both slow enough and
spawned away far enough from the bottom and right edges, the spawning code
will then kill it off before its unblitting code gets to run, leaving its
pixel on the screen until something else overlaps it and causes it to be
unblitted.
Which, during regular gameplay, will quickly happen with the Orb, all the
pellets flying around, and your own player movement. Also, the RNG can
easily spawn this particle at a position and velocity that causes it to
leave the screen more quickly. Kind of impressive how ZUN laid out the
structure
of arrays in a way that ensured practically no effect of this bug on the
game; this glitch could have easily happened every 80 frames instead.
He almost got close to all bugs canceling out each other here!
Next up: The player control functions, including the second-biggest function in all of PC-98 Touhou.