Last blog post before the 100% completion of TH01! The final parts of
REIIDEN.EXE would feel rather out of place in a celebratory
blog post, after all. They provided quite a neat summary of the typical
technical details that are wrong with this game, and that I now get to
mention for one final time:
The Orb's animation cycle is maybe two frames shorter than it should
have been, showing its last sprite for just 1 frame rather than 3:
The text in the Pause and Continue menus is not quite correctly
centered.
The memory info screen hides quite a bit of information about the .PTN
buffers, and obscures even the info that it does show behind
misleading labels. The most vital information would have been that ZUN could
have easily saved 20% of the memory by using a structure without the
unneeded alpha plane… Oh, and the REWIRTE option
mapped to the ⬇️ down arrow key simply redraws the info screen. Might be
useful after a NODE CHEAK, which replaces the output
with its own, but stays within the same input loop.
But hey, there's an error message if you start REIIDEN.EXE
without a resident MDRV2 or a correctly prepared resident structure! And
even a good, user-friendly one, asking the user to launch the batch file
instead. For some reason, this convenience went out of fashion in the later
games.
The Game Over animation (how fitting) gives us TH01's final piece of weird
sprite blitting code, which seriously manages to include 2 bugs and 3 quirks
in under 50 lines of code. In debug mode, you can trigger this effect by
pressing the ⬇️ down arrow key, which certainly explains why I encountered
seemingly random Game Over events during all the tests I did with this
game…
The animation appears to have changed quite a bit during development, to the
point that probably even ZUN himself didn't know what he wanted it to look
like in the end:
The original version unblits a 32×32 rectangle around Reimu that only
grows on the X axis… for the first 5 frames. The unblitting call is
only run if the corresponding sprite wasn't clipped at the edges of the
playfield in the frame before, and ZUN uses the animation's frame
number rather than the sprite loop variable to index the per-sprite
clip flag array. The resulting out-of-bounds access then reads the
sprite coordinates instead, which are never 0, thus interpreting
all 5 sprites as clipped.
This variant would interpret the declared 5 effect coordinates as
distinct sprites and unblit them correctly every frame. The end result
is rather wimpy though… hardly appropriate for a Game Over, especially
with the original animation in mind.
This variant would not unblit anything, and is probably closest to what
the final animation should have been.
Finally, we get to the big main() function, serving as the duct
tape that holds this game together. It may read rather disorganized with all
the (actually necessary) assignments and function calls, but the only
actual minor issue I've seen there is that you're robbed of any
pellet destroy bonus collected on the final frame of the final boss. There
is a certain charm in directly nesting the infinite main gameplay loop
within the infinite per-life loop within the infinite stage loop. But come
on, why is there no fourth scene loop? Instead, the
game just starts a new REIIDEN.EXE process before and after a
boss fight. With all the wildly mutated global state, that was probably a
much saner choice.
The final secrets can be found in the debug mode stage selection. ZUN
implemented the prompts using the C standard library's scanf()
function, which is the natural choice for quick-and-dirty testing features
like this one. However, the C standard library is also complete and utter
trash, and so it's not surprising that both of the scanf()
calls do… well, probably not what ZUN intended. The guaranteed out-of-bounds
memory access in the select_flag route prompt thankfully has no
real effect on the game, but it gets really interesting with the 面数 stage prompt.
Back in 2020, I already wrote about
📝 stages 21-24, and how they're loaded from actual data that ZUN shipped with the game.
As it now turns out, the code that maps stage IDs to STAGE?.DAT
scene numbers contains an explicit branch that maps any (1-based) stage
number ≥21 to scene 7. Does this mean that an Extra Stage was indeed planned
at some point? That branch seems way too specific to just be meant as a
fallback. Maybe
Asprey was on to something after all…
However, since ZUN passed the stage ID as a signed integer to
scanf(), you can also enter negative numbers. The only place
that kind of accidentally checks for them is the aforementioned stage
ID → scene mapping, which ensures that (1-based) stages < 5 use
the shrine's background image and BGM. With no checks anywhere else, we get
a new set of "glitch stages":
Stage -1Stage -2Stage -3Stage -4Stage -5
The scene loading function takes the entered 0-based stage ID value modulo
5, so these 4 are the only ones that "exist", and lower stage numbers will
simply loop around to them. When loading these stages, the function accesses
the data in REIIDEN.EXE that lies before the statically
allocated 5-element stages-of-scene array, which happens to encompass
Borland C++'s locale and exception handling data, as well as a small bit of
ZUN's global variables. In particular, the obstacle/card HP on the tile I
highlighted in green corresponds to the
lowest byte of the 32-bit RNG seed. If it weren't for that and the fact that
the obstacles/card HP on the few tiles before are similarly controlled by
the x86 segment values of certain initialization function addresses, these
glitch stages would be completely deterministic across PC-98 systems, and
technically canon…
Stage -4 is the only playable one here as it's the only stage to end up
below the
📝 heap corruption limit of 102 stage objects.
Completing it loads Stage -3, which crashes with a Divide Error
just like it does if it's directly selected. Unsurprisingly, this happens
because all 50 card bytes at that memory location are 0, so one division (or
in this case, modulo operation) by the number of cards is enough to crash
the game.
Stage -5 is modulo'd to 0 and thus loads the first regular stage. The only
apparent broken element there is the timer, which is handled by a completely
different function that still operates with a (0-based) stage ID value of
-5. Completing the stage loads Stage -4, which also crashes, but only
because its 61 cards naturally cause the
📝 stack overflow in the flip-in animation for any stage with more than 50 cards.
And that's REIIDEN.EXE, the biggest and most bloated PC-98
Touhou executable, fully decompiled! Next up: Finishing this game with the
main menu, and hoping I'll actually pull it off within 24 hours. (If I do,
we might all have to thank 32th
System, who independently decompiled half of the remaining 14
functions…)
Wow, it's been 3 days and I'm already back with an unexpectedly long post
about TH01's bonus point screens? 3 days used to take much longer in my
previous projects…
Before I talk about graphics for the rest of this post, let's start with the
exact calculations for both bonuses. Touhou Wiki already got these right,
but it still makes sense to provide them here, in a format that allows you
to cross-reference them with the source code more easily. For the
card-flipping stage bonus:
Time
min((Stage timer * 3), 6553)
Continuous
min((Highest card combo * 100), 6553)
Bomb&Player
min(((Lives * 200) + (Bombs * 100)), 6553)
STAGE
min(((Stage number - 1) * 200), 6553)
BONUS Point
Sum of all above values * 10
The boss stage bonus is calculated from the exact same metrics, despite half
of them being labeled differently. The only actual differences are in the
higher multipliers and in the cap for the stage number bonus. Why remove it
if raising it high enough also effectively disables it?
Time
min((Stage timer * 5), 6553)
Continuous
min((Highest card combo * 200), 6553)
MIKOsan
min(((Lives * 500) + (Bombs * 200)), 6553)
Clear
min((Stage number * 1000), 65530)
TOTLE
Sum of all above values * 10
The transition between the gameplay and TOTLE screens is one of the more
impressive effects showcased in this game, especially due to how wavy it
often tends to look. Aside from the palette interpolation (which is, by the
way, the first time ZUN wrote a correct interpolation algorithm between two
4-bit palettes), the core of the effect is quite simple. With the TOTLE
image blitted to VRAM page 1:
Shift the contents of a line on VRAM page 0 by 32 pixels, alternating
the shift direction between right edge → left edge (even Y
values) and the other way round (odd Y values)
Keep a cursor for the destination pixels on VRAM page 1 for every line,
starting at the respective opposite edge
Blit the 32 pixels at the VRAM page 1 cursor to the newly freed 32
pixels on VRAM page 0, and advance the cursor towards the other edge
Successive line shifts will then include these newly blitted 32 pixels
as well
Repeat (640 / 32) = 20 times, after which all new pixels
will be in their intended place
So it's really more like two interlaced shift effects with opposite
directions, starting on different scanlines. No trigonometry involved at
all.
Horizontally scrolling pixels on a single VRAM page remains one of the few
📝 appropriate uses of the EGC in a fullscreen 640×400 PC-98 game,
regardless of the copied block size. The few inter-page copies in this
effect are also reasonable: With 8 new lines starting on each effect frame,
up to (8 × 20) = 160 lines are transferred at any given time, resulting
in a maximum of (160 × 2 × 2) = 640 VRAM page switches per frame for the newly
transferred pixels. Not that frame rate matters in this situation to begin
with though, as the game is doing nothing else while playing this effect.
What does sort of matter: Why 32 pixels every 2 frames, instead of 16
pixels on every frame? There's no performance difference between doing one
half of the work in one frame, or two halves of the work in two frames. It's
not like the overhead of another loop has a serious impact here,
especially with the PC-98 VRAM being said to have rather high
latencies. 32 pixels over 2 frames is also harder to code, so ZUN
must have done it on purpose. Guess he really wanted to go for that 📽
cinematic 30 FPS look 📽 here…
Removing the palette interpolation and transitioning from a black screen
to CLEAR3.GRP makes it a lot clearer how the effect works.
Once all the metrics have been calculated, ZUN animates each value with a
rather fancy left-to-right typing effect. As 16×16 images that use a single
bright-red color, these numbers would be
perfect candidates for gaiji… except that ZUN wanted to render them at the
more natural Y positions of the labels inside CLEAR3.GRP that
are far from aligned to the 8×16 text RAM grid. Not having been in the mood
for hardcoding another set of monochrome sprites as C arrays that day, ZUN
made the still reasonable choice of storing the image data for these numbers
in the single-color .GRC form– yeah, no, of course he once again
chose the .PTN hammer, and its
📝 16×16 "quarter" wrapper functions around nominal 32×32 sprites.
The three 32×32 TOTLE metric digit sprites inside
NUMB.PTN.
Why do I bring up such a detail? What's actually going on there is that ZUN
loops through and blits each digit from 0 to 9, and then continues the loop
with "digit" numbers from 10 to 19, stopping before the number whose ones
digit equals the one that should stay on screen. No problem with that in
theory, and the .PTN sprite selection is correct… but the .PTN
quarter selection isn't, as ZUN wrote (digit % 4)
instead of the correct ((digit % 10) % 4).
Since .PTN quarters are indexed in a row-major
way, the 10-19 part of the loop thus ends up blitting
2 →
3 →
0 →
1 →
6 →
7 →
4 →
5 →
(nothing):
This footage was slowed down to show one sprite blitting operation per
frame. The actual game waits a hardcoded 4 milliseconds between each
sprite, so even theoretically, you would only see roughly every
4th digit. And yes, we can also observe the empty quarter
here, only blitted if one of the digits is a 9.
Seriously though? If the deadline is looming and you've got to rush
some part of your game, a standalone screen that doesn't affect
anything is the best place to pick. At 4 milliseconds per digit, the
animation goes by so fast that this quirk might even add to its
perceived fanciness. It's exactly the reason why I've always been rather
careful with labeling such quirks as "bugs". And in the end, the code does
perform one more blitting call after the loop to make sure that the correct
digit remains on screen.
The remaining ¾ of the second push went towards transferring the final data
definitions from ASM to C land. Most of the details there paint a rather
depressing picture about ZUN's original code layout and the bloat that came
with it, but it did end on a real highlight. There was some unused data
between ZUN's non-master.lib VSync and text RAM code that I just moved away
in September 2015 without taking a closer look at it. Those bytes kind of
look like another hardcoded 1bpp image though… wait, what?!
Lovely! With no mouse-related code left in the game otherwise, this cursor
sprite provides some great fuel for wild fan theories about TH01's
development history:
Could ZUN have 📝 stolen the basic PC-98
VSync or text RAM function code from a source that also implemented mouse
support?
Or was this game actually meant to have mouse-controllable portions at
some point during development? Even if it would have just been the
menus.
… Actually, you know what, with all shared data moved to C land, I might as
well finish FUUIN.EXE right now. The last secret hidden in its
main() function: Just like GAME.BAT supports
launching the game in a debug mode from the DOS command line,
FUUIN.EXE can directly launch one of the game's endings. As
long as the MDRV2 driver is installed, you can enter
fuuin t1 for the 魔界/Makai Good Ending, or
fuuin t for 地獄/Jigoku Good Ending.
Unfortunately, the command-line parameter can only control the route.
Choosing between a Good or Bad Ending is still done exclusively through
TH01's resident structure, and the continues_per_scene array in
particular. But if you pre-allocate that structure somehow and set one of
the members to a nonzero value, it would work. Trainers, anyone?
Alright, gotta get back to the code if I want to have any chance of
finishing this game before the 15th… Next up: The final 17
functions in REIIDEN.EXE that tie everything together and add
some more debug features on top.
With Elis, we've not only reached the midway point in TH01's boss code, but
also a bunch of other milestones: Both REIIDEN.EXE and TH01 as
a whole have crossed the 75% RE mark, and overall position independence has
also finally cracked 80%!
And it got done in 4 pushes again? Yup, we're back to
📝 Konngara levels of redundancy and
copy-pasta. This time, it didn't even stop at the big copy-pasted code
blocks for the rift sprite and 256-pixel circle animations, with the words
"redundant" and "unnecessary" ending up a total of 18 times in my source
code comments.
But damn is this fight broken. As usual with TH01 bosses, let's start with a
high-level overview:
The Elis fight consists of 5 phases (excluding the entrance animation),
which must be completed in order.
In all odd-numbered phases, Elis uses a random one-shot danmaku pattern
from an exclusive per-phase pool before teleporting to a random
position.
There are 3 exclusive girl-form patterns per phase, plus 4
additional bat-form patterns in phase 5, for a total of 13.
Due to a quirk in the selection algorithm in phases 1 and 3, there
is a 25% chance of Elis skipping an attack cycle and just teleporting
again.
In contrast to Konngara, Elis can freely select the same pattern
multiple times in a row. There's nothing in the code to prevent that
from happening.
This pattern+teleport cycle is repeated until Elis' HP reach a certain
threshold value. The odd-numbered phases correspond to the white (phase 1),
red-white (phase 3), and red (phase 5) sections of the health bar. However,
the next phase can only start at the end of each cycle, after a
teleport.
Phase 2 simply teleports Elis back to her starting screen position of
(320, 144) and then advances to phase 3.
Phase 4 does the same as phase 2, but adds the initial bat form
transformation before advancing to phase 5.
Phase 5 replaces the teleport with a transformation to the bat form.
Rather than teleporting instantly to the target position, the bat gradually
flies there, firing a randomly selected looping pattern from the 4-pattern
bat pool on the way, before transforming back to the girl form.
This puts the earliest possible end of the fight at the first frame of phase
5. However, nothing prevents Elis' HP from reaching 0 before that point. You
can nicely see this in 📝 debug mode: Wait
until the HP bar has filled up to avoid heap corruption, hold ↵ Return
to reduce her HP to 0, and watch how Elis still goes through a total of
two patterns* and four
teleport animations before accepting defeat.
But wait, heap corruption? Yup, there's a bug in the HP bar that already
affected Konngara as well, and it isn't even just about the graphical
glitches generated by negative HP:
The initial fill-up animation is drawn to both VRAM pages at a rate of 1
HP per frame… by passing the current frame number as the
current_hp number.
The target_hp is indicated by simply passing the current
HP…
… which, however, can be reduced in debug mode at an equal rate of up to
1 HP per frame.
The completion condition only checks if
((target_hp - 1) == current_hp). With the
right timing, both numbers can run past each other though.
In that case, the function is repeatedly called on every frame, backing
up the original VRAM contents for the current HP point before blitting
it…
… until frame ((96 / 2) + 1), where the
.PTN slot pointer overflows the heap buffer and overwrites whatever comes
after. 📝 Sounds familiar, right?
Since Elis starts with 14 HP, which is an even number, this corruption is
trivial to cause: Simply hold ↵ Return from the beginning of the
fight, and the completion condition will never be true, as the
HP and frame numbers run past the off-by-one meeting point.
Regular gameplay, however, entirely prevents this due to the fixed start
positions of Reimu and the Orb, the Orb's fixed initial trajectory, and the
50 frames of delay until a bomb deals damage to a boss. These aspects make
it impossible to hit Elis within the first 14 frames of phase 1, and ensure
that her HP bar is always filled up completely. So ultimately, this bug ends
up comparable in seriousness to the
📝 recursion / stack overflow bug in the memory info screen.
These wavy teleport animations point to a quite frustrating architectural
issue in this fight. It's not even the fact that unblitting the yellow star
sprites rips temporary holes into Elis' sprite; that's almost expected from
TH01 at this point. Instead, it's all because of this unused frame of the
animation:
With this sprite still being part of BOSS5.BOS, Girl-Elis has a
total of 9 animation frames, 1 more than the
📝 8 per-entity sprites allowed by ZUN's architecture.
The quick and easy solution would have been to simply bump the sprite array
size by 1, but… nah, this would have added another 20 bytes to all 6 of the
.BOS image slots. Instead, ZUN wrote the manual
position synchronization code I mentioned in that 2020 blog post.
Ironically, he then copy-pasted this snippet of code often enough that it
ended up taking up more than 120 bytes in the Elis fight alone – with, you
guessed it, some of those copies being redundant. Not to mention that just
going from 8 to 9 sprites would have allowed ZUN to go down from 6 .BOS
image slots to 3. That would have actually saved 420 bytes in
addition to the manual synchronization trouble. Looking forward to SinGyoku,
that's going to be fun again…
As for the fight itself, it doesn't take long until we reach its most janky
danmaku pattern, right in phase 1:
The "pellets along circle" pattern on Lunatic, in its original version
and with fanfiction fixes for everything that can potentially be
interpreted as a bug.
For whatever reason, the lower-right quarter of the circle isn't
animated? This animation works by only drawing the new dots added with every
subsequent animation frame, expressed as a tiny arc of a dotted circle. This
arc starts at the animation's current 8-bit angle and ends on the sum of
that angle and a hardcoded constant. In every other (copy-pasted, and
correct) instance of this animation, ZUN uses 0x02 as the
constant, but this one uses… 0.05 for the lower-right quarter?
As in, a 64-bit double constant that truncates to 0 when added
to an 8-bit integer, thus leading to the start and end angles being
identical and the game not drawing anything.
On Easy and Normal, the pattern then spawns 32 bullets along the outline
of the circle, no problem there. On Lunatic though, every one of these
bullets is instead turned into a narrow-angled 5-spread, resulting in 160
pellets… in a game with a pellet cap of 100.
Now, if Elis teleported herself to a position near the top of the playfield,
most of the capped pellets would have been clipped at that top edge anyway,
since the bullets are spawned in clockwise order starting at Elis' right
side with an angle of 0x00. On lower positions though, you can
definitely see a difference if the cap were high enough to allow all coded
pellets to actually be spawned.
The Hard version gets dangerously close to the cap by spawning a total of 96
pellets. Since this is the only pattern in phase 1 that fires pellets
though, you are guaranteed to see all of the unclipped ones.
The pellets also aren't spawned exactly on the telegraphed circle, but 4 pixels to the left.
Then again, it might very well be that all of this was intended, or, most
likely, just left in the game as a happy accident. The latter interpretation
would explain why ZUN didn't just delete the rendering calls for the
lower-right quarter of the circle, because seriously, how would you not spot
that? The phase 3 patterns continue with more minor graphical glitches that
aren't even worth talking about anymore.
And then Elis transforms into her bat form at the beginning of Phase 5,
which displays some rather unique hitboxes. The one against the Orb is fine,
but the one against player shots…
… uses the bat's X coordinate for both X and Y dimensions.
In regular gameplay, it's not too bad as most
of the bat patterns fire aimed pellets which typically don't allow you to
move below her sprite to begin with. But if you ever tried destroying these
pellets while standing near the middle of the playfield, now you know why
that didn't work. This video also nicely points out how the bat, like any
boss sprite, is only ever blitted at positions on the 8×1-pixel VRAM byte
grid, while collision detection uses the actual pixel position.
The bat form patterns are all relatively simple, with little variation
depending on the difficulty level, except for the "slow pellet spreads"
pattern. This one is almost easiest to dodge on Lunatic, where the 5-spreads
are not only always fired downwards, but also at the hardcoded narrow delta
angle, leaving plenty of room for the player to move out of the way:
The "slow pellet spreads" pattern of Elis' bat form, on every
difficulty. Which version do you think is the easiest one?
Finally, we've got another potential timesave in the girl form's "safety
circle" pattern:
After the circle spawned completely, you lose a life by moving outside it,
but doing that immediately advances the pattern past the circle part. This
part takes 200 frames, but the defeat animation only takes 82 frames, so
you can save up to 118 frames there.
Final funny tidbit: As with all dynamic entities, this circle is only
blitted to VRAM page 0 to allow easy unblitting. However, it's also kind of
static, and there needs to be some way to keep the Orb, the player shots,
and the pellets from ripping holes into it. So, ZUN just re-blits the circle
every… 4 frames?! 🤪 The same is true for the Star of David and its
surrounding circle, but there you at least get a flash animation to justify
it. All the overlap is actually quite a good reason for not even attempting
to 📝 mess with the hardware color palette instead.
Reproducing the crash was the whole challenge here. Even after moving Elis
and Reimu to the exact positions seen in Pearl's video and setting Elis' HP
to 0 on the exact same frame, everything ran fine for me. It's definitely no
division by 0 this time, the function perfectly guards against that
possibility. The line specified in the function's parameters is always
clipped to the VRAM region as well, so we can also rule out illegal memory
accesses here…
… or can we? Stepping through it all reminded me of how this function brings
unblitting sloppiness to the next level: For each VRAM byte touched, ZUN
actually unblits the 4 surrounding bytes, adding one byte to the left
and two bytes to the right, and using a single 32-bit read and write per
bitplane. So what happens if the function tries to unblit the topmost byte
of VRAM, covering the pixel positions from (0, 0) to (7, 0)
inclusive? The VRAM offset of 0x0000 is decremented to
0xFFFF to cover the one byte to the left, 4 bytes are written
to this address, the CPU's internal offset overflows… and as it turns out,
that is illegal even in Real Mode as of the 80286, and will raise a General Protection
Fault. Which is… ignored by DOSBox-X,
every Neko Project II version in common use, the CSCP
emulators, SL9821, and T98-Next. Only Anex86 accurately emulates the
behavior of real hardware here.
OK, but no laser fired by Elis ever reaches the top-left corner of the
screen. How can such a fault even happen in practice? That's where the
broken laser reset+unblit function comes in: Not only does it just flat out pass the wrong
parameters to the line unblitting function – describing the line
already traveled by the laser and stopping where the laser begins –
but it also passes them
wrongly, in the form of raw 32-bit fixed-point Q24.8 values, with no
conversion other than a truncation to the signed 16-bit pixels expected by
the function. What then follows is an attempt at interpolation and clipping
to find a line segment between those garbage coordinates that actually falls
within the boundaries of VRAM:
right/bottom correspond to a laser's origin position, and
left/top to the leftmost pixel of its moved-out top line. The
bug therefore only occurs with lasers that stopped growing and have started
moving.
Moreover, it will only happen if either (left % 256) or
(right % 256) is ≤ 127 and the other one of the two is ≥ 128.
The typecast to signed 16-bit integers then turns the former into a large
positive value and the latter into a large negative value, triggering the
function's clipping code.
The function then follows Bresenham's
algorithm: left is ensured to be smaller than right
by swapping the two values if necessary. If that happened, top
and bottom are also swapped, regardless of their value – the
algorithm does not care about their order.
The slope in the X dimension is calculated using an integer division of
((bottom - top) /
(right - left)). Both subtractions are done on signed
16-bit integers, and overflow accordingly.
(-left × slope_x) is added to top,
and left is set to 0.
If both top and bottom are < 0 or
≥ 640, there's nothing to be unblitted. Otherwise, the final
coordinates are clipped to the VRAM range of [(0, 0),
(639, 399)].
If the function got this far, the line to be unblitted is now very
likely to reach from
the top-left to the bottom-right corner, starting out at
(0, 0) right away, or
from the bottom-left corner to the top-right corner. In this case,
you'd expect unblitting to end at (639, 0), but thanks to an
off-by-one error,
it actually ends at (640, -1), which is equivalent to
(0, 0). Why add clipping to VRAM offset calculations when
everything else is clipped already, right?
Possible laser states that will cause the fault, with some debug
output to help understand the cause, and any pellets removed for better
readability. This can happen for all bosses that can potentially have
shootout lasers on screen when being defeated, so it also applies to Mima.
Fixing this is easier than understanding why it happens, but since y'all
love reading this stuff…
tl;dr: TH01 has a high chance of freezing at a boss defeat sequence if there
are diagonally moving lasers on screen, and if your PC-98 system
raises a General Protection Fault on a 4-byte write to offset
0xFFFF, and if you don't run a TSR with an INT
0Dh handler that might handle this fault differently.
The easiest fix option would be to just remove the attempted laser
unblitting entirely, but that would also have an impact on this game's…
distinctive visual glitches, in addition to touching a whole lot of
code bytes. If I ever get funded to work on a hypothetical TH01 Anniversary
Edition that completely rearchitects the game to fix all these glitches, it
would be appropriate there, but not for something that purports to be the
original game.
(Sidenote to further hype up this Anniversary Edition idea for PC-98
hardware owners: With the amount of performance left on the table at every
corner of this game, I'm pretty confident that we can get it to work
decently on PC-98 models with just an 80286 CPU.)
Since we're in critical infrastructure territory once again, I went for the
most conservative fix with the least impact on the binary: Simply changing
any VRAM offsets >= 0xFFFD to 0x0000 to avoid
the GPF, and leaving all other bugs in place. Sure, it's rather lazy and
"incorrect"; the function still unblits a 32-pixel block there, but adding a
special case for blitting 24 pixels would add way too much code. And
seriously, it's not like anything happens in the 8 pixels between
(24, 0) and (31, 0) inclusive during gameplay to begin with.
To balance out the additional per-row if() branch, I inlined
the VRAM page change I/O, saving two function calls and one memory write per
unblitted row.
That means it's time for a new community_choice_fixes
build, containing the new definitive bugfixed versions of these games:
2022-05-31-community-choice-fixes.zip
Check the th01_critical_fixes
branch for the modified TH01 code. It also contains a fix for the HP bar
heap corruption in debug mode – simply changing the ==
comparison to <= is enough to avoid it, and negative HP will
still create aesthetic glitch art.
Once again, I then was left with ½ of a push, which I finally filled with
some FUUIN.EXE code, specifically the verdict screen. The most
interesting part here is the player title calculation, which is quite
sneaky: There are only 6 skill levels, but three groups of
titles for each level, and the title you'll see is picked from a random
group. It looks like this is the first time anyone has documented the
calculation?
As for the levels, ZUN definitely didn't expect players to do particularly
well. With a 1cc being the standard goal for completing a Touhou game, it's
especially funny how TH01 expects you to continue a lot: The code has
branches for up to 21 continues, and the on-screen table explicitly leaves
room for 3 digits worth of continues per 5-stage scene. Heck, these
counts are even stored in 32-bit long variables.
Next up: 📝 Finally finishing the long
overdue Touhou Patch Center MediaWiki update work, while continuing with
Kikuri in the meantime. Originally I wasn't sure about what to do between
Elis and Seihou,
but with Ember2528's surprise
contribution last week, y'all have
demonstrated more than enough interest in the idea of getting TH01 done
sooner rather than later. And I agree – after all, we've got the 25th
anniversary of its first public release coming up on August 15, and I might
still manage to completely decompile this game by that point…
Did you know that moving on top of a boss sprite doesn't kill the player in
TH04, only in TH05?
Yup, Reimu is not getting hit… yet.
That's the first of only three interesting discoveries in these 3 pushes,
all of which concern TH04. But yeah, 3 for something as seemingly simple as
these shared boss functions… that's still not quite the speed-up I had hoped
for. While most of this can be blamed, again, on TH04 and all of its
hardcoded complexities, there still was a lot of work to be done on the
maintenance front as well. These functions reference a bunch of code I RE'd
years ago and that still had to be brought up to current standards, with the
dependencies reaching from 📝 boss explosions
over 📝 text RAM overlay functionality up to
in-game dialog loading.
The latter provides a good opportunity to talk a bit about x86 memory
segmentation. Many aspiring PC-98 developers these days are very scared
of it, with some even going as far as to rather mess with Protected Mode and
DOS extenders just so that they don't have to deal with it. I wonder where
that fear comes from… Could it be because every modern programming language
I know of assumes memory to be flat, and lacks any standard language-level
features to even express something like segments and offsets? That's why
compilers have a hard time targeting 16-bit x86 these days: Doing anything
interesting on the architecture requires giving the programmer full
control over segmentation, which always comes down to adding the
typical non-standard language extensions of compilers from back in the day.
And as soon as DOS stopped being used, these extensions no longer made sense
and were subsequently removed from newer tools. A good example for this can
be found in an old version of the
NASM manual: The project started as an attempt to make x86 assemblers
simple again by throwing out most of the segmentation features from
MASM-style assemblers, which made complete sense in 1996 when 16-bit DOS and
Windows were already on their way out. But there was a point to all
those features, and that's why ReC98 still has to use the supposedly
inferior TASM.
Not that this fear of segmentation is completely unfounded: All the
segmentation-related keywords, directives, and #pragmas
provided by Borland C++ and TASM absolutely can be the cause of many
weird runtime bugs. Even if the compiler or linker catches them, you are
often left with confusing error messages that aged just as poorly as memory
segmentation itself.
However, embracing the concept does provide quite the opportunity for
optimizations. While it definitely was a very crazy idea, there is a small
bit of brilliance to be gained from making proper use of all these
segmentation features. Case in point: The buffer for the in-game dialog
scripts in TH04 and TH05.
// Thanks to the semantics of `far` pointers, we only need a single 32-bit
// pointer variable for the following code.
extern unsigned char far *dialog_p;
// This master.lib function returns a `void __seg *`, which is a 16-bit
// segment-only pointer. Converting to a `far *` yields a full segment:offset
// pointer to offset 0000h of that segment.
dialog_p = (unsigned char far *)hmem_allocbyte(/* … */);
// Running the dialog script involves pointer arithmetic. On a far pointer,
// this only affects the 16-bit offset part, complete with overflow at 64 KiB,
// from FFFFh back to 0000h.
dialog_p += /* … */;
dialog_p += /* … */;
dialog_p += /* … */;
// Since the segment part of the pointer is still identical to the one we
// allocated above, we can later correctly free the buffer by pulling the
// segment back out of the pointer.
hmem_free((void __seg *)dialog_p);
If dialog_p was a huge pointer, any pointer
arithmetic would have also adjusted the segment part, requiring a second
pointer to store the base address for the hmem_free call. Doing
that will also be necessary for any port to a flat memory model. Depending
on how you look at it, this compression of two logical pointers into a
single variable is either quite nice, or really, really dumb in its
reliance on the precise memory model of one single architecture.
Why look at dialog loading though, wasn't this supposed to be all about
shared boss functions? Well, TH04 unnecessarily puts certain stage-specific
code into the boss defeat function, such as loading the alternate Stage 5
Yuuka defeat dialog before a Bad Ending, or initializing Gengetsu after
Mugetsu's defeat in the Extra Stage.
That's TH04's second core function with an explicit conditional branch for
Gengetsu, after the
📝 dialog exit code we found last year during EMS research.
And I've heard people say that Shinki was the most hardcoded fight in PC-98
Touhou… Really, Shinki is a perfectly regular boss, who makes proper use of
all internal mechanics in the way they were intended, and doesn't blast
holes into the architecture of the game. Even within TH05, it's Mai and Yuki
who rely on hacks and duplicated code, not Shinki.
The worst part about this though? How the function distinguishes Mugetsu
from Gengetsu. Once again, it uses its own global variable to track whether
it is called the first or the second time within TH04's Extra Stage,
unrelated to the same variable used in the dialog exit function. But this
time, it's not just any newly created, single-use variable, oh no. In a
misguided attempt to micro-optimize away a few bytes of conventional memory,
TH04 reserves 16 bytes of "generic boss state", which can (and are) freely
used for anything a boss doesn't want to store in a more dedicated
variable.
It might have been worth it if the bosses actually used most of these
16 bytes, but the majority just use (the same) two, with only Stage 4 Reimu
using a whopping seven different ones. To reverse-engineer the various uses
of these variables, I pretty much had to map out which of the undecompiled
danmaku-pattern functions corresponds to which boss
fight. In the end, I assigned 29 different variable names for each of the
semantically different use cases, which made up another full push on its
own.
Now, 16 bytes of wildly shared state, isn't that the perfect recipe for
bugs? At least during this cursory look, I haven't found any obvious ones
yet. If they do exist, it's more likely that they involve reused state from
earlier bosses – just how the Shinki death glitch in
TH05 is caused by reusing cheeto data from way back in Stage 4 – and
hence require much more boss-specific progress.
And yes, it might have been way too early to look into all these tiny
details of specific boss scripts… but then, this happened:
Looks similar to another
screenshot of a crash in the same fight that was reported in December,
doesn't it? I was too much in a hurry to figure it out exactly, but notice
how both crashes happen right as the last of Marisa's four bits is destroyed.
KirbyComment has suspected
this to be the cause for a while, and now I can pretty much confirm it
to be an unguarded division by the number of on-screen bits in
Marisa-specific pattern code. But what's the cause for Kurumi then?
As for fixing it, I can go for either a fast or a slow option:
Superficially fixing only this crash will probably just take a fraction
of a push.
But I could also go for a deeper understanding by looking at TH04's
version of the 📝 custom entity structure. It
not only stores the data of Marisa's bits, but is also very likely to be
involved in Kurumi's crash, and would get TH04 a lot closer to 100%
PI. Taking that look will probably need at least 2 pushes, and might require
another 3-4 to completely decompile Marisa's fight, and 2-3 to decompile
Kurumi's.
OK, now that that's out of the way, time to finish the boss defeat function…
but not without stumbling over the third of TH04's quirks, relating to the
Clear Bonus for the main game or the Extra Stage:
To achieve the incremental addition effect for the in-game score display
in the HUD, all new points are first added to a score_delta
variable, which is then added to the actual score at a maximum rate of
61,110 points per frame.
There are a fixed 416 frames between showing the score tally and
launching into MAINE.EXE.
As a result, TH04's Clear Bonus is effectively limited to
(416 × 61,110) = 25,421,760 points.
Only TH05 makes sure to commit the entirety of the
score_delta to the actual score before switching binaries,
which fixes this issue.
And after another few collision-related functions, we're now truly,
finally ready to decompile bosses in both TH04 and TH05! Just as the
anything funds were running out… The
remaining ¼ of the third push then went to Shinki's 32×32 ball bullets,
rounding out this delivery with a small self-contained piece of the first
TH05 boss we're probably going to look at.
Next up, though: I'm not sure, actually. Both Shinki and Elis seem just a
little bit larger than the 2¼ or 4 pushes purchased so far, respectively.
Now that there's a bunch of room left in the cap again, I'll just let the
next contribution decide – with a preference for Shinki in case of a tie.
And if it will take longer than usual for the store to sell out again this
time (heh), there's still the
📝 PC-98 text RAM JIS trail word rendering research
waiting to be documented.
TH03 finally passed 20% RE, and the newly decompiled code contains no
serious ZUN bugs! What a nice way to end the year.
There's only a single unlockable feature in TH03: Chiyuri and Yumemi as
playable characters, unlocked after a 1CC on any difficulty. Just like the
Extra Stages in TH04 and TH05, YUME.NEM contains a single
designated variable for this unlocked feature, making it trivial to craft a
fully unlocked score file without recording any high scores that others
would have to compete against. So, we can now put together a complete set
for all PC-98 Touhou games: 2021-12-27-Fully-unlocked-clean-score-files.zip
It would have been cool to set the randomly generated encryption keys in
these files to a fixed value so that they cancel out and end up not actually
encrypting the file. Too bad that TH03 also started feeding each encrypted
byte back into its stream cipher, which makes this impossible.
The main loading and saving code turned out to be the second-cleanest
implementation of a score file format in PC-98 Touhou, just behind TH02.
Only two of the YUME.NEM functions come with nonsensical
differences between OP.EXE and MAINL.EXE, rather
than 📝 all of them, as in TH01 or
📝 too many of them, as in TH04 and TH05. As
for the rest of the per-difficulty structure though… well, it quickly
becomes clear why this was the final score file format to be RE'd. The name,
score, and stage fields are directly stored in terms of the internal
REGI*.BFT sprite IDs used on the high score screen. TH03 also
stores 10 score digits for each place rather than the 9 possible ones, keeps
any leading 0 digits, and stores the letters of entered names in reverse
order… yeah, let's decompile the high score screen as well, for a full
understanding of why ZUN might have done all that. (Answer: For no reason at
all. )
And wow, what a breath of fresh air. It's surely not
good-code: The overlapping shadows resulting from using
a 24-pixel letterspacing with 32-pixel glyphs in the name column led ZUN to
do quite a lot of unnecessary and slightly confusing rendering work when
moving the cursor back and forth, and he even forgot about the EGC there.
But it's nowhere close to the level of jank we saw in
📝 TH01's high score menu last year. Good to
see that ZUN had learned a thing or two by his third game – especially when
it comes to storing the character map cursor in terms of a character ID,
and improving the layout of the character map:
That's almost a nicely regular grid there. With the question mark and the
double-wide SP, BS, and END options, the cursor
movement code only comes with a reasonable two exceptions, which are easily
handled. And while I didn't get this screen completely decompiled,
one additional push was enough to cover all important code there.
The only potential glitch on this screen is a result of ZUN's continued use
of binary-coded
decimal digits without any bounds check or cap. Like the in-game HUD
score display in TH04 and TH05, TH03's high score screen simply uses the
next glyph in the character set for the most significant digit of any score
above 1,000,000,000 points – in this case, the period. Still, it only
really gets bad at 8,000,000,000 points: Once the glyphs are
exhausted, the blitting function ends up accessing garbage data and filling
the entire screen with garbage pixels. For comparison though, the current world record
is 133,650,710 points, so good luck getting 8 billion in the first
place.
Next up: Starting 2022 with the long-awaited decompilation of TH01's Sariel
fight! Due to the 📝 recent price increase,
we now got a window in the cap that
is going to remain open until tomorrow, providing an early opportunity to
set a new priority after Sariel is done.
Of course, Sariel's potentially bloated and copy-pasted code is blocked by
even more definitely bloated and copy-pasted code. It's TH01, what did you
expect?
But even then, TH01's item code is on a new level of software architecture
ridiculousness. First, ZUN uses distinct arrays for both types of items,
with their own caps of 4 for bomb items, and 10 for point items. Since that
obviously makes any type-related switch statement redundant,
he also used distinct functions for both types, with copy-pasted
boilerplate code. The main per-item update and render function is
shared though… and takes every single accessed member of the item
structure as its own reference parameter. Like, why, you have a
structure, right there?! That's one way to really practice the C++ language
concept of passing arbitrary structure fields by mutable reference…
To complete the unwarranted grand generic design of this function, it calls
back into per-type collision detection, drop, and collect functions with
another three reference parameters. Yeah, why use C++ virtual methods when
you can also implement the effectively same polymorphism functionality by
hand? Oh, and the coordinate clamping code in one of these callbacks could
only possibly have come from nested min() and
max() preprocessor macros. And that's how you extend such
dead-simple functionality to 1¼ pushes…
Amidst all this jank, we've at least got a sensible item↔player hitbox this
time, with 24 pixels around Reimu's center point to the left and right, and
extending from 24 pixels above Reimu down to the bottom of the playfield.
It absolutely didn't look like that from the initial naive decompilation
though. Changing entity coordinates from left/top to center was one of the
better lessons from TH01 that ZUN implemented in later games, it really
makes collision detection code much more intuitive to grasp.
The card flip code is where we find out some slightly more interesting
aspects about item drops in this game, and how they're controlled by a
hidden cycle variable:
At the beginning of every 5-stage scene, this variable is set to a
random value in the [0..59] range
Point items are dropped at every multiple of 10
Every card flip adds 1 to its value after this mod 10
check
At a value of 140, the point item is replaced with a bomb item, but only
if no damaging bomb is active. In any case, its value is then reset to
1.
Then again, score players largely ignore point items anyway, as card
combos simply have a much bigger effect on the score. With this, I should
have RE'd all information necessary to construct a tool-assisted score run,
though? Edit: Turns out that 1) point items are becoming
increasingly important in score runs, and 2) Pearl already did a TAS some
months ago. Thanks to
spaztron64 for the info!
The Orb↔card hitbox also makes perfect sense, with 24 pixels around
the center point of a card in every direction.
The rest of the code confirms the
card
flip score formula documented on Touhou Wiki, as well as the way cards
are flipped by bombs: During every of the 90 "damaging" frames of the
140-frame bomb animation, there is a 75% chance to flip the card at the
[bomb_frame % total_card_count_in_stage] array index. Since
stages can only have up to 50 cards
📝 thanks to a bug, even a 75% chance is high
enough to typically flip most cards during a bomb. Each of these flips
still only removes a single card HP, just like after a regular collision
with the Orb.
Also, why are the card score popups rendered before the cards
themselves? That's two needless frames of flicker during that 25-frame
animation. Not all too noticeable, but still.
And that's over 50% of REIIDEN.EXE decompiled as well! Next
up: More HUD update and rendering code… with a direct dependency on
rank pellet speed modifications?
…or maybe not that soon, as it would have only wasted time to
untangle the bullet update commits from the rest of the progress. So,
here's all the bullet spawning code in TH04 and TH05 instead. I hope
you're ready for this, there's a lot to talk about!
(For the sake of readability, "bullets" in this blog post refers to the
white 8×8 pellets
and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)
But first, what was going on📝 in 2020? Spent 4 pushes on the basic types
and constants back then, still ended up confusing a couple of things, and
even getting some wrong. Like how TH05's "bullet slowdown" flag actually
always prevents slowdown and fires bullets at a constant speed
instead. Or how "random spread" is not the
best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen,
which deserve different names:
Mechanic #1: Clearing bullets for a custom amount of
time, awarding 1000 points for all bullets alive on the first frame,
and 100 points for all bullets spawned during the clear time.
Mechanic #2: Zapping bullets for a fixed 16 frames,
awarding a semi-exponential and loudly announced Bonus!! for all
bullets alive on the first frame, and preventing new bullets from being
spawned during those 16 frames. In TH04 at least; thanks to a ZUN bug,
zapping got reduced to 1 frame and no animation in TH05…
Bullets are zapped at the end of most midboss and boss phases, and
cleared everywhere else – most notably, during bombs, when losing a
life, or as rewards for extends or a maximized Dream bonus. The
Bonus!! points awarded for zapping bullets are calculated iteratively,
so it's not trivial to give an exact formula for these. For a small number
𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛
points – or, using uth05win's (correct) recursive definition,
Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10.
However, one of the internal step variables is capped at a different number
of points for each difficulty (and game), after which the points only
increase linearly. Hence, "semi-exponential".
On to TH04's bullet spawn code then, because that one can at least be
decompiled. And immediately, we have to deal with a pointless distinction
between regular bullets, with either a decelerating or constant
velocity, and special bullets, with preset velocity changes during
their lifetime. That preset has to be set somewhere, so why have
separate functions? In TH04, this separation continues even down to the
lowest level of functions, where values are written into the global bullet
array. TH05 merges those two functions into one, but then goes too far and
uses self-modifying code to save a grand total of two local variables…
Luckily, the rest of its actual code is identical to TH04.
Most of the complexity in bullet spawning comes from the (thankfully
shared) helper function that calculates the velocities of the individual
bullets within a group. Both games handle each group type via a large
switch statement, which is where TH04 shows off another Turbo
C++ 4.0 optimization: If the range of case values is too
sparse to be meaningfully expressed in a jump table, it usually generates a
linear search through a second value table. But with the -G
command-line option, it instead generates branching code for a binary
search through the set of cases. 𝑂(log 𝑛) as the worst case for a
switch statement in a C++ compiler from 1994… that's so cool.
But still, why are the values in TH04's group type enum all
over the place to begin with?
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only
shows up here and in a few places in TH02, compared to at least 50
switch value tables.
In all of its micro-optimized pointlessness, TH05's undecompilable version
at least fixes some of TH04's redundancy. While it's still not even
optimal, it's at least a decently written piece of ASM…
if you take the time to understand what's going on there, because it
certainly took quite a bit of that to verify that all of the things which
looked like bugs or quirks were in fact correct. And that's how the code
for this function ended up with 35% comments and blank lines before I could
confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04,
where an invalid bullet group type would fill all remaining slots in the
bullet array with identical versions of the first bullet.
Something that both games also share in these functions is an over-reliance
on globals for return values or other local state. The most ridiculous
example here: Tuning the speed of a bullet based on rank actually mutates
the global bullet template… which ZUN then works around by adding a wrapper
function around both regular and special bullet spawning, which saves the
base speed before executing that function, and restores it afterward.
Add another set of wrappers to bypass that exact
tuning, and you've expanded your nice 1-function interface to 4 functions.
Oh, and did I mention that TH04 pointlessly duplicates the first set of
wrapper functions for 3 of the 4 difficulties, which can't even be
explained with "debugging reasons"? That's 10 functions then… and probably
explains why I've procrastinated this feature for so long.
At this point, I also finally stopped decompiling ZUN's original ASM just
for the sake of it. All these small TH05 functions would look horribly
unidiomatic, are identical to their decompiled TH04 counterparts anyway,
except for some unique constant… and, in the case of TH05's rank-based
speed tuning function, actually become undecompilable as soon as we
want to return a C++ class to preserve the semantic meaning of the return
value. Mainly, this is because Turbo C++ does not allow register
pseudo-variables like _AX or _AL to be cast into
class types, even if their size matches. Decompiling that function would
have therefore lowered the quality of the rest of the decompiled code, in
exchange for the additional maintenance and compile-time cost of another
translation unit. Not worth it – and for a TH05 port, you'd already have to
decompile all the rest of the bullet spawning code anyway!
The only thing in there that was still somewhat worth being
decompiled was the pre-spawn clipping and collision detection function. Due
to what's probably a micro-optimization mistake, the TH05 version continues
to spawn a bullet even if it was spawned on top of the player. This might
sound like it has a different effect on gameplay… until you realize that
the player got hit in this case and will either lose a life or deathbomb,
both of which will cause all on-screen bullets to be cleared anyway.
So it's at most a visual glitch.
But while we're at it, can we please stop talking about hitboxes? At least
in the context of TH04 and TH05 bullets. The actual collision detection is
described way better as a kill delta of 8×8 pixels between the
center points of the player and a bullet. You can distribute these pixels
to any combination of bullet and player "hitboxes" that make up 8×8. 4×4
around both the player and bullets? 1×1 for bullets, and 8×8 for the
player? All equally valid… or perhaps none of them, once you keep in mind
that other entity types might have different kill deltas. With that in
mind, the concept of a "hitbox" turns into just a confusing abstraction.
The same is true for the 36×44 graze box delta. For some reason,
this one is not exactly around the center of a bullet, but shifted to the
right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the
player, but only up to 16 pixels left of the player. uth05win also spotted
this… and rotated the deltas clockwise by 90°?!
Which brings us to the bullet updates… for which I still had to
research a decompilation workaround, because
📝 P0148 turned out to not help at all?
Instead, the solution was to lie to the compiler about the true segment
distance of the popup function and declare its signature far
rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function
call/return per frame, and those precious 2 bytes in the BSS segment
that he didn't have to spend on a segment value.
📝 Another function that didn't have just a
single declaration in a common header file… really,
📝 how were these games even built???
The function itself is among the longer ones in both games. It especially
stands out in the indentation department, with 7 levels at its most
indented point – and that's the minimum of what's possible without
goto. Only two more notable discoveries there:
Bullets are the only entity affected by Slow Mode. If the number of
bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04,
or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame
rate by 33%, by waiting for one additional VSync event every two frames.
The code also reveals a second tier, with 50% slowdown for a slightly
higher number of bullets, but that conditional branch can never be executed
Bullets must have been grazed in a previous frame before they can
be collided with. (Note how this does not apply to bullets that spawned
on top of the player, as explained earlier!)
Whew… When did ReC98 turn into a full-on code review?! 😅 And after all
this, we're still not done with TH04 and TH05 bullets, with all the
special movement types still missing. That should be less than one push
though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun
rewriting the Touhou Wiki Gameplay pages 😛
Three pushes to decompile the TH01 high score menu… because it's
completely terrible, and needlessly complicated in pretty much every
aspect:
Another, final set of differences between the REIIDEN.EXE
and FUUIN.EXE versions of the code. Which are so
insignificant that it must mean that ZUN kept this code in two
separate, manually and imperfectly synced files. The REIIDEN.EXE
version, only shown when game-overing, automatically jumps to the
enter/終 button after the 8th character was entered,
and also has a completely invisible timeout that force-enters a high score
name after 1000… key presses? Not frames? Why. Like, how do you
even realistically such a number. (Best guess: It's a hidden easter egg to
amuse players who place drinking glasses on cursor keys. Or beer bottles.)
That's all the differences that are maybe visible if you squint
hard enough. On top of that though, we got a bunch of further, minor code
organization differences that serve no purpose other than to waste
decompilation time, and certainly did their part in stretching this out to
3 pushes instead of 2.
Entered names are restricted to a set of 16-bit, full-width Shift-JIS
codepoints, yet are still accessed as 8-bit byte arrays everywhere. This
bloats both the C++ and generated ASM code with needless byte splits,
swaps, and bit shifts. Same for the route kanji. You have this 16-, heck,
even 32-bit CPU, why not use it?! (Fun fact: FUUIN.EXE is
explicitly compiled for a 80186, for the most part – unlike
REIIDEN.EXE, which does use Turbo C++'s 80386 mode.)
The sensible way of storing the current position of the alphabet
cursor would simply be two variables, indicating the logical row and
column inside the character map. When rendering, you'd then transform
these into screen space. This can keep the on-screen position constants in
a single place of code.
TH01 does the opposite: The selected character is stored directly in terms
of its on-screen position, which is then mapped back to a character
index for every processed input and the subsequent screen update. There's
no notion of a logical row or column anywhere, and consequently, the
position constants are vomited all over the code.
Which might not be as bad if the character map had a uniform
grid structure, with no gaps. But the one in TH01 looks like this:
And with no sense of abstraction anywhere, both input handling and
rendering end up with a separate if branch for at least 4 of
the 6 rows.
In the end, I just gave up with my usual redundancy reduction efforts for
this one. Anyone wanting to change TH01's high score name entering code
would be better off just rewriting the entire thing properly.
And that's all of the shared code in TH01! Both OP.EXE and
FUUIN.EXE are now only missing the actual main menu and
ending code, respectively. Next up, though: The long awaited TH01 PI push.
Which will not only deliver 100% PI for OP.EXE and
FUUIN.EXE, but also probably quite some gains in
REIIDEN.EXE. With now over 30% of the game decompiled, it's about
time we get to look at some gameplay code!
Final TH01 RE push for the time being, and as expected, we've got the
superficially final piece of shared code between the TH01 executables.
However, just having a single implementation for loading and recreating
the REYHI*.DAT score files would have been way above ZUN's
standards of consistency. So ZUN had the unique idea to mix up the file
I/O APIs, using master.lib functions in REIIDEN.EXE, and
POSIX functions (along with error messages and disabled interrupts) in
FUUIN.EXE… Could have been worse
though, as it was possible to abstract that away quite nicely.
That code wasn't quite in the natural way of decompilation either. As it
turns out though, 📝 segment splitting isn't
so painful after all if one of the new segments only has a few functions.
Definitely going to do that more often from now on, since it allows a much
larger number of functions to be immediately decompiled. Which is always
superior to somehow transforming a function's ASM into a form that I can
confidently call "reverse-engineered", only to revisit it again later for
its decompilation.
And while I unfortunately missed 25% of total RE by a bit, this push
reached two other and perhaps even more significant milestones:
After (finally) compressing all unknown parts of the BSS segments
using arrays, the number of remaining lines in the
REIIDEN.EXE ASM dump has fallen below TASM's limit of 65,535. Which
means that we no longer need that annoying th01_reiiden_2.inc
file that everyone has forgotten about at least once.
As noted in 📝 P0061, TH03 gameplay RE is
indeed going to progress very slowly in the beginning. A lot of the
initial progress won't even be reflected in the RE% – there are just so
many features in this game that are intertwined into each other, and I
only consider functions to be "reverse-engineered" once we understand
every involved piece of code and data, and labeled every absolute
memory reference in it. (Yes, that means that the percentages on the front
page are actually underselling ReC98's progress quite a bit, and reflect a
pretty low bound of our actual understanding of the games.)
So, when I get asked to look directly at gameplay code right now,
it's quite the struggle to find a place that can be covered within a push
or two and that would immediately benefit
scoreplayers. The basics of score and combo handling themselves
managed to fit in pretty well, though:
Just like TH04 and TH05, TH03 stores the current score as 8
binary-coded
decimal digits. Since the last constant 0 is not included, the maximum
score displayable without glitches therefore is 999,999,990 points, but
the game will happily store up to 24,699,999,990 points before the score
wraps back to 0.
There are (surprisingly?) only 6 places where the game actually
adds points to the score. Not quite sure about all of them yet, but they
(of course) include ending a combo, killing enemies, and the bonus at the
end of a round.
Combos can be continued for 80 frames after a 2-hit. The hit counter
can only be increased in the first 48, and effectively resets to 0 for the
last 32, when the Spell Point value starts blinking.
TH03 can track a total of 16 independent "hit combo sources" per
player, simultaneously. These are not related to the number of
actual explosions; rather, each explosion is assigned to one of the 16
slots when it spawns, and all consecutive explosions spawned from that one
will then add to the hit combo in that slot. The hit number displayed in
the top left is simply the largest one among all these.
Oh well, at least we still got a bit of PI% out of this one. From this
point though, the next push (or two) should be enough to cover the big
128-byte player structure – which by itself might not be immediately
interesting to scoreplayers, but surely is quite a blocker for everything
else.
Just like most of the time, it was more sensible to cover
GENSOU.SCR, the last structure missing in TH05's
OP.EXE,
everywhere it's used, rather than just rushing out OP.EXE
position independence. I did have to look into all of the functions to
fully RE it after all, and to find out whether the unused fields actually
are unused. The only thing that kept this push from yielding even
more above-average progress was the sheer inconsistency in how the games
implemented the operations on this PC-98 equivalent of score*.dat:
OP.EXE declares two structure instances, for simultaneous
access to both Reimu and Marisa scores. TH05 with its 4 playable
characters instead uses a single one, and overwrites it successively for
each character when drawing the high score menu – meaning, you'd only see
Yuuka's scores when looking at the structure inside the rendered high
score menu. However, it still declares the TH04 "Marisa" structure as a
leftover… and also decodes it and verifies its checksum, despite
nothing being ever loaded into it
MAIN.EXE uses a separate ASM implementation of the decoding
and encoding functions
TH05's MAIN.EXE also reimplements the basic loading
functions
in ASM – without the code to regenerate GENSOU.SCR with
default data if the file is missing or corrupted. That actually makes
sense, since any regeneration is already done in OP.EXE, which
always has to load that file anyway to check how much has been cleared
However, there is a regeneration function in TH05's
MAINE.EXE… which actually generates different default
data: OP.EXE consistently sets Extra Stage records to Stage 1,
while MAINE.EXE uses the same place-based stage numbering that
both versions use for the regular ranks
Technically though, TH05's OP.EXEis
position-independent now, and the rest are (should be?
) merely false positives. However, TH04's is
still missing another structure, in addition to its false
positives. So, let's wait with the big announcement until the next push…
which will also come with a demo video of what will be possible then.
The glacial pace continues, with TH05's unnecessarily, inappropriately
micro-optimized, and hence, un-decompilable code for rendering the current
and high score, as well as the enemy health / dream / power bars. While
the latter might still pass as well-written ASM, the former goes to such
ridiculous levels that it ends up being technically buggy. If you
enjoy quality ZUN code, it's
definitely worth a read.
In TH05, this all still is at the end of code segment #1, but in TH04,
the same code lies all over the same segment. And since I really
wanted to move that code into its final form now, I finally did the
research into decompiling from anywhere else in a segment.
Turns out we actually can! It's kinda annoying, though: After splitting
the segment after the function we want to decompile, we then need to group
the two new segments back together into one "virtual segment" matching the
original one. But since all ASM in ReC98 heavily relies on being
assembled in MASM mode, we then start to suffer from MASM's group
addressing quirk. Which then forces us to manually prefix every single
function call
from inside the group
to anywhere else within the newly created segment
with the group name. It's stupidly boring busywork, because of all the
function calls you mustn't prefix. Special tooling might make this
easier, but I don't have it, and I'm not getting crowdfunded for it.
So while you now definitely can request any specific thing in any
of the 5 games to be decompiled right now, it will take slightly
longer, and cost slightly more.
(Except for that one big segment in TH04, of course.)
Only one function away from the TH05 shot type control functions now!
What do you do if the TH06 text image feature for thcrap should have been done 3 days™ ago, but keeps getting more and more complex, and you have a ton of other pushes to deliver anyway? Get some distraction with some light ReC98 reverse-engineering work. This is where it becomes very obvious how much uth05win helps us with all the games, not just TH05.
5a5c347 is the most important one in there, this was the missing substructure that now makes every other sprite-like structure trivial to figure out.