Last blog post before the 100% completion of TH01! The final parts of
REIIDEN.EXE would feel rather out of place in a celebratory
blog post, after all. They provided quite a neat summary of the typical
technical details that are wrong with this game, and that I now get to
mention for one final time:
The Orb's animation cycle is maybe two frames shorter than it should
have been, showing its last sprite for just 1 frame rather than 3:
The text in the Pause and Continue menus is not quite correctly
centered.
The memory info screen hides quite a bit of information about the .PTN
buffers, and obscures even the info that it does show behind
misleading labels. The most vital information would have been that ZUN could
have easily saved 20% of the memory by using a structure without the
unneeded alpha plane… Oh, and the REWIRTE option
mapped to the ⬇️ down arrow key simply redraws the info screen. Might be
useful after a NODE CHEAK, which replaces the output
with its own, but stays within the same input loop.
But hey, there's an error message if you start REIIDEN.EXE
without a resident MDRV2 or a correctly prepared resident structure! And
even a good, user-friendly one, asking the user to launch the batch file
instead. For some reason, this convenience went out of fashion in the later
games.
The Game Over animation (how fitting) gives us TH01's final piece of weird
sprite blitting code, which seriously manages to include 2 bugs and 3 quirks
in under 50 lines of code. In test mode (game t or game
d), you can trigger this effect by pressing the ⬇️ down arrow key,
which certainly explains why I encountered seemingly random Game Over events
during all the tests I did with this game…
The animation appears to have changed quite a bit during development, to the
point that probably even ZUN himself didn't know what he wanted it to look
like in the end:
Finally, we get to the big main() function, serving as the duct
tape that holds this game together. It may read rather disorganized with all
the (actually necessary) assignments and function calls, but the only
actual minor issue I've seen there is that you're robbed of any
pellet destroy bonus collected on the final frame of the final boss. There
is a certain charm in directly nesting the infinite main gameplay loop
within the infinite per-life loop within the infinite stage loop. But come
on, why is there no fourth scene loop? Instead, the
game just starts a new REIIDEN.EXE process before and after a
boss fight. With all the wildly mutated global state, that was probably a
much saner choice.
The final secrets can be found in the debug stage selection. ZUN
implemented the prompts using the C standard library's scanf()
function, which is the natural choice for quick-and-dirty testing features
like this one. However, the C standard library is also complete and utter
trash, and so it's not surprising that both of the scanf()
calls do… well, probably not what ZUN intended. The guaranteed out-of-bounds
memory access in the select_flag route prompt thankfully has no
real effect on the game, but it gets really interesting with the 面数 stage prompt.
Back in 2020, I already wrote about
📝 stages 21-24, and how they're loaded from actual data that ZUN shipped with the game.
As it now turns out, the code that maps stage IDs to STAGE?.DAT
scene numbers contains an explicit branch that maps any (1-based) stage
number ≥21 to scene 7. Does this mean that an Extra Stage was indeed planned
at some point? That branch seems way too specific to just be meant as a
fallback. Maybe
Asprey was on to something after all…
However, since ZUN passed the stage ID as a signed integer to
scanf(), you can also enter negative numbers. The only place
that kind of accidentally checks for them is the aforementioned stage
ID → scene mapping, which ensures that (1-based) stages < 5 use
the shrine's background image and BGM. With no checks anywhere else, we get
a new set of "glitch stages":
The scene loading function takes the entered 0-based stage ID value modulo
5, so these 4 are the only ones that "exist", and lower stage numbers will
simply loop around to them. When loading these stages, the function accesses
the data in REIIDEN.EXE that lies before the statically
allocated 5-element stages-of-scene array, which happens to encompass
Borland C++'s locale and exception handling data, as well as a small bit of
ZUN's global variables. In particular, the obstacle/card HP on the tile I
highlighted in green corresponds to the
lowest byte of the 32-bit RNG seed. If it weren't for that and the fact that
the obstacles/card HP on the few tiles before are similarly controlled by
the x86 segment values of certain initialization function addresses, these
glitch stages would be completely deterministic across PC-98 systems, and
technically canon…
Stage -4 is the only playable one here as it's the only stage to end up
below the
📝 heap corruption limit of 102 stage objects.
Completing it loads Stage -3, which crashes with a Divide Error
just like it does if it's directly selected. Unsurprisingly, this happens
because all 50 card bytes at that memory location are 0, so one division (or
in this case, modulo operation) by the number of cards is enough to crash
the game.
Stage -5 is modulo'd to 0 and thus loads the first regular stage. The only
apparent broken element there is the timer, which is handled by a completely
different function that still operates with a (0-based) stage ID value of
-5. Completing the stage loads Stage -4, which also crashes, but only
because its 61 cards naturally cause the
📝 stack overflow in the flip-in animation for any stage with more than 50 cards.
And that's REIIDEN.EXE, the biggest and most bloated PC-98
Touhou executable, fully decompiled! Next up: Finishing this game with the
main menu, and hoping I'll actually pull it off within 24 hours. (If I do,
we might all have to thank 32th
System, who independently decompiled half of the remaining 14
functions…)
With Elis, we've not only reached the midway point in TH01's boss code, but
also a bunch of other milestones: Both REIIDEN.EXE and TH01 as
a whole have crossed the 75% RE mark, and overall position independence has
also finally cracked 80%!
And it got done in 4 pushes again? Yup, we're back to
📝 Konngara levels of redundancy and
copy-pasta. This time, it didn't even stop at the big copy-pasted code
blocks for the rift sprite and 256-pixel circle animations, with the words
"redundant" and "unnecessary" ending up a total of 18 times in my source
code comments.
But damn is this fight broken. As usual with TH01 bosses, let's start with a
high-level overview:
The Elis fight consists of 5 phases (excluding the entrance animation),
which must be completed in order.
In all odd-numbered phases, Elis uses a random one-shot danmaku pattern
from an exclusive per-phase pool before teleporting to a random
position.
There are 3 exclusive girl-form patterns per phase, plus 4
additional bat-form patterns in phase 5, for a total of 13.
Due to a quirk in the selection algorithm in phases 1 and 3, there
is a 25% chance of Elis skipping an attack cycle and just teleporting
again.
In contrast to Konngara, Elis can freely select the same pattern
multiple times in a row. There's nothing in the code to prevent that
from happening.
This pattern+teleport cycle is repeated until Elis' HP reach a certain
threshold value. The odd-numbered phases correspond to the white (phase 1),
red-white (phase 3), and red (phase 5) sections of the health bar. However,
the next phase can only start at the end of each cycle, after a
teleport.
Phase 2 simply teleports Elis back to her starting screen position of
(320, 144) and then advances to phase 3.
Phase 4 does the same as phase 2, but adds the initial bat form
transformation before advancing to phase 5.
Phase 5 replaces the teleport with a transformation to the bat form.
Rather than teleporting instantly to the target position, the bat gradually
flies there, firing a randomly selected looping pattern from the 4-pattern
bat pool on the way, before transforming back to the girl form.
This puts the earliest possible end of the fight at the first frame of phase
5. However, nothing prevents Elis' HP from reaching 0 before that point. You
can nicely see this in 📝 debug mode: Wait
until the HP bar has filled up to avoid heap corruption, hold ↵ Return
to reduce her HP to 0, and watch how Elis still goes through a total of
two patterns* and four
teleport animations before accepting defeat.
But wait, heap corruption? Yup, there's a bug in the HP bar that already
affected Konngara as well, and it isn't even just about the graphical
glitches generated by negative HP:
The initial fill-up animation is drawn to both VRAM pages at a rate of 1
HP per frame… by passing the current frame number as the
current_hp number.
The target_hp is indicated by simply passing the current
HP…
… which, however, can be reduced in debug mode at an equal rate of up to
1 HP per frame.
The completion condition only checks if
((target_hp - 1) == current_hp). With the
right timing, both numbers can therefore run past each other.
In that case, the function is repeatedly called on every frame, backing
up the original VRAM contents for the current HP point before blitting
it…
… until frame ((96 / 2) + 1), where the
.PTN slot pointer overflows the heap buffer and overwrites whatever comes
after. 📝 Sounds familiar, right?
Since Elis starts with 14 HP, which is an even number, this corruption is
trivial to cause: Simply hold ↵ Return from the beginning of the
fight, and the completion condition will never be true, as the
HP and frame numbers run past the off-by-one meeting point.
Edit (2023-07-21): Pressing ↵ Return to reduce HP
also works in test mode (game t). There, the game doesn't
even check the heap, and consequently won't report any corruption,
allowing the HP bar to be glitched even further.
Regular gameplay, however, entirely prevents this due to the fixed start
positions of Reimu and the Orb, the Orb's fixed initial trajectory, and the
50 frames of delay until a bomb deals damage to a boss. These aspects make
it impossible to hit Elis within the first 14 frames of phase 1, and ensure
that her HP bar is always filled up completely. So ultimately, this bug ends
up comparable in seriousness to the
📝 recursion / stack overflow bug in the memory info screen.
These wavy teleport animations point to a quite frustrating architectural
issue in this fight. It's not even the fact that unblitting the yellow star
sprites rips temporary holes into Elis' sprite; that's almost expected from
TH01 at this point. Instead, it's all because of this unused frame of the
animation:
With this sprite still being part of BOSS5.BOS, Girl-Elis has a
total of 9 animation frames, 1 more than the
📝 8 per-entity sprites allowed by ZUN's architecture.
The quick and easy solution would have been to simply bump the sprite array
size by 1, but… nah, this would have added another 20 bytes to all 6 of the
.BOS image slots. Instead, ZUN wrote the manual
position synchronization code I mentioned in that 2020 blog post.
Ironically, he then copy-pasted this snippet of code often enough that it
ended up taking up more than 120 bytes in the Elis fight alone – with, you
guessed it, some of those copies being redundant. Not to mention that just
going from 8 to 9 sprites would have allowed ZUN to go down from 6 .BOS
image slots to 3. That would have actually saved 420 bytes in
addition to the manual synchronization trouble. Looking forward to SinGyoku,
that's going to be fun again…
As for the fight itself, it doesn't take long until we reach its most janky
danmaku pattern, right in phase 1:
The "pellets along circle" pattern on Lunatic, in its original version
and with fanfiction fixes for everything that can potentially be
interpreted as a bug.
For whatever reason, the lower-right quarter of the circle isn't
animated? This animation works by only drawing the new dots added with every
subsequent animation frame, expressed as a tiny arc of a dotted circle. This
arc starts at the animation's current 8-bit angle and ends on the sum of
that angle and a hardcoded constant. In every other (copy-pasted, and
correct) instance of this animation, ZUN uses 0x02 as the
constant, but this one uses… 0.05 for the lower-right quarter?
As in, a 64-bit double constant that truncates to 0 when added
to an 8-bit integer, thus leading to the start and end angles being
identical and the game not drawing anything.
On Easy and Normal, the pattern then spawns 32 bullets along the outline
of the circle, no problem there. On Lunatic though, every one of these
bullets is instead turned into a narrow-angled 5-spread, resulting in 160
pellets… in a game with a pellet cap of 100.
Now, if Elis teleported herself to a position near the top of the playfield,
most of the capped pellets would have been clipped at that top edge anyway,
since the bullets are spawned in clockwise order starting at Elis' right
side with an angle of 0x00. On lower positions though, you can
definitely see a difference if the cap were high enough to allow all coded
pellets to actually be spawned.
The Hard version gets dangerously close to the cap by spawning a total of 96
pellets. Since this is the only pattern in phase 1 that fires pellets
though, you are guaranteed to see all of the unclipped ones.
The pellets also aren't spawned exactly on the telegraphed circle, but 4 pixels to the left.
Then again, it might very well be that all of this was intended, or, most
likely, just left in the game as a happy accident. The latter interpretation
would explain why ZUN didn't just delete the rendering calls for the
lower-right quarter of the circle, because seriously, how would you not spot
that? The phase 3 patterns continue with more minor graphical glitches that
aren't even worth talking about anymore.
And then Elis transforms into her bat form at the beginning of Phase 5,
which displays some rather unique hitboxes. The one against the Orb is fine,
but the one against player shots…
… uses the bat's X coordinate for both X and Y dimensions.
In regular gameplay, it's not too bad as most
of the bat patterns fire aimed pellets which typically don't allow you to
move below her sprite to begin with. But if you ever tried destroying these
pellets while standing near the middle of the playfield, now you know why
that didn't work. This video also nicely points out how the bat, like any
boss sprite, is only ever blitted at positions on the 8×1-pixel VRAM byte
grid, while collision detection uses the actual pixel position.
The bat form patterns are all relatively simple, with little variation
depending on the difficulty level, except for the "slow pellet spreads"
pattern. This one is almost easiest to dodge on Lunatic, where the 5-spreads
are not only always fired downwards, but also at the hardcoded narrow delta
angle, leaving plenty of room for the player to move out of the way:
The "slow pellet spreads" pattern of Elis' bat form, on every
difficulty. Which version do you think is the easiest one?
Finally, we've got another potential timesave in the girl form's "safety
circle" pattern:
After the circle spawned completely, you lose a life by moving outside it,
but doing that immediately advances the pattern past the circle part. This
part takes 200 frames, but the defeat animation only takes 82 frames, so
you can save up to 118 frames there.
Final funny tidbit: As with all dynamic entities, this circle is only
blitted to VRAM page 0 to allow easy unblitting. However, it's also kind of
static, and there needs to be some way to keep the Orb, the player shots,
and the pellets from ripping holes into it. So, ZUN just re-blits the circle
every… 4 frames?! 🤪 The same is true for the Star of David and its
surrounding circle, but there you at least get a flash animation to justify
it. All the overlap is actually quite a good reason for not even attempting
to 📝 mess with the hardware color palette instead.
Reproducing the crash was the whole challenge here. Even after moving Elis
and Reimu to the exact positions seen in Pearl's video and setting Elis' HP
to 0 on the exact same frame, everything ran fine for me. It's definitely no
division by 0 this time, the function perfectly guards against that
possibility. The line specified in the function's parameters is always
clipped to the VRAM region as well, so we can also rule out illegal memory
accesses here…
… or can we? Stepping through it all reminded me of how this function brings
unblitting sloppiness to the next level: For each VRAM byte touched, ZUN
actually unblits the 4 surrounding bytes, adding one byte to the left
and two bytes to the right, and using a single 32-bit read and write per
bitplane. So what happens if the function tries to unblit the topmost byte
of VRAM, covering the pixel positions from (0, 0) to (7, 0)
inclusive? The VRAM offset of 0x0000 is decremented to
0xFFFF to cover the one byte to the left, 4 bytes are written
to this address, the CPU's internal offset overflows… and as it turns out,
that is illegal even in Real Mode as of the 80286, and will raise a General Protection
Fault. Which is… ignored by DOSBox-X,
every Neko Project II version in common use, the CSCP
emulators, SL9821, and T98-Next. Only Anex86 accurately emulates the
behavior of real hardware here.
OK, but no laser fired by Elis ever reaches the top-left corner of the
screen. How can such a fault even happen in practice? That's where the
broken laser reset+unblit function comes in: Not only does it just flat out pass the wrong
parameters to the line unblitting function – describing the line
already traveled by the laser and stopping where the laser begins –
but it also passes them
wrongly, in the form of raw 32-bit fixed-point Q24.8 values, with no
conversion other than a truncation to the signed 16-bit pixels expected by
the function. What then follows is an attempt at interpolation and clipping
to find a line segment between those garbage coordinates that actually falls
within the boundaries of VRAM:
right/bottom correspond to a laser's origin position, and
left/top to the leftmost pixel of its moved-out top line. The
bug therefore only occurs with lasers that stopped growing and have started
moving.
Moreover, it will only happen if either (left % 256) or
(right % 256) is ≤ 127 and the other one of the two is ≥ 128.
The typecast to signed 16-bit integers then turns the former into a large
positive value and the latter into a large negative value, triggering the
function's clipping code.
The function then follows Bresenham's
algorithm: left is ensured to be smaller than right
by swapping the two values if necessary. If that happened, top
and bottom are also swapped, regardless of their value – the
algorithm does not care about their order.
The slope in the X dimension is calculated using an integer division of
((bottom - top) /
(right - left)). Both subtractions are done on signed
16-bit integers, and overflow accordingly.
(-left × slope_x) is added to top,
and left is set to 0.
If both top and bottom are < 0 or
≥ 640, there's nothing to be unblitted. Otherwise, the final
coordinates are clipped to the VRAM range of [(0, 0),
(639, 399)].
If the function got this far, the line to be unblitted is now very
likely to reach from
the top-left to the bottom-right corner, starting out at
(0, 0) right away, or
from the bottom-left corner to the top-right corner. In this case,
you'd expect unblitting to end at (639, 0), but thanks to an
off-by-one error,
it actually ends at (640, -1), which is equivalent to
(0, 0). Why add clipping to VRAM offset calculations when
everything else is clipped already, right?
Possible laser states that will cause the fault, with some debug
output to help understand the cause, and any pellets removed for better
readability. This can happen for all bosses that can potentially have
shootout lasers on screen when being defeated, so it also applies to Mima.
Fixing this is easier than understanding why it happens, but since y'all
love reading this stuff…
tl;dr: TH01 has a high chance of freezing at a boss defeat sequence if there
are diagonally moving lasers on screen, and if your PC-98 system
raises a General Protection Fault on a 4-byte write to offset
0xFFFF, and if you don't run a TSR with an INT
0Dh handler that might handle this fault differently.
The easiest fix option would be to just remove the attempted laser
unblitting entirely, but that would also have an impact on this game's…
distinctive visual glitches, in addition to touching a whole lot of
code bytes. If I ever get funded to work on a hypothetical TH01 Anniversary
Edition that completely rearchitects the game to fix all these glitches, it
would be appropriate there, but not for something that purports to be the
original game.
(Sidenote to further hype up this Anniversary Edition idea for PC-98
hardware owners: With the amount of performance left on the table at every
corner of this game, I'm pretty confident that we can get it to work
decently on PC-98 models with just an 80286 CPU.)
Since we're in critical infrastructure territory once again, I went for the
most conservative fix with the least impact on the binary: Simply changing
any VRAM offsets >= 0xFFFD to 0x0000 to avoid
the GPF, and leaving all other bugs in place. Sure, it's rather lazy and
"incorrect"; the function still unblits a 32-pixel block there, but adding a
special case for blitting 24 pixels would add way too much code. And
seriously, it's not like anything happens in the 8 pixels between
(24, 0) and (31, 0) inclusive during gameplay to begin with.
To balance out the additional per-row if() branch, I inlined
the VRAM page change I/O, saving two function calls and one memory write per
unblitted row.
That means it's time for a new community_choice_fixes
build, containing the new definitive bugfixed versions of these games:
2022-05-31-community-choice-fixes.zip
Check the th01_critical_fixes
branch for the modified TH01 code. It also contains a fix for the HP bar
heap corruption in test or debug mode – simply changing the ==
comparison to <= is enough to avoid it, and negative HP will
still create aesthetic glitch art.
Once again, I then was left with ½ of a push, which I finally filled with
some FUUIN.EXE code, specifically the verdict screen. The most
interesting part here is the player title calculation, which is quite
sneaky: There are only 6 skill levels, but three groups of
titles for each level, and the title you'll see is picked from a random
group. It looks like this is the first time anyone has documented the
calculation?
As for the levels, ZUN definitely didn't expect players to do particularly
well. With a 1cc being the standard goal for completing a Touhou game, it's
especially funny how TH01 expects you to continue a lot: The code has
branches for up to 21 continues, and the on-screen table explicitly leaves
room for 3 digits worth of continues per 5-stage scene. Heck, these
counts are even stored in 32-bit long variables.
Next up: 📝 Finally finishing the long
overdue Touhou Patch Center MediaWiki update work, while continuing with
Kikuri in the meantime. Originally I wasn't sure about what to do between
Elis and Seihou,
but with Ember2528's surprise
contribution last week, y'all have
demonstrated more than enough interest in the idea of getting TH01 done
sooner rather than later. And I agree – after all, we've got the 25th
anniversary of its first public release coming up on August 15, and I might
still manage to completely decompile this game by that point…
TH03 finally passed 20% RE, and the newly decompiled code contains no
serious ZUN bugs! What a nice way to end the year.
There's only a single unlockable feature in TH03: Chiyuri and Yumemi as
playable characters, unlocked after a 1CC on any difficulty. Just like the
Extra Stages in TH04 and TH05, YUME.NEM contains a single
designated variable for this unlocked feature, making it trivial to craft a
fully unlocked score file without recording any high scores that others
would have to compete against. So, we can now put together a complete set
for all PC-98 Touhou games: 2021-12-27-Fully-unlocked-clean-score-files.zip
It would have been cool to set the randomly generated encryption keys in
these files to a fixed value so that they cancel out and end up not actually
encrypting the file. Too bad that TH03 also started feeding each encrypted
byte back into its stream cipher, which makes this impossible.
The main loading and saving code turned out to be the second-cleanest
implementation of a score file format in PC-98 Touhou, just behind TH02.
Only two of the YUME.NEM functions come with nonsensical
differences between OP.EXE and MAINL.EXE, rather
than 📝 all of them, as in TH01 or
📝 too many of them, as in TH04 and TH05. As
for the rest of the per-difficulty structure though… well, it quickly
becomes clear why this was the final score file format to be RE'd. The name,
score, and stage fields are directly stored in terms of the internal
REGI*.BFT sprite IDs used on the high score screen. TH03 also
stores 10 score digits for each place rather than the 9 possible ones, keeps
any leading 0 digits, and stores the letters of entered names in reverse
order… yeah, let's decompile the high score screen as well, for a full
understanding of why ZUN might have done all that. (Answer: For no reason at
all. )
And wow, what a breath of fresh air. It's surely not
good-code: The overlapping shadows resulting from using
a 24-pixel letterspacing with 32-pixel glyphs in the name column led ZUN to
do quite a lot of unnecessary and slightly confusing rendering work when
moving the cursor back and forth, and he even forgot about the EGC there.
But it's nowhere close to the level of jank we saw in
📝 TH01's high score menu last year. Good to
see that ZUN had learned a thing or two by his third game – especially when
it comes to storing the character map cursor in terms of a character ID,
and improving the layout of the character map:
That's almost a nicely regular grid there. With the question mark and the
double-wide SP, BS, and END options, the cursor
movement code only comes with a reasonable two exceptions, which are easily
handled. And while I didn't get this screen completely decompiled,
one additional push was enough to cover all important code there.
The only potential glitch on this screen is a result of ZUN's continued use
of binary-coded
decimal digits without any bounds check or cap. Like the in-game HUD
score display in TH04 and TH05, TH03's high score screen simply uses the
next glyph in the character set for the most significant digit of any score
above 1,000,000,000 points – in this case, the period. Still, it only
really gets bad at 8,000,000,000 points: Once the glyphs are
exhausted, the blitting function ends up accessing garbage data and filling
the entire screen with garbage pixels. For comparison though, the current world record
is 133,650,710 points, so good luck getting 8 billion in the first
place.
Next up: Starting 2022 with the long-awaited decompilation of TH01's Sariel
fight! Due to the 📝 recent price increase,
we now got a window in the cap that
is going to remain open until tomorrow, providing an early opportunity to
set a new priority after Sariel is done.