🎉 After almost 3 years, TH04 finally caught up to TH05 and is now 100%
position-independent as well! 🎉
For a refresher on what this means and does not mean, check the
announcements from back in 2019 and 2020 when we chased the goal for TH05's
📝 OP.EXE and
📝 the rest of the game. These also feature
some demo videos that show off the kind of mods you were able to efficiently
code back then. With the occasional reverse-engineering attention it
received over the years, TH04's code should now be slightly easier to work
with than TH05's was back in the day. Although not by much – TH04 has
remained relatively unpopular among backers, and only received more than the
funded attention because it shares most of its core code with the more
popular TH05. Which, coincidentally, ended up becoming
📝 the reason for getting this done now.
Not that it matters a lot. Ever since we reached 100% PI for TH05, community
and backer interest in position independence has dropped to near zero. We
just didn't end up seeing the expected large amount of community-made mods
that PI was meant to facilitate, and even the
📝 100% decompilation of TH01 changed nothing
about that. But that's OK; after all, I do appreciate the business of
continually getting commissioned for all the
📝 large-scale mods. Not focusing on PI is
also the correct choice for everyone who likes reading these blog posts, as
it often means that I can't go that much into detail due to cutting corners
and piling up technical debt left and right.
Surprisingly, this only took 1.25 pushes, almost twice as fast as expected.
As that's closer to 1 push than it is to 2, I'm OK with releasing it like
this – especially since it was originally meant to come out three days ago.
🍋 Unfortunately, it was delayed thanks to surprising
website bugs and a certain piece of code that was way more difficult to
document than it was to decompile… The next push will have slightly less
content in exchange, though.
📝 P0240 and P0241 already covered the final
remaining structures, so I only needed to do some superficial RE to prove
the remaining numeric literals as either constants or memory addresses. For
example, I initially thought I'd have to decompile the dissolve animations
in the staff roll, but I only needed to identify a single function pointer
type to prove all false positives as screen coordinates there. Now, the TH04
staff roll would be another fast and cheap decompilation, similar to the
custom entity types of TH04. (And TH05 as well!)
The one piece of code I did have to decompile was Stage 4's carpet
lighting animation, thanks to hex literals that were way too complicated to
leave in ASM. And this one probably takes the crown for TH04's worst set of
landmines and bloat that still somehow results in no observable bugs or
quirks.
This animation starts at frame 1664, roughly 29.5 seconds into the stage,
and quickly turns the stage background into a repeated row of dark-red plaid
carpet tiles by moving out from the center of the playfield towards the
edges. Afterward, the animation repeats with a brighter set of tiles that is
then used for the rest of the stage. As I explained
📝 a while ago in the context of TH02, the
stage tile and map formats in PC-98 Touhou can't express animations, so all
of this needed to be hardcoded in the binary.
And ZUN did start out making the right decision by only using fully-lit
carpet tiles for all tile sections defined in ST03.MAP. This
way, the animation can simply disable itself after it completed, letting the
rest of the stage render normally and use new tile sections that are only
defined for the final light level. This means that the "initial" dark
version of the carpet is as much a result of hardcoded tile manipulation as
the animation itself.
But then, ZUN proceeded to implement it all by directly manipulating the
ring buffer of on-screen tiles. This is the lowest level before the tiles
are rendered, and rather detached from the defined content of the
📝 .MAP tile sections. Which leads to a whole
lot of problems:
If you decide to do this kind of tile ring modification, it should ideally
happen at a very specific point: after scrolling in new tiles into
the ring buffer, but before blitting any scrolled or invalidated
tiles to VRAM based on the ring buffer. Which is not where ZUN chose to put
it, as he placed the call to the stage-specific render function after both
of those operations. By the time the function is
called, the tile renderer has already blitted a few lines of the fully-lit
carpet tiles from the defined .MAP tile section, matching the scroll speed.
Fortunately, these are hidden behind the black TRAM cells above and below
the playfield…
Still, the code needs to get rid of them before they would become visible.
ZUN uses the regular tile invalidation function for this, which will only
cause actual redraws on the next frame. Again, the tile rendering call has
already happened by the time the Stage 4-specific rendering function gets
called.
But wait, this game also flips VRAM pages between frames to provide a
tear-free gameplay experience. This means that the intended redraw of the
new tiles actually hits the wrong VRAM page.
And sure, the code does attempt to invalidate these newly blitted lines
every frame – but only relative to the current VRAM Y coordinate that
represents the top of the hardware-scrolled screen. Once we're back on the
original VRAM page on the next frame, the lines we initially set out to
remove could have already scrolled past that point, making it impossible to
ever catch up with them in this way.
The only real "solution": Defining the height of the tile invalidation
rectangle at 3× the scroll speed, which ensures that each invalidation call
covers 3 frames worth of newly scrolled-in lines. This is not intuitive at
all, and requires an understanding of everything I have just written to even
arrive at this conclusion. Needless to say that ZUN didn't comprehend it
either, and just hardcoded an invalidation height that happened to be enough
for the small scroll speeds defined in ST03.STD for the first
30 seconds of the stage.
The effect must consistently modify the tile ring buffer to "fix" any new
tiles, overriding them with the intended light level. During the animation,
the code not only needs to set the old light level for any tiles that are
still waiting to be replaced, but also the new light level for any tiles
that were replaced – and ZUN forgot the second part. As a result, newly scrolled-in tiles within the already animated
area will "remain" untouched at light level 2 if the scroll speed is fast
enough during the transition from light level 0 to 1.
All that means that we only have to raise the scroll speed for the effect to
fall apart. Let's try, say, 4 pixels per frame rather than the original
0.25:
All of this could have been so much simpler and actually stable if ZUN
applied the tile changes directly onto the .MAP. This is a much more
intuitive way of expressing what is supposed to happen to the map, and would
have reduced the code to the actually necessary tile changes for the first
frame and each individual frame of the animation. It would have still
required a way to force these changes into the tile ring buffer, but ZUN
could have just used his existing full-playfield redraw functions for that.
In any case, there would have been no need for any per-frame tile
fixing and redrawing. The CPU cycles saved this way could have then maybe
been put towards writing the tile-replacing part of the animation in C++
rather than ASM…
Wow, that was an unreasonable amount of research into a feature that
superficially works fine, just because its decompiled code didn't make
sense. To end on a more positive note, here are
some minor new discoveries that might actually matter to someone:
The laser part of Marisa's Illusion Laser shot type always does 3
points of damage per frame, regardless of the player's power level. Its
hitbox also remains identical on all power levels, no matter how wide the
laser appears on screen. The strength difference between the levels purely
comes from the number of frames the laser stays active before a fixed
non-damaging 32-frame cooldown time:
Power level
Frames per cycle (including 32-frame cooldown)
2
64
3
72
4
88
5
104
6
128
7
144
8
168
9
192
The decay animation for player shots is faster in TH05 (12 frames) than in
TH04 (16 frames).
In the first phase of her Stage 6 fight, Yuuka moves along one of two
randomly chosen hardcoded paths, defined as a set of 5 movement angles.
After reaching the final point and firing a danmaku pattern, she teleports
back to her initial position to repeat the path one more time before the
phase times out.
Similarly, TH04's Stage 3 midboss also goes through 12 fixed movement angles
before flying off the playfield.
The formulas for calculating the skill rating on both TH04's and TH05's
final verdict screen are going to be very long and complicated.
Next up: ¾ of a push filled with random boilerplate, finalization, and TH01
code cleanup work, while I finish the preparations for Shuusou Gyoku's
OpenGL backend. This month, everything should finally work out as intended:
I'll complete both tasks in parallel, ship the former to free up the cap,
and then ship the latter once its 5th push is fully funded.
It only took a record-breaking 1½ pushes to get SinGyoku done!
No 📝 entity synchronization code after
all! Since all of SinGyoku's sprites are 96×96 pixels, ZUN made the rather
smart decision of just using the sphere entity's position to render the
📝 flash and person entities – and their only
appearance is encapsulated in a single sphere→person→sphere transformation
function.
Just like Kikuri, SinGyoku's code as a whole is not a complete
disaster.
The negative:
It's still exactly as buggy as Kikuri, with both of the ZUN bugs being
rendering glitches in a single function once again.
It also happens to come with a weird hitbox, …
… and some minor questionable and weird pieces of code.
The overview:
SinGyoku's fight consists of 2 phases, with the first one corresponding
to the white part from 8 to 6 HP, and the second one to the rest of the HP
bar. The distinction between the red-white and red parts is purely visual,
and doesn't reflect anything about the boss script.
Both phases cycle between a pellet pattern and SinGyoku's sphere form
slamming itself into the player, followed by it slightly overshooting its
intended base Y position on its way back up.
Phase 1 only consists of the sphere form's half-circle spray pattern.
Technically, the phase can only end during that pattern, but adding
that one additional condition to allow it to end during the slam+return
"pattern" wouldn't have made a difference anyway. The code doesn't rule out
negative HP during the slam (have fun in test or debug mode), but the sum of
invincibility frames alone makes it impossible to hit SinGyoku 7 times
during a single slam in regular gameplay.
Phase 2 features two patterns for both the female and male forms
respectively, which are selected randomly.
This time, we're back to the Orb hitbox being a logical 49×49 pixels in
SinGyoku's center, and the shot hitbox being the weird one. What happens if
you want the shot hitbox to be both offset to the left a bit
and stretch the entire width of SinGyoku's sprite? You get a hitbox
that ends in mid-air, far away from the right edge of the sprite:
Since the female and male forms also use the sphere entity's coordinates,
they share the same hitbox.
Onto the rendering glitches then, which can – you guessed it – all be found
in the sphere form's slam movement:
ZUN unblits the delta area between the sphere's previous and current
position on every frame, but reblits the sphere itself on… only every second
frame?
For negative X velocities, ZUN made a typo and subtracted the Y velocity
from the right edge of the area to be unblitted, rather than adding the X
velocity. On a cursory look, this shouldn't affect the game all too
much due to the unblitting function's word alignment. Except when it does:
If the Y velocity is much smaller than the X one, the left edge of the
unblitted area can, on certain frames, easily align to a word address past
the previous right edge of the sphere. As a result, not a single sphere
pixel will actually be unblitted, and a small stripe of the sphere will be
left in VRAM for one frame, until the alignment has caught up with the
sphere's movement in the next one.
Due to the low contrast of the sphere against the background, you typically
don't notice these glitches, but the white invincibility flashing after a
hit really does draw attention to them. This time, all of these glitches
aren't even directly caused by ZUN having never learned about the
EGC's bit length register – if he just wrote correct code for SinGyoku, none
of this would have been an issue. Sigh… I wonder how many more glitches will
be caused by improper use of this one function in the last 18% of
REIIDEN.EXE.
There's even another bug here, with ZUN hardcoding a horizontal delta of 8
pixels rather than just passing the actual X velocity. Luckily, the maximum
movement speed is 6 pixels on Lunatic, and this would have only turned into
an additional observable glitch if the X velocity were to exceed 24 pixels.
But that just means it's the kind of bug that still drains RE attention to
prove that you can't actually observe it in-game under some
circumstances.
The 5 pellet patterns are all pretty straightforward, with nothing to talk
about. The code architecture during phase 2 does hint towards ZUN having had
more creative patterns in mind – especially for the male form, which uses
the transformation function's three pattern callback slots for three
repetitions of the same pellet group.
There is one more oddity to be found at the very end of the fight:
Right before the defeat white-out animation, the sphere form is explicitly
reblitted for no reason, on top of the form that was blitted to VRAM in the
previous frame, and regardless of which form is currently active. If
SinGyoku was meant to immediately transform back to the sphere form before
being defeated, why isn't the person form unblitted before then? Therefore,
the visibility of both forms is undeniably canon, and there is some
lore meaning to be found here…
In any case, that's SinGyoku done! 6th PC-98 Touhou boss fully
decompiled, 25 remaining.
No FUUIN.EXE code rounding out the last push for a change, as
the 📝 remaining missile code has been
waiting in front of SinGyoku for a while. It already looked bad in November,
but the angle-based sprite selection function definitely takes the cake when
it comes to unnecessary and decadent floating-point abuse in this game.
The algorithm itself is very trivial: Even with
📝 .PTN requiring an additional quarter parameter to access 16×16 sprites,
it's essentially just one bit shift, one addition, and one binary
AND. For whatever reason though, ZUN casts the 8-bit missile
angle into a 64-bit double, which turns the following explicit
comparisons (!) against all possible 4 + 16 boundary angles (!!)
into FPU operations. Even with naive and readable
division and modulo operations, and the whole existence of this function not
playing well with Turbo C++ 4.0J's terrible code generation at all, this
could have been 3 lines of code and 35 un-inlined constant-time
instructions. Instead, we've got this 207-instruction monster… but hey, at
least it works. 🤷
The remaining time then went to YuugenMagan's initialization code, which
allowed me to immediately remove more declarations from ASM land, but more
on that once we get to the rest of that boss fight.
That leaves 76 functions until we're done with TH01! Next up: Card-flipping
stage obstacles.
Slight change of plans, because we got instructions for
reliably reproducing the TH04 Kurumi Divide Error crash! Major thanks to
Colin Douglas Howell. With those, it also made sense to immediately look at
the crash in the Stage 4 Marisa fight as well. This way, I could release
both of the obligatory bugfix mods at the same time.
Especially since it turned out that I was wrong: Both crashes are entirely
unrelated to the custom entity structure that would have required PI-centric
progress. They are completely specific to Kurumi's and Marisa's
danmaku-pattern code, and really are two separate bugs
with no connection to each other. All of the necessary research nicely fit
into Arandui's 0.5 pushes, with no further deep understanding
required here.
But why were there still three weeks between Colin's message and this blog
post? DMCA distractions aside: There are no easy fixes this time, unlike
📝 back when I looked at the Stage 5 Yuuka crash.
Just like how division by zero is undefined in mathematics, it's also,
literally, undefined what should happen instead of these two
Divide error crashes. This means that any possible "fix" can
only ever be a fanfiction interpretation of the intentions behind ZUN's
code. The gameplay community should be aware of this, and
might decide to handle these cases differently. And if we
have to go into fanfiction territory to work around crashes in the
canon games, we'd better document what exactly we're fixing here and how, as
comprehensible as possible.
With that out of the way, let's look at Kurumi's crash first, since it's way
easier to grasp. This one is known to primarily happen to new players, and
it's easy to see why:
In one of the patterns in her third phase, Kurumi fires a series of 3
aimed rings from both edges of the playfield. By default (that is, on Normal
and with regular rank), these are 6-way rings.
6 happens to be quite a peculiar number here, due to how rings are
(manually) tuned based on the current "rank" value (playperf)
before being fired. The code, abbreviated for clarity:
Let's look at the range of possible playperf values per
difficulty level:
Easy
Normal
Hard
Lunatic
Extra
playperf_min
4
11
20
22
16
playperf_max
16
24
32
34
20
Edit (2022-05-24): This blog post initially had
26 instead of 16 for playperf_min for the Extra Stage. Thanks
to Popfan for pointing out that typo!
Reducing rank to its minimum on Easy mode will therefore result in a
0-ring after tuning.
To calculate the individual angles of each bullet in a ring, ZUN divides
360° (or, more correctly,
📝 0x100) by the total number of
bullets…
Boom, division by zero.
So, what should the workaround look like? Obviously, we want to modify
neither the default number of ring bullets nor the tuning algorithm – that
would change all other non-crashing variations of this pattern on other
difficulties and ranks, creating a fork of the original gameplay. Instead, I
came up with four possible workarounds that all seemed somewhat logical to
me:
Firing no bullet, i.e., interpreting 0-ring literally. This would
create the only constellation in which a call to the bullet group spawn
functions would not spawn at least one new bullet.
Firing a "1-ring", i.e., a single bullet. This would be consistent with
how the bullet spawn functions behave for "0-way" stack and spread
groups.
Firing a "∞-ring", i.e., 200 bullets, which is as much as the game's cap
on 16×16 bullets would allow. This would poke fun at the whole "division by
zero" idea… but given that we're still talking about Easy Mode (and
especially new players) here, it might be a tad too cruel. Certainly the
most trollish interpretation.
Triggering an immediate Game Over, exchanging the hard crash for a
softer and more controlled shutdown. Certainly the option that would be
closest to the behavior of the original games, and perhaps the only one to
be accepted in Serious, High-Level Play™.
As I was writing this post, it felt increasingly wrong for me to make this
decision. So I once again went to Twitter, where 56.3%
voted in favor of the 1-bullet option. Good that I asked! I myself was
more leaning towards the 0-bullet interpretation, which only got 28.7% of
the vote. Also interesting are the 2.3% in favor of the Game Over option but
I get it, low-rank Easy Mode isn't exactly the most competitive mode of
playing TH04.
There are reports of Kurumi crashing on higher difficulties as well, but I
could verify none of them. If they aren't fixed by this workaround, they're
caused by an entirely different bug that we have yet to discover.
Onto the Stage 4 Marisa crash then, which does in fact apply to all
difficulty levels. I was also wrong on this one – it's a hell of a lot more
intricate than being just a division by the number of on-screen bits.
Without having decompiled the entire fight, I can't give a completely
accurate picture of what happens there yet, but here's the rough idea:
Marisa uses different patterns, depending on whether at least one of her
bits is still alive, or all of them have been destroyed.
Destroying the last bit will immediately switch to the bit-less
counterpart of the current pattern.
The bits won't respawn before the pattern ended, which ensures that the
bit-less version is always shown in its entirety after being started or
switched into.
In two of the bit-less patterns, Marisa gradually moves to the point
reflection of her position at the start of the pattern across the playfield
coordinate of (192, 112), or (224, 128) on screen.
The velocity of this movement is determined by both her distance to that
point and the total amount of frames that this instance of the bit-less
pattern will last.
Since this frame amount is directly tied to the frame the player
destroyed the last bit on, it becomes a user-controlled variable. I think
you can see where this is going…
The last 12 frames of this duration, however, are always reserved for a
"braking phase", where Marisa's velocity is halved on each frame.
This part of the code only runs every 4 frames though. This expands the
time window for this crash to 4 frames, rather than just the two frames you
would expect from looking at the division itself.
Both of the broken patterns run for a maximum of 160 frames. Therefore,
the crash will occur when Marisa's last bit is destroyed between frame 152
and 155 inclusive. On these frames, the
last_frame_with_bits_alive variable is set to 148, which is the
crucial 12 duration frames away from the maximum of 160.
Interestingly enough, the calculated velocity is also only
applied every 4 frames, with Marisa actually staying still for the 3 frames
inbetween. As a result, she either moves
too slowly to ever actually reach the yellow point if the last bit
was destroyed early in the pattern (see destruction frames 68 or
112),
or way too quickly, and almost in a jerky, teleporting way (see
destruction frames 144 or 148).
Finally, as you may have already gathered from the formula: Destroying
the last bit between frame 156 and 160 inclusive results in
duration values of 8 or 4. These actually push Marisa
away from the intended point, as the divisor becomes negative.
tl;dr: "Game crashes if last bit destroyed within 4-frame window near end of
two patterns". For an informed decision on a new movement behavior for these
last 8 frames, we definitely need to know all the details behind the crash
though. Here's what I would interpret into the code:
Not moving at all, i.e., interpreting 0 as the middle ground between
positive and negative movement. This would also make sense because a
12-frame duration implies 100% of the movement to consist of
the braking phase – and Marisa wasn't moving before, after all.
Move at maximum speed, i.e., dividing by 1 rather than 0. Since the
movement duration is still 12 in this case, Marisa will immediately start
braking. In total, she will move exactly ¾ of the way from her initial
position to (192, 112) within the 8 frames before the pattern
ends.
Directly warping to (192, 112) on frame 0, and to the
point-reflected target on 4, respectively. This "emulates" the division by
zero by moving Marisa at infinite speed to the exact two points indicated by
the velocity formula. It also fits nicely into the 8 frames we have to fill
here. Sure, Marisa can't reach these points at any other duration, but why
shouldn't she be able to, with infinite speed? Then again, if Marisa
is far away enough from (192, 112), this workaround would warp her
across the entire playfield. Can Marisa teleport according to lore? I
have no idea…
Triggering an immediate Game O– hell no, this is the Stage 4 boss,
people already hate losing runs to this bug!
Asking Twitter worked great for the Kurumi workaround, so let's do it again!
Gotta attach a screenshot of an earlier draft of this blog post though,
since this stuff is impossible to explain in tweets…
…and it went
through the roof, becoming the most successful ReC98 tweet so far?!
Apparently, y'all really like to just look at descriptions of overly complex
bugs that I'd consider way beyond the typical attention span that can be
expected from Twitter. Unfortunately, all those tweet impressions didn't
quite translate into poll turnout. The results
were pretty evenly split between 1) and 2), with option 1) just coming out
slightly ahead at 49.1%, compared to 41.5% of option 2).
(And yes, I only noticed after creating the poll that warping to both the
green and yellow points made more sense than warping to just one of the two.
Let's hope that this additional variant wouldn't have shifted the results
too much. Both warp options only got 9.4% of the vote after all, and no one
else came up with the idea either. In the end,
you can always merge together your preferred combination of workarounds from
the Git branches linked below.)
So here you go: The new definitive version of TH04, containing not only the
community-chosen Kurumi and Stage 4 Marisa workaround variant, but also the
📝 No-EMS bugfix from last year.
Edit (2022-05-31): This package is outdated, 📝 the current version is here!2022-04-18-community-choice-fixes.zip
Oh, and let's also add spaztron64's TH03 GDC clock fix
from 2019 because why not. This binary was built from the community_choice_fixes
branch, and you can find the code for all the individual workarounds on
these branches:
Again, because it can't be stated often enough: These fixes are
fanfiction. The gameplay community should be aware of
this, and might decide to handle these cases differently.
With all of that taking way more time to evaluate and document, this
research really had to become part of a proper push, instead of just being
covered in the quick non-push blog post I initially intended. With ½ of a
push left at the end, TH05's Stage 1-5 boss background rendering functions
fit in perfectly there. If you wonder how these static backdrop images even
need any boss-specific code to begin with, you're right – it's basically the
same function copy-pasted 4 times, differing only in the backdrop image
coordinates and some other inconsequential details.
Only Sara receives a nice variation of the typical
📝 blocky entrance animation: The usually
opaque bitmap data from ST00.BB is instead used as a transition
mask from stage tiles to the backdrop image, by making clever use of the
tile invalidation system:
TH04 uses the same effect a bit more frequently, for its first three bosses.
Next up: Shinki, for real this time! I've already managed to decompile 10 of
her 11 danmaku patterns within a little more than one push – and yes,
that one is included in there. Looks like I've slightly
overestimated the amount of work required for TH04's and TH05's bosses…
OK, TH01 missile bullets. Can we maybe have a well-behaved entity type,
without any weirdness? Just once?
Ehh, kinda. Apart from another 150 bytes wasted on unused structure members,
this code is indeed more on the low end in terms of overall jank. It does
become very obvious why dodging these missiles in the YuugenMagan, Mima, and
Elis fights feels so awful though: An unfair 46×46 pixel hitbox around
Reimu's center pixel, combined with the comeback of
📝 interlaced rendering, this time in every
stage. ZUN probably did this because missiles are the only 16×16 sprite in
TH01 that is blitted to unaligned X positions, which effectively ends up
touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would
have been totally possible to render every missile in every frame at roughly
the same amount of CPU time that the original game uses for interlaced
rendering:
Note that all missile sprites only use two colors, white and green.
Instead of naively going with the usual four bitplanes, extract the
pixels drawn in each of the two used colors into their own bitplanes.
master.lib calls this the "tiny format".
Use the GRCG to draw these two bitplanes in the intended white and green
colors, halving the amount of VRAM writes compared to the original
function.
(Not using the .PTN format would have also avoided the inconsistency of
storing the missile sprites in boss-specific sprite slots.)
That's an optimization that would have significantly benefitted the game, in
contrast to all of the fake ones
introduced in later games. Then again, this optimization is
actually something that the later games do, and it might have in fact been
necessary to achieve their higher bullet counts without significant
slowdown.
After some effectively unused Mima sprite effect code that is so broken that
it's impossible to make sense out of it, we get to the final feature I
wanted to cover for all bosses in parallel before returning to Sariel: The
separate sprite background storage for moving or animated boss sprites in
the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin
with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from
main memory on every frame. After all, TH01 and TH02 had a minimum required
clock speed of 33 MHz, half of the speed required for the later three games.
So, he simply blitted these boss sprites to both VRAM pages, leading
the usual unblitting calls to only remove the other sprites on top of the
boss. However, these bosses themselves want to move across the screen…
and this makes it necessary to save the stage background behind them
in some other way.
Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM
into a sprite slot. No problem with that approach in theory, as the size of
all these bigger sprites is a multiple of 32×32; splitting a larger sprite
into these smaller 32×32 chunks makes the code look just a little bit clumsy
(and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot
that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is
blitted on top of her regular sprite, using just regular sprite
transparency, she ends up with her infamous third arm:
Ironically, there's an unused code path in Mima's unblit function where ZUN
assumes a height of 48 pixels for Mima's animation sprites rather than the
actual 64. This leads to even clumsier .PTN function calls for the bottom
128×16 pixels… Failing to unblit the bottom 16 pixels would have also
yielded that third arm, although it wouldn't have looked as natural. Still
wouldn't say that it was intentional; maybe this casting sprite was just
added pretty late in the game's development?
So, mission accomplished, Sariel unblocked… at 2¼ pushes. That's quite some time left for some smaller stage initialization
code, which bundles a bunch of random function calls in places where they
logically really don't belong. The stage opening animation then adds a bunch
of VRAM inter-page copies that are not only redundant but can't even be
understood without knowing the hidden internal state of the last VRAM page
accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any
complexity limit on inlining arithmetic expressions, as long as they only
operate on compile-time constants. That's how we get macro-free,
compile-time Shift-JIS to JIS X 0208 conversion of the individual code
points in the 東方★靈異伝 string, in a compiler from 1994. As long as you
don't store any intermediate results in variables, that is…
But wait, there's more! With still ¼ of a push left, I also went for the
boss defeat animation, which includes the route selection after the SinGyoku
fight.
As in all other instances, the 2× scaled font is accomplished by first
rendering the text at regular 1× resolution to the other, invisible VRAM
page, and then scaled from there to the visible one. However, the route
selection is unique in that its scaled text is both drawn transparently on
top of the stage background (not onto a black one), and can also change
colors depending on the selection. It would have been no problem to unblit
and reblit the text by rendering the 1× version to a position on the
invisible VRAM page that isn't covered by the 2× version on the visible one,
but ZUN (needlessly) clears the invisible page before rendering any text.
Instead, he assigned a separate VRAM color for both
the 魔界 and 地獄 options, and only changed the palette value for
these colors to white or gray, depending on the correct selection. This is
another one of the
📝 rare cases where TH01 demonstrates good use of PC-98 hardware,
as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.
Then, why does this still not count as good-code? When
changing palette colors, you kinda need to be aware of everything
else that can possibly be on screen, which colors are used there, and which
aren't and can therefore be used for such an effect without affecting other
sprites. In this case, well… hover over the image below, and notice how
Reimu's hair and the bomb sprites in the HUD light up when Makai is
selected:
This push did end on a high note though, with the generic, non-SinGyoku
version of the defeat animation being an easily parametrizable copy. And
that's how you decompile another 2.58% of TH01 in just slightly over three
pushes.
Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and
SinGyoku without needing any more detours into non-boss code. Thanks to the
current TH01 funding subscriptions, I can plan to cover most, if not all, of
Sariel in a single push series, but the currently 3 pending pushes probably
won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got
quite a lot of not specifically TH01-related funds in the backlog to pass
the time though.
Due to recent developments, it actually makes quite a lot of sense to take a
break from TH01: spaztron64 has
managed what every Touhou download site so far has failed to do: Bundling
all 5 game onto a single .HDI together with pre-configured PC-98
emulators and a nice boot menu, and hosting the resulting package on a
proper website. While this first release is already quite good (and much
better than my attempt from 2014), there is still a bit of room for
improvement to be gained from specific ReC98 research. Next up,
therefore:
Researching how TH04 and TH05 use EMS memory, together with the cause
behind TH04's crash in Stage 5 when playing as Reimu without an EMS driver
loaded, and
reverse-engineering TH03's score data file format
(YUME.NEM), which hopefully also comes with a way of building a
file that unlocks all characters without any high scores.
No technical obstacles for once! Just pure overcomplicated ZUN code. Unlike
📝 Konngara's main function, the main TH01
player function was every bit as difficult to decompile as you would expect
from its size.
With TH01 using both separate left- and right-facing sprites for all of
Reimu's moves and separate classes for Reimu's 32×32 and 48×*
sprites, we're already off to a bad start. Sure, sprite mirroring is
minimally more involved on PC-98, as the planar
nature of VRAM requires the bits within an 8-pixel byte to also be
mirrored, in addition to writing the sprite bytes from right to left. TH03
uses a 256-byte lookup table for this, generated at runtime by an infamous
micro-optimized and undecompilable ASM algorithm. With TH01's existing
architecture, ZUN would have then needed to write 3 additional blitting
functions. But instead, he chose to waste a total of 26,112 bytes of memory
on pre-mirrored sprites…
Alright, but surely selecting those sprites from code is no big deal? Just
store the direction Reimu is facing in, and then add some branches to the
rendering code. And there is in fact a variable for Reimu's direction…
during regular arrow-key movement, and another one while shooting and
sliding, and a third as part of the special attack types,
launched out of a slide.
Well, OK, technically, the last two are the same variable. But that's even
worse, because it means that ZUN stores two distinct enums at
the same place in memory: Shooting and sliding uses 1 for left,
2 for right, and 3 for the "invalid" direction of
holding both, while the special attack types indicate the direction in their
lowest bit, with 0 for right and 1 for left. I
decompiled the latter as bitflags, but in ZUN's code, each of the 8
permutations is handled as a distinct type, with copy-pasted and adapted
code… The interpretation of this
two-enum "sub-mode" union variable is controlled
by yet another "mode" variable… and unsurprisingly, two of the bugs in this
function relate to the sub-mode variable being interpreted incorrectly.
Also, "rendering code"? This one big function basically consists of separate
unblit→update→render code snippets for every state and direction Reimu can
be in (moving, shooting, swinging, sliding, special-attacking, and bombing),
pasted together into a tangled mess of nested if(…) statements.
While a lot of the code is copy-pasted, there are still a number of
inconsistencies that defeat the point of my usual refactoring treatment.
After all, with a total of 85 conditional branches, anything more than I did
would have just obscured the control flow too badly, making it even harder
to understand what's going on.
In the end, I spotted a total of 8 bugs in this function, all of which leave
Reimu invisible for one or more frames:
2 frames after all special attacks
2 frames after swing attacks, and
4 frames before swing attacks
Thanks to the last one, Reimu's first swing animation frame is never
actually rendered. So whenever someone complains about TH01 sprite
flickering on an emulator: That emulator is accurate, it's the game that's
poorly written.
And guess what, this function doesn't even contain everything you'd
associate with per-frame player behavior. While it does
handle Yin-Yang Orb repulsion as part of slides and special attacks, it does
not handle the actual player/Orb collision that results in lives being lost.
The funny thing about this: These two things are done in the same function…
Therefore, the life loss animation is also part of another function. This is
where we find the final glitch in this 3-push series: Before the 16-frame
shake, this function only unblits a 32×32 area around Reimu's center point,
even though it's possible to lose a life during the non-deflecting part of a
48×48-pixel animation. In that case, the extra pixels will just stay on
screen during the shake. They are unblitted afterwards though, which
suggests that ZUN was at least somewhat aware of the issue?
Finally, the chance to see the alternate life loss sprite is exactly ⅛.
As for any new insights into game mechanics… you know what? I'm just not
going to write anything, and leave you with this flowchart instead. Here's
the definitive guide on how to control Reimu in TH01 we've been waiting for
24 years:
Pellets are deflected during all gray
states. Not shown is the obvious "double-tap Z and X" transition from
all non-(#1) states to the Bomb state, but that would have made this
diagram even more unwieldy than it turned out. And yes, you can shoot
twice as fast while moving left or right.
While I'm at it, here are two more animations from MIKO.PTN
which aren't referenced by any code:
With that monster of a function taken care of, we've only got boss sprite animation as the final blocker of uninterrupted Sariel progress. Due to some unfavorable code layout in the Mima segment though, I'll need to spend a bit more time with some of the features used there. Next up: The missile bullets used in the Mima and YuugenMagan fights.
Nothing really noteworthy in TH01's stage timer code, just yet another HUD
element that is needlessly drawn into VRAM. Sure, ZUN applies his custom
boldfacing effect on top of the glyphs retrieved from font ROM, but he could
have easily installed those modified glyphs as gaiji.
Well, OK, halfwidth gaiji aren't exactly well documented, and sometimes not
even correctly emulated
📝 due to the same PC-98 hardware oddity I was researching last month.
I've reserved two of the pending anonymous "anything" pushes for the
conclusion of this research, just in case you were wondering why the
outstanding workload is now lower after the two delivered here.
And since it doesn't seem to be clearly documented elsewhere: Every 2 ticks
on the stage timer correspond to 4 frames.
So, TH01 rank pellet speed. The resident pellet speed value is a
factor ranging from a minimum of -0.375 up to a maximum of 0.5 (pixels per
frame), multiplied with the difficulty-adjusted base speed for each pellet
and added on top of that same speed. This multiplier is modified
every time the stage timer reaches 0 and
HARRY UP is shown (+0.05)
for every score-based extra life granted below the maximum number of
lives (+0.025)
every time a bomb is used (+0.025)
on every frame in which the rand value (shown in debug
mode) is evenly divisible by
(1800 - (lives × 200) - (bombs × 50)) (+0.025)
every time Reimu got hit (set to 0 if higher, then -0.05)
when using a continue (set to -0.05 if higher, then -0.125)
Apparently, ZUN noted that these deltas couldn't be losslessly stored in an
IEEE 754 floating-point variable, and therefore didn't store the pellet
speed factor exactly in a way that would correspond to its gameplay effect.
Instead, it's stored similar to Q12.4 subpixels: as a simple integer,
pre-multiplied by 40. This results in a raw range of -15 to 20, which is
what the undecompiled ASM calls still use. When spawning a new pellet, its
base speed is first multiplied by that factor, and then divided by 40 again.
This is actually quite smart: The calculation doesn't need to be aware of
either Q12.4 or the 40× format, as
((Q12.4 * factor×40) / factor×40) still comes out as a
Q12.4 subpixel even if all numbers are integers. The only limiting issue
here would be the potential overflow of the 16-bit multiplication at
unadjusted base speeds of more than 50 pixels per frame, but that'd be
seriously unplayable.
So yeah, pellet speed modifications are indeed gradual, and don't just fall
into the coarse three "high, normal, and low" categories.
That's ⅝ of P0160 done, and the continue and pause menus would make good
candidates to fill up the remaining ⅜… except that it seemed impossible to
figure out the correct compiler options for this code?
The issues centered around the two effects of Turbo C++ 4.0J's
-O switch:
Optimizing jump instructions: merging duplicate successive jumps into a
single one, and merging duplicated instructions at the end of conditional
branches into a single place under a single branch, which the other branches
then jump to
Compressing ADD SP and POP CX
stack-clearing instructions after multiple successive CALLs to
__cdecl functions into a single ADD SP with the
combined parameter stack size of all function calls
But how can the ASM for these functions exhibit #1 but not #2? How
can it be seemingly optimized and unoptimized at the same time? The
only option that gets somewhat close would be -O- -y, which
emits line number information into the .OBJ files for debugging. This
combination provides its own kind of #1, but these functions clearly need
the real deal.
The research into this issue ended up consuming a full push on its own.
In the end, this solution turned out to be completely unrelated to compiler
options, and instead came from the effects of a compiler bug in a totally
different place. Initializing a local structure instance or array like
const uint4_t flash_colors[3] = { 3, 4, 5 };
always emits the { 3, 4, 5 } array into the program's data
segment, and then generates a call to the internal SCOPY@
function which copies this data array to the local variable on the stack.
And as soon as this SCOPY@ call is emitted, the -O
optimization #1 is disabled for the entire rest of the translation
unit?!
So, any code segment with an SCOPY@ call followed by
__cdecl functions must strictly be decompiled from top to
bottom, mirroring the original layout of translation units. That means no
TH01 continue and pause menus before we haven't decompiled the bomb
animation, which contains such an SCOPY@ call. 😕
Luckily, TH01 is the only game where this bug leads to significant
restrictions in decompilation order, as later games predominantly use the
pascal calling convention, in which each function itself clears
its stack as part of its RET instruction.
What now, then? With 51% of REIIDEN.EXE decompiled, we're
slowly running out of small features that can be decompiled within ⅜ of a
push. Good that I haven't been looking a lot into OP.EXE and
FUUIN.EXE, which pretty much only got easy pieces of
code left to do. Maybe I'll end up finishing their decompilations entirely
within these smaller gaps? I still ended up finding one more small
piece in REIIDEN.EXE though: The particle system, seen in the
Mima fight.
I like how everything about this animation is contained within a single
function that is called once per frame, but ZUN could have really
consolidated the spawning code for new particles a bit. In Mima's fight,
particles are only spawned from the top and right edges of the screen, but
the function in fact contains unused code for all other 7 possible
directions, written in quite a bloated manner. This wouldn't feel quite as
unused if ZUN had used an angle parameter instead…
Also, why unnecessarily waste another 40 bytes of
the BSS segment?
But wait, what's going on with the very first spawned particle that just
stops near the bottom edge of the screen in the video above? Well, even in
such a simple and self-contained function, ZUN managed to include an
off-by-one error. This one then results in an out-of-bounds array access on
the 80th frame, where the code attempts to spawn a 41st
particle. If the first particle was unlucky to be both slow enough and
spawned away far enough from the bottom and right edges, the spawning code
will then kill it off before its unblitting code gets to run, leaving its
pixel on the screen until something else overlaps it and causes it to be
unblitted.
Which, during regular gameplay, will quickly happen with the Orb, all the
pellets flying around, and your own player movement. Also, the RNG can
easily spawn this particle at a position and velocity that causes it to
leave the screen more quickly. Kind of impressive how ZUN laid out the
structure
of arrays in a way that ensured practically no effect of this bug on the
game; this glitch could have easily happened every 80 frames instead.
He almost got close to all bugs canceling out each other here!
Next up: The player control functions, including the second-biggest function
in all of PC-98 Touhou.
Of course, Sariel's potentially bloated and copy-pasted code is blocked by
even more definitely bloated and copy-pasted code. It's TH01, what did you
expect?
But even then, TH01's item code is on a new level of software architecture
ridiculousness. First, ZUN uses distinct arrays for both types of items,
with their own caps of 4 for bomb items, and 10 for point items. Since that
obviously makes any type-related switch statement redundant,
he also used distinct functions for both types, with copy-pasted
boilerplate code. The main per-item update and render function is
shared though… and takes every single accessed member of the item
structure as its own reference parameter. Like, why, you have a
structure, right there?! That's one way to really practice the C++ language
concept of passing arbitrary structure fields by mutable reference…
To complete the unwarranted grand generic design of this function, it calls
back into per-type collision detection, drop, and collect functions with
another three reference parameters. Yeah, why use C++ virtual methods when
you can also implement the effectively same polymorphism functionality by
hand? Oh, and the coordinate clamping code in one of these callbacks could
only possibly have come from nested min() and
max() preprocessor macros. And that's how you extend such
dead-simple functionality to 1¼ pushes…
Amidst all this jank, we've at least got a sensible item↔player hitbox this
time, with 24 pixels around Reimu's center point to the left and right, and
extending from 24 pixels above Reimu down to the bottom of the playfield.
It absolutely didn't look like that from the initial naive decompilation
though. Changing entity coordinates from left/top to center was one of the
better lessons from TH01 that ZUN implemented in later games, it really
makes collision detection code much more intuitive to grasp.
The card flip code is where we find out some slightly more interesting
aspects about item drops in this game, and how they're controlled by a
hidden cycle variable:
At the beginning of every 5-stage scene, this variable is set to a
random value in the [0..59] range
Point items are dropped at every multiple of 10
Every card flip adds 1 to its value after this mod 10
check
At a value of 140, the point item is replaced with a bomb item, but only
if no damaging bomb is active. In any case, its value is then reset to
1.
Then again, score players largely ignore point items anyway, as card
combos simply have a much bigger effect on the score. With this, I should
have RE'd all information necessary to construct a tool-assisted score run,
though? Edit: Turns out that 1) point items are becoming
increasingly important in score runs, and 2) Pearl already did a TAS some
months ago. Thanks to
spaztron64 for the info!
The Orb↔card hitbox also makes perfect sense, with 24 pixels around
the center point of a card in every direction.
The rest of the code confirms the
card
flip score formula documented on Touhou Wiki, as well as the way cards
are flipped by bombs: During every of the 90 "damaging" frames of the
140-frame bomb animation, there is a 75% chance to flip the card at the
[bomb_frame % total_card_count_in_stage] array index. Since
stages can only have up to 50 cards
📝 thanks to a bug, even a 75% chance is high
enough to typically flip most cards during a bomb. Each of these flips
still only removes a single card HP, just like after a regular collision
with the Orb.
Also, why are the card score popups rendered before the cards
themselves? That's two needless frames of flicker during that 25-frame
animation. Not all too noticeable, but still.
And that's over 50% of REIIDEN.EXE decompiled as well! Next
up: More HUD update and rendering code… with a direct dependency on
rank pellet speed modifications?
…or maybe not that soon, as it would have only wasted time to
untangle the bullet update commits from the rest of the progress. So,
here's all the bullet spawning code in TH04 and TH05 instead. I hope
you're ready for this, there's a lot to talk about!
(For the sake of readability, "bullets" in this blog post refers to the
white 8×8 pellets
and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)
But first, what was going on📝 in 2020? Spent 4 pushes on the basic types
and constants back then, still ended up confusing a couple of things, and
even getting some wrong. Like how TH05's "bullet slowdown" flag actually
always prevents slowdown and fires bullets at a constant speed
instead. Or how "random spread" is not the
best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen,
which deserve different names:
Bullets are zapped at the end of most midboss and boss phases, and
cleared everywhere else – most notably, during bombs, when losing a
life, or as rewards for extends or a maximized Dream bonus. The
Bonus!! points awarded for zapping bullets are calculated iteratively,
so it's not trivial to give an exact formula for these. For a small number
𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛
points – or, using uth05win's (correct) recursive definition,
Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10.
However, one of the internal step variables is capped at a different number
of points for each difficulty (and game), after which the points only
increase linearly. Hence, "semi-exponential".
On to TH04's bullet spawn code then, because that one can at least be
decompiled. And immediately, we have to deal with a pointless distinction
between regular bullets, with either a decelerating or constant
velocity, and special bullets, with preset velocity changes during
their lifetime. That preset has to be set somewhere, so why have
separate functions? In TH04, this separation continues even down to the
lowest level of functions, where values are written into the global bullet
array. TH05 merges those two functions into one, but then goes too far and
uses self-modifying code to save a grand total of two local variables…
Luckily, the rest of its actual code is identical to TH04.
Most of the complexity in bullet spawning comes from the (thankfully
shared) helper function that calculates the velocities of the individual
bullets within a group. Both games handle each group type via a large
switch statement, which is where TH04 shows off another Turbo
C++ 4.0 optimization: If the range of case values is too
sparse to be meaningfully expressed in a jump table, it usually generates a
linear search through a second value table. But with the -G
command-line option, it instead generates branching code for a binary
search through the set of cases. 𝑂(log 𝑛) as the worst case for a
switch statement in a C++ compiler from 1994… that's so cool.
But still, why are the values in TH04's group type enum all
over the place to begin with?
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only
shows up here and in a few places in TH02, compared to at least 50
switch value tables.
In all of its micro-optimized pointlessness, TH05's undecompilable version
at least fixes some of TH04's redundancy. While it's still not even
optimal, it's at least a decently written piece of ASM…
if you take the time to understand what's going on there, because it
certainly took quite a bit of that to verify that all of the things which
looked like bugs or quirks were in fact correct. And that's how the code
for this function ended up with 35% comments and blank lines before I could
confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04,
where an invalid bullet group type would fill all remaining slots in the
bullet array with identical versions of the first bullet.
Something that both games also share in these functions is an over-reliance
on globals for return values or other local state. The most ridiculous
example here: Tuning the speed of a bullet based on rank actually mutates
the global bullet template… which ZUN then works around by adding a wrapper
function around both regular and special bullet spawning, which saves the
base speed before executing that function, and restores it afterward.
Add another set of wrappers to bypass that exact
tuning, and you've expanded your nice 1-function interface to 4 functions.
Oh, and did I mention that TH04 pointlessly duplicates the first set of
wrapper functions for 3 of the 4 difficulties, which can't even be
explained with "debugging reasons"? That's 10 functions then… and probably
explains why I've procrastinated this feature for so long.
At this point, I also finally stopped decompiling ZUN's original ASM just
for the sake of it. All these small TH05 functions would look horribly
unidiomatic, are identical to their decompiled TH04 counterparts anyway,
except for some unique constant… and, in the case of TH05's rank-based
speed tuning function, actually become undecompilable as soon as we
want to return a C++ class to preserve the semantic meaning of the return
value. Mainly, this is because Turbo C++ does not allow register
pseudo-variables like _AX or _AL to be cast into
class types, even if their size matches. Decompiling that function would
have therefore lowered the quality of the rest of the decompiled code, in
exchange for the additional maintenance and compile-time cost of another
translation unit. Not worth it – and for a TH05 port, you'd already have to
decompile all the rest of the bullet spawning code anyway!
The only thing in there that was still somewhat worth being
decompiled was the pre-spawn clipping and collision detection function. Due
to what's probably a micro-optimization mistake, the TH05 version continues
to spawn a bullet even if it was spawned on top of the player. This might
sound like it has a different effect on gameplay… until you realize that
the player got hit in this case and will either lose a life or deathbomb,
both of which will cause all on-screen bullets to be cleared anyway.
So it's at most a visual glitch.
But while we're at it, can we please stop talking about hitboxes? At least
in the context of TH04 and TH05 bullets. The actual collision detection is
described way better as a kill delta of 8×8 pixels between the
center points of the player and a bullet. You can distribute these pixels
to any combination of bullet and player "hitboxes" that make up 8×8. 4×4
around both the player and bullets? 1×1 for bullets, and 8×8 for the
player? All equally valid… or perhaps none of them, once you keep in mind
that other entity types might have different kill deltas. With that in
mind, the concept of a "hitbox" turns into just a confusing abstraction.
The same is true for the 36×44 graze box delta. For some reason,
this one is not exactly around the center of a bullet, but shifted to the
right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the
player, but only up to 16 pixels left of the player. uth05win also spotted
this… and rotated the deltas clockwise by 90°?!
Which brings us to the bullet updates… for which I still had to
research a decompilation workaround, because
📝 P0148 turned out to not help at all?
Instead, the solution was to lie to the compiler about the true segment
distance of the popup function and declare its signature far
rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function
call/return per frame, and those precious 2 bytes in the BSS segment
that he didn't have to spend on a segment value.
📝 Another function that didn't have just a
single declaration in a common header file… really,
📝 how were these games even built???
The function itself is among the longer ones in both games. It especially
stands out in the indentation department, with 7 levels at its most
indented point – and that's the minimum of what's possible without
goto. Only two more notable discoveries there:
Bullets are the only entity affected by Slow Mode. If the number of
bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04,
or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame
rate by 33%, by waiting for one additional VSync event every two frames.
The code also reveals a second tier, with 50% slowdown for a slightly
higher number of bullets, but that conditional branch can never be executed
Bullets must have been grazed in a previous frame before they can
be collided with. (Note how this does not apply to bullets that spawned
on top of the player, as explained earlier!)
Whew… When did ReC98 turn into a full-on code review?! 😅 And after all
this, we're still not done with TH04 and TH05 bullets, with all the
special movement types still missing. That should be less than one push
though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun
rewriting the Touhou Wiki Gameplay pages 😛
Only one newly ordered push since I've reopened the store? Great, that's
all the justification I needed for the extended maintenance delay that was
part of these two pushes 😛
Having to write comments to explain whether coordinates are relative to
the top-left corner of the screen or the top-left corner of the playfield
has finally become old. So, I introduced
distinct
types for all the coordinate systems we typically encounter, applying
them to all code decompiled so far. Note how the planar nature of PC-98
VRAM meant that X and Y coordinates also had to be different from each
other. On the X side, there's mainly the distinction between the
[0; 640] screen space and the corresponding [0; 80] VRAM byte
space. On the Y side, we also have the [0; 400] screen space, but
the visible area of VRAM might be limited to [0; 200] when running in
the PC-98's line-doubled 640×200 mode. A VRAM Y coordinate also always
implies an added offset for vertical scrolling.
During all of the code reconstruction, these types can only have a
documenting purpose. Turning them into anything more than just
typedefs to int, in order to define conversion
operators between them, simply won't recompile into identical binaries.
Modding and porting projects, however, now have a nice foundation for
doing just that, and can entirely lift coordinate system transformations
into the type system, without having to proofread all the meaningless
int declarations themselves.
So, what was left in terms of memory references? EX-Alice's fire waves
were our final unknown entity that can collide with the player. Decently
implemented, with little to say about them.
That left the bomb animation structures as the one big remaining PI
blocker. They started out nice and simple in TH04, with a small 6-byte
star animation structure used for both Reimu and Marisa. TH05, however,
gave each character her own animation… and what the hell is going
on with Reimu's blue stars there? Nope, not going to figure this out on
ASM level.
A decompilation first required some more bomb-related variables to be
named though. Since this was part of a generic RE push, it made sense to
do this in all 5 games… which then led to nice PI gains in anything
but TH05. Most notably, we now got the
"pulling all items to player" flag in TH04 and TH05, which is
actually separate from bombing. The obvious cheat mod is left as an
exercise to the reader.
So, TH05 bomb animations. Just like the
📝 custom entity types of this game, all 4
characters share the same memory, with the superficially same 10-byte
structure.
But let's just look at the very first field. Seen from a low level, it's a
simple struct { int x, y; } pos, storing the current position
of the character-specific bomb animation entity. But all 4 characters use
this field differently:
For Reimu's blue stars, it's the top-left position of each star, in the
12.4 fixed-point format. But unlike the vast majority of these values in
TH04 and TH05, it's relative to the top-left corner of the
screen, not the playfield. Much better represented as
struct { Subpixel screen_x, screen_y; } topleft.
For Marisa's lasers, it's the center of each circle, as a regular 12.4
fixed-point coordinate, relative to the top-left corner of the playfield.
Much better represented as
struct { Subpixel x, y; } center.
For Mima's shrinking circles, it's the center of each circle in regular
pixel coordinates. Much better represented as
struct { screen_x_t x; screen_y_t y; } center.
For Yuuka's spinning heart, it's the top-left corner in regular pixel
coordinates. Much better represented as
struct { screen_x_t x; screen_y_t y; } topleft.
And yes, singular. The game is actually smart enough to only store a single
heart, and then create the rest of the circle on the fly. (If it were even
smarter, it wouldn't even use this structure member, but oh well.)
Therefore, I decompiled it as 4 separate structures once again, bundled
into an union of arrays.
As for Reimu… yup, that's some pointer arithmetic straight out of
Jigoku* for setting and updating the positions of the falling star
trails. While that certainly required several
comments to wrap my head around the current array positions, the one "bug"
in all this arithmetic luckily has no effect on the game.
There is a small glitch with the growing circles, though. They are
spawned at the end of the loop, with their position taken from the star
pointer… but after that pointer has already been incremented. On
the last loop iteration, this leads to an out-of-bounds structure access,
with the position taken from some unknown EX-Alice data, which is 0 during
most of the game. If you look at the animation, you can easily spot these
bugged circles, consistently growing from the top-left corner (0, 0)
of the playfield:
After all that, there was barely enough remaining time to filter out and
label the final few memory references. But now, TH05's
MAIN.EXE is technically position-independent! 🎉
-Tom- is going to work on a pretty extensive demo of this
unprecedented level of efficient Touhou game modding. For a more impactful
effect of both the 100% PI mark and that demo, I'll be delaying the push
covering the remaining false positives in that binary until that demo is
done. I've accumulated a pretty huge backlog of minor maintenance issues
by now…
Next up though: The first part of the long-awaited build system
improvements. I've finally come up with a way of sanely accelerating the
32-bit build part on most setups you could possibly want to build ReC98
on, without making the building experience worse for the other few setups.
As expected, we've now got the TH04 and TH05 stage enemy structure,
finishing position independence for all big entity types. This one was
quite straightfoward, as the .STD scripting system is pretty simple.
Its most interesting aspect can be found in the way timing is handled. In
Windows Touhou, all .ECL script instructions come with a frame field that
defines when they are executed. In TH04's and TH05's .STD scripts, on the
other hand, it's up to each individual instruction to add a frame time
parameter, anywhere in its parameter list. This frame time defines for how
long this instruction should be repeatedly executed, before it manually
advances the instruction pointer to the next one. From what I've seen so
far, these instruction typically apply their effect on the first frame
they run on, and then do nothing for the remaining frames.
Oh, and you can't nest the LOOP instruction, since the enemy
structure only stores one single counter for the current loop iteration.
Just from the structure, the only innovation introduced by TH05 seems to
have been enemy subtypes. These can be used to parametrize scripts via
conditional jumps based on this value, as a first attempt at cutting down
the need to duplicate entire scripts for similar enemy behavior. And
thanks to TH05's favorable segment layout, this game's version of the
.STD enemy script interpreter is even immediately ready for decompilation,
in one single future push.
As far as I can tell, that now only leaves
.MPN file loading
player bomb animations
some structures specific to the Shinki and EX-Alice battles
plus some smaller things I've missed over the years
until TH05's MAIN.EXE is completely position-independent.
Which, however, won't be all it needs for that 100% PI rating on the front
page. And with that many false positives, it's quite easy to get lost with
immediately reverse-engineering everything around them. This time, the
rendering of the text dissolve circles, used for the stage and BGM title
popups, caught my eye… and since the high-level code to handle all of
that was near the end of a segment in both TH04 and TH05, I just decided
to immediately decompile it all. Like, how hard could it possibly be?
Sure, it needed another segment split, which was a bit harder due
to all the existing ASM referencing code in that segment, but certainly
not impossible…
Oh wait, this code depends on 9 other sets of identifiers that haven't
been declared in C land before, some of which require vast reorganizations
to bring them up to current consistency standards. Whoops! Good thing that
this is the part of the project I'm still offering for free…
Among the referenced functions was tiles_invalidate_around(),
which marks the stage background tiles within a rectangular area to be
redrawn this frame. And this one must have had the hardest function
signature to figure out in all of PC-98 Touhou, because it actually
seems impossible. Looking at all the ways the game passes the center
coordinate to this function, we have
X and Y as 16-bit integer literals, merged into a single
PUSH of a 32-bit immediate
X and Y calculated and pushed independently from each other
by-value copies of entire Point instances
Any single declaration would only lead to at most two of the three cases
generating the original instructions. No way around separately declaring
the function in every translation unit then, with the correct parameter
list for the respective calls. That's how ZUN must have also written it.
Oh well, we would have needed to do all of this some time. At least
there were quite a bit of insights to be gained from the actual
decompilation, where using const references actually made it
possible to turn quite a number of potentially ugly macros into wholesome
inline functions.
But still, TH04 and TH05 will come out of ReC98's decompilation as one big
mess. A lot of further manual decompilation and refactoring, beyond the
limits of the original binary, would be needed to make these games
portable to any non-PC-98, non-x86 architecture.
And yes, that includes IBM-compatible DOS – which, for some reason, a
number of people see as the obvious choice for a first system to port
PC-98 Touhou to. This will barely be easier. Sure, you'll save the effort
of decompiling all the remaining original ASM. But even with
master.lib's MASTER_DOSV setting, these games still very much
rely on PC-98 hardware, with corresponding assumptions all over ZUN's
code. You will need to provide abstractions for the PC-98's
superimposed text mode, the gaiji, and planar 4-bit color access in
general, exchanging the use of the PC-98's GRCG and EGC blitter chips with
something else. At that point, you might as well port the game to one
generic 640×400 framebuffer and away from the constraints of DOS,
resulting in that Doom source code-like situation which made that
game easily portable to every architecture to begin with. But ZUN just
wasn't a John Carmack, sorry.
Or what do I know. I've never programmed for IBM-compatible DOS, but maybe
ReC98's audience does include someone who is intimately familiar
with IBM-compatible DOS so that the constraints aren't much of an issue
for them? But even then, 16-bit Windows would make much more sense
as a first porting target if you don't want to bother with that
undecompilable ASM.
At least I won't have to look at TH04 and TH05 for quite a while now.
The delivery delays have made it obvious that
my life has become pretty busy again, probably until September. With a
total of 9 TH01 pushes from monthly subscriptions now waiting in the
backlog, the shop will stay closed until I've caught up with most of
these. Which I'm quite hyped for!
Alright, the score popup numbers shown when collecting items or defeating
(mid)bosses. The second-to-last remaining big entity type in TH05… with
quite some PI false positives in the memory range occupied by its data.
Good thing I still got some outstanding generic RE pushes that haven't
been claimed for anything more specific in over a month! These
conveniently allowed me to RE most of these functions right away, the
right way.
Most of the false positives were boss HP values, passed to a "boss phase
end" function which sets the HP value at which the next phase should end.
Stage 6 Yuuka, Mugetsu, and EX-Alice have their own copies of this
function, in which they also reset certain boss-specific global variables.
Since I always like to cover all varieties of such duplicated functions at
once, it made sense to reverse-engineer all the involved variables while I
was at it… and that's why this was exactly the right time to cover the
implementation details of Stage 6 Yuuka's parasol and vanishing animations
in TH04.
With still a bit of time left in that RE push afterwards, I could also
start looking into some of the smaller functions that didn't quite fit
into other pushes. The most notable one there was a simple function that
aims from any point to the current player position. Which actually only
became a separate function in TH05, probably since it's called 27 times in
total. That's 27 places no longer being blocked from further RE progress.
WindowsTiger already
did most of the work for the score popup numbers in January, which meant
that I only had to review it and bring it up to ReC98's current coding
styles and standards. This one turned out to be one of those rare features
whose TH05 implementation is significantly less insane than the
TH04 one. Both games lazily redraw only the tiles of the stage background
that were drawn over in the previous frame, and try their best to minimize
the amount of tiles to be redrawn in this way. For these popup numbers,
this involves calculating the on-screen width, based on the exact number
of digits in the point value. TH04 calculates this width every frame
during the rendering function, and even resorts to setting that field
through the digit iteration pointer via self-modifying code… yup. TH05, on
the other hand, simply calculates the width once when spawning a new popup
number, during the conversion of the point value to
binary-coded
decimal. The "×2" multiplier suffix being removed in TH05 certainly
also helped in simplifying that feature in this game.
And that's ⅓ of TH05 reverse-engineered! Next up, one more TH05 PI push,
in which the stage enemies hopefully finish all the big entity types.
Maybe it will also be accompanied by another RE push? In any case, that
will be the last piece of TH05 progress for quite some time. The next TH01
stretch will consist of 6 pushes at the very least, and I currently have
no idea of how much time I can spend on ReC98 a month from now…
To finish this TH05 stretch, we've got a feature that's exclusive to TH05
for once! As the final memory management innovation in PC-98 Touhou, TH05
provides a single static (64 * 26)-byte array for storing up to 64
entities of a custom type, specific to a stage or boss portion.
(Edit (2023-05-29): This system actually debuted in
📝 TH04, where it was used for much simpler
entities.)
TH05 uses this array for
the Stage 2 star particles,
Alice's puppets,
the tip of curve ("jello") bullets,
Mai's snowballs and Yuki's fireballs,
Yumeko's swords,
and Shinki's 32×32 bullets,
which makes sense, given that only one of those will be active at any
given time.
On the surface, they all appear to share the same 26-byte structure, with
consistently sized fields, merely using its 5 generic fields for different
purposes. Looking closer though, there actually are differences in
the signedness of certain fields across the six types. uth05win chose to
declare them as entirely separate structures, and given all the semantic
differences (pixels vs. subpixels, regular vs. tiny master.lib sprites,
…), it made sense to do the same in ReC98. It quickly turned out to be the
only solution to meet my own standards of code readability.
Which blew this one up to two pushes once again… But now, modders can
trivially resize any of those structures without affecting the other types
within the original (64 * 26)-byte boundary, even without full position
independence. While you'd still have to reduce the type-specific
number of distinct entities if you made any structure larger, you
could also have more entities with fewer structure members.
As for the types themselves, they're full of redundancy once again – as
you might have already expected from seeing #4, #5, and #6 listed as
unrelated to each other. Those could have indeed been merged into a single
32×32 bullet type, supporting all the unique properties of #4
(destructible, with optional revenge bullets), #5 (optional number of
twirl animation frames before they begin to move) and #6 (delay clouds).
The *_add(), *_update(), and *_render()
functions of #5 and #6 could even already be completely
reverse-engineered from just applying the structure onto the ASM, with the
ones of #3 and #4 only needing one more RE push.
But perhaps the most interesting discovery here is in the curve bullets:
TH05 only renders every second one of the 17 nodes in a curve
bullet, yet hit-tests every single one of them. In practice, this is an
acceptable optimization though – you only start to notice jagged edges and
gaps between the fragments once their speed exceeds roughly 11 pixels per
second:
And that brings us to the last 20% of TH05 position independence! But
first, we'll have more cheap and fast TH01 progress.
Big gains, as expected, but not much to say about this one. With TH05 Reimu
being way too easy to decompile after
📝 the shot control groundwork done in October,
there was enough time to give the comprehensive PI false-positive
treatment to two other sets of functions present in TH04's and TH05's
OP.EXE. One of them, master.lib's super_*()
functions, was used a lot in TH02, more than in any other game… I
wonder how much more that game will progress without even focusing on it
in particular.
Alright then! 100% PI for TH04's and TH05's OP.EXE upcoming…
(Edit: Already got funding to cover this!)
… nope, with a game whose MAIN.EXE is still just 5%
reverse-engineered and which naturally makes heavy use of
structures, there's still a lot more PI groundwork to be done before RE
progress can speed up to the levels that we've now reached with TH05. The
good news is that this game is (now) way easier to understand: In contrast
to TH04 and TH05, where we needed to work towards player shots over a
two-digit number of pushes, TH03 only needed two for SPRITE16, and a half
one for the playfield shaking mechanism. After that, I could even already
decompile the per-frame shot update and render functions, thanks to TH03's
high number of code segments. Now, even the big 128-byte player structure
doesn't seem all too far off.
Then again, as TH03 shares no code with any other game, this actually was
a completely average PI push. For the remaining three, we'll return to
TH04 and TH05 though, which should more than make up for the slight drop
in RE speed after this one.
In other news, we've now also reached peak C++, with the introduction of
templates! TH03 stores movement speeds in a 4.4 fixed-point
format, which is an 8-bit spin on the usual 16-bit, 12.4 fixed-point
format.
So, here we have the first two pushes with an explicit focus on position
independence… and they start out looking barely different from regular
reverse-engineering? They even already deduplicate a bunch of item-related
code, which was simple enough that it required little additional work?
Because the actual work, once again, was in comparing uth05win's
interpretations and naming choices with the original PC-98 code? So that
we only ended up removing a handful of memory references there?
(Oh well, you can mod item drops now!)
So, continuing to interpret PI as a mere by-product of reverse-engineering
might ultimately drive up the total PI cost quite a bit. But alright then,
let's systematically clear out some false positives by looking at
master.lib function calls instead… and suddenly we get the PI progress we
were looking for, nicely spread out over all games since TH02. That kinda
makes it sound like useless work, only done because it's dictated by some
counting algorithm on a website. But decompilation will want to convert
all of these values to decimal anyway. We're merely doing that right now,
across all games.
Then again, it doesn't actually make any game more
position-independent, and only proves how position-independent it already
was. So I'm really wondering right now whether I should just rush
actual position independence by simply identifying structures and
their sizes, and not bother with members or false positives until that's
done. That would certainly get the job done for TH04 and TH05 in just a
few more pushes, but then leave all the proving work (and the road
to 100% PI on the front page) to reverse-engineering.
I don't know. Would it be worth it to have a game that's "maybe
fully position-independent", only for there to maybe be rare edge
cases where it isn't?
Or maybe, continuing to strike a balance between identifying false
positives (fast) and reverse-engineering structures (slow) will continue
to work out like it did now, and make us end up close to the current
estimate, which was attractive enough to sell out the crowdfunding for the
first time… 🤔
Please give feedback! If possible, by Friday evening UTC+1, before I start
working on the next PI push, this time with a focus on TH04.
Deathbombs confirmed, in both TH04 and TH05! On the surface, it's the same
8-frame window as in
most Windows games, but due to the slightly lower PC-98 frame rate of
56.4 Hz, it's actually slightly more lenient in TH04 and TH05.
The last function in front of the TH05 shot type control functions marks
the player's previous position in VRAM to be redrawn. But as it turns out,
"player" not only means "the player's option satellites on shot levels ≥
2", but also "the explosion animation if you lose a life", which required
reverse-engineering both things, ultimately leading to the confirmation of
deathbombs.
It actually was kind of surprising that we then had reverse-engineered
everything related to rendering all three things mentioned above,
and could also cover the player rendering function right now. Luckily,
TH05 didn't decide to also micro-optimize that function into
un-decompilability; in fact, it wasn't changed at all from TH04. Unlike
the one invalidation function whose decompilation would have
actually been the goal here…
But now, we've finally gotten to where we wanted to… and only got 2
outstanding decompilation pushes left. Time to get the website ready for
hosting an actual crowdfunding campaign, I'd say – It'll make a better
impression if people can still see things being delivered after the big
announcement.
So, let's continue with player shots! …eh, or maybe not directly, since they involve two other structure types in TH05, which we'd have to cover first. One of them is a different sort of sprite, and since I like me some context in my reverse-engineering, let's disable every other sprite type first to figure out what it is.
One of those other sprite types were the little sparks flying away from killed stage enemies, midbosses, and grazed bullets; easy enough to also RE right now. Turns out they use the same 8 hardcoded 8×8 sprites in TH02, TH04, and TH05. Except that it's actually 64 16×8 sprites, because ZUN wanted to pre-shift them for all 8 possible start pixels within a planar VRAM byte (rather than, like, just writing a few instructions to shift them programmatically), leading to them taking up 1,024 bytes rather than just 64.
Oh, and the thing I wanted to RE *actually* was the decay animation whenever a shot hits something. Not too complex either, especially since it's exclusive to TH05.
And since there was some time left and I actually have to pick some of the next RE places strategically to best prepare for the upcoming 17 decompilation pushes, here's two more function pointers for good measure.