Turns out I was not quite done with the TH01 Anniversary Edition yet.
You might have noticed some white streaks at the beginning of Sariel's
second form, which are in fact a bug that I accidentally added to the
initial release.
These can be traced back to a quirk
I wasn't aware of, and hadn't documented so far. When defeating Sariel's
first form during a pattern that spawns pellets, it's likely for the second
form to start with additional pellets that resemble the previous pattern,
but come out of seemingly nowhere. This shouldn't really happen if you look
at the code: Nothing outside the typical pattern code spawns new pellets,
and all existing ones are reset before the form transition…
Except if they're currently showing the 10-frame delay cloud
animation , activated for all pellets during the symmetrical radial 2-ring
pattern in Phase 2 and left activated for the rest of the fight. These
pellets will continue their animation after the transition to the second
form, and turn into regular pellets you have to dodge once their animation
completed.
By itself, this is just one more quirk to keep in mind during refactoring.
It only turned into a bug in the Anniversary Edition because the game tracks
the number of living pellets in a separate counter variable. After resetting
all pellets, this counter is simply set to 0, regardless of any delay cloud
pellets that may still be alive, and it's merely incremented or decremented
when pellets are spawned or leave the playfield.
In the original game, this counter is only used as an optimization to skip
spawning new pellets once the cap is reached. But with batched
EGC-accelerated unblitting, it also makes sense to skip the rather costly
setup and shutdown of the EGC if no pellets are active anyway. Except if the
counter you use to check for that case can be 0 even if there are
pellets alive, which consequently don't get unblitted…
There is an optimal fix though: Instead of unconditionally resetting the
living pellet counter to 0, we decrement it for every pellet that
does get reset. This preserves the quirk and gives us a
consistently correct counter, allowing us to still skip every unnecessary
loop over the pellet array.
Ultimately, this was a harmless bug that didn't affect gameplay, but it's
still something that players would have probably reported a few more times.
So here's a free bugfix:
Here we go, TH01 Sariel! This is the single biggest boss fight in all of
PC-98 Touhou: If we include all custom effect code we previously decompiled,
it amounts to a total of 10.31% of all code in TH01 (and 3.14%
overall). These 8 pushes cover the final 8.10% (or 2.47% overall),
and are likely to be the single biggest delivery this project will ever see.
Considering that I only managed to decompile 6.00% across all games in 2021,
2022 is already off to a much better start!
So, how can Sariel's code be that large? Well, we've got:
16 danmaku patterns; including the one snowflake detonating into a giant
94×32 hitbox
Gratuitous usage of floating-point variables, bloating the binary thanks
to Turbo C++ 4.0J's particularly horrid code generation
The hatching birds that shoot pellets
3 separate particle systems, sharing the general idea, overall code
structure, and blitting algorithm, but differing in every little detail
The "gust of wind" background transition animation
5 sets of custom monochrome sprite animations, loaded from
BOSS6GR?.GRC
A further 3 hardcoded monochrome 8×8 sprites for the "swaying leaves"
pattern during the second form
In total, it's just under 3,000 lines of C++ code, containing a total of 8
definite ZUN bugs, 3 of them being subpixel/pixel confusions. That might not
look all too bad if you compare it to the
📝 player control function's 8 bugs in 900 lines of code,
but given that Konngara had 0… (Edit (2022-07-17):
Konngara contains two bugs after all: A
📝 possible heap corruption in test or debug mode,
and the infamous
📝 temporary green discoloration.)
And no, the code doesn't make it obvious whether ZUN coded Konngara or
Sariel first; there's just as much evidence for either.
Some terminology before we start: Sariel's first form is separated
into four phases, indicated by different background images, that
cycle until Sariel's HP reach 0 and the second, single-phase form
starts. The danmaku patterns within each phase are also on a cycle,
and the game picks a random but limited number of patterns per phase before
transitioning to the next one. The fight always starts at pattern 1 of phase
1 (the random purple lasers), and each new phase also starts at its
respective first pattern.
Sariel's bugs already start at the graphics asset level, before any code
gets to run. Some of the patterns include a wand raise animation, which is
stored in BOSS6_2.BOS:
The "lowered wand" sprite is missing in this file simply because it's
captured from the regular background image in VRAM, at the beginning of the
fight and after every background transition. What I previously thought to be
📝 background storage code has therefore a
different meaning in Sariel's case. Since this captured sprite is fully
opaque, it will reset the entire 128×128 wand area… wait, 128×128, rather
than 96×96? Yup, this lowered sprite is larger than necessary, wasting 1,967
bytes of conventional memory. That still doesn't quite explain the
second sprite in BOSS6_2.BOS though. Turns out that the black
part is indeed meant to unblit the purple reflection (?) in the first
sprite. But… that's not how you would correctly unblit that?
The first sprite already eats up part of the red HUD line, and the second
one additionally fails to recover the seal pixels underneath, leaving a nice
little black hole and some stray purple pixels until the next background
transition. Quite ironic given that both
sprites do include the right part of the seal, which isn't even part of the
animation.
Just like Konngara, Sariel continues the approach of using a single function
per danmaku pattern or custom entity. While I appreciate that this allows
all pattern- and entity-specific state to be scoped locally to that one
function, it quickly gets ugly as soon as such a function has to do more than one thing.
The "bird function" is particularly awful here: It's just one if(…)
{…} else if(…) {…} else if(…) {…} chain with different
branches for the subfunction parameter, with zero shared code between any of
these branches. It also uses 64-bit floating-point double as
its subpixel type… and since it also takes four of those as parameters
(y'know, just in case the "spawn new bird" subfunction is called), every
call site has to also push four double values onto the stack.
Thanks to Turbo C++ even using the FPU for pushing a 0.0 constant, we
have already reached maximum floating-point decadence before even having
seen a single danmaku pattern. Why decadence? Every possible spawn position
and velocity in both bird patterns just uses pixel resolution, with no
fractional component in sight. And there goes another 720 bytes of
conventional memory.
Speaking about bird patterns, the red-bird one is where we find the first
code-level ZUN bug: The spawn cross circle sprite suddenly disappears after
it finished spawning all the bird eggs. How can we tell it's a bug? Because
there is code to smoothly fly this sprite off the playfield, that
code just suddenly forgets that the sprite's position is stored in Q12.4
subpixels, and treats it as raw screen pixels instead.
As a result, the well-intentioned 640×400
screen-space clipping rectangle effectively shrinks to 38×23 pixels in the
top-left corner of the screen. Which the sprite is always outside of, and
thus never rendered again.
The intended animation is easily restored though:
Also, did you know that birds actually have a quite unfair 14×38-pixel
hitbox? Not that you'd ever collide with them in any of the patterns…
Another 3 of the 8 bugs can be found in the symmetric, interlaced spawn rays
used in three of the patterns, and the 32×32 debris "sprites" shown at their endpoint, at
the edge of the screen. You kinda have to commend ZUN's attention to detail
here, and how he wrote a lot of code for those few rapidly animated pixels
that you most likely don't
even notice, especially with all the other wrong pixels
resulting from rendering glitches. One of the bugs in the very final pattern
of phase 4 even turns them into the vortex sprites from the second pattern
in phase 1 during the first 5 frames of
the first time the pattern is active, and I had to single-step the blitting
calls to verify it.
It certainly was annoying how much time I spent making sense of these bugs,
and all weird blitting offsets, for just a few pixels… Let's look at
something more wholesome, shall we?
So far, we've only seen the PC-98 GRCG being used in RMW (read-modify-write)
mode, which I previously
📝 explained in the context of TH01's red-white HP pattern.
The second of its three modes, TCR (Tile Compare Read), affects VRAM reads
rather than writes, and performs "color extraction" across all 4 bitplanes:
Instead of returning raw 1bpp data from one plane, a VRAM read will instead
return a bitmask, with a 1 bit at every pixel whose full 4-bit color exactly
matches the color at that offset in the GRCG's tile register, and 0
everywhere else. Sariel uses this mode to make sure that the 2×2 particles
and the wind effect are only blitted on top of "air color" pixels, with
other parts of the background behaving like a mask. The algorithm:
Set the GRCG to TCR mode, and all 8 tile register dots to the air
color
Read N bits from the target VRAM position to obtain an N-bit mask where
all 1 bits indicate air color pixels at the respective position
AND that mask with the alpha plane of the sprite to be drawn, shifted to
the correct start bit within the 8-pixel VRAM byte
Set the GRCG to RMW mode, and all 8 tile register dots to the color that
should be drawn
Write the previously obtained bitmask to the same position in VRAM
Quite clever how the extracted colors double as a secondary alpha plane,
making for another well-earned good-code tag. The wind effect really doesn't deserve it, though:
ZUN calculates every intermediate result inside this function
over and over and over again… Together with some ugly
pointer arithmetic, this function turned into one of the most tedious
decompilations in a long while.
This gradual effect is blitted exclusively to the front page of VRAM,
since parts of it need to be unblitted to create the illusion of a gust of
wind. Then again, anything that moves on top of air-colored background –
most likely the Orb – will also unblit whatever it covered of the effect…
As far as I can tell, ZUN didn't use TCR mode anywhere else in PC-98 Touhou.
Tune in again later during a TH04 or TH05 push to learn about TDW, the final
GRCG mode!
Speaking about the 2×2 particle systems, why do we need three of them? Their
only observable difference lies in the way they move their particles:
Up or down in a straight line (used in phases 4 and 2,
respectively)
Left or right in a straight line (used in the second form)
Left and right in a sinusoidal motion (used in phase 3, the "dark
orange" one)
Out of all possible formats ZUN could have used for storing the positions
and velocities of individual particles, he chose a) 64-bit /
double-precision floating-point, and b) raw screen pixels. Want to take a
guess at which data type is used for which particle system?
If you picked double for 1) and 2), and raw screen pixels for
3), you are of course correct! Not that I'm implying
that it should have been the other way round – screen pixels would have
perfectly fit all three systems use cases, as all 16-bit coordinates
are extended to 32 bits for trigonometric calculations anyway. That's what,
another 1.080 bytes of wasted conventional memory? And that's even
calculated while keeping the current architecture, which allocates
space for 3×30 particles as part of the game's global data, although only
one of the three particle systems is active at any given time.
That's it for the first form, time to put on "Civilization
of Magic"! Or "死なばもろとも"? Or "Theme of 地獄めくり"? Or whatever SYUGEN is
supposed to mean…
… and the code of these final patterns comes out roughly as exciting as
their in-game impact. With the big exception of the very final "swaying
leaves" pattern: After 📝 Q4.4,
📝 Q28.4,
📝 Q24.8, and double variables,
this pattern uses… decimal subpixels? Like, multiplying the number by
10, and using the decimal one's digit to represent the fractional part?
Well, sure, if you really insist on moving the leaves in cleanly
represented integer multiples of ⅒, which is infamously impossible in IEEE
754. Aside from aesthetic reasons, it only really combines less precision
(10 possible fractions rather than the usual 16) with the inferior
performance of having to use integer divisions and multiplications rather
than simple bit shifts. And it's surely not because the leaf sprites needed
an extended integer value range of [-3276, +3276], compared to
Q12.4's [-2047, +2048]: They are clipped to 640×400 screen space
anyway, and are removed as soon as they leave this area.
This pattern also contains the second bug in the "subpixel/pixel confusion
hiding an entire animation" category, causing all of
BOSS6GR4.GRC to effectively become unused:
At least their hitboxes are what you would expect, exactly covering the
30×30 pixels of Reimu's sprite. Both animation fixes are available on the th01_sariel_fixes
branch.
After all that, Sariel's main function turned out fairly unspectacular, just
putting everything together and adding some shake, transition, and color
pulse effects with a bunch of unnecessary hardware palette changes. There is
one reference to a missing BOSS6.GRP file during the
first→second form transition, suggesting that Sariel originally had a
separate "first form defeat" graphic, before it was replaced with just the
shaking effect in the final game.
Speaking about the transition code, it is kind of funny how the… um,
imperative and concrete nature of TH01 leads to these 2×24
lines of straight-line code. They kind of look like ZUN rattling off a
laundry list of subsystems and raw variables to be reinitialized, making
damn sure to not forget anything.
Whew! Second PC-98 Touhou boss completely decompiled, 29 to go, and they'll
only get easier from here! 🎉 The next one in line, Elis, is somewhere
between Konngara and Sariel as far as x86 instruction count is concerned, so
that'll need to wait for some additional funding. Next up, therefore:
Looking at a thing in TH03's main game code – really, I have little
idea what it will be!
Now that the store is open again, also check out the
📝 updated RE progress overview I've posted
together with this one. In addition to more RE, you can now also directly
order a variety of mods; all of these are further explained in the order
form itself.
Well, make that three days. Trying to figure out all the details behind
the sprite flickering was absolutely dreadful…
It started out easy enough, though. Unsurprisingly, TH01 had a quite
limited pellet system compared to TH04 and TH05:
The cap is 100, rather than 240 in TH04 or 180 in TH05.
Only 6 special motion functions (with one of them broken and unused)
instead of 10. This is where you find the code that generates SinGyoku's
chase pellets, Kikuri's small spinning multi-pellet circles, and
Konngara's rain pellets that bounce down from the top of the playfield.
A tiny selection of preconfigured multi-pellet groups. Rather than
TH04's and TH05's freely configurable n-way spreads, stacks, and rings,
TH01 only provides abstractions for 2-, 3-, 4-, and 5- way spreads (yup,
no 6-way or beyond), with a fixed narrow or wide angle between the
individual pellets. The resulting pellets are also hardcoded to linear
motion, and can't use the special motion functions. Maybe not the best
code, but still kind of cute, since the generated groups do follow a
clear logic.
As expected from TH01, the code comes with its fair share of smaller,
insignificant ZUN bugs and oversights. As you would also expect
though, the sprite flickering points to the biggest and most consequential
flaw in all of this.
Apparently, it started with ZUN getting the impression that it's only
possible to use the PC-98 EGC for fast blitting of all 4 bitplanes in one
CPU instruction if you blit 16 horizontal pixels (= 2 bytes) at a time.
Consequently, he only wrote one function for EGC-accelerated sprite
unblitting, which can only operate on a "grid" of 16×1 tiles in VRAM. But
wait, pellets are not only just 8×8, but can also be placed at any
unaligned X position…
… yet the game still insists on using this 16-dot-aligned function to
unblit pellets, forcing itself into using a super sloppy 16×8 rectangle
for the job. 🤦 ZUN then tried to mitigate the resulting flickering in two
hilarious ways that just make it worse:
An… "interlaced rendering" mode? This one's activated for all Stage 15
and 20 fights, and separates pellets into two halves that are rendered on
alternating frames. Collision detection with the Yin-Yang Orb and the
player is only done for the visible half, but collision detection with
player shots is still done for all pellets every frame, as are
motion updates – so that pellets don't end up moving half as fast as they
should.
So yeah, your eyes weren't deceiving you. The game does effectively
drop its perceived frame rate in the Elis, Kikuri, Sariel, and Konngara
fights, and it does so deliberately.
📝 Just like player shots, pellets
are also unblitted, moved, and rendered in a single function.
Thanks to the 16×8 rectangle, there's now the (completely unnecessary)
possibility of accidentally unblitting parts of a sprite that was
previously drawn into the 8 pixels right of a pellet. And this
is where ZUN went full and went "oh, I
know, let's test the entire 16 pixels, and in case we got an entity
there, we simply make the pellet invisible for this frame! Then
we don't even have to unblit it later!"
Except that this is only done for the first 3 elements of the player
shot array…?! Which don't even necessarily have to contain the 3 shots
fired last. It's not done for the player sprite, the Orb, or, heck,
other pellets that come earlier in the pellet array. (At least
we avoided going 𝑂(𝑛²) there?)
Actually, and I'm only realizing this now as I type this blog post:
This test is done even if the shots at those array elements aren't
active. So, pellets tend to be made invisible based on comparisons
with garbage data.
And then you notice that the player shot
unblit/move/render function is actually only ever called from the
pellet unblit/move/render function on the one global instance
of the player shot manager class, after pellets were unblitted. So, we
end up with a sequence of
which means that we can't ever unblit a previously rendered shot
with a pellet. Sure, as terrible as this one function call is from
a software architecture perspective, it was enough to fix this issue.
Yet we don't even get the intended positive effect, and walk away with
pellets that are made temporarily invisible for no reason at all. So,
uh, maybe it all just was an attempt at increasing the
ramerate on lower spec PC-98 models?
Yup, that's it, we've found the most stupid piece of code in this game,
period. It'll be hard to top this.
I'm confident that it's possible to turn TH01 into a well-written, fluid
PC-98 game, with no flickering, and no perceived lag, once it's
position-independent. With some more in-depth knowledge and documentation
on the EGC (remember, there's still
📝 this one TH03 push waiting to be funded),
you might even be able to continue using that piece of blitter hardware.
And no, you certainly won't need ASM micro-optimizations – just a bit of
knowledge about which optimizations Turbo C++ does on its own, and what
you'd have to improve in your own code. It'd be very hard to write
worse code than what you find in TH01 itself.
(Godbolt for Turbo C++ 4.0J when?
Seriously though, that would 📝 also be a
great project for outside contributors!)
Oh well. In contrast to TH04 and TH05, where 4 pushes only covered all the
involved data types, they were enough to completely cover all of
the pellet code in TH01. Everything's already decompiled, and we never
have to look at it again. 😌 And with that, TH01 has also gone from by far
the least RE'd to the most RE'd game within ReC98, in just half a year! 🎉
Still, that was enough TH01 game logic for a while.
Next up: Making up for the delay with some
more relaxing and easy pieces of TH01 code, that hopefully make just a
bit more sense than all this garbage. More image formats, mainly.