So, only one card-flipping function missing, and then we can start
decompiling TH01's two final bosses? Unfortunately, that had to be the one
big function that initializes and renders all gameplay objects. #17 on the
list of longest functions in all of PC-98 Touhou, requiring two pushes to
fully understand what's going on there… and then it immediately returns
for all "boss" stages whose number is divisible by 5, yet is still called
during Sariel's and Konngara's initialization 🤦
Oh well. This also involved the final file format we hadn't looked at
yet – the STAGE?.DAT files that describe the layout for all
stages within a single 5-stage scene. Which, for a change is a very
well-designed form– no, of course it's completely weird, what did you
expect? Development must have looked somewhat like this:
Weirdness #1: "Hm, the stage format should
include the file names for the background graphics and music… or should
it?" And so, the 22-byte header still references some music and
background files that aren't part of the final game. The game doesn't use
anything from there, and instead derives those file names from the
That's probably nothing new to anyone who has ever looked at TH01's data
files. In a slightly more interesting discovery though, seeing the+
📝 .GRF extension, in some of the file names
that are short enough to not cut it off, confirms that .GRF was initially
used for background images. Probably before ZUN learned about .PI, and how
it achieves better compression than his own per-bitplane RLE approach?
Weirdness #2: "Hm, I might want to put
obstacles on top of cards?" You'd probably expect this format to
contain one single array for every stage, describing which object to place
on every 32×32 tile, if any. Well, the real format uses two arrays:
One for the cards, and a combined one for all "obstacles" – bumpers, bumper
bars, turrets, and portals. However, none of the card-flipping stages in
the final game come with any such overlaps. That's quite unfortunate, as it
would have made for some quite interesting level designs:
As you can see, the final version of the blitting code was not written
with such overlaps in mind either, blitting the cards on top of all
the obstacles, and not the other way round.
Weirdness #3: "In contrast to obstacles, of
which there are multiple types, cards only really need 1 bit. Time for some
bit twiddling!" Not the worst idea, given that the 640×336 playfield
can fit 20×10 cards, which would fit exactly into 25 bytes if you use a
single bit to indicate card or no card. But for whatever
reason, ZUN only stored 4 card bits per byte, leaving the other 4 bits
unused, and needlessly blowing up that array to 50 bytes. 🤷
Oh, and did I mention that the contents of the STAGE?.DAT files are
loaded into the main data segment, even though the game immediately parses
them into something more conveniently accessible? That's another 1250 bytes
of memory wasted for no reason…
Weirdness #4: "Hm, how about requiring the
player to flip some of the cards multiple times? But I've already written
all this bit twiddling code to store 4 cards in 1 byte. And if cards should
need anywhere from 1 to 4 flips, that would need at least 2 more bits,
which won't fit into the unused 4 bits either…" This feature
must have come later, because the final game uses 3 "obstacle" type
IDs to act as a flip count modifier for a card at the same relative array
position. Complete with lookup code to find the actual card index these
modifiers belong to, and ridiculous switch statements to not include
those non-obstacles in the game's internal obstacle array.
With all that, it's almost not worth mentioning how there are 12 turret
types, which only differ in which hardcoded pellet group they fire at a
hardcoded interval of either 100 or 200 frames, and that they're all
explicitly spelled out in every single switch statement. Or
how the layout of the internal card and obstacle SoA classes is quite
disjointed. So here's the new ZUN bugs you've probably already been
Cards and obstacles are blitted to both VRAM pages. This way, any other
entities moving on top of them can simply be unblitted by restoring pixels
from VRAM page 1, without requiring the stationary objects to be redrawn
from main memory. Obviously, the backgrounds behind the cards have to be
stored somewhere, since the player can remove them. For faster transitions
between stages of a scene, ZUN chose to store the backgrounds behind
obstacles as well. This way, the background image really only needs to be
blitted for the first stage in a scene.
All that memory for the object backgrounds adds up quite a bit though. ZUN
actually made the correct choice here and picked a memory allocation
function that can return more than the 64 KiB of a single x86 Real Mode
segment. He then accesses the individual backgrounds via regular array
subscripts… and that's where the bug lies, because he stores the returned
address in a regular far pointer rather than a
huge one. This way, the game still can only display a
total of 102 objects (i. e., cards and obstacles combined) per stage,
without any unblitting glitches.
What a shame, that limit could have been 127 if ZUN didn't needlessly
allocate memory for alpha planes when backing up VRAM content.
And since array subscripts on far pointers wrap around after
64 KiB, trying to save the background of the 103rd object is guaranteed to
corrupt the memory block header at the beginning of the returned segment.
. When TH01 runs in test mode, it
correctly reports a corrupted heap in this case.
Finally, some unused content! Upon discovering TH01's
debug mode, probably everyone tried to access Stage 21,
just to see what happens, and indeed landed in an actual stage, with a
black background and a weird color palette. Turns out that ZUN did
ship an unused scene in SCENE7.DAT, which is exactly was
Unfortunately, it's easy to believe that this is just garbage data (as I
initially did): At the beginning of "Stage 22", the game seems to enter an
infinite loop somewhere during the flip-in animation.
Well, we've had a heap overflow above, and the cause here is nothing but a
stack buffer overflow – a perhaps more modern kind of classic C bug,
given its prevalence in the Windows Touhou games. Explained in a few lines
int card_animation_frames; // even though there can be up to 200?!
int total_frames = 0;
(code that would end up resetting total_frames if it ever tried to reset
The number of cards in "Stage 22"? 76. There you have it.
But of course, it's trivial to disable this animation and fix these stage
transitions. So here they are, Stages 21 to 24, as shipped with the game
Wow, what a mess. All that was just a bit too much to be covered in two
pushes… Next up, assuming the current subscriptions: Taking a vacation with
one smaller TH01 push, covering some smaller functions here and there to
ensure some uninterrupted Konngara progress later on.
Well, make that three days. Trying to figure out all the details behind
the sprite flickering was absolutely dreadful…
It started out easy enough, though. Unsurprisingly, TH01 had a quite
limited pellet system compared to TH04 and TH05:
The cap is 100, rather than 240 in TH04 or 180 in TH05.
Only 6 special motion functions (with one of them broken and unused)
instead of 10. This is where you find the code that generates SinGyoku's
chase pellets, Kikuri's small spinning multi-pellet circles, and
Konngara's rain pellets that bounce down from the top of the playfield.
A tiny selection of preconfigured multi-pellet groups. Rather than
TH04's and TH05's freely configurable n-way spreads, stacks, and rings,
TH01 only provides abstractions for 2-, 3-, 4-, and 5- way spreads (yup,
no 6-way or beyond), with a fixed narrow or wide angle between the
individual pellets. The resulting pellets are also hardcoded to linear
motion, and can't use the special motion functions. Maybe not the best
code, but still kind of cute, since the generated groups do follow a
As expected from TH01, the code comes with its fair share of smaller,
insignificant ZUN bugs and oversights. As you would also expect
though, the sprite flickering points to the biggest and most consequential
flaw in all of this.
Apparently, it started with ZUN getting the impression that it's only
possible to use the PC-98 EGC for fast blitting of all 4 bitplanes in one
CPU instruction if you blit 16 horizontal pixels (= 2 bytes) at a time.
Consequently, he only wrote one function for EGC-accelerated sprite
unblitting, which can only operate on a "grid" of 16×1 tiles in VRAM. But
wait, pellets are not only just 8×8, but can also be placed at any
unaligned X position…
… yet the game still insists on using this 16-dot-aligned function to
unblit pellets, forcing itself into using a super sloppy 16×8 rectangle
for the job. 🤦 ZUN then tried to mitigate the resulting flickering in two
hilarious ways that just make it worse:
An… "interlaced rendering" mode? This one's activated for all Stage 15
and 20 fights, and separates pellets into two halves that are rendered on
alternating frames. Collision detection with the Yin-Yang Orb and the
player is only done for the visible half, but collision detection with
player shots is still done for all pellets every frame, as are
motion updates – so that pellets don't end up moving half as fast as they
So yeah, your eyes weren't deceiving you. The game does effectively
drop its perceived frame rate in the Elis, Kikuri, Sariel, and Konngara
fights, and it does so deliberately.
📝 Just like player shots, pellets
are also unblitted, moved, and rendered in a single function.
Thanks to the 16×8 rectangle, there's now the (completely unnecessary)
possibility of accidentally unblitting parts of a sprite that was
previously drawn into the 8 pixels right of a pellet. And this
is where ZUN went full and went "oh, I
know, let's test the entire 16 pixels, and in case we got an entity
there, we simply make the pellet invisible for this frame! Then
we don't even have to unblit it later!"
Except that this is only done for the first 3 elements of the player
shot array…?! Which don't even necessarily have to contain the 3 shots
fired last. It's not done for the player sprite, the Orb, or, heck,
other pellets that come earlier in the pellet array. (At least
we avoided going 𝑂(𝑛²) there?)
Actually, and I'm only realizing this now as I type this blog post:
This test is done even if the shots at those array elements aren't
active. So, pellets tend to be made invisible based on comparisons
with garbage data.
And then you notice that the player shot
unblit/move/render function is actually only ever called from the
pellet unblit/move/render function on the one global instance
of the player shot manager class, after pellets were unblitted. So, we
end up with a sequence of
which means that we can't ever unblit a previously rendered shot
with a pellet. Sure, as terrible as this one function call is from
a software architecture perspective, it was enough to fix this issue.
Yet we don't even get the intended positive effect, and walk away with
pellets that are made temporarily invisible for no reason at all. So,
uh, maybe it all just was an attempt at increasing the
ramerate on lower spec PC-98 models?
Yup, that's it, we've found the most stupid piece of code in this game,
period. It'll be hard to top this.
I'm confident that it's possible to turn TH01 into a well-written, fluid
PC-98 game, with no flickering, and no perceived lag, once it's
position-independent. With some more in-depth knowledge and documentation
on the EGC (remember, there's still
📝 this one TH03 push waiting to be funded),
you might even be able to continue using that piece of blitter hardware.
And no, you certainly won't need ASM micro-optimizations – just a bit of
knowledge about which optimizations Turbo C++ does on its own, and what
you'd have to improve in your own code. It'd be very hard to write
worse code than what you find in TH01 itself.
(Godbolt for Turbo C++ 4.0J when?
Seriously though, that would 📝 also be a
great project for outside contributors!)
Oh well. In contrast to TH04 and TH05, where 4 pushes only covered all the
involved data types, they were enough to completely cover all of
the pellet code in TH01. Everything's already decompiled, and we never
have to look at it again. 😌 And with that, TH01 has also gone from by far
the least RE'd to the most RE'd game within ReC98, in just half a year! 🎉
Still, that was enough TH01 game logic for a while.
Next up: Making up for the delay with some
more relaxing and easy pieces of TH01 code, that hopefully make just a
bit more sense than all this garbage. More image formats, mainly.
Back to TH01, and its high score menu… oh, wait, that one will eventually
involve keyboard input. And thanks to the generous TH01 funding situation,
there's really no reason not to cover that right now. After all,
TH01 is the last game where input still hadn't been RE'd.
But first, let's also cover that one unused blitting function, together
with REIIDEN.CFG loading and saving, which are in front of
the input function in OP.EXE… (By now, we all know about
the hidden start bomb configuration, right?)
Unsurprisingly, the earliest game also implements input in the messiest
way, with a different function for each of the three executables. "Because
they all react differently to keyboard inputs ",
apparently? OP.EXE even has two functions for it, one for the
START / CONTINUE / OPTION / QUIT main
menu, and one for both Option and Music Test menus, both of which directly
perform the ring arithmetic on the menu cursor variable. A consistent
separation of keyboard polling from input processing apparently wasn't all
too obvious of a thought, since it's only truly done from TH02 on.
Which means that this screen is actually re-entered for every 3 frames
that the PgUp key is being held. And yes, you can, of course, also
crash the system via a stack overflow this way by holding down PgUp for a
few seconds, if that's your thing. Edit (2020-09-17): Here's a video from
spaztron64, showing off this
exact stack overflow crash while running under the
memory manager, which displays additional information about these
sorts of crashes:
What makes this even funnier is that the code actually tracks the last
state of every polled key, to prevent exactly that sort of bug. But the
copy-pasted assignment of the last input state is only done aftertest_mem() already returned, making it effectively pointless
for PgUp. It does work as intended for PgDown… and that's why you
have to actually press and release this key once for every call to
test_mem() in order to actually get back into the game. Even
though a single call to PgDown will already show the game screen
In maybe more relevant news though, this function also came with what can
be considered the first piece of actual gameplay logic! Bombing via
double-tapping the Z and X keys is also handled here, and now we know that
both keys simply have to be tapped twice within a window of 20 frames.
They are tracked independently from each other, so you don't necessarily
have to press them simultaneously.
In debug mode, the bomb count tracks precisely this window of
time. That's why it only resets back to 0 when pressing Z or X if it's
Sure, TH01's code is expectedly terrible and messy. But compared to the
micro-optimizations of TH04 and TH05, it's an absolute joy to work on, and
opening all these ZUN bug loot boxes is just the icing on the cake.
Looking forward to more of the high score menu in the next pushes!