TH03 finally passed 20% RE, and the newly decompiled code contains no
serious ZUN bugs! What a nice way to end the year.
There's only a single unlockable feature in TH03: Chiyuri and Yumemi as
playable characters, unlocked after a 1CC on any difficulty. Just like the
Extra Stages in TH04 and TH05, YUME.NEM contains a single
designated variable for this unlocked feature, making it trivial to craft a
fully unlocked score file without recording any high scores that others
would have to compete against. So, we can now put together a complete set
for all PC-98 Touhou games: 2021-12-27-Fully-unlocked-clean-score-files.zip
It would have been cool to set the randomly generated encryption keys in
these files to a fixed value so that they cancel out and end up not actually
encrypting the file. Too bad that TH03 also started feeding each encrypted
byte back into its stream cipher, which makes this impossible.
The main loading and saving code turned out to be the second-cleanest
implementation of a score file format in PC-98 Touhou, just behind TH02.
Only two of the YUME.NEM functions come with nonsensical
differences between OP.EXE and MAINL.EXE, rather
than 📝 all of them, as in TH01 or
📝 too many of them, as in TH04 and TH05. As
for the rest of the per-difficulty structure though… well, it quickly
becomes clear why this was the final score file format to be RE'd. The name,
score, and stage fields are directly stored in terms of the internal
REGI*.BFT sprite IDs used on the high score screen. TH03 also
stores 10 score digits for each place rather than the 9 possible ones, keeps
any leading 0 digits, and stores the letters of entered names in reverse
order… yeah, let's decompile the high score screen as well, for a full
understanding of why ZUN might have done all that. (Answer: For no reason at
And wow, what a breath of fresh air. It's surely not
good-code: The overlapping shadows resulting from using
a 24-pixel letterspacing with 32-pixel glyphs in the name column led ZUN to
do quite a lot of unnecessary and slightly confusing rendering work when
moving the cursor back and forth, and he even forgot about the EGC there.
But it's nowhere close to the level of jank we saw in
📝 TH01's high score menu last year. Good to
see that ZUN had learned a thing or two by his third game – especially when
it comes to storing the character map cursor in terms of a character ID,
and improving the layout of the character map:
That's almost a nicely regular grid there. With the question mark and the
double-wide SP, BS, and END options, the cursor
movement code only comes with a reasonable two exceptions, which are easily
handled. And while I didn't get this screen completely decompiled,
one additional push was enough to cover all important code there.
The only potential glitch on this screen is a result of ZUN's continued use
decimal digits without any bounds check or cap. Like the in-game HUD
score display in TH04 and TH05, TH03's high score screen simply uses the
next glyph in the character set for the most significant digit of any score
above 1,000,000,000 points – in this case, the period. Still, it only
really gets bad at 8,000,000,000 points: Once the glyphs are
exhausted, the blitting function ends up accessing garbage data and filling
the entire screen with garbage pixels. For comparison though, the current world record
is 133,650,710 points, so good luck getting 8 billion in the first
Next up: Starting 2022 with the long-awaited decompilation of TH01's Sariel
fight! Due to the 📝 recent price increase,
we now got a window in the cap that
is going to remain open until tomorrow, providing an early opportunity to
set a new priority after Sariel is done.
EMS memory! The
infamous stopgap measure between the 640 KiB ("ought to be enough for
everyone") of conventional
memory offered by DOS from the very beginning, and the later XMS standard for
accessing all the rest of memory up to 4 GiB in the x86 Protected Mode. With
an optionally active EMS driver, TH04 and TH05 will make use of EMS memory
to preload a bunch of situational .CDG images at the beginning of
The "eye catch" game title image, shown while stages are loaded
The character-specific background image, shown while bombing
The player character dialog portraits
TH05 additionally stores the boss portraits there, preloading them
at the beginning of each stage. (TH04 instead keeps them in conventional
memory during the entire stage.)
Once these images are needed, they can then be copied into conventional
memory and accessed as usual.
Uh… wait, copied? It certainly would have been possible to map EMS
memory to a regular 16-bit Real Mode segment for direct access,
bank-switching out rarely used system or peripheral memory in exchange for
the EMS data. However, master.lib doesn't expose this functionality, and
only provides functions for copying data from EMS to regular memory and vice
But even that still makes EMS an excellent fit for the large image files
it's used for, as it's possible to directly copy their pixel data from EMS
to VRAM. (Yes, I tried!) Well… would, because ZUN doesn't do
that either, and always naively copies the images to newly allocated
conventional memory first. In essence, this dumbs down EMS into just another
layer of the memory hierarchy, inserted between conventional memory and
disk: Not quite as slow as disk, but still requiring that
memcpy() to retrieve the data. Most importantly though: Using
EMS in this way does not increase the total amount of memory
simultaneously accessible to the game. After all, some other data will have
to be freed from conventional memory to make room for the newly loaded data.
The most idiomatic way to define the game-specific layout of the EMS area
would be either a struct or an enum.
Unfortunately, the total size of all these images exceeds the range of a
16-bit value, and Turbo C++ 4.0J supports neither 32-bit enums
(which are silently degraded to 16-bit) nor 32-bit structs
(which simply don't compile). That still leaves raw compile-time constants
though, you only have to manually define the offset to each image in terms
of the size of its predecessor. But instead of doing that, ZUN just placed
each image at a nice round decimal offset, each slightly larger than the
actual memory required by the previous image, just to make sure that
everything fits. This results not only in quite
a bit of unnecessary padding, but also in technically the single
biggest amount of "wasted" memory in PC-98 Touhou: Out of the 180,000 (TH04)
and 320,000 (TH05) EMS bytes requested, the game only uses 135,552 (TH04)
and 175,904 (TH05) bytes. But hey, it's EMS, so who cares, right? Out of all
the opportunities to take shortcuts during development, this is among the
most acceptable ones. Any actual PC-98 model that could run these two games
comes with plenty of memory for this to not turn into an actual issue.
On to the EMS-using functions themselves, which are the definition of
"cross-cutting concerns". Most of these have a fallback path for the non-EMS
case, and keep the loaded .CDG images in memory if they are immediately
needed. Which totally makes sense, but also makes it difficult to find names
that reflect all the global state changed by these functions. Every one of
these is also just called from a single place, so inlining
them would have saved me a lot of naming and documentation trouble
The TH04 version of the EMS allocation code was actually displayed on ZUN's monitor in the
2010 MAG・ネット documentary; WindowsTiger already transcribed the low-quality video image
in 2019. By 2015 ReC98 standards, I would have just run with that, but
the current project goal is to write better code than ZUN, so I didn't. 😛
We sure ain't going to use magic numbers for EMS offsets.
The dialog init and exit code then is completely different in both games,
yet equally cross-cutting. TH05 goes even further in saving conventional
memory, loading each individual player or boss portrait into a single .CDG
slot immediately before blitting it to VRAM and freeing the pixel data
again. People who play TH05 without an active EMS driver are surely going to
enjoy the hard drive access lag between each portrait change…
TH04, on the other hand, also abuses the dialog
exit function to preload the Mugetsu defeat / Gengetsu entrance and
Gengetsu defeat portraits, using a static variable to track how often the
function has been called during the Extra Stage… who needs function
parameters anyway, right?
This is also the function in which TH04 infamously crashes after the Stage 5
pre-boss dialog when playing with Reimu and without any active EMS driver.
That crash is what motivated this look into the games' EMS usage… but the
code looks perfectly fine? Oh well, guess the crash is not related to EMS
then. Next u–
OK, of course I can't leave it like that. Everyone is expecting a fix now,
and I still got half of a push left over after decompiling the regular EMS
code. Also, I've now RE'd every function that could possibly be involved in
the crash, and this is very likely to be the last time I'll be looking at
Turns out that the bug has little to do with EMS, and everything to do with
ZUN limiting the amount of conventional RAM that TH04's
MAIN.EXE is allowed to use, and then slightly miscalculating
this upper limit. Playing Stage 5 with Reimu is the most asset-intensive
configuration in this game, due to the combination of
6 player portraits (Marisa has only 5), at 128×128 pixels each
a 288×256 background for the boss fight, tied in size only with the
ones in the Extra Stage
the additional 96×80 image for the vertically scrolling stars during
the stage, wastefully stored as 4 bitplanes rather than a single one.
This image is never freed, not even at the end of the stage.
The star image used in TH04's Stage 5.
Remove any single one of the above points, and this crash would have never
occurred. But with all of them combined, the total amount of memory consumed
by TH04's MAIN.EXE just barely exceeds ZUN's limit of 320,000
bytes, by no more than 3,840 bytes, the size of the star image.
But wait: As we established earlier, EMS does nothing to reduce the amount
of conventional memory used by the game. In fact, if you disabled TH04's EMS
handling, you'd still get this crash even if you are running an EMS
driver and loaded DOS into the High Memory Area to free up as much
conventional RAM as possible. How can EMS then prevent this crash in the
The answer: It's only because ZUN's usage of EMS bypasses the need to load
the cached images back out of the XOR-encrypted 東方幻想.郷
packfile. Leaving aside the general
stupidity of any game data file encryption*, master.lib's decryption
implementation is also quite wasteful: It uses a separate buffer that
receives fixed-size chunks of the file, before decrypting every individual
byte and copying it to its intended destination buffer. That really
resembles the typical slowness of a C fread() implementation
more than it does the highly optimized ASM code that master.lib purports to
be… And how large is this well-hidden decryption buffer? 4 KiB.
So, looking back at the game, here is what happens once the Stage 5
pre-battle dialog ends:
Reimu's bomb background image, which was previously freed to make space
for her dialog portraits, has to be loaded back into conventional memory
BB0.CDG is found inside the 東方幻想.郷
file_ropen() ends up allocating a 4 KiB buffer for the
encrypted packfile data, getting us the decisive ~4 KiB closer to the memory
The .CDG loader tries to allocate 52 608 contiguous bytes for the
pixel data of Reimu's bomb image
This would exceed the memory limit, so hmem_allocbyte()
fails and returns a nullptr
ZUN doesn't check for this case (as usual)
The pixel data is loaded to address 0000:0000,
overwriting the Interrupt Vector Table and whatever comes after
The game crashes
The final frame rendered by a crashing TH04.
The 4 KiB encryption buffer would only be freed by the corresponding
file_close() call, which of course never happens because the
game crashes before it gets there. At one point, I really did suspect the
cause to be some kind of memory leak or fragmentation inside master.lib,
which would have been quite delightful to fix.
Instead, the most straightforward fix here is to bump up that memory limit
by at least 4 KiB. Certainly easier than squeezing in a
cdg_free() call for the star image before the pre-boss dialog
without breaking position dependence.
Or, even better, let's nuke all these memory limits from orbit
because they make little sense to begin with, and fix every other potential
out-of-memory crash that modders would encounter when adding enough data to
any of the 4 games that impose such limits on themselves. Unless you want to
launch other binaries (which need to do their own memory allocations) after
launching the game, there's really no reason to restrict the amount of
memory available to a DOS process. Heck, whenever DOS creates a new one, it
assigns all remaining free memory by default anyway.
Removing the memory limits also removes one of ZUN's few error checks, which
end up quitting the game if there isn't at least a given maximum amount of
conventional RAM available. While it might be tempting to reserve enough
memory at the beginning of execution and then never check any allocation for
a potential failure, that's exactly where something like TH04's crash
This game is also still running on DOS, where such an initial allocation
failure is very unlikely to happen – no one fills close to half of
conventional RAM with TSRs and then tries running one of these games. It
might have been useful to detect systems with less than 640 KiB of
actual, physical RAM, but none of the PC-98 models with that little amount
of memory are fast enough to run these games to begin with. How ironic… a
place where ZUN actually added an error check, and then it's mostly
OK, TH01 missile bullets. Can we maybe have a well-behaved entity type,
without any weirdness? Just once?
Ehh, kinda. Apart from another 150 bytes wasted on unused structure members,
this code is indeed more on the low end in terms of overall jank. It does
become very obvious why dodging these missiles in the YuugenMagan, Mima, and
Elis fights feels so awful though: An unfair 46×46 pixel hitbox around
Reimu's center pixel, combined with the comeback of
📝 interlaced rendering, this time in every
stage. ZUN probably did this because missiles are the only 16×16 sprite in
TH01 that is blitted to unaligned X positions, which effectively ends up
touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would
have been totally possible to render every missile in every frame at roughly
the same amount of CPU time that the original game uses for interlaced
Note that all missile sprites only use two colors, white and green.
Instead of naively going with the usual four bitplanes, extract the
pixels drawn in each of the two used colors into their own bitplanes.
master.lib calls this the "tiny format".
Use the GRCG to draw these two bitplanes in the intended white and green
colors, halving the amount of VRAM writes compared to the original
(Not using the .PTN format would have also avoided the inconsistency of
storing the missile sprites in boss-specific sprite slots.)
That's an optimization that would have significantly benefitted the game, in
contrast to all of the fake ones
introduced in later games. Then again, this optimization is
actually something that the later games do, and it might have in fact been
necessary to achieve their higher bullet counts without significant
After some effectively unused Mima sprite effect code that is so broken that
it's impossible to make sense out of it, we get to the final feature I
wanted to cover for all bosses in parallel before returning to Sariel: The
separate sprite background storage for moving or animated boss sprites in
the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin
with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from
main memory on every frame. After all, TH01 and TH02 had a minimum required
clock speed of 33 MHz, half of the speed required for the later three games.
So, he simply blitted these boss sprites to both VRAM pages, leading
the usual unblitting calls to only remove the other sprites on top of the
boss. However, these bosses themselves want to move across the screen…
and this makes it necessary to save the stage background behind them
in some other way.
Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM
into a sprite slot. No problem with that approach in theory, as the size of
all these bigger sprites is a multiple of 32×32; splitting a larger sprite
into these smaller 32×32 chunks makes the code look just a little bit clumsy
(and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot
that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is
blitted on top of her regular sprite, using just regular sprite
transparency, she ends up with her infamous third arm:
Ironically, there's an unused code path in Mima's unblit function where ZUN
assumes a height of 48 pixels for Mima's animation sprites rather than the
actual 64. This leads to even clumsier .PTN function calls for the bottom
128×16 pixels… Failing to unblit the bottom 16 pixels would have also
yielded that third arm, although it wouldn't have looked as natural. Still
wouldn't say that it was intentional; maybe this casting sprite was just
added pretty late in the game's development?
So, mission accomplished, Sariel unblocked… at 2¼ pushes. That's quite some time left for some smaller stage initialization
code, which bundles a bunch of random function calls in places where they
logically really don't belong. The stage opening animation then adds a bunch
of VRAM inter-page copies that are not only redundant but can't even be
understood without knowing the hidden internal state of the last VRAM page
accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any
complexity limit on inlining arithmetic expressions, as long as they only
operate on compile-time constants. That's how we get macro-free,
compile-time Shift-JIS to JIS X 0208 conversion of the individual code
points in the 東方★靈異伝 string, in a compiler from 1994. As long as you
don't store any intermediate results in variables, that is…
But wait, there's more! With still ¼ of a push left, I also went for the
boss defeat animation, which includes the route selection after the SinGyoku
As in all other instances, the 2× scaled font is accomplished by first
rendering the text at regular 1× resolution to the other, invisible VRAM
page, and then scaled from there to the visible one. However, the route
selection is unique in that its scaled text is both drawn transparently on
top of the stage background (not onto a black one), and can also change
colors depending on the selection. It would have been no problem to unblit
and reblit the text by rendering the 1× version to a position on the
invisible VRAM page that isn't covered by the 2× version on the visible one,
but ZUN (needlessly) clears the invisible page before rendering any text.
Instead, he assigned a separate VRAM color for both
the 魔界 and 地獄 options, and only changed the palette value for
these colors to white or gray, depending on the correct selection. This is
another one of the
📝 rare cases where TH01 demonstrates good use of PC-98 hardware,
as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.
Then, why does this still not count as good-code? When
changing palette colors, you kinda need to be aware of everything
else that can possibly be on screen, which colors are used there, and which
aren't and can therefore be used for such an effect without affecting other
sprites. In this case, well… hover over the image below, and notice how
Reimu's hair and the bomb sprites in the HUD light up when Makai is
This push did end on a high note though, with the generic, non-SinGyoku
version of the defeat animation being an easily parametrizable copy. And
that's how you decompile another 2.58% of TH01 in just slightly over three
Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and
SinGyoku without needing any more detours into non-boss code. Thanks to the
current TH01 funding subscriptions, I can plan to cover most, if not all, of
Sariel in a single push series, but the currently 3 pending pushes probably
won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got
quite a lot of not specifically TH01-related funds in the backlog to pass
the time though.
Due to recent developments, it actually makes quite a lot of sense to take a
break from TH01: spaztron64 has
managed what every Touhou download site so far has failed to do: Bundling
all 5 game onto a single .HDI together with pre-configured PC-98
emulators and a nice boot menu, and hosting the resulting package on a
proper website. While this first release is already quite good (and much
better than my attempt from 2014), there is still a bit of room for
improvement to be gained from specific ReC98 research. Next up,
Researching how TH04 and TH05 use EMS memory, together with the cause
behind TH04's crash in Stage 5 when playing as Reimu without an EMS driver
reverse-engineering TH03's score data file format
(YUME.NEM), which hopefully also comes with a way of building a
file that unlocks all characters without any high scores.
Technical debt, part 5… and we only got TH05's stupidly optimized
.PI functions this time?
As far as actual progress is concerned, that is. In maintenance news
though, I was really hyped for the #include improvements I've
mentioned in 📝 the last post. The result: A
new x86real.h file, bundling all the declarations specific to
the 16-bit x86 Real Mode in a smaller file than Turbo C++'s own
DOS.H. After all, DOS is something else than the underlying
CPU. And while it didn't speed up build times quite as much as I had hoped,
it now clearly indicates the x86-specific parts of PC-98 Touhou code to
future port authors.
After another couple of improvements to parameter declaration in ASM land,
we get to TH05's .PI functions… and really, why did ZUN write all of
them in ASM? Why (re)declare all the necessary structures and data in
ASM land, when all these functions are merely one layer of abstraction
above master.lib, which does all the actual work?
I get that ZUN might have wanted masked blitting to be faster, which is
used for the fade-in effect seen during TH05's main menu animation and the
ending artwork. But, uh… he knew how to modify master.lib. In fact, he
did already modify the graph_pack_put_8() function
used for rendering a single .PI image row, to ignore master.lib's VRAM
clipping region. For this effect though, he first blits each row regularly
to the invisible 400th row of VRAM, and then does an EGC-accelerated
VRAM-to-VRAM blit of that row to its actual target position with the mask
enabled. It would have been way more efficient to add another version of
this function that takes a mask pattern. No amount of REP
MOVSW is going to change the fact that two VRAM writes per line are
slower than a single one. Not to mention that it doesn't justify writing
every other .PI function in ASM to go along with it…
This is where we also find the most hilarious aspect about this: For most
of ZUN's pointless micro-optimizations, you could have maybe made the
argument that they do save some CPU cycles here and there, and
therefore did something positive to the final, PC-98-exclusive result. But
some of the hand-written ASM here doesn't even constitute a
micro-optimization, because it's worse than what you would have got
out of even Turbo C++ 4.0J with its 80386 optimization flags!
At least it was possible to "decompile" 6 out of the 10 functions
here, making them easy to clean up for future modders and port authors.
Could have been 7 functions if I also decided to "decompile"
pi_free(), but all the C++ code is already surrounded by ASM,
resulting in 2 ASM translation units and 2 C++ translation units.
pi_free() would have needed a single translation unit by
itself, which wasn't worth it, given that I would have had to spell out
every single ASM instruction anyway.
There you go. What about this needed to be written in ASM?!?
The function calls between these small translation units even seemed to
glitch out TASM and the linker in the end, leading to one CALL
offset being weirdly shifted by 32 bytes. Usually, TLINK reports a fixup
overflow error when this happens, but this time it didn't, for some reason?
Mirroring the segment grouping in the affected translation unit did solve
the problem, and I already knew this, but only thought of it after spending
quite some RTFM time… during which I discovered the -lE
switch, which enables TLINK to use the expanded dictionaries in
Borland's .OBJ and .LIB files to speed up linking. That shaved off roughly
another second from the build time of the complete ReC98 repository. The
more you know… Binary blobs compiled with non-Borland tools would be the
only reason not to use this flag.
So, even more slowdown with this 5th dedicated push, since we've still only
repaid 41% of the technical debt in the SHARED segment so far.
Next up: Part 6, which hopefully manages to decompile the FM and SSG
channel animations in TH05's Music Room, and hopefully ends up being the
final one of the slow ones.
50% hype! 🎉 But as usual for TH01, even that final set of functions
shared between all bosses had to consume two pushes rather than one…
First up, in the ongoing series "Things that TH01 draws to the PC-98
graphics layer that really should have been drawn to the text layer
instead": The boss HP bar. Oh well, using the graphics layer at least made
it possible to have this half-red, half-white pattern
for the middle section.
This one pattern is drawn by making surprisingly good use of the GRCG. So
far, we've only seen it used for fast monochrome drawing:
// Setting up fast drawing using color #9 (1001 in binary)
outportb(0x7E, 0xFF); // Plane 0: (B): (********)
outportb(0x7E, 0x00); // Plane 1: (R): ( )
outportb(0x7E, 0x00); // Plane 2: (G): ( )
outportb(0x7E, 0xFF); // Plane 3: (E): (********)
// Write a checkerboard pattern (* * * * ) in color #9 to the top-left corner,
// with transparent blanks. Requires only 1 VRAM write to a single bitplane:
// The GRCG automatically writes to the correct bitplanes, as specified above
*(uint8_t *)(MK_FP(0xA800, 0)) = 0xAA;
But since this is actually an 8-pixel tile register, we can set any
8-pixel pattern for any bitplane. This way, we can get different colors
for every one of the 8 pixels, with still just a single VRAM write of the
alpha mask to a single bitplane:
And I thought TH01 only suffered the drawbacks of PC-98 hardware, making
so little use of its actual features that it's perhaps not fair to even
call it "a PC-98 game"… Still, I'd say that "bad PC-98 port of an idea"
describes it best.
However, after that tiny flash of brilliance, the surrounding HP rendering
code goes right back to being the typical sort of confusing TH01 jank.
There's only a single function for the three distinct jobs of
incrementing HP during the boss entrance animation,
decrementing HP if hit by the Orb, and
redrawing the entire bar, because it's still all in VRAM, and Sariel
wants different backgrounds,
with magic numbers to select between all of these.
VRAM of course also means that the backgrounds behind the individual hit
points have to be stored, so that they can be unblitted later as the boss
is losing HP. That's no big deal though, right? Just allocate some memory,
copy what's initially in VRAM, then blit it back later using your
foundational set of blitting funct– oh, wait, TH01 doesn't have this sort
of thing, right The closest thing,
📝 once again, are the .PTN functions. And
so, the game ends up handling these 8×16 background sprites with 16×16
wrappers around functions for 32×32 sprites.
That's quite the recipe for confusion, especially since ZUN
preferred copy-pasting the necessary ridiculous arithmetic expressions for
calculating positions, .PTN sprite IDs, and the ID of the 16×16 quarter
inside the 32×32 sprite, instead of just writing simple helper functions.
He did manage to make the result mostly bug-free this time
around, though! There's one minor hit point discoloration bug if the
red-white or white sections start at an odd number of hit points, but
that's never the case for any of the original 7 bosses.
The remaining sloppiness is ultimately inconsequential as well: The game
always backs up twice the number of hit point backgrounds, and thus
uses twice the amount of memory actually required. Also, this
self-restriction of only unblitting 16×16 pixels at a time requires any
remaining odd hit point at the last position to, of course, be rendered
After stumbling over the weakest imaginable random number
generator, we finally arrive at the shared boss↔orb collision
handling function, the final blocker among the final blockers. This
function takes a whopping 12 parameters, 3 of them being references to
int values, some of which are duplicated for every one of the
7 bosses, with no generic boss struct anywhere.
📝 Previously, I speculated that YuugenMagan might have been the first boss to be programmed for TH01.
With all these variables though, there is some new evidence that SinGyoku
might have been the first one after all: It's the only boss to use its own
HP and phase frame variables, with the other bosses sharing the same two
While this function only handles the response to a boss↔orb
collision, it still does way too much to describe it briefly. Took me
quite a while to frame it in terms of invincibility (which is the
main impact of all of this that can be observed in gameplay code). That
made at least some sort of sense, considering the other usages of
the variables passed as references to that function. Turns out that
YuugenMagan, Kikuri, and Elis abuse what's meant to be the "invincibility
frame" variable as a frame counter for some of their animations 🙄
Oh well, the game at least doesn't call the collision handling function
during those, so "invincibility frame" is technically still a
correct variable name there.
And that's it! We're finally ready to start with Konngara, in 2021. I've
been waiting quite a while for this, as all this high-level boss code is
very likely to speed up TH01 progress quite a bit. Next up though: Closing
out 2020 with more of the technical debt in the other games.
Done with the .BOS format, at last! While there's still quite a bunch of
undecompiled non-format blitting code left, this was in fact the final
piece of graphics format loading code in TH01.
📝 Continuing the trend from three pushes ago,
we've got yet another class, this time for the 48×48 and 48×32 sprites
used in Reimu's gohei, slide, and kick animations. The only reason these
had to use the .BOS format at all is simply because Reimu's regular
sprites are 32×32, and are therefore loaded from
📝 .PTN files.
Yes, this makes no sense, because why would you split animations for
the same character across two file formats and two APIs, just because
of a sprite size difference?
This necessity for switching blitting APIs might also explain why Reimu
vanishes for a few frames at the beginning and the end of the gohei swing
animation, but more on that once we get to the high-level rendering code.
Now that we've decompiled all the .BOS implementations in TH01, here's an
overview of all of them, together with .PTN to show that there really was
no reason for not using the .BOS API for all of Reimu's sprites:
Single-line and wave only
Per-file sprite limit
Pixels blitted at once
And even that last property could simply be handled by branching based on
the sprite width, and wouldn't be a reason for switching formats. But
well, it just wouldn't be TH01 without all that redundant bloat though,
The basic loading, freeing, and blitting code was yet another variation
on the other .BOS code we've seen before. So this should have caused just
as little trouble as the CBossAnim code… except that
CPlayerAnimdid add one slightly difficult function to
the mix, which led to it requiring almost a full push after all.
Similar to 📝 the unblitting code for moving lasers we've seen in the last push,
ZUN tries to minimize the amount of VRAM writes when unblitting Reimu's
slide animations. Technically, it's only necessary to restore the pixels
that Reimu traveled by, plus the ones that wouldn't be redrawn by
the new animation frame at the new X position.
The theoretically arbitrary distance between the two sprites is, of
course, modeled by a fixed-size buffer on the stack
, coming with the further assumption that the
sprite surely hasn't moved by more than 1 horizontal VRAM byte compared to
the last frame. Which, of course, results in glitches if that's not the
case, leaving little Reimu parts in VRAM if the slide speed ever exceeded
8 pixels per frame. (Which it never does,
being hardcoded to 6 pixels, but still.). As it also turns out, all those
bit masking operations easily lead to incredibly sloppy C code.
Which compiles into incredibly terrible ASM, which in turn might end up
wasting way more CPU time than the final VRAM write optimization would
have gained? Then again, in-depth profiling is way beyond the scope of
this project at this point.
Next up: The TH04 main menu, and some more technical debt.
This time around, laser is 📝 actually not
difficult, with TH01's shootout laser class being simple enough to nicely
fit into a single push. All other stationary lasers (as used by
YuugenMagan, for example) don't even use a class, and are simply treated
as regular lines with collision detection.
But of course, the shootout lasers also come with the typical share of
TH01 jank we've all come to expect by now. This time, it already starts
with the hardcoded sprite data:
A shootout laser can have a width from 1 to 8 pixels, so ZUN stored a
separate 16×1 sprite with a line for each possible width (left-to-right).
Then, he shifted all of these sprites 1 pixel to the right for all of the
8 possible start positions within a planar VRAM byte (top-to-bottom).
Because… doing that bit shift programmatically is way too
expensive, so let's pre-shift at compile time, and use 16× the memory per
Since a bunch of other sprite sheets need to be pre-shifted as well (this
is the 5th one we've found so far), our sprite converter has a feature to
automatically generate those pre-shifted variations. This way, we can
abstract away that implementation detail and leave modders with .BMP files
that still only contain a single version of each sprite. But, uh…, wait,
in this sprite sheet, the second row for 1-pixel lasers is accidentally
shifted right by one more pixel that it should have been?! Which means
we can't use the auto-preshift feature here, and have to store this
weird-looking (and quite frankly, completely unnecessary) sprite sheet in
ZUN did, at least during TH01's development, not have a sprite
converter, and directly hardcoded these dot patterns in the C++ code
The waste continues with the class itself. 69 bytes, with 22 bytes
outright unused, and 11 not really necessary. As for actual innovations
though, we've got
📝 another 32-bit fixed-point type, this
time actually using 8 bits for the fractional part. Therefore, the
ray position is tracked to the 1/256th of a pixel, using the full
precision of master.lib's 8-bit sin() and cos() lookup
Unblitting is also remarkably efficient: It's only done once the laser
stopped extending and started moving, and only for the exact pixels at the
start of the ray that the laser traveled by in a single frame. If only the
ray part was also rendered as efficiently – it's fully blitted every frame,
right next to the collision detection for each row of the ray.
With a public interface of two functions (spawn, and update / collide /
unblit / render), that's superficially all there is to lasers in this
game. There's another (apparently inlined) function though, to both reset
and, uh, "fully unblit" all lasers at the end of every boss fight… except
that it fails hilariously at doing the latter, and ends up effectively
unblitting random 32-pixel line segments, due to ZUN confusing both the
coordinates and the parameter types for the line unblitting function.
A while ago, I was asked about
this crash that tends to
happen when defeating Elis. And while you can clearly see the random
unblitted line segments that are missing from the sprites, I don't
quite think we've found the cause for the crash, since the
📝 line unblitting function used theredoes clip its coordinates to the VRAM range.
Next up: The final piece of image format code in TH01, covering Reimu's
Back to TH01, and its boss sprite format… with a separate class for
storing animations that only differs minutely from the
📝 regular boss entity class I covered last time?
Decompiling this class was almost free, and the main reason why the first
of these pushes ended up looking pretty huge.
Next up were the remaining shape drawing functions from the code segment
that started with the .GRC functions. P0105 already started these with the
(surprisingly sanely implemented) 8×8 diamond, star, and… uh, snowflake
prominently seen in the Konngara, Elis, and Sariel fights, respectively.
Now, we've also got:
ellipse arcs with a customizable angle distance between the individual
dots – mostly just used for drawing full circles, though
line loops – which are only used for the rotating white squares around
Mima, meaning that the white star in the YuugenMagan fight got a completely
and the surprisingly weirdest one, drawing the red invincibility
The weirdness becomes obvious with just a single screenshot:
First, we've got the obvious issue of the sprites not being clipped at the
right edge of VRAM, with the rightmost pixels in each row of the sprite
extending to the beginning of the next row. Well, that's just what you get
if you insist on writing unique low-level blitting code for the majority
of the individual sprites in the game… 🤷
More importantly though, the sprite sheet looks like this:
So how do we even get these fully filled red diamonds?
Well, turns out that the sprites are never consistently unblitted during
their 8 frames of animation. There is a function that looks
like it unblits the sprite… except that it starts with by enabling the
GRCG and… reading from the first bitplane on the background page?
If this was the EGC, such a read would fill some internal registers with
the contents of all 4 bitplanes, which can then subsequently be blitted to
all 4 bitplanes of any VRAM page with a single memory write. But with the
GRCG in RMW mode, reads do nothing special, and simply copy the memory
contents of one bitplane to the read destination. Maybe ZUN thought
that setting the RMW color to red
also sets some internal 4-plane mask register to match that color?
Instead, the rather random pixels read from the first bitplane are then
used as a mask for a second blit of the same red sprite.
Effectively, this only really "unblits" the invincibility pixels that are
drawn on top of Reimu's sprite. Since Reimu is drawn first, the
invincibility sprites are overwritten anyway. But due to the palette color
layout of Reimu's sprite, its pixels end up fully masking away any
invincibility sprite pixels in that second blit, leaving VRAM untouched as
a result. Anywhere else though, this animation quickly turns into the
union of all animation frames.
Then again, if that 16-dot-aligned rectangular unblitting function is all
you know about the EGC, and you can't be bothered to write a perfect
unblitter for 8×8 sprites, it becomes obvious why you wouldn't want to use
Because Reimu would barely be visible under all that flicker. In
comparison, those fully filled diamonds actually look pretty good.
After all that, the remaining time wouldn't have been enough for the next
few essential classes, so I closed out the push with three more VRAM
Single-bitplane pixel inversion inside a 32×32 square – the main effect
behind the discoloration seen in the bomb animation, as well as the
exploding squares at the end of Kikuri's and Sariel's entrance
EGC-accelerated VRAM row copies – the second half of smooth and fully
hardware-accelerated scrolling for backgrounds that are twice the size of
And finally, the VRAM page content transition function using meshed 8×8
squares, used for the blocky transition to Sariel's first and second phases.
Which is quite ridiculous in just how needlessly bloated it is. I'm positive
that this sort of thing could have also been accelerated using the PC-98's
EGC… although simply writing better C would have already gone a long way.
The function also comes with three unused mesh patterns.
And with that, ReC98, as a whole, is not only ⅓ done, but I've also fully
caught up with the feature backlog for the first time in the history of
this crowdfunding! Time to go into maintenance mode then, while we wait
for the next pushes to be funded. Got a huge backlog of tiny maintenance
issues to address at a leisurely pace, and of course there's also the
📝 16-bit build system waiting to be
You can now freely add or remove both data and code anywhere in TH05, by
editing the ReC98 codebase, writing your mod in ASM or C/C++, and
recompiling the code. Since all absolute memory addresses have now been
converted to labels, this will work without causing any instability. See
the position independence section in the FAQ
for a more thorough explanation about why this was a problem.
By extension, this also means that it's now theoretically possible
to use a different compiler on the source code. But:
What does this not mean?
The original ZUN code hasn't been completely reverse-engineered yet, let
alone decompiled. As the final PC-98 Touhou game, TH05 also happens to
have the largest amount of actual ZUN-written ASM that can't ever
be decompiled within ReC98's constraints of a legit source code
reconstruction. But a lot of the originally-in-C code is also still in
ASM, which might make modding a bit inconvenient right now. And while I
have decompiled a bunch of functions, I selected them largely
because they would help with PI (as requested by the backers), and not
because they are particularly relevant to typical modding interests.
As a result, the code might also be a bit confusingly organized. There's
quite a conflict between various goals there: On the one hand, I'd like to
only have a single instance of every function shared with earlier games,
as well as reduce ZUN's code duplication within a single game. On the
other hand, this leads to quite a lot of code being scattered all over the
place and then #include-pasted back together, except for the
📝 this doesn't work, and you'd have to use multiple translation units anyway…
I'm only beginning to figure out the best structure here, and some more
reverse-engineering attention surely won't hurt.
Also, keep in mind that the code still targets x86 Real Mode. To work
effectively in this codebase, you'd need some familiarity with
segmentation, and how to express it all in code. This tends to make
even regular C++ development about an order of magnitude harder,
especially once you want to interface with the remaining ASM code. That
part made -Tom- struggle quite a bit with implementing his
custom scripting language for the demo above. For now, he built that demo
on quite a limited foundation – which is why he also chose to release
neither the build nor the source publically for the time being.
So yeah, you're definitely going to need the TASM and Borland C++ manuals
tl;dr: We now know everything about this game's data, but not quite
as much about this game's code.
So, how long until source ports become a realistic project?
You probably want to wait for 100% RE, which is when everything
that can be decompiled has been decompiled.
Unless your target system is 16-bit Windows, in which case you could
theoretically start right away. 📝 Again,
this would be the ideal first system to port PC-98 Touhou to: It would
require all the generic portability work to remove the dependency on PC-98
hardware, thus paving the way for a subsequent port to modern systems,
yet you could still just drop in any undecompiled ASM.
Porting to IBM-compatible DOS would only be a harder and less universally
useful version of that. You'd then simply exchange one architecture, with
its idiosyncrasies and limits, for another, with its own set of
idiosyncrasies and limits. (Unless, of course, you already happen to be
intimately familiar with that architecture.) The fact that master.lib
provides DOS/V support would have only mattered if ZUN consistently used
it to abstract away PC-98 hardware at every single place in the code,
which is definitely not the case.
The list of actually interesting findings in this push is,
📝 again, very short. Probably the most
notable discovery: The low-level part of the code that renders Marisa's
laser from her TH04 Illusion Laser shot type is still present in
TH05. Insert wild mass guessing about potential beta version shot types…
Oh, and did you know that the order of background images in the Extra
Stage staff roll differs by character?
Next up: Finally driving up the RE% bar again, by decompiling some TH05
main menu code.
Finally, after a long while, we've got two pushes with barely anything to
talk about! Continuing the road towards 100% PI for TH05, these were
exactly the two pushes that TH05 MAINE.EXE PI was estimated
to additionally cost, relative to TH04's. Consequently, they mostly went
to TH05's unique data structures in the ending cutscenes, the score name
registration menu, and the
A unique feature in there is TH05's support for automatic text color
changes in its ending scripts, based on the first full-width Shift-JIS
codepoint in a line. The \c=codepoint,color
commands at the top of the _ED??.TXT set up exactly this
codepoint→color mapping. As far as I can tell, TH05 is the only Touhou
game with a feature like this – even the Windows Touhou games went back to
manually spelling out each color change.
The orb particles in TH05's staff roll also try to be a bit unique by
using 32-bit X and Y subpixel variables for their current position. With
still just 4 fractional bits, I can't really tell yet whether the extended
range was actually necessary. Maybe due to how the "camera scrolling"
through "space" was implemented? All other entities were pretty much the
usual fare, though.
12.4, 4.4, and now a 28.4 fixed-point format… yup,
📝 C++ templates were
definitely the right choice.
At the end of its staff roll, TH05 not only displays
the usual performance
verdict, but then scrolls in the scores at the end of each stage
before switching to the high score menu. The simplest way to smoothly
scroll between two full screens on a PC-98 involves a separate bitmap…
which is exactly what TH05 does here, reserving 28,160 bytes of its global
data segment for just one overly large monochrome 320×704 bitmap where
both the screens are rendered to. That's… one benefit of splitting your
game into multiple executables, I guess?
Not sure if it's common knowledge that you can actually scroll back and
forth between the two screens with the Up and Down keys before moving to
the score menu. I surely didn't know that before. But it makes sense –
might as well get the most out of that memory.
The necessary groundwork for all of this may have actually made
TH04's (yes, TH04's) MAINE.EXE technically
position-independent. Didn't quite reach the same goal for TH05's – but
what we did reach is ⅔ of all PC-98 Touhou code now being
position-independent! Next up: Celebrating even more milestones, as
-Tom- is about to finish development on his TH05
MAIN.EXE PI demo…
Alright, tooling and technical debt. Shouldn't be really much to talk
about… oh, wait, this is still ReC98
For the tooling part, I finished up the remaining ergonomics and error
handling for the
📝 sprite converter that Jonathan Campbell contributed two months ago.
While I familiarized myself with the tool, I've actually ran into some
unreported errors myself, so this was sort of important to me. Still got
no command-line help in there, but the error messages can now do that job
probably even better, since we would have had to write them anyway.
So, what's up with the technical debt then? Well, by now we've accumulated
quite a number of 📝 ASM code slices that
need to be either decompiled or clearly marked as undecompilable. Since we
define those slices as "already reverse-engineered", that decision won't
affect the numbers on the front page at all. But for a complete
decompilation, we'd still have to do this someday. So, rather than
incorporating this work into pushes that were purchased with the
expectation of measurable progress in a certain area, let's take the
"anything goes" pushes, and focus entirely on that during them.
The second code segment seemed like the best place to start with this,
since it affects the largest number of games simultaneously. Starting with
TH02, this segment contains a set of random "core" functions needed by the
binary. Image formats, sounds, input, math, it's all there in some
capacity. You could maybe call it all "libzun" or something like
that? But for the time being, I simply went with the obvious name,
seg2. Maybe I'll come up with something more convincing in
Oh, but wait, why were we assembling all the previous undecompilable ASM
translation units in the 16-bit build part? By moving those to the 32-bit
part, we don't even need a 16-bit TASM in our list of dependencies, as
long as our build process is not fully 16-bit.
And with that, ReC98 now also builds on Windows 95, and thus, every 32-bit
Windows version. 🎉 Which is certainly the most user-visible improvement
in all of these two pushes.
Back in 2015, I already decompiled all of TH02's seg2
functions. As suggested by the Borland compiler, I tried to follow a "one
translation unit per segment" layout, bundling the binary-specific
contents via #include. In the end, it required two
translation units – and that was even after manually inserting the
original padding bytes via #pragma codestring… yuck. But it
worked, compiled, and kept the linker's job (and, by extension,
segmentation worries) to a minimum. And as long as it all matched the
original binaries, it still counted as a valid reconstruction of ZUN's
However, that idea ultimately falls apart once TH03 starts mixing
undecompilable ASM code inbetween C functions. Now, we officially have no
choice but to use multiple C and ASM translation units, with maybe only
just one or two #includes in them…
…or we finally start reconstructing the actual seg2 library,
turning every sequence of related functions into its own translation unit.
This way, we can simply reuse the once-compiled .OBJ files for all the
binaries those functions appear in, without requiring that additional
layer of translation units mirroring the original segmentation.
The best example for this is
almost undecompilable function that generates a lookup table for
horizontally flipping 8 1bpp pixels. It's part of every binary since
TH03, but only used in that game. With the previous approach, we would
have had to add 9 C translation units, which would all have just
#included that one file. Now, we simply put the .OBJ file
into the correct place on the linker command line, as soon as we can.
💡 And suddenly, the linker just inserts the correct padding bytes itself.
The most immediate gains there also happened to come from TH03. Which is
also where we did get some tiny RE% and PI% gains out of this after
all, by reverse-engineering some of its sprite blitting setup code. Sure,
I should have done even more RE here, to also cover those 5 functions at
the end of code segment #2 in TH03's MAIN.EXE that were in
front of a number of library functions I already covered in this push. But
let's leave that to an actual RE push 😛
All in all though, I was just getting started with this; the real
gains in terms of removed ASM files are still to come. But in the
meantime, the funding situation has become even better in terms of
allowing me to focus on things nobody asked for. 🙂 So here's a slightly
better idea: Instead of spending two more pushes on this, let's shoot for
TH05 MAINE.EXE position independence next. If I manage to get
it done, we'll have a 100% position-independent TH05 by the time
-Tom- finishes his MAIN.EXE PI demo, rather
than the 94% we'd get from just MAIN.EXE. That's bound to
make a much better impression on all the people who will then
(re-)discover the project.
And indeed, I got to end my vacation with a lot of image format and
blitting code, covering the final two formats, .GRC and .BOS. .GRC was
nothing noteworthy – one function for loading, one function for
byte-aligned blitting, and one function for freeing memory. That's it –
not even a unblitting function for this one. .BOS, on the other hand…
…has no generic (read: single/sane) implementation, and is only
implemented as methods of some boss entity class. And then again for
Sariel's dress and wand animations, and then again for Reimu's
animations, both of which weren't even part of these 4 pushes. Looking
forward to decompiling essentially the same algorithms all over again… And
that's how TH01 became the largest and most bloated PC-98 Touhou game. So
yeah, still not done with image formats, even at 44% RE.
This means I also had to reverse-engineer that "boss entity" class… yeah,
what else to call something a boss can have multiple of, that may or may
not be part of a larger boss sprite, may or may not be animated, and that
may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this
class. Since renaming all these variables in ASM land is tedious anyway, I
went the extra mile and directly defined separate, meaningful names for
the entities of all bosses. These also now document the natural order in
which the bosses will ultimately be decompiled. So, unless a backer
requests anything else, this order will be:
(code for regular card-flipping stages)
As everyone kind of expects from TH01 by now, this class reveals yet
another… um, unique and quirky piece of code architecture. In
addition to the position and hitbox members you'd expect from a class like
this, the game also stores the .BOS metadata – width, height, animation
frame count, and 📝 bitplane pointer slot
number – inside the same class. But if each of those still corresponds to
one individual on-screen sprite, how can YuugenMagan have 5 eye sprites,
or Kikuri have more than one soul and tear sprite? By duplicating that
metadata, of course! And copying it from one entity to another
At this point, I feel like I even have to congratulate the game for not
actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760
bytes of waste would have definitely been noticeable in the DOS days.
Makes much more sense to waste that amount of space on an unused C++
exception handler, and a bunch of redundant, unoptimized blitting
(Thinking about it, YuugenMagan fits this entire system perfectly. And
together with its position in the game's code – last to be decompiled
means first on the linker command line – we might speculate that
YuugenMagan was the first boss to be programmed for TH01?)
So if a boss wants to use sprites with different sizes, there's no way
around using another entity. And that's why Girl-Elis and Bat-Elis are two
distinct entities internally, and have to manually sync their position.
Except that there's also a third one for Attacking-Girl-Elis,
because Girl-Elis has 9 frames of animation in total, and the global .BOS
bitplane pointers are divided into 4 slots of only 8 images each.
Same for SinGyoku, who is split into a sphere entity, a
person entity, and a… white flash entity for all three forms,
all at the same resolution. Or Konngara's facial expressions, which also
require two entities just for themselves.
And once you decompile all this code, you notice just how much of it the
game didn't even use. 13 of the 50 bytes of the boss entity class are
outright unused, and 10 bytes are used for a movement clamping and lock
system that would have been nice if ZUN also used it outside of
Kikuri's soul sprites. Instead, all other bosses ignore this system
completely, and just
the X/Y coordinates of the boss entities directly.
As for the rendering functions, 5 out of 10 are unused. And while those
definitely make up less than half of the code, I still must have
spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis'
entrance animation, the class provides functions for wavy blitting and
unblitting, which use a separate X coordinate for every line of the
sprite. But there's also an unused and sort of broken one for unblitting
two overlapping wavy sprites, located at the same Y coordinate. This might
indicate that Elis could originally split herself into two sprites,
similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind
of animation effect, who knows.
After over 3 months of TH01 progress though, it's finally time to look at
other games, to cover the rest of the crowdfunding backlog. Next up: Going
back to TH05, and getting rid of those last PI false positives. And since
I can potentially spend the next 7 weeks on almost full-time ReC98 work,
I've also re-opened the store until October!
So, let's finally look at some TH01 gameplay structures! The obvious
choices here are player shots and pellets, which are conveniently located
in the last code segment. Covering these would therefore also help in
transferring some first bits of data in REIIDEN.EXE from ASM
land to C land. (Splitting the data segment would still be quite
annoying.) Player shots are immediately at the beginning…
…but wait, these are drawn as transparent sprites loaded from .PTN files.
Guess we first have to spend a push on
📝 Part 2 of this format.
Hm, 4 functions for alpha-masked blitting and unblitting of both 16×16 and
32×32 .PTN sprites that align the X coordinate to a multiple of 8
(remember, the PC-98 uses a
VRAM memory layout, where 8 pixels correspond to a byte), but only one
function that supports unaligned blitting to any X coordinate, and only
for 16×16 sprites? Which is only called twice? And doesn't come with a
corresponding unblitting function?
Yeah, "unblitting". TH01 isn't
and uses the PC-98's second VRAM page exclusively to store a stage's
background and static sprites. Since the PC-98 has no hardware sprites,
all you can do is write pixels into VRAM, and any animated sprite needs to
be manually removed from VRAM at the beginning of each frame. Not using
double-buffering theoretically allows TH01 to simply copy back all 128 KB
of VRAM once per frame to do this. But that
would be pretty wasteful, so TH01 just looks at all animated sprites, and
selectively copies only their occupied pixels from the second to the first
Alright, player shot class methods… oh, wait, the collision functions
directly act on the Yin-Yang Orb, so we first have to spend a push on
that one. And that's where the impression we got from the .PTN
functions is confirmed: The orb is, in fact, only ever displayed at
byte-aligned X coordinates, divisible by 8. It's only thanks to the
constant spinning that its movement appears at least somewhat
This is purely a rendering issue; internally, its position is
tracked at pixel precision. Sadly, smooth orb rendering at any unaligned X
coordinate wouldn't be that trivial of a mod, because well, the
necessary functions for unaligned blitting and unblitting of 32×32 sprites
don't exist in TH01's code. Then again, there's so much potential for
optimization in this code, so it might be very possible to squeeze those
additional two functions into the same C++ translation unit, even without
More importantly though, this was the right time to decompile the core
functions controlling the orb physics – probably the highlight in these
three pushes for most people.
Well, "physics". The X velocity is restricted to the 5 discrete states of
-8, -4, 0, 4, and 8, and gravity is applied by simply adding 1 to the Y
velocity every 5 frames No wonder that this can
easily lead to situations in which the orb infinitely bounces from the
At least fangame authors now have
reference of how ZUN did it originally, because really, this bad
approximation of physics had to have been written that way on purpose. But
hey, it uses 64-bit floating-point variables!
…sometimes at least, and quite randomly. This was also where I had to
learn about Turbo C++'s floating-point code generation, and how rigorously
it defines the order of instructions when mixing double and
float variables in arithmetic or conditional expressions.
This meant that I could only get ZUN's original instruction order by using
literal constants instead of variables, which is impossible right now
without somehow splitting the data segment. In the end, I had to resort to
spelling out ⅔ of one function, and one conditional branch of another, in
inline ASM. 😕 If ZUN had just written 16.0 instead of
16.0f there, I would have saved quite some hours of my life
trying to decompile this correctly…
To sort of make up for the slowdown in progress, here's the TH01 orb
physics debug mod I made to properly understand them:
To use it, simply replace REIIDEN.EXE, and run the game
in debug mode, via game d on the DOS prompt.
Its code might also serve as an example of how to achieve this sort of
thing without position independence.
Alright, now it's time for player shots though. Yeah, sure, they
don't move horizontally, so it's not too bad that those are also
always rendered at byte-aligned positions. But, uh… why does this code
only use the 16×16 alpha-masked unblitting function for decaying shots,
and just sloppily unblits an entire 16×16 square everywhere else?
The worst part though: Unblitting, moving, and rendering player shots
is done in a single function, in that order. And that's exactly where
TH01's sprite flickering comes from. Since different types of sprites are
free to overlap each other, you'd have to first unblit all types, then
move all types, and then render all types, as done in later
PC-98 Touhou games. If you do these three steps per-type instead, you
will unblit sprites of other types that have been rendered before… and
therefore end up with flicker.
Oh, and finally, ZUN also added an additional sloppy 16×16 square unblit
call if a shot collides with a pellet or a boss, for some
guaranteed flicker. Sigh.
And that's ⅓ of all ZUN code in TH01 decompiled! Next up: Pellets!
Alright, the score popup numbers shown when collecting items or defeating
(mid)bosses. The second-to-last remaining big entity type in TH05… with
quite some PI false positives in the memory range occupied by its data.
Good thing I still got some outstanding generic RE pushes that haven't
been claimed for anything more specific in over a month! These
conveniently allowed me to RE most of these functions right away, the
Most of the false positives were boss HP values, passed to a "boss phase
end" function which sets the HP value at which the next phase should end.
Stage 6 Yuuka, Mugetsu, and EX-Alice have their own copies of this
function, in which they also reset certain boss-specific global variables.
Since I always like to cover all varieties of such duplicated functions at
once, it made sense to reverse-engineer all the involved variables while I
was at it… and that's why this was exactly the right time to cover the
implementation details of Stage 6 Yuuka's parasol and vanishing animations
With still a bit of time left in that RE push afterwards, I could also
start looking into some of the smaller functions that didn't quite fit
into other pushes. The most notable one there was a simple function that
aims from any point to the current player position. Which actually only
became a separate function in TH05, probably since it's called 27 times in
total. That's 27 places no longer being blocked from further RE progress.
did most of the work for the score popup numbers in January, which meant
that I only had to review it and bring it up to ReC98's current coding
styles and standards. This one turned out to be one of those rare features
whose TH05 implementation is significantly less insane than the
TH04 one. Both games lazily redraw only the tiles of the stage background
that were drawn over in the previous frame, and try their best to minimize
the amount of tiles to be redrawn in this way. For these popup numbers,
this involves calculating the on-screen width, based on the exact number
of digits in the point value. TH04 calculates this width every frame
during the rendering function, and even resorts to setting that field
through the digit iteration pointer via self-modifying code… yup. TH05, on
the other hand, simply calculates the width once when spawning a new popup
number, during the conversion of the point value to
decimal. The "×2" multiplier suffix being removed in TH05 certainly
also helped in simplifying that feature in this game.
And that's ⅓ of TH05 reverse-engineered! Next up, one more TH05 PI push,
in which the stage enemies hopefully finish all the big entity types.
Maybe it will also be accompanied by another RE push? In any case, that
will be the last piece of TH05 progress for quite some time. The next TH01
stretch will consist of 6 pushes at the very least, and I currently have
no idea of how much time I can spend on ReC98 a month from now…
Wait, PI for FUUIN.EXE is mainly blocked by the high score
menu? That one should really be properly decompiled in a separate
RE push, since it's also present in largely identical form in
REIIDEN.EXE… but I currently lack the explicit funding to do
And as it turns out, I shouldn't really capture any of the existing generic
RE contributions for it either. Back in 2018 when I ran the crowdfunding
on the Touhou Patch Center Discord server, I said that generic RE
contributions would never go towards TH01. No one was interested in that
game back then, and as it's significantly different from all the other
games, it made sense to only cover it if explicitly requested.
As Touhou Patch Center still remains one of the biggest supporters and
advertisers for ReC98, someone recently believed that this rule was still
in effect, despite not being mentioned anywhere on this website.
Fast forward to today, and TH01 has become the single most supported game
lately, with plenty of incomplete pushes still open to be completed.
Reverse-engineering it has proven to be quite efficient, yielding lots of
completion percentage points per push. This, I suppose, is exactly what
backers that don't give any specific priorities are mainly interested in.
Therefore, I will allocate future partial
contributions to TH01, whenever it makes sense.
So, instead of rushing TH01 PI, let's wait for Ember2528's
April subscription, and get the 25% total RE milestone with some TH05 PI
progress instead. This one primarily focused on the gather circles
(spirals…?), the third-last missing entity type in TH05. These are
rendered using the same 8×8 pellet sprite introduced in TH02… except that
the actual pellets received a darkened bottom part in TH04
Which, in turn, is actually rendered quite efficiently – the games first
render the top white part of all pellets, followed by the bottom gray part
of all pellets. The PC-98 GRCG is used throughout the process, doing its
typical job of accelerating monochrome blitting, and by arranging the
rendering like this, only two GRCG color changes are required to draw any
number of pellets. I guess that makes it quite a worthwhile
optimization? Don't ask me for specific performance numbers or even saved
Sadly, we've already reached the end of fast triple-speed TH01 progress
with 📝 the last push, which decompiled the
last segment shared by all three of TH01's executables. There's still a
bit of double-speed progress left though, with a small number of code
segments that are shared between just two of the three executables.
At the end of the first one of these, we've got all the code for the .GRZ
format – which is yet another run-length encoded image format, but this
time storing up to 16 full 640×400 16-color images with an alpha bit. This
one is exclusively used to wastefully store Konngara's sword slash and
animations. Due to… suboptimal code organization, the code for the format
is also present in OP.EXE, despite not being used there. But
hey, that brings TH01 to over 20% in RE!
Decoupling the RLE command stream from the pixel data sounds like a nice
idea at first, allowing the format to efficiently encode a variety of
animation frames displayed all over the screen… if ZUN actually made
use of it. The RLE stream also has quite some ridiculous overhead,
starting with 1 byte to store the 1-bit command (putting a single 8×1
pixel block, or entering a run of N such blocks). Run commands then store
another 1-byte run length, which has to be followed by another
command byte to identify the run as putting N blocks, or skipping N blocks.
And the pixel data is just a sequence of these blocks for all 4 bitplanes,
in uncompressed form…
Also, have some rips of all the images this format is used for:
To make these, I just wrote a small viewer, calling the same decompiled
TH01 code: 2020-03-07-grzview.zip
Obviously, this means that it not only must to be run on a PC-98, but also
discards the alpha information.
If any backers are really interested in having a proper converter
to and from PNG, I can implement that in an upcoming push… although that
would be the perfect thing for outside contributors to do.
Next up, we got some code for the PI format… oh, wait, the actual files
are called "GRP" in TH01.
Long time no see! And this is exactly why I've been procrastinating
bullets while there was still meaningful progress to be had in other parts
of TH04 and TH05: There was bound to be quite some complexity in this most
central piece of game logic, and so I couldn't possibly get to a
satisfying understanding in just one push.
Or in two, because their rendering involves another bunch of
micro-optimized functions adapted from master.lib.
Or in three, because we'd like to actually name all the bullet sprites,
since there are a number of sprite ID-related conditional branches. And
so, I was refining things I supposedly RE'd in the the commits from the
first push until the very end of the fourth.
When we talk about "bullets" in TH04 and TH05, we mean just two things:
the white 8×8 pellets, with a cap of 240 in TH04 and 180 in TH05, and any
16×16 sprites from MIKO16.BFT, with a cap of 200 in TH04 and
220 in TH05. These are by far the most common types of… err, "things the
player can collide with", and so ZUN provides a whole bunch of pre-made
motion, animation, and
n-way spread / ring / stack group options for those, which can be
selected by simply setting a few fields in the bullet template. All the
other "non-bullets" have to be fired and controlled individually.
Which is nothing new, since uth05win covered this part pretty accurately –
I don't think anyone could just make up these structure member
overloads. The interesting insights here all come from applying this
research to TH04, and figuring out its differences compared to TH05. The
most notable one there is in the default groups: TH05 allows you to add
to any single bullet, n-way spread or ring, but TH04 only lets you create
stacks separately from n-way spreads and rings, and thus gets by with
fewer fields in its bullet template structure. On the other hand, TH04 has
a separate "n-way spread with random angles, yet still aimed at the
player" group? Which seems to be unused, at least as far as
midbosses and bosses are concerned; can't say anything about stage enemies
In fact, TH05's larger bullet template structure illustrates that these
distinct group types actually are a rather redundant piece of
over-engineering. You can perfectly indicate any permutation of the basic
groups through just the stack bullet count (1 = no stack), spread bullet
count (1 = no spread), and spread delta angle (0 = ring instead of
spread). Add a 4-flag bitfield to cover the rest (aim to player, randomize
angle, randomize speed, force single bullet regardless of difficulty or
rank), and the result would be less redundant and even slightly
Even those 4 pushes didn't quite finish all of the bullet-related types,
stopping just shy of the most trivial and consistent enum that defines
special movement. This also left us in a
📝 TH03-like situation, in which we're still
a bit away from actually converting all this research into actual RE%. Oh
well, at least this got us way past 50% in overall position independence.
On to the second half! 🎉
For the next push though, we'll first have a quick detour to the remaining
C code of all the ZUN.COM binaries. Now that the
📝 TH04 and TH05 resident structures no
longer block those, -Tom- has requested TH05's
RES_KSO.COM to be covered in one of his outstanding pushes.
And since 32th System
recently RE'd TH03's resident structure, it makes sense to also review and
merge that, before decompiling all three remaining RES_*.COM
binaries in hopefully a single push. It might even get done faster than
that, in which case I'll then review and merge some more of
Now that's more like the speed I was expecting! After a few more
unused functions for palette fading and rectangle blitting, we've reached
the big line drawing functions. And the biggest one among them,
drawing a straight line at any angle between two points using
Bresenham's algorithm, actually happens to be the single longest
function present in more than one binary in all of PC-98 Touhou, and #23
on the list of individual longest functions.
And it technically has a ZUN bug! If you pass a point outside the
(0, 0) - (639, 399) screen range, the function will calculate a new point
at the edge of the screen, so that the resulting line will retain the
angle intended by the points given. Except that it does so by calculating
the line slope using an integer division rather than a floating-point one
Doesn't seem like it actually causes any weirdly
skewed lines to be drawn in-game, though; that case is only hit in the
Mima boss fight, which draws a few lines with a bottom coordinate of
400 rather than the maximum of 399. It might also cause the wrong
background pixels to be restored during parts of the YuugenMagan fight,
leading to flickering sprites, but seriously, that's an issue everywhere
you look in this game.
Together with the rendering-text-to-VRAM function we've mostly already
known from TH02, this pushed the total RE percentage well over 20%, and
almost doubled the TH01 RE percentage, all within three pushes. And
comparatively, it went really smoothly, to the point (ha) where I
even had enough time left to also include the single-point functions that
come next in that code segment. Since about half of the remaining
functions in OP.EXE are present in more than just itself,
I'll be able to at least keep up this speed until OP.EXE hits
the 70% RE mark. That is, as long as the backers' priorities continue to
be generic RE or "giving some love to TH01"… we don't have a precedent for
TH01's actual game code yet.
And that's all the TH01 progress funded for January! Next up, we actually
do have a focus on TH03's game and scoring mechanics… or at least
the foundation for that.
So, where to start? Well, TH04 bullets are hard, so let's
procrastinate start with TH03 instead
The 📝 sprite display functions are the
obvious blocker for any structure describing a sprite, and therefore most
meaningful PI gains in that game… and I actually did manage to fit a
decompilation of those three functions into exactly the amount of time
that the Touhou Patch Center community votes alloted to TH03
And a pretty amazing one at that. The original code was so obviously
written in ASM and was just barely decompilable by exclusively using
register pseudovariables and a bit of goto, but I was able to
abstract most of that away, not least thanks to a few helpful optimization
properties of Turbo C++… seriously, I can't stop marveling at this ancient
compiler. The end result is both readable, clear, and dare I say
portable?! To anyone interested in porting TH03,
take a look. How painful would it be to port that away from 16-bit
However, this push is also a typical example that the RE/PI priorities can
only control what I look at, and the outcome can actually differ
greatly. Even though the priorities were 65% RE and 35% PI, the progress
outcome was +0.13% RE and +1.35% PI. But hey, we've got one more push with
a focus on TH03 PI, so maybe that one will include more RE than
PI, and then everything will end up just as ordered?
Here we go, new C code! …eh, it will still take a bit to really get
decompilation going at the speeds I was hoping for. Especially with the
sheer amount of stuff that is set in the first few significant
functions we actually can decompile, which now all has to be
correctly declared in the C world. Turns out I spent the last 2 years
screwing up the case of exported functions, and even some of their names,
so that it didn't actually reflect their calling convention… yup. That's
just the stuff you tend to forget while it doesn't matter.
To make up for that, I decided to research whether we can make use of some
C++ features to improve code readability after all. Previously, it seemed
that TH01 was the only game that included any C++ code, whereas TH02 and
later seemed to be 100% C and ASM. However, during the development of the
soon to be released new build system, I noticed that even this old
compiler from the mid-90's, infamous for prioritizing compile speeds over
all but the most trivial optimizations, was capable of quite surprising
levels of automatic inlining with class methods…
…leading the research to culminate in the mindblow that is
9d121c7 – yes, we can use C++ class methods
and operator overloading to make the code more readable, while still
generating the same code than if we had just used C and preprocessor
Looks like there's now the potential for a few pull requests from outside
devs that apply C++ features to improve the legibility of previously
decompiled and terribly macro-ridden code. So, if anyone wants to help
without spending money…
Turns out I had only been about half done with the drawing routines. The rest was all related to redrawing the scrolling stage backgrounds after other sprites were drawn on top. Since the PC-98 does have hardware-accelerated scrolling, but no hardware-accelerated sprites, everything that draws animated sprites into a scrolling VRAM must then also make sure that the background tiles covered by the sprite are redrawn in the next frame, which required a bit of ZUN code. And that are the functions that have been in the way of the expected rapid reverse-engineering progress that uth05win was supposed to bring. So, looks like everything's going to go really fast now?
… yeah, no, we won't get very far without figuring out these drawing routines.
Which process data that comes from the .STD files.
Which has various arrays related to the background… including one to specify the scrolling speed. And wait, setting that to 0 actually is what starts a boss battle?
So, have a TH05 Boss Rush patch: 2018-12-26-TH05BossRush.zip
Theoretically, this should have also worked for TH04, but for some reason,
the Stage 3 boss gets stuck on the first phase if we do this?