Yup, there still are features that can be fully covered in a single push
and don't lead to sprawling blog posts. The giant
STAGE number and
HARRY UP messages, as well as the
flashing transparent 東方★靈異伝 at the beginning of each scene are drawn
by retrieving the glyphs for each letter from font ROM, and then "blitting"
them to text RAM by placing a colored fullwidth 16×16 square at every pixel
that is set in the font bitmap.
And 📝 once again, ZUN's code there matches
the mediocre example code for the related hardware interrupt from the
PC-9801 Programmers' Bible. It's not 100% copied this time, but
definitely inspired by the code on page 121. Therefore, we can conclude
that these letters are probably only displayed as these 16× scaled glyphs
because that book had code on how to achieve this effect.
ZUN "improved" on the example code by implementing a write-only cursor over
the entire text RAM that fills every 16×16 cell with a differently colored
space character, fully clearing the text RAM as a side effect. For once, he
even removed some redundancy here by using helper functions! It's all still
far from good-code though. For example, there's a
function for filling 5 rows worth of cells, which he uses for both the top
and bottom margin of these letters. But since the bottom margin starts at
the 22nd line, the code writes past the 25th line and into the second TRAM
page. Good that this page is not used by either the hardware or the game.
These cursor functions can actually write any fullwidth JIS code point to
text RAM… and seem to do that in a rather simplified way, because shouldn't
you set the most significant bit to indicate the right half of a fullwidth
character? That's what's written in the same book that ZUN copied all
functions out of, after all. 🤔 Researching this led me down quite the
rabbit hole, where I found an oddity in PC-98 text RAM rendering that no
single one of the widely-used PC-98 emulators gets completely right. I'm
almost done with the 2-push research into this issue, which will
include fixes for DOSBox-X and Neko Project II. The only thing I'm missing
to get these fully accurate is a screenshot of the output created by this binary, on any PC-98 model made by EPSON:
That's the reason why this push was rather delayed. Thanks in advance to
anyone who'd like to help with this!
In maybe more disappointing news: Sariel is going to be delayed for a while
longer. 😕 The player- and HUD-related functions, which previously delayed
further progress there, turned out to call a lot of not yet RE'd functions
themselves. Seems as if we're doing most of the
card-flipping code second, after all? Next up: Point and
bomb items, which at least are a significant step in terms of position
Didn't quite get to cover background rendering for TH05's Stage 1-5
bosses in this one, as I had to reverse-engineer two more fundamental parts
involved in boss background rendering before.
First, we got the those blocky transitions from stage tiles to bomb and
boss backgrounds, loaded from BB*.BB and ST*.BB,
respectively. These files store up to 8 frames of animation, with every bit
corresponding to a 16×16 tile on the playfield. With 384×368 pixels to be
covered, that would require 69 bytes per frame. But since that's a very odd
number to work with in micro-optimized ASM, ZUN instead stores 512×512
pixels worth of bits, ending up with a frame size of 128 bytes, and a
per-frame waste of 59 bytes. At least it was
possible to decompile the core blitting function as __fastcall
But wait, TH05 comes with, and loads, a bomb .BB file for every character,
not just for the Reimu and Yuuka bomb transitions you see in-game… 🤔
Restoring those unused stage tile → bomb image transition
animations for Mima and Marisa isn't that trivial without having decompiled
their actual bomb animation functions before, so stay tuned!
Interestingly though, the code leaves out what would look like the most
obvious optimization: All stage tiles are unconditionally redrawn
each frame before they're erased again with the 16×16 blocks, no matter if
they weren't covered by such a block in the previous frame, or are
going to be covered by such a block in this frame. The same is true
for the static bomb and boss background images, where ZUN simply didn't
write a .CDG blitting function that takes the dirty tile array into
account. If VRAM writes on PC-98 really were as slow as the games'
README.TXT files claim them to be, shouldn't all the
optimization work have gone towards minimizing them?
Oh well, it's not like I have any idea what I'm talking about here. I'd
better stop talking about anything relating to VRAM performance on PC-98…
Second, it finally was time to solve the long-standing confusion about all
those callbacks that are supposed to render the playfield background. Given
the aforementioned static bomb background images, ZUN chose to make this
needlessly complicated. And so, we have two callback function
pointers: One during bomb animations, one outside of bomb
animations, and each boss update function is responsible for keeping the
former in sync with the latter.
Other than that, this was one of the smoothest pushes we've had in a while;
the hardest parts of boss background rendering all were part of
📝 the last push. Once you figured out that
ZUN does indeed dynamically change hardware color #0 based on the current
boss phase, the remaining one function for Shinki, and all of EX-Alice's
background rendering becomes very straightforward and understandable.
Meanwhile, -Tom- told me about his plans to publicly
release 📝 his TH05 scripting toolkit once
TH05's MAIN.EXE would hit around 50% RE! That pretty much
defines what the next bunch of generic TH05 pushes will go towards:
bullets, shared boss code, and one
full, concrete boss script to demonstrate how it's all combined. Next up,
therefore: TH04's bullet firing code…? Yes, TH04's. I want to see what I'm
doing before I tackle the undecompilable mess that is TH05's bullet firing
code, and you all probably want readable code for that feature as
well. Turns out it's also the perfect place for Blue Bolt's
Alright, onto Konngara! Let's quickly move the escape sequences used later
in the battle to C land, and then we can immediately decompile the loading
and entrance animation function together with its filenames. Might as well
reverse-engineer those escape sequences while I'm at it, though – even if
they aren't implemented in DOSBox-X, they're well documented in all those
Japanese PDFs, so this should be no big deal…
…wait, ESC )3 switches to "graph mode"? As opposed to the
default "kanji mode", which can be re-entered via ESC )0?
Let's look up graph mode in the PC-9801 Programmers' Bible then…
> Kanji cannot be handled in this mode.
…and that's apparently all it has to say. Why have it then, on a platform
whose main selling point is a kanji ROM, and where Shift-JIS (and, well,
7-bit ASCII) are the only native encodings? No support for graph mode in
DOSBox-X either… yeah, let's take a deep dive into NEC's
IO.SYS, and get to the bottom of this.
And yes, graph mode pretty much just disables Shift-JIS decoding for
characters written via INT 29h, the lowest-level way of "just
printing a char" on DOS, which every printf()
will ultimately end up calling. Turns out there is a use for it though,
which we can spot by looking at the 8×16 half-width section of font ROM:
The half-width glyphs marked in red
correspond to the byte ranges from 0x80-0x9F and 0xE0-0xFF… which Shift-JIS
defines as lead bytes for two-byte, full-width characters. But if we turn
off Shift-JIS decoding…
(Yes, that g in the function row is how NEC DOS
indicates that graph mode is active. Try it yourself by pressing
Jackpot, we get those half-width characters when printing their
corresponding bytes. I've
re-implemented all my findings into DOSBox-X, which will include graph
mode in the upcoming 0.83.14 release. If P0140 looks a bit empty as a
result, that's why – most of the immediate feature work went into
DOSBox-X, not into ReC98. That's the beauty of "anything" pushes.
So, after switching to graph mode, TH01 does… one of the slowest possible
memset()s over all of text RAM – one printf(" ")
call for every single one of its 80×25 half-width cells – before switching
back to kanji mode. What a waste of RE time…? Oh well, at least we've now
got plenty of proof that these weird escape sequences actually do
nothing of interest.
As for the Konngara code itself… well, it's script-like code, what can you
say. Maybe minimally sloppy in some places, but ultimately harmless.
One small thing that might not be widely known though: The large,
blue-green Siddhaṃ seed syllables are supposed to show up immediately, with
no delay between them? Good to know. Clocking your emulator too low tends
to roll them down from the top of the screen, and will certainly add a
noticeable delay between the four individual images.
… Wait, but this means that ZUN could have intended this "effect".
Why else would he not only put those syllables into four individual images
(and therefore add at least the latency of disk I/O between them), but also
show them on the foreground VRAM page, rather than on the "back buffer"?
Meanwhile, in 📝 another instance of "maybe
having gone too far in a few places":
Expressing distances on the playfield as fractions of its width
and height, just to avoid absolute numbers? Raw numbers are bad because
they're in screen space in this game. But we've already been throwing
PLAYFIELD_ constants into the mix as a way of explicitly
communicating screen space, and keeping raw number literals for the actual
playfield coordinates is looking increasingly sloppy… I don't know,
fractions really seemed like the most sensible thing to do with what we're
given here. 😐
So, 2 pushes in, and we've got the loading code, the entrance animation,
facial expression rendering, and the first one out of Konngara's 12
danmaku patterns. Might not sound like much, but since that first pattern
blue-green diamond sprites and therefore is one of the more complicated
ones, it all amounts to roughly 21.6% of Konngara's code. That's 7 more
pushes to get Konngara done, then? Next up though: Two pushes of website
50% hype! 🎉 But as usual for TH01, even that final set of functions
shared between all bosses had to consume two pushes rather than one…
First up, in the ongoing series "Things that TH01 draws to the PC-98
graphics layer that really should have been drawn to the text layer
instead": The boss HP bar. Oh well, using the graphics layer at least made
it possible to have this half-red, half-white pattern
for the middle section.
This one pattern is drawn by making surprisingly good use of the GRCG. So
far, we've only seen it used for fast monochrome drawing:
// Setting up fast drawing using color #9 (1001 in binary)
outportb(0x7E, 0xFF); // Plane 0: (B): (********)
outportb(0x7E, 0x00); // Plane 1: (R): ( )
outportb(0x7E, 0x00); // Plane 2: (G): ( )
outportb(0x7E, 0xFF); // Plane 3: (E): (********)
// Write a checkerboard pattern (* * * * ) in color #9 to the top-left corner,
// with transparent blanks. Requires only 1 VRAM write to a single bitplane:
// The GRCG automatically writes to the correct bitplanes, as specified above
*(uint8_t *)(MK_FP(0xA800, 0)) = 0xAA;
But since this is actually an 8-pixel tile register, we can set any
8-pixel pattern for any bitplane. This way, we can get different colors
for every one of the 8 pixels, with still just a single VRAM write of the
alpha mask to a single bitplane:
And I thought TH01 only suffered the drawbacks of PC-98 hardware, making
so little use of its actual features that it's perhaps not fair to even
call it "a PC-98 game"… Still, I'd say that "bad PC-98 port of an idea"
describes it best.
However, after that tiny flash of brilliance, the surrounding HP rendering
code goes right back to being the typical sort of confusing TH01 jank.
There's only a single function for the three distinct jobs of
incrementing HP during the boss entrance animation,
decrementing HP if hit by the Orb, and
redrawing the entire bar, because it's still all in VRAM, and Sariel
wants different backgrounds,
with magic numbers to select between all of these.
VRAM of course also means that the backgrounds behind the individual hit
points have to be stored, so that they can be unblitted later as the boss
is losing HP. That's no big deal though, right? Just allocate some memory,
copy what's initially in VRAM, then blit it back later using your
foundational set of blitting funct– oh, wait, TH01 doesn't have this sort
of thing, right The closest thing,
📝 once again, are the .PTN functions. And
so, the game ends up handling these 8×16 background sprites with 16×16
wrappers around functions for 32×32 sprites.
That's quite the recipe for confusion, especially since ZUN
preferred copy-pasting the necessary ridiculous arithmetic expressions for
calculating positions, .PTN sprite IDs, and the ID of the 16×16 quarter
inside the 32×32 sprite, instead of just writing simple helper functions.
He did manage to make the result mostly bug-free this time
around, though! There's one minor hit point discoloration bug if the
red-white or white sections start at an odd number of hit points, but
that's never the case for any of the original 7 bosses.
The remaining sloppiness is ultimately inconsequential as well: The game
always backs up twice the number of hit point backgrounds, and thus
uses twice the amount of memory actually required. Also, this
self-restriction of only unblitting 16×16 pixels at a time requires any
remaining odd hit point at the last position to, of course, be rendered
After stumbling over the weakest imaginable random number
generator, we finally arrive at the shared boss↔orb collision
handling function, the final blocker among the final blockers. This
function takes a whopping 12 parameters, 3 of them being references to
int values, some of which are duplicated for every one of the
7 bosses, with no generic boss struct anywhere.
📝 Previously, I speculated that YuugenMagan might have been the first boss to be programmed for TH01.
With all these variables though, there is some new evidence that SinGyoku
might have been the first one after all: It's the only boss to use its own
HP and phase frame variables, with the other bosses sharing the same two
While this function only handles the response to a boss↔orb
collision, it still does way too much to describe it briefly. Took me
quite a while to frame it in terms of invincibility (which is the
main impact of all of this that can be observed in gameplay code). That
made at least some sort of sense, considering the other usages of
the variables passed as references to that function. Turns out that
YuugenMagan, Kikuri, and Elis abuse what's meant to be the "invincibility
frame" variable as a frame counter for some of their animations 🙄
Oh well, the game at least doesn't call the collision handling function
during those, so "invincibility frame" is technically still a
correct variable name there.
And that's it! We're finally ready to start with Konngara, in 2021. I've
been waiting quite a while for this, as all this high-level boss code is
very likely to speed up TH01 progress quite a bit. Next up though: Closing
out 2020 with more of the technical debt in the other games.
Alright, back to continuing the master.hpp transition started
in P0124, and repaying technical debt. The last blog post already
announced some ridiculous decompilations… and in fact, not a single
one of the functions in these two pushes was decompilable into
idiomatic C/C++ code.
As usual, that didn't keep me from trying though. The TH04 and TH05
version of the infamous 16-pixel-aligned, EGC-accelerated rectangle
blitting function from page 1 to page 0 was fairly average as far as
unreasonable decompilations are concerned.
The big blocker in TH03's MAIN.EXE, however, turned out to be
the .MRS functions, used to render the gauge attack portraits and bomb
backgrounds. The blitting code there uses the additional FS and GS segment
registers provided by the Intel 386… which
are not supported by Turbo C++'s inline assembler, and
can't be turned into pointers, due to a compiler bug in Turbo C++ that
generates wrong segment prefix opcodes for the _FS and
Apparently I'm the first one to even try doing that with this compiler? I
haven't found any other mention of this bug…
Compiling via assembly (#pragma inline) would work around
this bug and generate the correct instructions. But that would incur yet
another dependency on a 16-bit TASM, for something honestly quite
What we can always do, however, is using __emit__() to simply
output x86 opcodes anywhere in a function. Unlike spelled-out inline
assembly, that can even be used in helper functions that are supposed to
inline… which does in fact allow us to fully abstract away this compiler
bug. Regular if() comparisons with pseudo-registers
wouldn't inline, but "converting" them into C++ template function
specializations does. All that's left is some C preprocessor abuse
to turn the pseudo-registers into types, and then we do retain a
normal-looking poke() call in the blitting functions in the
Yeah… the result is
I may have gone too far in a few places…
One might certainly argue that all these ridiculous decompilations
actually hurt the preservation angle of this project. "Clearly, ZUN
couldn't have possibly written such unreasonable C++ code.
So why pretend he did, and not just keep it all in its more natural ASM
form?" Well, there are several reasons:
Future port authors will merely have to translate all the
pseudo-registers and inline assembly to C++. For the former, this is
typically as easy as replacing them with newly declared local variables. No
need to bother with function prolog and epilog code, calling conventions, or
the build system.
No duplication of constants and structures in ASM land.
As a more expressive language, C++ can document the code much better.
Meticulous documentation seems to have become the main attraction of ReC98
these days – I've seen it appreciated quite a number of times, and the
continued financial support of all the backers speaks volumes. Mods, on the
other hand, are still a rather rare sight.
Having as few .ASM files in the source tree as possible looks better to
casual visitors who just look at GitHub's repo language breakdown. This way,
ReC98 will also turn from an "Assembly project" to its rightful state
of "C++ project" much sooner.
And finally, it's not like the ASM versions are
gone – they're still part of the Git history.
Unfortunately, these pushes also demonstrated a second disadvantage in
trying to decompile everything possible: Since Turbo C++ lacks TASM's
fine-grained ability to enforce code alignment on certain multiples of
bytes, it might actually be unfeasible to link in a C-compiled object file
at its intended original position in some of the .EXE files it's used in.
Which… you're only going to notice once you encounter such a case. Due to
the slightly jumbled order of functions in the
📝 second, shared code segment, that might
be long after you decompiled and successfully linked in the function
And then you'll have to throw away that decompilation after all 😕 Oh
well. In this specific case (the lookup table generator for horizontally
flipping images), that decompilation was a mess anyway, and probably
helped nobody. I could have added a dummy .OBJ that does nothing but
enforce the needed 2-byte alignment before the function if I
really insisted on keeping the C version, but it really wasn't
Now that I've also described yet another meta-issue, maybe there'll
really be nothing to say about the next technical debt pushes?
Next up though: Back to actual progress
again, with TH01. Which maybe even ends up pushing that game over the 50%
Turns out that TH04's player selection menu is exactly three times as
complicated as TH05's. Two screens for character and shot type rather than
one, and a way more intricate implementation for saving and restoring the
background behind the raised top and left edges of a character picture
when moving the cursor between Reimu and Marisa. TH04 decides to backup
precisely only the two 256×8 (top) and 8×244 (left) strips behind the
edges, indicated in red in the picture
These take up just 4 KB of heap memory… but require custom blitting
functions, and expanding this explicitly hardcoded approach to TH05's 4
characters would have been pretty annoying. So, rather than, uh, not
explicitly hardcoding it all, ZUN decided to just be lazy with the backup
area in TH05, saving the entire 640×400 screen, and thus spending 128 KB
of heap memory on this rather simple selection shadow effect.
So, this really wasn't something to quickly get done during the first half
of a push, even after already having done TH05's equivalent of this menu.
But since life is very busy right now, I also used the occasion to start
addressing another code organization annoyance: master.lib's single master.h header file.
Now that ReC98 is trying to develop (or at least mimic) a more
type-safe C++ foundation to model the PC-98 hardware, a pure C header
(with counter-productive C++ extensions) is becoming increasingly
unidiomatic. By moving some of the original assumptions about function
parameters into the type system, we can also reduce the reliance on its
Japanese-only documentation without having to translate it
It's quite bloated, with at least 2800 lines of code that
currently are #included into the vast majority of files, not
counting master.h's recursively included C standard library
headers. PC-98 Touhou only makes direct use of a rather small fraction of
And finally, all the DOS/V compatibility definitions are especially
useless in the context of ReC98. As I've noted
📝 time and
📝 time again, porting PC-98 Touhou to
IBM-compatible DOS won't be easy, and MASTER_DOSV won't be
helping much. Therefore, my upstream version of ReC98 will never include
all of master.lib. There's no point in lengthening compile times for
everyone by default, and those will be getting quite noticeable
after moving to a full 16-bit build process.
(Actually, what retro system ports should rather be doing: Get rid
of master.lib's original ASM code, replace it with
C++, and then simply convert the optimized assembly output of modern
compilers to your ISA of choice. Improving the landscape of such
assembly or object file converters would benefit everyone!)
So, time to start a new master.hpp header that would contain
just the declarations from master.h that PC-98 Touhou
actually needs, plus some semantic (yes, semantic) sugar. Comparing just
the old master.h to just the new master.hpp
after roughly 60% of the transition has been completed, we get median
build times of 319 ms for master.h, and 144 ms for
master.hpp on my (admittedly rather slow) DOSBox setup.
As of this push, ReC98 consists of 107 translation units that have to be
compiled with Turbo C++ 4.0J. Fully rebuilding all of these currently
takes roughly 37.5 seconds in DOSBox. After the transition to
master.hpp is done, we could therefore shave some 10 to 15
seconds off this time, simply by switching header files. And that's just
the beginning, as this will also pave the way for further
#include optimizations. Life in this codebase will be great!
Unfortunately, there wasn't enough time to repay some of the actual
technical debt I was looking forward to, after all of this. Oh well, at
least we now also have nice identifiers for the three different boldface
options that are used when rendering text to VRAM, after procrastinating
that issue for almost 11 months. Next up, assuming the existing
subscriptions: More ridiculous decompilations of things that definitely
weren't originally written in C, and a big blocker in TH03's
Back to TH01, and its boss sprite format… with a separate class for
storing animations that only differs minutely from the
📝 regular boss entity class I covered last time?
Decompiling this class was almost free, and the main reason why the first
of these pushes ended up looking pretty huge.
Next up were the remaining shape drawing functions from the code segment
that started with the .GRC functions. P0105 already started these with the
(surprisingly sanely implemented) 8×8 diamond, star, and… uh, snowflake
prominently seen in the Konngara, Elis, and Sariel fights, respectively.
Now, we've also got:
ellipse arcs with a customizable angle distance between the individual
dots – mostly just used for drawing full circles, though
line loops – which are only used for the rotating white squares around
Mima, meaning that the white star in the YuugenMagan fight got a completely
and the surprisingly weirdest one, drawing the red invincibility
The weirdness becomes obvious with just a single screenshot:
First, we've got the obvious issue of the sprites not being clipped at the
right edge of VRAM, with the rightmost pixels in each row of the sprite
extending to the beginning of the next row. Well, that's just what you get
if you insist on writing unique low-level blitting code for the majority
of the individual sprites in the game… 🤷
More importantly though, the sprite sheet looks like this:
So how do we even get these fully filled red diamonds?
Well, turns out that the sprites are never consistently unblitted during
their 8 frames of animation. There is a function that looks
like it unblits the sprite… except that it starts with by enabling the
GRCG and… reading from the first bitplane on the background page?
If this was the EGC, such a read would fill some internal registers with
the contents of all 4 bitplanes, which can then subsequently be blitted to
all 4 bitplanes of any VRAM page with a single memory write. But with the
GRCG in RMW mode, reads do nothing special, and simply copy the memory
contents of one bitplane to the read destination. Maybe ZUN thought
that setting the RMW color to red
also sets some internal 4-plane mask register to match that color?
Instead, the rather random pixels read from the first bitplane are then
used as a mask for a second blit of the same red sprite.
Effectively, this only really "unblits" the invincibility pixels that are
drawn on top of Reimu's sprite. Since Reimu is drawn first, the
invincibility sprites are overwritten anyway. But due to the palette color
layout of Reimu's sprite, its pixels end up fully masking away any
invincibility sprite pixels in that second blit, leaving VRAM untouched as
a result. Anywhere else though, this animation quickly turns into the
union of all animation frames.
Then again, if that 16-dot-aligned rectangular unblitting function is all
you know about the EGC, and you can't be bothered to write a perfect
unblitter for 8×8 sprites, it becomes obvious why you wouldn't want to use
Because Reimu would barely be visible under all that flicker. In
comparison, those fully filled diamonds actually look pretty good.
After all that, the remaining time wouldn't have been enough for the next
few essential classes, so I closed out the push with three more VRAM
Single-bitplane pixel inversion inside a 32×32 square – the main effect
behind the discoloration seen in the bomb animation, as well as the
exploding squares at the end of Kikuri's and Sariel's entrance
EGC-accelerated VRAM row copies – the second half of smooth and fully
hardware-accelerated scrolling for backgrounds that are twice the size of
And finally, the VRAM page content transition function using meshed 8×8
squares, used for the blocky transition to Sariel's first and second phases.
Which is quite ridiculous in just how needlessly bloated it is. I'm positive
that this sort of thing could have also been accelerated using the PC-98's
EGC… although simply writing better C would have already gone a long way.
The function also comes with three unused mesh patterns.
And with that, ReC98, as a whole, is not only ⅓ done, but I've also fully
caught up with the feature backlog for the first time in the history of
this crowdfunding! Time to go into maintenance mode then, while we wait
for the next pushes to be funded. Got a huge backlog of tiny maintenance
issues to address at a leisurely pace, and of course there's also the
📝 16-bit build system waiting to be
So, let's finally look at some TH01 gameplay structures! The obvious
choices here are player shots and pellets, which are conveniently located
in the last code segment. Covering these would therefore also help in
transferring some first bits of data in REIIDEN.EXE from ASM
land to C land. (Splitting the data segment would still be quite
annoying.) Player shots are immediately at the beginning…
…but wait, these are drawn as transparent sprites loaded from .PTN files.
Guess we first have to spend a push on
📝 Part 2 of this format.
Hm, 4 functions for alpha-masked blitting and unblitting of both 16×16 and
32×32 .PTN sprites that align the X coordinate to a multiple of 8
(remember, the PC-98 uses a
VRAM memory layout, where 8 pixels correspond to a byte), but only one
function that supports unaligned blitting to any X coordinate, and only
for 16×16 sprites? Which is only called twice? And doesn't come with a
corresponding unblitting function?
Yeah, "unblitting". TH01 isn't
and uses the PC-98's second VRAM page exclusively to store a stage's
background and static sprites. Since the PC-98 has no hardware sprites,
all you can do is write pixels into VRAM, and any animated sprite needs to
be manually removed from VRAM at the beginning of each frame. Not using
double-buffering theoretically allows TH01 to simply copy back all 128 KB
of VRAM once per frame to do this. But that
would be pretty wasteful, so TH01 just looks at all animated sprites, and
selectively copies only their occupied pixels from the second to the first
Alright, player shot class methods… oh, wait, the collision functions
directly act on the Yin-Yang Orb, so we first have to spend a push on
that one. And that's where the impression we got from the .PTN
functions is confirmed: The orb is, in fact, only ever displayed at
byte-aligned X coordinates, divisible by 8. It's only thanks to the
constant spinning that its movement appears at least somewhat
This is purely a rendering issue; internally, its position is
tracked at pixel precision. Sadly, smooth orb rendering at any unaligned X
coordinate wouldn't be that trivial of a mod, because well, the
necessary functions for unaligned blitting and unblitting of 32×32 sprites
don't exist in TH01's code. Then again, there's so much potential for
optimization in this code, so it might be very possible to squeeze those
additional two functions into the same C++ translation unit, even without
More importantly though, this was the right time to decompile the core
functions controlling the orb physics – probably the highlight in these
three pushes for most people.
Well, "physics". The X velocity is restricted to the 5 discrete states of
-8, -4, 0, 4, and 8, and gravity is applied by simply adding 1 to the Y
velocity every 5 frames No wonder that this can
easily lead to situations in which the orb infinitely bounces from the
At least fangame authors now have
reference of how ZUN did it originally, because really, this bad
approximation of physics had to have been written that way on purpose. But
hey, it uses 64-bit floating-point variables!
…sometimes at least, and quite randomly. This was also where I had to
learn about Turbo C++'s floating-point code generation, and how rigorously
it defines the order of instructions when mixing double and
float variables in arithmetic or conditional expressions.
This meant that I could only get ZUN's original instruction order by using
literal constants instead of variables, which is impossible right now
without somehow splitting the data segment. In the end, I had to resort to
spelling out ⅔ of one function, and one conditional branch of another, in
inline ASM. 😕 If ZUN had just written 16.0 instead of
16.0f there, I would have saved quite some hours of my life
trying to decompile this correctly…
To sort of make up for the slowdown in progress, here's the TH01 orb
physics debug mod I made to properly understand them:
To use it, simply replace REIIDEN.EXE, and run the game
in debug mode, via game d on the DOS prompt.
Its code might also serve as an example of how to achieve this sort of
thing without position independence.
Alright, now it's time for player shots though. Yeah, sure, they
don't move horizontally, so it's not too bad that those are also
always rendered at byte-aligned positions. But, uh… why does this code
only use the 16×16 alpha-masked unblitting function for decaying shots,
and just sloppily unblits an entire 16×16 square everywhere else?
The worst part though: Unblitting, moving, and rendering player shots
is done in a single function, in that order. And that's exactly where
TH01's sprite flickering comes from. Since different types of sprites are
free to overlap each other, you'd have to first unblit all types, then
move all types, and then render all types, as done in later
PC-98 Touhou games. If you do these three steps per-type instead, you
will unblit sprites of other types that have been rendered before… and
therefore end up with flicker.
Oh, and finally, ZUN also added an additional sloppy 16×16 square unblit
call if a shot collides with a pellet or a boss, for some
guaranteed flicker. Sigh.
And that's ⅓ of all ZUN code in TH01 decompiled! Next up: Pellets!
The original ZUN code hasn't been completely decompiled yet. The final
high-level parts of both the main menu and the cutscenes are still ASM,
which might make modding a bit inconvenient right now.
It's not that much more code though, and could quickly be covered in a few
pushes if requested. Due to the plentiful monthly subscriptions, the shop
will stay closed for regular orders until the end of June, but backers
with outstanding contributions could request that now if they want
to – simply drop me a mail. Otherwise, the "generic TH01 RE" money will
continue to go towards the main game. That way, we'll have more substance
to show once we do decide to decompile the rest of
OP.EXE and FUUIN.EXE, and likely get some press
coverage as a result.
Then again, we've been building up to this point over the last few pushes,
and it only really needed a quick look over the remaining false positives.
The majority of the time therefore went towards more PI in
REIIDEN.EXE, where the bitplane pointers for .BOS files yielded
some quite big gains. Couldn't really find any obvious reason why ZUN used
two slighly different variations on loading and blitting those files,
As the final function in this rather random push, we got TH01's
hardware-powered scrolling function, used for screen shaking effects and
the scrolling backgrounds at the start of the Final Boss stages. And while
I tried to document all these I/O writes… it turned out that ZUN actually
copied the entire function straight from the PC-9801 Programmers'
Bible, with no changes. It's the
setgsta() example function on page 150. Which is terribly
suboptimal and bloated – all those integer divisions are really
not how you'd write such code for a 16-bit compiler from the 90's…
And that gives us 60% PI overall, and 50% PI over all of TH01! Next up:
More structures… and classes, even?
Last part of TH01's main graphics function segment, and we've got even
more code that alternates between being boring and being slightly weird.
But at least, "boring" also meant "consistent" for once. And
so progress continued to be as fast as expected from the last TH01 pushes,
yielding 3.3% in TH01 RE%, and 1% in overall RE%, within a single day.
There even was enough time to decompile another full code segment, which
bundles all the hardware initialization and cleanup calls into single
functions to be run when starting and exiting the game. Which might be
interesting for at least one person, I guess
But seriously, trying to access page 2 on a system with only page 0 and 1?
Had to get out my real PC-98 to double-check that I wasn't missing
anything here, since every emulator only looks at the bottom bit of the
page number. But real hardware seems to do the same, and there really is
nothing special to it semantically, being equivalent to page 0. 🤷
Next up in TH01, we'll have some file format code!
Now that's more like the speed I was expecting! After a few more
unused functions for palette fading and rectangle blitting, we've reached
the big line drawing functions. And the biggest one among them,
drawing a straight line at any angle between two points using
Bresenham's algorithm, actually happens to be the single longest
function present in more than one binary in all of PC-98 Touhou, and #23
on the list of individual longest functions.
And it technically has a ZUN bug! If you pass a point outside the
(0, 0) - (639, 399) screen range, the function will calculate a new point
at the edge of the screen, so that the resulting line will retain the
angle intended by the points given. Except that it does so by calculating
the line slope using an integer division rather than a floating-point one
Doesn't seem like it actually causes any weirdly
skewed lines to be drawn in-game, though; that case is only hit in the
Mima boss fight, which draws a few lines with a bottom coordinate of
400 rather than the maximum of 399. It might also cause the wrong
background pixels to be restored during parts of the YuugenMagan fight,
leading to flickering sprites, but seriously, that's an issue everywhere
you look in this game.
Together with the rendering-text-to-VRAM function we've mostly already
known from TH02, this pushed the total RE percentage well over 20%, and
almost doubled the TH01 RE percentage, all within three pushes. And
comparatively, it went really smoothly, to the point (ha) where I
even had enough time left to also include the single-point functions that
come next in that code segment. Since about half of the remaining
functions in OP.EXE are present in more than just itself,
I'll be able to at least keep up this speed until OP.EXE hits
the 70% RE mark. That is, as long as the backers' priorities continue to
be generic RE or "giving some love to TH01"… we don't have a precedent for
TH01's actual game code yet.
And that's all the TH01 progress funded for January! Next up, we actually
do have a focus on TH03's game and scoring mechanics… or at least
the foundation for that.
With no feedback to 📝 last week's blog post,
I assume you all are fine with how things are going? Alright then, another
one towards position independence, with the same approach as before…
Since -Tom- wanted to learn something about how the PC-98
EGC is used in TH04 and TH05, I took a look at master.lib's
egc_shift_*() functions. These simply do a hardware-accelerated
memmove() of any VRAM region, and are used for screen shaking
effects. Hover over the image below for the raw effect:
Then, I finally wanted to take a look at the bullet structures, but it
required way too much reverse-engineering to even start within ¾ of
a position independence push. Even with the help of uth05win –
bullet handling was changed quite a bit from TH04 to TH05.
What I ultimately settled on was more raw, "boring" PI work based around
an already known set of functions. For this one, I looked at vector
construction… and this time, that actually made the games a little
bit more position-independent, and wasn't just all about removing
false positives from the calculation. This was one of the few sets of
functions that would also apply to TH01, and it revealed just how
chaotically that game was coded. This one commit shows three ways how ZUN
stored regular 2D points in TH01:
"regularly", like in master.lib's Point structure (X
first, Y second)
reversed, (Y first and X second), then obviously with two distinct
variables declared next to each other
… yeah. But in more productive news, this did actually lay the
groundwork for TH04 and TH05 bullet structures. Which might even be coming
up within the next big, 5-push order from Touhou Patch Center? These are
the priorities I got from them, let's see how close I can get!
Turns out I had only been about half done with the drawing routines. The rest was all related to redrawing the scrolling stage backgrounds after other sprites were drawn on top. Since the PC-98 does have hardware-accelerated scrolling, but no hardware-accelerated sprites, everything that draws animated sprites into a scrolling VRAM must then also make sure that the background tiles covered by the sprite are redrawn in the next frame, which required a bit of ZUN code. And that are the functions that have been in the way of the expected rapid reverse-engineering progress that uth05win was supposed to bring. So, looks like everything's going to go really fast now?
… yeah, no, we won't get very far without figuring out these drawing routines.
Which process data that comes from the .STD files.
Which has various arrays related to the background… including one to specify the scrolling speed. And wait, setting that to 0 actually is what starts a boss battle?
So, have a TH05 Boss Rush patch: 2018-12-26-TH05BossRush.zip
Theoretically, this should have also worked for TH04, but for some reason,
the Stage 3 boss gets stuck on the first phase if we do this?
Let's start this stretch with a pretty simple entity type, the growing and shrinking circles shown during bomb animations and around bosses in TH04 and TH05. Which can be drawn in varying colors… wait, what's all this inlined and duplicated GRCG mode and color setting code? Let's move that out into macros too, it takes up too much space when grepping for constants, and will raise a "wait, what was that I/O port doing again" question for most people reading the code again after a few months.
🎉 With this push, we've also hit a milestone! Less than 200,000 unknown x86 instructions remain until we've completely reverse-engineered all of PC-98 Touhou.
> OK, let's do a quick ReC98 update before going back to thcrap, shouldn't take long
> Hm, all that input code is kind of in the way, would be nice to cover that first to ease comparisons with uth05win's source code
> What the hell, why does ZUN do this? Need to do more research
> OK, research done, wait, what are those other functions doing?
> Wha, everything about this is just ever so slightly awkward
Which ended up turning this one update into 2/10, 3/10, 4/10 and 5/10 of zorg's reverse-engineering commits. But at least we now got all shared input functions of TH02-TH05 covered and well understood.