Yet another small interruption before we get to Shuusou Gyoku, but only
because I've got a big announcement to make! Touhou Patch Center has just
commissioned the basic feature set that would allow PC-98 Touhou to be
translated into non-ASCII languages. 💸 And we're in fact doing it on PC-98,
and don't wait for the games to be ported to other systems first.
How is this going to work?
This project will start sometime after I've completed the current big
project of porting Shuusou
Gyoku to Linux, so probably in the last few months of 2023. Similar to
the previous MediaWiki update, this will bypass the ReC98 push and cap
model: Touhou Patch Center is going to guarantee a minimum budget out of
their Open Collective funds, which can be increased with further donations
from the community, and I'm going to send an invoice once I'm done. In
addition, I'm also going to keep in contact with all interested translators
and backers via a Discord room throughout the process for additional
technical quality control.
Since ReC98 hasn't really focused on text so far, the amount of moddable
text-related code would be fairly limited if I started working on it right
away. Currently, it only includes the following:
All of TH01
All of TH02's OP.EXE, i.e., the main menu and the Music
TH04's and TH05's endings
Therefore, I'll put any general and unconstrained reverse-engineering,
position independence, or anything contributions that come in during
the next few months towards covering everything that's still missing there.
I've scheduled the in-game dialog system of TH04 and TH05 for mid-August,
after the next two Shuusou Gyoku-related deliveries, and here's what we'd
still be missing afterward:
TH02's dialog system has seen no RE done yet and is completely different
from TH04's and TH05's, so we'd need another 2-3 pushes there. Technically,
we'd also need MAIN.EXE to be 100% position-independent for any
translation-related code modifications, but reaching that goal before I get
to work on translation support is probably unrealistic. If I tackle this
game last, I might figure out something to bypass that requirement.
The Music Rooms of TH03, TH04, and TH05 need roughly 2 pushes for their
combined RE and finalization. Most of their code is shared with TH02's Music
Room which I decompiled back in 2015.
TH02's MAINE.EXE needs to be fully RE'd since all of its
screens contain translatable text. Due to its highly copy-pasted nature, it
should be much cheaper than currently estimated on the main page. 3 pushes
should be enough there, though.
The main menu help strings in TH04's and TH05's OP.EXE are
very conveniently located, and could be RE'd within 0.25 pushes at
Similarly, TH04's and TH05's MAINE.EXE contains some not
yet RE'd text in their verdict screens. 1 push there, since it's
unfortunately contained in the same function that also performs the highly
complex skill value calculation.
TH03's MAINL.EXE needs 100% PI to enable translations of
the Stage 8/9 cutscenes and endings, as well as a bit of RE for the
end-of-stage text. Let's go with 1.75 pushes there just to be safe, rounding
up the 0.25 above.
TH03's MAIN.EXE has the ＷＩＮＮＥＲ
ＢＯＮＵＳ popup as its only translatable text. Like TH02's
MAIN.EXE, the entire binary should be position-independent
before doing translations. Let's also do that last.
In total, that's the next 11 general pushes that will go towards ensuring
translatability of most of PC-98 Touhou, if possible. If you'd like your
contribution (or existing subscription) to go to gameplay
code instead, be sure to tell me!
What's the minimum guaranteed set of features?
The main feature will be a custom renderer for a subsetted, monospaced
Unicode bitmap font, and its integration into any translatable part of the
game. For the script files, this means UTF-8 support with Shift-JIS
fallback. For the glyphs, I'll use GNU Unifont by default, but we
could also use any other freely licensed bitmap font with 8×16 or 16×16
glyphs for alphabets of certain languages. Everything about this will be the
real deal: The system will potentially support all of Unicode without font
ROM hacks so that the translations will work on real hardware, and there
will be no shortcuts for just a few Latin characters. And if someone wants
to translate this game into a language with more complex
shaping rules, I'll make sure that they look pretty as well if there's
some budget left.
This will allow translation teams to build static translation patches into
any language by editing the original script files, and using
tools for any images. Modifications of hardcoded strings would still
require recompiling the binary, and each group would have to distribute and
advertise the result on their own.
Which languages are we getting?
As of 2023-07-28, the following translators and teams have expressed
Spanish, Latin American: Xziled, DarkeyeSide, Mr. Tremolo
How much better could it be?
The most important feature: We could finally move away from the concept
of translation patches, integrate all translations as part of the
ReC98 repo, and ship them directly as part of new ReC98 builds. Languages
could then be switched at runtime, through a new setting in the Option
And why stop there? How about binding a keyboard key to a new
language selection window that can be opened at any point during the
game, and even switches out any text that is currently shown on
I could finally translate a
canon Touhou game via a gettext-like dictionary
system. This would allow modded source text to override translations, and
even make it possible to translate mods as well.
Ideally, all translators I get to work with are highly motivated and
finish translating each game they start, so that we don't even have to think
stacking, but maybe we still should.
We could use proportional fonts instead of aligning every glyph to the
8×16 text RAM grid. Unlike the Windows Touhou games where proportional fonts
are crucial because adding more text space would desync replays, they are
not that important in PC-98 Touhou. With no replays to be desynced,
we can arbitrarily add new boxes without worrying about the font.
However, supporting proportional fonts would make it possible to
lift some of the text sprites into the custom font system, allowing
their glyphs to be shared more easily across languages:
On the topic of new text boxes: Automatic line and box breaks at word
boundaries would completely remove the need for in-game proofreading.
In-game TL notes… nah, probably not. Where would we even put them in the
original screen layout?
Due to the continued interest in TH01's Anniversary Edition, any code
modifications would be exclusive to a respective game's bugfixed
anniversary branch – i.e., any translated builds will be
bundled with a growing number of fixes for issues in the original games that
fall under my
current definition of bugs. This avoids a combinatorial explosion
of the number of branches, merges, and releases I'd have to do. For a small
amount of extra money though, I could merge them back to the ZUN
bug-preserving debloated branch. And for a lot of extra
money, I could reimplement everything on master while
preserving the original memory layout of ZUN's original binaries. This would
allow the translation-supporting binaries to be easily diffed against the
original ones, and retain compatibility with existing hacks or cheat tables.
The latter was already something that
📝 the Shuusou Gyoku community previously expected my recompiled builds to have.
I could top off the project with some smaller, more intricate
localizations that translators might request for certain languages. Most
notably, this category would include any localization of TH01's
STAGE #, and
HARRY UP popups that goes beyond just
fixing the Engrish.
The previous static English patches from 2014 introduced quite a few
fanfiction changes that have been interpreted as canon in the years since. I
could write a blog post to highlight these, and also compare the translation
as a whole with the more literal English translation we're likely to get
this time around.
Finally, if you all really want to, I could move all translatable
content to the Touhou Patch Center interface, which would truly turn that
site into the one central translation source for all canon Touhou games.
Automatic updates won't be feasible before porting away the games from PC-98
hardware, so the thpatch server would have to communicate with the ReC98
repo via a GitHub webhook. This will be rather expensive though, as I'd also
have to set up some kind of build/release CI for ReC98 first.
These features are mostly independent of each other, and it will be up to
Touhou Patch Center to pick a priority order. That's also where all of you
could come in and influence this order with your donations. So it's closer
to a traditional crowdfunding campaign with stretch goals, where the sky is
the limit, than it is to the usual ReC98 model. And while there can be no
fixed prices for any of the goals, you can be sure that anything you invest
will improve the quality of the final product.
From now on, this will be the only way of funding any translation-related
goals; I've removed the respective options from the ReC98 order form.
Looking forward to how many of these additional ideas I get to implement –
but, as always, please invest responsibly.
Well, well. My original plan was to ship the first step of Shuusou Gyoku
OpenGL support on the next day after this delivery. But unfortunately, the
complications just kept piling up, to a point where the required solutions
definitely blow the current budget for that goal. I'm currently sitting on
over 70 commits that would take at least 5 pushes to deliver as a meaningful
release, and all of that is just rearchitecting work, preparing the
game for a not too Windows-specific OpenGL backend in the first place. I
haven't even written a single line of OpenGL yet… 🥲
This shifts the intended Big Release Month™ to June after all. Now I know
that the next round of Shuusou Gyoku features should better start with the
SC-88Pro recordings, which are much more likely to get done within their
current budget. At least I've already completed the configuration versioning
system required for that goal, which leaves only the actual audio part.
So, TH04 position independence. Thanks to a bit of funding for stage
dialogue RE, non-ASCII translations will soon become viable, which finally
presents a reason to push TH04 to 100% position independence after
📝 TH05 had been there for almost 3 years. I
haven't heard back from Touhou Patch Center about how much they want to be
involved in funding this goal, if at all, but maybe other backers are
interested as well.
And sure, it would be entirely possible to implement non-ASCII translations
in a way that retains the layout of the original binaries and can be easily
compared at a binary level, in case we consider translations to be a
critical piece of infrastructure. This wouldn't even just be an exercise in
needless perfectionism, and we only have to look to Shuusou Gyoku to realize
why: Players expected
that my builds were compatible with existing SpoilerAL SSG files, which
was something I hadn't even considered the need for. I mean, the game is
open-source 📝 and I made it easy to build.
You can just fork the code, implement all the practice features you want in
a much more efficient way, and I'd probably even merge your code into my
But I get it – recompiling the game yields just yet another build that can't
be easily compared to the original release. A cheat table is much more
trustworthy in giving players the confidence that they're still practicing
the same original game. And given the current priorities of my backers,
it'll still take a while for me to implement proof by replay validation,
which will ultimately free every part of the community from depending on the
original builds of both Seihou and PC-98 Touhou.
However, such an implementation within the original binary layout would
significantly drive up the budget of non-ASCII translations, and I sure
don't want to constantly maintain this layout during development. So, let's
chase TH04 position independence like it's 2020, and quickly cover a larger
amount of PI-relevant structures and functions at a shallow level. The only
parts I decompiled for now contain calculations whose intent can't be
clearly communicated in ASM. Hitbox visualizations or other more in-depth
research would have to wait until I get to the proper decompilation of these
But even this shallow work left us with a large amount of TH04-exclusive
code that had its worst parts RE'd and could be decompiled fairly quickly.
If you want to see big TH04 finalization% gains, general TH04 progress would
be a very good investment.
The first push went to the often-mentioned stage-specific custom entities
that share a single statically allocated buffer. Back in 2020, I
📝 wrongly claimed that these were a TH05 innovation,
but the system actually originated in TH04. Both games use a 26-byte
structure, but TH04 only allocates a 32-element array rather than TH05's
64-element one. The conclusions from back then still apply, but I also kept
wondering why these games used a static array for these entities to begin
with. You know what they call an area of memory that you can cleanly
repurpose for things? That's right, a heap!
And absolutely no one would mind one additional heap allocation at the start
of a stage, next to the ones for all the sprites and portraits.
However, we are still running in Real Mode with segmented memory. Accessing
anything outside a common data segment involves modifying segment registers,
which has a nonzero CPU cycle cost, and Turbo C++ 4.0J is terrible at
optimizing away the respective instructions. Does this matter? Probably not,
but you don't take "risks" like these if you're in a permanent
The 4 📝 bits used in Marisa's Stage 4 boss
fight. Coincidentally also related to the rare Divide Error
crash in that fight.
Stage 4 Reimu's spinning orbs. Note how the game uses two different sets
of sprites just to have two different outline colors. This was probably
better than messing with the palette, which can easily cause unintended
effects if you only have 16 colors to work with. Heck, I have an entire blog post tag just to highlight
these cases. Capped to the full 32 entities.
The chasing cross bullets, seen in Phase 14 of the same Stage 6 Yuuka
fight. Featuring some smart sprite work, making use of point symmetry to
achieve a fluid animation in just 4 frames. This is
good-code in sprite form. Capped to 31 entities, because
the 32nd custom entity during this fight is defined to be…
The single purple pulsating and shrinking safety circle, seen in Phase 4 of
the same fight. The most interesting aspect here is actually still related
to the cross bullets, whose spawn function is wrongly limited to 32 entities
and could theoretically overwrite this circle. This
is strictly landmine territory though:
Yuuka never uses these bullets and the safety circle
She never spawns more than 24 cross bullets
All cross bullets are fast enough to have left the screen by the
time Yuuka restarts the corresponding subpattern
The cross bullets spawn at Yuuka's center position, and assign its
Q12.4 coordinates to structure fields that the safety circle interprets
as raw pixels. The game does try to render the circle afterward, but
since Yuuka's static position during this phase is nowhere near a valid
pixel coordinate, it is immediately clipped.
The flashing lines seen in Phase 5 of the Gengetsu fight,
telegraphing the slightly random bullet columns.
These structures only took 1 push to reverse-engineer rather than the 2 I
needed for their TH05 counterparts because they are much simpler in this
game. The "structure" for Gengetsu's lines literally uses just a single X
position, with the remaining 24 bytes being basically padding. The only
minor bug I found on this shallow level concerns Marisa's bits, which are
clipped at the right and bottom edges of the playfield 16 pixels earlier
than you would expect:
The remaining push went to a bunch of smaller structures and functions:
The structure for the up to 2 "thick" (a.k.a. "Master Spark") lasers. Much
saner than the
📝 madness of TH05's laser system while being
equally customizable in width and duration.
The structure for the various monochrome 16×16 shapes in the background of
the Stage 6 Yuuka fight, drawn on top of the checkerboard.
The rendering code for the three falling stars in the background of Stage 5.
The effect here is entirely palette-related: After blitting the stage tiles,
the 📝 1bpp star image is ORed
into only the 4th VRAM plane, which is equivalent to setting the
highest bit in the palette color index of every pixel within the star-shaped
region. This of course raises the question of how the stage would look like
if it was fully illuminated:
Most code that modifies a stage's tile map, and directly specifies tiles via
their top-left offset in VRAM.
Thanks to code alignment reasons, this forced a much longer detour into the
.STD format loader. Nothing all too noteworthy there since we're still
missing the enemy script and spawn structures before we can call .STD
"reverse-engineered", but maybe still helpful if you're looking for an
overview of the format. Also features a buffer overflow landmine if a .STD
file happens to contain more than 32 enemy scripts… you know, the usual
To top off the second push, we've got the vertically scrolling checkerboard
background during the Stage 6 Yuuka fight, made up of 32×32 squares. This
one deserves a special highlight just because of its needless complexity.
You'd think that even a performant implementation would be pretty simple:
Set the GRCG to TDW mode
Set the GRCG tile to one of the two square colors
Start with Y as the current scroll offset, and X
as some indicator of which color is currently shown at the start of each row
Iterate over all lines of the playfield, filling in all pixels that
should be displayed in the current color, skipping over the other ones
Count down Y for each line drawn
If Y reaches 0, reset it to 32 and flip X
At the bottom of the playfield, change the GRCG tile to the other color,
and repeat with the initial value of X flipped
The most important aspect of this algorithm is how it reduces GRCG state
changes to a minimum, avoiding the costly port I/O that we've identified
time and time again as one of the main bottlenecks in TH01. With just 2
state variables and 3 loops, the resulting code isn't that complex either. A
naive implementation that just drew the squares from top to bottom in a
single pass would barely be simpler, but much slower: By changing the GRCG
tile on every color, such an implementation would burn a low 5-digit number
of CPU cycles per frame for the 12×11.5-square checkerboard used in the
And indeed, ZUN retained all important aspects of this algorithm… but still
implemented it all in ASM, with a ridiculous layer of x86 segment arithmetic
on top? Which blows up the complexity to 4 state
variables, 5 nested loops, and a bunch of constants in unusual units. I'm
not sure what this code is supposed to optimize for, especially with that
rather questionable register allocation that nevertheless leaves one of the
general-purpose registers unused. Fortunately,
the function was still decompilable without too many code generation hacks,
and retains the 5 nested loops in all their goto-connected
glory. If you want to add a checkerboard to your next PC-98
demo, just stick to the algorithm I gave above.
(Using a single XOR for flipping the starting X offset between 32 and 64
pixels is pretty nice though, I have to give him that.)
This makes for a good occasion to talk about the third and final GRCG mode,
completing the series I started with my previous coverage of the
📝 RMW and
📝 TCR modes. The TDW (Tile Data Write) mode
is the simplest of the three and just writes the 8×1 GRCG tile into VRAM
as-is, without applying any alpha bitmask. This makes it perfect for
clearing rectangular areas of pixels – or even all of VRAM by doing a single
// Set up the GRCG in TDW mode.
// Fill the tile register with color #7 (0111 in binary).
outportb(0x7E, 0xFF); // Plane 0: (B): (********)
outportb(0x7E, 0xFF); // Plane 1: (R): (********)
outportb(0x7E, 0xFF); // Plane 2: (G): (********)
outportb(0x7E, 0x00); // Plane 3: (E): ( )
// Set the 32 pixels at the top-left corner of VRAM to the exact contents of
// the tile register, effectively repeating the tile 4 times. In TDW mode, the
// GRCG ignores the CPU-supplied operand, so we might as well just pass the
// contents of a register with the intended width. This eliminates useless load
// instructions in the compiled assembly, and even sort of signals to readers
// of this code that we do not care about the source value.
*reinterpret_cast<uint32_t far *>(MK_FP(0xA800, 0)) = _EAX;
// Fill the entirety of VRAM with the GRCG tile. A simple C one-liner that will
// probably compile into a single `REP STOS` instruction. Unfortunately, Turbo
// C++ 4.0J only ever generates the 16-bit `REP STOSW` here, even when using
// the `__memset__` intrinsic and when compiling in 386 mode. When targeting
// that CPU and above, you'd ideally want `REP STOSD` for twice the speed.
memset(MK_FP(0xA800, 0), _AL, ((640 / 8) * 400));
However, this might make you wonder why TDW mode is even necessary. If it's
functionally equivalent to RMW mode with a CPU-supplied bitmask made up
entirely of 1 bits (i.e., 0xFF, 0xFFFF, or
0xFFFFFFFF), what's the point? The difference lies in the
hardware implementation: If all you need to do is write tile data to
VRAM, you don't need the read and modify parts of RMW mode
which require additional processing time. The PC-9801 Programmers'
Bible claims a speedup of almost 2× when using TDW mode over equivalent
operations in RMW mode.
And that's the only performance claim I found, because none of these old
PC-98 hardware and programming books did any benchmarks. Then again, it's
not too interesting of a question to benchmark either, as the byte-aligned
nature of TDW blitting severely limits its use in a game engine anyway.
Sure, maybe it makes sense to temporarily switch from RMW to TDW mode
if you've identified a large rectangular and byte-aligned section within a
sprite that could be blitted without a bitmask? But the necessary
identification work likely nullifies the performance gained from TDW mode,
I'd say. In any case, that's pretty deep
micro-optimization territory. Just use TDW mode for the
few cases it's good at, and stick to RMW mode for the rest.
So is this all that can be said about the GRCG? Not quite, because there are
4 bits I haven't talked about yet…
And now we're just 5.37% away from 100% position independence for TH04! From
this point, another 2 pushes should be enough to reach this goal. It might
not look like we're that close based on the current estimate, but a
big chunk of the remaining numbers are false positives from the player shot
control functions. Since we've got a very special deadline to hit, I'm going
to cobble these two pushes together from the two current general
subscriptions and the rest of the backlog. But you can, of course, still
invest in this goal to allow the existing contributions to go to something
… Well, if the store was actually open. So I'd better
continue with a quick task to free up some capacity sooner rather than
later. Next up, therefore: Back to TH02, and its item and player systems.
Shouldn't take that long, I'm not expecting any surprises there. (Yeah, I
know, famous last words…)
Stripe is now
properly integrated into this website as an alternative to PayPal! Now, you
can also financially support the project if PayPal doesn't work for you, or
if you prefer using a
provider out of Stripe's greater variety. It's unfortunate that I had to
ship this integration while the store is still sold out, but the Shuusou
Gyoku OpenGL backend has turned out way too complicated to be finished next
to these two pushes within a month. It will take quite a while until the
store reopens and you all can start using Stripe, so I'll just link back to
this blog post when it happens.
Integrating Stripe wasn't the simplest task in the world either. At first,
the Checkout API
seems pretty friendly to developers: The entire payment flow is handled on
the backend, in the server language of your choice, and requires no frontend
backend API endpoint initiates the Stripe Checkout session, answers with a
redirect to Stripe, and Stripe then sends a redirect back to your server if
the customer completed the payment. Superficially, this server-based
approach seems much more GDPR-friendly than PayPal, because there are no
remote scripts to obtain consent for. In reality though, Stripe shares
much more potential personal data about your credit card or bank
account with a merchant, compared to PayPal's almost bare minimum of
It's also rather annoying how the backend has to persist the order form
information throughout the entire Checkout session, because it would
otherwise be lost if the server restarts while a customer is still busy
entering data into Stripe's Checkout form. Compare that to the PayPal
only makes the integration harder: If you trigger the initial payment
to improvise a bit to avoid the CORS error when redirecting away to a
But sure, it's all not too bad… for regular orders at least. With
subscriptions, however, things get much worse. Unlike PayPal, Stripe
kind of wants to stay out of the way of the payment process as much as
possible, and just be a wrapper around its supported payment methods. So if
customers aren't really meant to register with Stripe, how would they cancel
the… merchant? Which I quite dislike in principle, because why should
you have to trust me to actually cancel your subscription after you
requested it? It also means that I probably should add some sort of UI for
self-canceling a Stripe subscription, ideally without adding full-blown user
accounts. Not that this solves the underlying trust issue, but it's more
convenient than contacting me via email or, worse, going through your bank
somehow. Here is how my solution works:
When setting up a Stripe subscription, the server will generate a random
ID for authentication. This ID is then used as a salt for a hash
of the Stripe subscription ID, linking the two without storing the latter on
The thank you page, which is parameterized with the Stripe
Checkout session ID, will use that ID to retrieve the subscription
ID via an API call to Stripe, and display it together with the above
salt. This works indefinitely – contrary to what the expiry field in the
Checkout session object suggests, Stripe sessions are indeed stored
forever. After all, Stripe also displays this session information in a
merchant's transaction log with an excessive amount of detail. It might have
been better to add my own expiration system to these pages, but this had
been taking long enough already. For now, be aware that sharing the link to
a Stripe thank you page is equivalent to sharing your subscription
The salt is then used as the key for a subscription management page. To
cancel, you visit this page and enter the Stripe subscription ID to confirm.
The server then checks whether the salt and subscription ID pair belong to
each other, and sends the actual cancellation
request back to Stripe if they do.
I might have gone a bit overboard with the crypto there, but I liked the
idea of not storing any of the Stripe session IDs in the server database.
It's not like that makes the system more complex anyway, and it's nice to
have a separate confirmation step before canceling a subscription.
But even that wasn't everything I had to keep in mind here. Once you
switch from test to production mode for the final tests, you'll notice that
payment providers take their sweet time to process and activate new
subscriptions. The Checkout session object even informs you about that, by
including a payment status field. Which initially seems just like
another field that could indicate hacking attempts, but treating it as such
and rejecting any unpaid session can also reject perfectly valid
subscriptions. I don't want all this control… 🥲
Instead, all I can do in this case is to tell you about it. In my test, the
Stripe dashboard said that it might take days or even weeks for the initial
subscription transaction to be confirmed. In such a case, the respective
fraction of the cap will unfortunately need to remain red for that entire time.
And that was 1½ pushes just to replicate the basic functionality of a simple
PayPal integration with the simplest type of Stripe integration. On the
architectural site, all the necessary refactoring work made me finally
upgrade my frontend code to TypeScript at least, using the amazing esbuild to handle transpilation inside
the server binary. Let's see how long it will now take for me to upgrade to
With the new payment options, it makes sense to go for another slight price
increase, from up to per push.
The amount of taxes I have to pay on this income is slowly becoming
significant, and the store has been selling out almost immediately for the
last few months anyway. If demand remains at the current level or even
increases, I plan to gradually go up to by the end
of the year. 📝 As📝 usual,
I'm going to deliver existing orders in the backlog at the value they were
originally purchased at. Due to the way the cap has to be calculated, these
contributions now appear to have increased in value by a rather awkward
This left ½ of a push for some more work on the TH01 Anniversary Edition.
Unfortunately, this was too little time for the grand issue of removing
byte-aligned rendering of bigger sprites, which will need some additional
blitting performance research. Instead, I went for a bunch of smaller
ANNIV.EXE now launches ZUNSOFT.COM if
MDRV98 wasn't resident before. In hindsight, it's completely obvious
why this is the right thing to do: Either you start
ANNIV.EXE directly, in which case there's no resident
MDRV98 and you haven't seen the ZUN Soft logo, or you have
made a single-line edit to GAME.BAT and replaced
op with anniv, in which case MDRV98 is
resident and you have seen the logo. These are the two
reasonable cases to support out of the box. If you are doing
anything else, it shouldn't be that hard to adjust though?
You might be wondering why I didn't just include all code of
ZUNSOFT.COM inside ANNIV.EXE together with
the rest of the game. The reason: ZUNSOFT.COM has
almost nothing in common with regular TH01 code. While the rest of
TH01 uses the custom image formats and bad rendering code I
documented again and again during its RE process,
ZUNSOFT.COM fully relies on master.lib for everything
about the bouncing-ball logo animation. Its code is much closer to
TH02 in that respect, which suggests that ZUN did in fact write this
animation for TH02, and just included the binary in TH01 for
consistency when he first sold both games together at Comiket 52.
Unlike the 📝 various bad reasons for splitting the PC-98 Touhou games into three main executables,
it's still a good idea to split off animations that use a completely
different set of rendering and file format functions. Combined with
all the BFNT and shape rendering code, ZUNSOFT.COM
actually contains even more unique code than OP.EXE,
and only slightly less than FUUIN.EXE.
The optional AUTOEXEC.BAT is now correctly encoded in
Shift-JIS instead of accidentally being UTF-8, fixing the previous
mojibake in its final ECHO line.
The command-line option that just adds a stage selection without
other debug features (anniv s) now works reliably.
This one's quite interesting because it only ever worked
because of a ZUN bug. From a superficial look at the code, it
shouldn't: While the presence of an 's' branch proves
that ZUN had such a mode during development, he nevertheless forgot
to initialize the debug flag inside the resident structure within
this branch. This mode only ever worked because master.lib's
resdata_create() function doesn't clear the resident
structure after allocation. If anything on the system previously
happened to write something other than 0x00,
0x01, or 0x03 to the specific byte that
then gets repurposed as the debug mode flag, this lack of
initialization does in fact result in a distinct non-test and
non-debug stage selection mode.
This is what happens on a certain widely circulated .HDI copy of
TH01 that boots MS-DOS 3.30C. On this system, the memory that
master.lib will allocate to the TH01 resident structure was
previously used by DOS as stack for its kernel, which left the
future resident debug flag byte at address 9FF6:0012 at
a value of 0x12. This might be the entire reason why
game s is even widely documented to trigger a stage
selection to begin with – on the widely circulated TH04 .HDI that
boots MS-DOS 6.20, or on DOSBox-X, the s parameter
doesn't work because both DOS systems leave the resident debug flag
byte at 0x00. And since ANNIV.EXE pushes
MDRV98 into that area of conventional DOS RAM, anniv s
previously didn't work even on MS-DOS 3.30C.
Both bugs in the
📝 1×1 particle system during the Mima fight
have been fixed. These include the off-by-one error that killed off the
very first particle on the 80th
frame and left it in VRAM, and, just like every other entity type, a
replacement of ZUN's EGC unblitter with the new pixel-perfect and fast
one. Until I've rearchitected unblitting as a whole, the particles will
now merely rip barely visible 1×1 holes into the sprites they overlap.
The bomb value shown in the lowest line of the in-game
debug mode output is now right-aligned together with the rest of the
values. This ensures that the game always writes a consistent number
of characters to TRAM, regardless of the magnitude of the
bomb value, preventing the seemingly wrong
timer values that appeared in the original game
whenever the value of the bomb variable changed to a
lower number of digits:
Finally, I've streamlined VRAM page access changes, which allowed me to
consistently replace ZUN's expensive function call with the optimal two
inlined x86 instructions. Interestingly, this change alone removed
2 KiB from the binary size, which is almost all of the difference
between 📝 the P0234-1 release and this
one. Let's see how much longer we can make each new release of
ANNIV.EXE smaller than the previous one.
The final point, however, raised the question of what we're now going to do
📝 a certain issue in the 地獄/Jigoku Bad Ending.
ZUN's original expensive way of switching the accessed VRAM page was the
main reason behind the lag frames on slower PC-98 systems, and
search-replacing the respective function calls would immediately get us to
the optimized version shown in that blog post. But is this something we
actually want? If we wanted to retain the lag, we could surely preserve that
function just for this one instance… The discovery of this issue
predates the clear distinction between bloat, quirks, and bugs, so it makes
sense to first classify what this issue even is. The distinction comes all
down to observability, which I defined as changes to rendered frames
between explicitly defined frame boundaries. That alone would be enough to
categorize any cause behind lag frames as bloat, but it can't hurt to be
more explicit here.
Therefore, I now officially judge observability in terms of an infinitely
fast PC-98 that can instantly render everything between two explicitly
defined frames, and will never add additional lag frames. If we plan to port
the games to faster architectures that aren't bottlenecked by disappointing
blitter chips, this is the only reasonable assumption to make, in my
opinion: The minimum system requirements in the games' README files are
minimums, after all, not recommendations. Chasing the exact frame
drop behavior that ZUN must have experienced during the time he developed
these games can only be a guessing game at best, because how can we know
which PC-98 model ZUN actually developed the games on? There might even be
more than one model, especially when it comes to TH01 which had been in
development for at least two years before ZUN first sold it. It's also not
like any current PC-98 emulator even claims to emulate the specific timing
of any existing model, and I sure hope that nobody expects me to import a
bunch of bulky obsolete hardware just to count dropped frames.
That leaves the tearing, where it's much more obvious how it's a bug. On an
infinitely fast PC-98, the ドカーン
frame would never be visible, and thus falls into the same category as the
📝 two unused animations in the Sariel fight.
With only a single unconditional 2-frame delay inside the animation loop, it
becomes clear that ZUN intended both frames of the animation to be displayed
for 2 frames each:
Next up: Taking the oldest still undelivered push and working towards TH04
position independence in preparation for multilingual translations. The
Shuusou Gyoku OpenGL backend shouldn't take that much longer either,
so I should have lots of stuff coming up in May afterward.
And so, the year unfortunately ended with yet another slow month. During the
MediaWiki upgrade, I was slowly decompiling the TH05 Sara fight on the side,
but stumbled over one interesting but high-maintenance detail there that
would really enhance her blog post. TH02 would need a lot of attention for
the basic rendering calls as well…
…so let's end the year with Shuusou Gyoku instead, looking at its most
critical issue in particular. As if that were the easy option here…
The game does not run properly on modern Windows systems due to its usage of
the ancient DirectDraw APIs, with issues ranging from unbearable slowdown to
glitched colors to the game not even starting at all. Thankfully, Shuusou
Gyoku is not the only ancient Windows game affected by these issues, and
people have developed a variety of generic DirectDraw wrappers and patches
for playing such games on modern systems. Out of all these, DDrawCompat is one of the
simpler solutions for Shuusou Gyoku in particular: Just drop its
ddraw proxy DLL into the game directory, and the game will run
as it's supposed to.
So let's just bundle that DLL with all my future Shuusou Gyoku releases
then? That would have been the quick and dirty option, coming with
Linux users might be annoyed by the potential need to configure a native
DLL override for ddraw.dll. It's not too much of an issue as we
could simply rename the DLL and replace the import with the new name.
However, doing that reproducibly would already involve changes to either the
DDrawCompat or Shuusou Gyoku build process.
Win32 API hooking is another potential point of failure in general,
requiring continual maintenance for new Windows versions. This is not even a
hypothetical concern: DDrawCompat does rely on particularly volatile Win32
API details, to the point that the recent Windows 11 22H2 update completely
broke it, causing a hang at startup that required a workaround.
But sure, it's still just a single third-party component. Keeping it up to
date doesn't sound too bad by itself…
…if DDrawCompat weren't evolving way beyond what we need to keep Shuusou
Gyoku running. Being a typical DirectDraw wrapper, it has always aimed to
solve all sorts of issues in old DirectDraw games. However, the latest
version, 0.4.0, has gone above and beyond in this regard, adding lots of
configuration options with default settings that actually
break Shuusou Gyoku.
To get a glimpse of how this is likely to play out, we only have to look at
the more mature DxWnd
project. In its expert mode, DxWnd features three rows of tabs, each packed
with checkboxes that toggle individual hacks, and most of these are
related to something that Shuusou Gyoku could be affected by. Imagine
checking a precise permutation of a three-digit number of checkboxes just to
keep an old game running at full speed on modern systems…
Finally, aesthetic and bloat considerations. If
📝 C++ fstreams were already too embarrassing
with the ~100 KB of bloat they add to the binary, a 565 KiB DLL is
even worse. And that's the old version 0.3.2 – version 0.4.0 comes in
at 2.43 MiB.
Fortunately, I had the budget to dig a bit deeper and figure out what
exactly DDrawCompat does to make Shuusou Gyoku work properly. Turns
out that among all the hooks and patches, the game only needs the most
central one: Enforcing a 32-bit display mode regardless of whatever lower
bit depth the game requests natively, combined with converting the game's
pixel buffer to 32-bit on the fly.
So does this mean that adding 32-bit to the game's list of supported bit
depths is everything we have to do?
Well, almost everything. Initially, this surprised me as well: With
all the if statements checking for precise bit depths, you
would think that supporting one more bit depth would be way harder in this
code base. As it turned out though, these conditional branches are not
really about 8-bit or 16-bit color for the most part, but instead
differentiate between two very distinct rendering approaches:
"8-bit" is a pure 2D mode with palettized colors,
while "16-bit" is a hybrid 2D/3D mode that uses Direct3D 2 on top of DirectDraw, with
3-channel RGB colors.
Consequently, most of these branches deal with differences between these two
approaches that couldn't be nicely abstracted away in pbg's renderer
interface: Specific palette changes that are exclusive to "8-bit" mode, or
certain entities and effects whose Direct3D draw calls in "16-bit" mode
require tailor-made approximations for the "8-bit" mode. Since our new
32-bit mode is equivalent to the 16-bit mode in all of these branches, I
only needed to replace the raw number comparisons with more meaningful
That only left a very small number of 2D raster effects that directly write
to or read from DirectDraw surface memory, and therefore do need to know the
bit size of each pixel. Thanks to std::variant and
std::visit(), adding 32-bit support becomes trivial here: By
rewriting the code in a generic manner that derives all offsets from the
template type, you only have to say hey,
I'd like to have 32-bit as well, and C++ will automatically
instantiate correct 32-bit variants of all bit depth-dependent code
There are only three features in the entire game that access pixel buffers
this way: a color key retrieval function, the lens ball animation on the
logo screen, and… the ending staff roll? Sure, the text sprites fade in and
out, but so does the picture next to it, using Direct3D alpha blending or
palette color ramping depending on the current rendering mode. Instead, the
only reason why these sprites directly access their pixel buffer is… an
unused and pretty wild spiral effect. 😮 It's still part of the code, and
only doesn't show up because the
parameters that control its timing were commented out before release:
Alright, 32-bit mode complete, let's set it as the default if possible… and
break compatibility to the original 秋霜CFG.DAT format in the
process? When validating this file, the original game only allows the
originally supported 8-bit or 16-bit modes. Setting the
BitDepth field to any other value causes the entire file
to be reset to its defaults, re-locking the Extra Stage in the process.
Introducing a backward-compatible version
system for 秋霜CFG.DAT was beyond the scope of this push.
Changing the validation to a per-field approach was a good small first step
to take though. The new build no longer validates the BitDepth
field against a fixed list, but against the actually supported bit depths on
your system, picking a different supported one if necessary. With the
original approach, this would have caused your entire configuration to fail
the validation check. Instead, you can now safely update to the new build
without losing your option settings, or your previously unlocked access to
the Extra Stage.
Side note: The validation limit for starting bombs is off by one, and the
one for starting lives check is off by two. By modifying
秋霜CFG.DAT, you could theoretically get new games to start with
7 lives and 3 bombs… if you then calculate a correct checksum for your
hacked config file, that is. 🧑💻
Interestingly, DirectDraw doesn't even indicate support for 8-bit or 16-bit
color on systems that are affected by the initially mentioned issues.
Therefore, these issues are not the fault of DirectDraw, but of
Shuusou Gyoku, as the original release requested a bit depth that it has
even verified to be unsupported. Unfortunately, Windows sides with
Sim City Shuusou Gyoku here: If you previously experimented with the
Windows app compatibility settings, you might have ended up with the
DWM8And16BitMitigation flag assigned to the full file path of
your Shuusou Gyoku executable in either
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\Layers, or
As the term mitigation suggests, these modes are (poorly) emulated,
which is exactly what causes the issues with this game in the first place.
Sure, this might be the lesser evil from the point of view of an operating
system: If you don't have the budget for a full-blown DDrawCompat-style
DirectDraw wrapper, you might consider it better for users to have the game
run poorly than have it fail at startup due to incorrect API usage.
Controlling this with a flag that sticks around for future runs of a binary
is definitely suboptimal though, especially given how hard it
is to programmatically remove this flag within the binary itself. It
only adds additional complexity to the ideal clean upgrade path.
So, make sure to check your registry and manually remove these flags for the
time being. Without them, the new Config → Graphic menu will
correctly prevent you from selecting anything else but 32-bit on modern
After all that, there was just enough time left in this push to implement
basic locale independence, as requested by the Seihou development
Discord group, without looking into automatic fixes for previous mojibake
filenames yet. Combining std::filesystem::path with the native
Win32 API should be straightforward and bloat-free, especially with all the
abstractions I've been building, right?
Well, turns out that std::filesystem::path does not
actually meet my expectations. At least as long as it's not
constexpr-enabled, because you still get the unfortunate
conversion from narrow to wide encoding at runtime, even for globals with
static storage duration. That brings us back to writing our path abstraction
in terms of the regular std::string and
std::wstring containers, which at least allow us to enforce the
respective encoding at compile time. Even std::string_view only
adds to the complexity here, as its strings are never inherently
null-terminated, which is required by both the POSIX and Win32 APIs. Not to
mention dynamic filenames: C++20's std::format() would be the
obvious idiomatic choice here, but using it almost doubles the size
of the compiled binary… 🤮
In the end, the most bloat-free way of implementing C++ file I/O in 2023 is
still the same as it was 30 years ago: Call system APIs, roll a custom
abstraction that conditionally uses the L prefix, and pass
around raw pointers. And if you need a dynamic filename, just write the
dynamic characters into arrays at fixed positions. Just as PC-98 Touhou used
Oh, and the game's window also uses a Unicode title bar now.
And that's it for this push! Make sure to rename your configuration
(秋霜CFG.DAT), score (秋霜SC.DAT), and replay
(秋霜りぷ*.DAT) filenames if you were previously running the
game on a non-Japanese locale, and then grab the new build:
Next up: Starting the new year with all my plans hopefully working out for
once. TH05 Sara very soon, ZMBV code review afterward, low-hanging fruit of
the TH01 Anniversary Edition after that, and then kicking off TH02 with a
bunch of low-level blitting code.
More than three months without any reverse-engineering progress! It's been
way too long. Coincidentally, we're at least back with a surprising 1.25% of
overall RE, achieved within just 3 pushes. The ending script system is not
only more or less the same in TH04 and TH05, but actually originated in
TH03, where it's also used for the cutscenes before stages 8 and 9. This
means that it was one of the final pieces of code shared between three of
the four remaining games, which I got to decompile at roughly 3× the usual
speed, or ⅓ of the price.
The only other bargains of this nature remain in OP.EXE. The
Music Room is largely equivalent in all three remaining games as well, and
the sound device selection, ZUN Soft logo screens, and main/option menus are
the same in TH04 and TH05. A lot of that code is in the "technically RE'd
but not yet decompiled" ASM form though, so it would shift Finalized% more
significantly than RE%. Therefore, make sure to order the new
Finalization option rather than Reverse-engineering if you
want to make number go up.
So, cutscenes. On the surface, the .TXT files look simple enough: You
directly write the text that should appear on the screen into the file
without any special markup, and add commands to define visuals, music, and
other effects at any place within the script. Let's start with the basics of
how text is rendered, which are the same in all three games:
First off, the text area has a size of 480×64 pixels. This means that it
does not correspond to the tiled area painted into TH05's
Since the font weight can be customized, all text is rendered to VRAM.
This also includes gaiji, despite them ignoring the font weight
The system supports automatic line breaks on a per-glyph basis, which
move the text cursor to the beginning of the red text area. This might seem like a piece of long-forgotten
ancient wisdom at first, considering the absence of automatic line breaks in
Windows Touhou. However, ZUN probably implemented it more out of pure
necessity: Text in VRAM needs to be unblitted when starting a new box, which
is way more straightforward and performant if you only need to worry
about a fixed area.
The system also automatically starts a new (key press-separated) text
box after the end of the 4th line. However, the text cursor is
also unconditionally moved to the top-left corner of the yellow name
area when this happens, which is almost certainly not what you expect, given
that automatic line breaks stay within the red area. A script author might
as well add the necessary text box change commands manually, if you're
forced to anticipate the automatic ones anyway…
Due to ZUN forgetting an unblitting call during the TH05 refactoring of the
box background buffer, this feature is even completely broken in that game,
as any new text will simply be blitted on top of the old one:
Overall, the system is geared toward exclusively full-width text. As
exemplified by the 2014 static English patches and the screenshots in this
blog post, half-width text is possible, but comes with a lot of
Each loop of the script interpreter starts by looking at the next
byte to distinguish commands from text. However, this step also skips
over every ASCII space and control character, i.e., every byte
≤ 32. If you only intend to display full-width glyphs anyway, this
sort of makes sense: You gain complete freedom when it comes to the
physical layout of these script files, and it especially allows commands
to be freely separated with spaces and line breaks for improved
readability. Still, enforcing commands to be separated exclusively by
line breaks might have been even better for readability, and would have
freed up ASCII spaces for regular text…
Non-command text is blindly processed and rendered two bytes at a
time. The rendering function interprets these bytes as a Shift-JIS
string, so you can use half-width characters here. While the
second byte can even be an ASCII 0x20 space due to the
parser's blindness, all half-width characters must still occur in pairs
that can't be interrupted by commands:
As a workaround for at least the ASCII space issue, you can replace
them with any of the unassigned
Shift-JIS lead bytes – 0x80, 0xA0, or
anything between 0xF0 and 0xFF inclusive.
That's what you see in all screenshots of this post that display
Finally, did you know that you can hold ESC to fast-forward
through these cutscenes, which skips most frame delays and reduces the rest?
Due to the blocking nature of all commands, the ESC key state is
only updated between commands or 2-byte text groups though, so it can't
interrupt an ongoing delay.
Superficially, the list of game-specific differences doesn't look too long,
and can be summarized in a rather short table:
It's when you get into the implementation that the combined three systems
reveal themselves as a giant mess, with more like 56 differences between the
games. Every single new weird line of code opened up
another can of worms, which ultimately made all of this end up with 24
pieces of bloat and 14 bugs. The worst of these should be quite interesting
for the general PC-98 homebrew developers among my audience:
The final official 0.23 release of master.lib has a bug in
graph_gaiji_put*(). To calculate the JIS X 0208 code point for
a gaiji, it is enough to ADD 5680h onto the gaiji ID. However,
these functions accidentally use ADC instead, which incorrectly
adds the x86 carry flag on top, causing weird off-by-one errors based on the
previous program state. ZUN did fix this bug directly inside master.lib for
TH04 and TH05, but still needed to work around it in TH03 by subtracting 1
from the intended gaiji ID. Anyone up for maintaining a bug-fixed master.lib
The worst piece of bloat comes from TH03 and TH04 needlessly
switching the visibility of VRAM pages while blitting a new 320×200 picture.
This makes it much harder to understand the code, as the mere existence of
these page switches is enough to suggest a more complex interplay between
the two VRAM pages which doesn't actually exist. Outside this visibility
switch, page 0 is always supposed to be shown, and page 1 is always used
for temporarily storing pixels that are later crossfaded onto page 0. This
is also the only reason why TH03 has to render text and gaiji onto both VRAM
pages to begin with… and because TH04 doesn't, changing the picture in the
middle of a string of text is technically bugged in that game, even though
you only get to temporarily see the new text on very underclocked PC-98
These performance implications made me wonder why cutscenes even bother with
writing to the second VRAM page anyway, before copying each crossfade step
to the visible one.
📝 We learned in June how costly EGC-"accelerated" inter-page copies are;
shouldn't it be faster to just blit the image once rather than twice?
Well, master.lib decodes .PI images into a packed-pixel format, and
unpacking such a representation into bitplanes on the fly is just about the
worst way of blitting you could possibly imagine on a PC-98. EGC inter-page
copies are already fairly disappointing at 42 cycles for every 16 pixels, if
we look at the i486 and ignore VRAM latencies. But under the same
conditions, packed-pixel unpacking comes in at 81 cycles for every 8
pixels, or almost 4× slower. On lower-end systems, that can easily sum up to
more than one frame for a 320×200 image. While I'd argue that the resulting
tearing could have been an acceptable part of the transition between two
images, it's understandable why you'd want to avoid it in favor of the
pure effect on a slower framerate.
Really makes me wonder why master.lib didn't just directly decode .PI images
into bitplanes. The performance impact on load times should have been
negligible? It's such a good format for
the often dithered 16-color artwork you typically see on PC-98, and
deserves better than master.lib's implementation which is both slow to
decode and slow to blit.
That brings us to the individual script commands… and yes, I'm going to
document every single one of them. Some of their interactions and edge cases
are not clear at all from just looking at the code.
Almost all commands are preceded by… well, a 0x5C lead byte.
Which raises the question of whether we should
document it as an ASCII-encoded \ backslash, or a Shift-JIS-encoded
¥ yen sign. From a gaijin perspective, it seems obvious that it's a
backslash, as it's consistently displayed as one in most of the editors you
would actually use nowadays. But interestingly, iconv
-f shift-jis -t utf-8 does convert any 0x5C
lead bytes to actual ¥ U+00A5 YEN SIGN code points
Ultimately, the distinction comes down to the font. There are fonts
that still render 0x5C as ¥, but mainly do so out
of an obvious concern about backward compatibility to JIS X 0201, where this
mapping originated. Unsurprisingly, this group includes MS Gothic/Mincho,
the old Japanese fonts from Windows 3.1, but even Meiryo and Yu
Gothic/Mincho, Microsoft's modern Japanese fonts. Meanwhile, pretty much
every other modern font, and freely licensed ones in particular, render this
code point as \, even if you set your editor to Shift-JIS. And
while ZUN most definitely saw it as a ¥, documenting this code
point as \ is less ambiguous in the long run. It can only
possibly correspond to one specific code point in either Shift-JIS or UTF-8,
and will remain correct even if we later mod the cutscene system to support
Now we've only got to clarify the parameter syntax, and then we can look at
the big table of commands:
Numeric parameters are read as sequences of up to 3 ASCII digits. This
limits them to a range from 0 to 999 inclusive, with 000 and
0 being equivalent. Because there's no further sentinel
character, any further digit from the 4th one onwards is
interpreted as regular text.
Filename parameters must be terminated with a space or newline and are
limited to 12 characters, which translates to 8.3 basenames without any
directory component. Any further characters are ignored and displayed as
text as well.
Each .PI image can contain up to four 320×200 pictures ("quarters") for
the cutscene picture area. In the script commands, they are numbered like
Clears both VRAM pages by filling them with VRAM color 0. 🐞
In TH03 and TH04, this command does not update the internal text area
background used for unblitting. This bug effectively restricts usage of
this command to either the beginning of a script (before the first
background image is shown) or its end (after no more new text boxes are
started). See the image below for an
example of using it anywhere else.
Sets the font weight to a value between 0 (raw font ROM glyphs) to 3
(very thicc). Specifying any other value has no effect.
🐞 In TH04 and TH05, \b3 leads to glitched pixels when
rendering half-width glyphs due to a bug in the newly micro-optimized
ASM version of
📝 graph_putsa_fx(); see the image below for an example.
In these games, the parameter also directly corresponds to the
graph_putsa_fx() effect function, removing the sanity check
that was present in TH03. In exchange, you can also access the four
dissolve masks for the bold font (\b2) by specifying a
parameter between 4 (fewest pixels) to 7 (most
pixels). Demo video below.
Changes the text color to VRAM color 15.
Adds a color map entry: If 字 is the first code point
inside the name area on a new line, the text color is automatically set
to 15. Up to 8 such entries can be registered
before overflowing the statically allocated buffer.
🐞 The comma is assumed to be present even if the color parameter is omitted.
Plays the sound effect with the given ID.
Calls master.lib's palette_black_in() or
palette_black_out() to play a hardware palette fade
animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
Fades out BGM volume via PMD's AH=02h interrupt call,
in a non-blocking way. The fade speed can range from 1 (slowest) to 127 (fastest).
Values from 128 to 255 technically correspond to
AH=02h's fade-in feature, which can't be used from cutscene
scripts because it requires BGM volume to first be lowered via
AH=19h, and there is no command to do that.
Plays a blocking 8-frame screen shake
Shows the gaiji with the given ID from 0 to 255
at the current cursor position. Even in TH03, gaiji always ignore the
text delay interval configured with \v.
TH05's replacement for the \ga command from TH03 and
TH04. The default ID of 3 corresponds to the
gaiji. Not to be confused with \@, which starts with a backslash,
unlike this command.
Shows the gaiji.
Shows the gaiji.
Shows the gaiji.
Shows the gaiji.
Shows the gaiji.
Shows the gaiji.
Waits 0 frames (0 = forever) for an advance key to be pressed before
continuing script execution. Before waiting, TH05 crossfades in any new
text that was previously rendered to the invisible VRAM page…
🐞 …but TH04 doesn't, leaving the text invisible during the wait time.
As a workaround, \vp1 can be
used before \k to immediately display that text without a
Stops the currently playing BGM.
Restarts playback of the currently loaded BGM from the
Stops the currently playing BGM, loads a new one from the given
file, and starts playback.
Starts a new line at the leftmost X coordinate of the box, i.e., the
start of the name area. This is how scripts can "change" the name of the
currently speaking character, or use the entire 480×64 pixels without
being restricted to the non-name area.
Note that automatic line breaks already move the cursor into a new line.
Using this command at the "end" of a line with the maximum number of 30
full-width glyphs would therefore start a second new line and leave the
previously started line empty.
If this command moved the cursor into the 5th line of a box,
\s is executed afterward, with
any of \n's parameters passed to \s.
Deallocates the loaded .PI image.
Loads the .PI image with the given file into the single .PI slot
available to cutscenes. TH04 and TH05 automatically deallocate any
previous image, 🐞 TH03 would leak memory without a manual prior call to
Sets the hardware palette to the one of the loaded .PI image.
Sets the loaded .PI image as the full-screen 640×400 background
image and overwrites both VRAM pages with its pixels, retaining the
current hardware palette.
Runs \pp followed by \p@.
Ends a text box and starts a new one. Fades in any text rendered to
the invisible VRAM page, then waits 0 frames
(0 = forever) for an advance key to be
pressed. Afterward, the new text box is started with the cursor moved to
the top-left corner of the name area. \s- skips the wait time and starts the new box
Sets palette brightness via master.lib's
palette_settone() to any value from 0 (fully black) to 200
(fully white). 100 corresponds to the palette's original colors.
Preceded by a 1-frame delay unless ESC is held.
Sets the number of frames to wait between every 2 bytes of rendered
Sets the number of frames to spend on each of the 4 fade
steps when crossfading between old and new text. The game-specific
default value is also used before the first use of this command.
Shows VRAM page 0. Completely useless in
TH03 (this game always synchronizes both VRAM pages at a command
boundary), only of dubious use in TH04 (for working around a bug in \k), and the games always return to
their intended shown page before every blitting operation anyway. A
debloated mod of this game would just remove this command, as it exposes
an implementation detail that script authors should not need to worry
about. None of the original scripts use it anyway.
\w and \wk wait for the given number
\wm and \wmk wait until PMD has played
back the current BGM for the total number of measures, including
loops, given in the first parameter, and fall back on calling
\w and \wk with the second parameter as
the frame number if BGM is disabled.
🐞 Neither PMD nor MMD reset the internal measure when stopping
playback. If no BGM is playing and the previous BGM hasn't been
played back for at least the given number of measures, this command
Since both TH04 and TH05 fade in any new text from the invisible VRAM
page, these commands can be used to simulate TH03's typing effect in
those games. Demo video below.
Contrary to \k and \s, specifying 0 frames would
simply remove any frame delay instead of waiting forever.
The TH03-exclusive k variants allow the delay to be
interrupted if ⏎ Return or Shot are held down.
TH04 and TH05 recognize the k as well, but removed its
All of these commands have no effect if ESC is held.
Calls master.lib's palette_white_in() or
palette_white_out() to play a hardware palette fade
animation from or to white, spending roughly 1 frame on each of the 16 fade steps.
Immediately displays the given quarter of the loaded .PI image in
the picture area, with no fade effect. Any value ≥ 4 resets the picture area to black.
Crossfades the picture area between its current content and quarter
#4 of the loaded .PI image, spending 1 frame on each of the 4 fade steps unless
ESC is held. Any value ≥ 4 is
replaced with quarter #0.
Stops script execution. Must be called at the end of each file;
otherwise, execution continues into whatever lies after the script
buffer in memory.
TH05 automatically deallocates the loaded .PI image, TH03 and TH04
require a separate manual call to \p- to not leak its memory.
Bold values signify the default if the parameter
is omitted; \c is therefore
equivalent to \c15.
So yeah, that's the cutscene system. I'm dreading the moment I will have to
deal with the other command interpreter in these games, i.e., the
stage enemy system. Luckily, that one is completely disconnected from any
other system, so I won't have to deal with it until we're close to finishing
MAIN.EXE… that is, unless someone requests it before. And it
won't involve text encodings or unblitting…
The cutscene system got me thinking in greater detail about how I would
implement translations, being one of the main dependencies behind them. This
goal has been on the order form for a while and could soon be implemented
for these cutscenes, with 100% PI being right around the corner for the TH03
and TH04 cutscene executables.
Once we're there, the "Virgin" old-school way of static translation patching
for Latin-script languages could be implemented fairly quickly:
Establish basic UTF-8 parsing for less painful manual editing of the
Procedurally generate glyphs for the few required additional letters
based on existing font ROM glyphs. For example, we'd generate ä
by painting two short lines on top of the font ROM's a glyph,
or generate ¿ by vertically flipping the question mark. This
way, the text retains a consistent look regardless of whether the translated
game is run with an NEC or EPSON font ROM, or the that Neko Project II auto-generates if you
don't provide either.
(Optional) Change automatic line breaks to work on a per-word
basis, rather than per-glyph
That's it – script editing and distribution would be handled by your local
translation group. It might seem as if this would also work for Greek and
Cyrillic scripts due to their presence in the PC-98 font ROM, but I'm not
sure if I want to attempt procedurally shrinking these glyphs from 16×16 to
8×16… For any more thorough solution, we'd need to go for a more "Chad" kind
of full-blown translation support:
Implement text subdivisions at a sensible granularity while retaining
automatic line and box breaks
Compile translatable text into a Japanese→target language dictionary
(I'm too old to develop any further translation systems that would overwrite
modded source text with translations of the original text)
Implement a custom Unicode font system (glyphs would be taken from GNU
Unifont unless translators provide a different 8×16 font for their
Combine the text compiler with the font compiler to only store needed
glyphs as part of the translation's font file (dealing with a multi-MB font
file would be rather ugly in a Real Mode game)
Write a simple install/update/patch stacking tool that supports both
.HDI and raw-file DOSBox-X scenarios (it's different enough from thcrap to
warrant a separate tool – each patch stack would be statically compiled into
a single package file in the game's directory)
Add a nice language selection option to the main menu
(Optional) Support proportional fonts
Which sounds more like a separate project to be commissioned from
Touhou Patch Center's Open Collective funds, separate from the ReC98 cap.
This way, we can make sure that the feature is completely implemented, and I
can talk with every interested translator to make sure that their language
It's still cheaper overall to do this on PC-98 than to first port the games
to a modern system and then translate them. On the other hand, most
of the tasks in the Chad variant (3, 4, 5, and half of 2) purely deal with
the difficulty of getting arbitrary Unicode characters to work natively in a
PC-98 DOS game at all, and would be either unnecessary or trivial if we had
already ported the game. Depending on where the patrons' interests lie, it
may not be worth it. So let's see what all of you think about which
way we should go, or whether it's worth doing at all. (Edit
(2022-12-01): With Splashman's
order towards the stage dialogue system, we've pretty much confirmed that it
is.) Maybe we want to meet in the middle – using e.g. procedural glyph
generation for dynamic translations to keep text rendering consistent with
the rest of the PC-98 system, and just not support non-Latin-script
languages in the beginning? In any case, I've added both options to the
order form. Edit (2023-07-28):Touhou Patch Center has agreed to fund
a basic feature set somewhere between the Virgin and Chad level. Check the
📝 dedicated announcement blog post for more
details and ideas, and to find out how you can support this goal!
Surprisingly, there was still a bit of RE work left in the third push after
all of this, which I filled with some small rendering boilerplate. Since I
also wanted to include TH02's playfield overlay functions,
1/15 of that last push went towards getting a
TH02-exclusive function out of the way, which also ended up including that
game in this delivery.
The other small function pointed out how TH05's Stage 5 midboss pops into
the playfield quite suddenly, since its clipping test thinks it's only 32
pixels tall rather than 64:
Next up: Staying with TH05 and looking at more of the pattern code of its
boss fights. Given the remaining TH05 budget, it makes the most sense to
continue in in-game order, with Sara and the Stage 2 midboss. If more money
comes in towards this goal, I could alternatively go for the Mai & Yuki
fight and immediately develop a pretty fix for the cheeto storage
glitch. Also, there's a rather intricate
pull request for direct ZMBV decoding on the website that I've still got
On August 15, 1997, at Comiket 52, an unknown doujin developer going by the
name of ZUN released his first game, 東方靈異伝 ～
The Highly Responsive to Prayers, marking the start of the
Touhou Project game series that keeps running to this day. Today, exactly 25
years later, the C++ source code to version 1.10 of that game has been
completely and perfectly reconstructed, reviewed, and documented.
And with that, a warm welcome to all game journalists who have
(re-)discovered this project through these news! Here's a summary for
everyone who doesn't want to go through 3 years worth of blog posts:
What does this mean?
All code that ZUN wrote as part of a TH01 installation has now been
decompiled to C++ code. The only parts left in assembly are two third-party
libraries (master.lib and PiLoad), which were originally written in
assembly, and are built from their respective official source code.
You can clone the ReC98
repository, set up the build environment, and get a binary with an
identical program image. The hashes of the resulting executables won't match
those of ZUN's original release, but all differences there stem from details
in the .EXE header that don't influence program execution, such as the
on-disk order of the conceptually unordered set of x86 memory segment
relocations. If you're interested in that level of correctness, you can
order Easier verification against original binaries from the store.
For now though, use mzdiff for
verifying the builds against ZUN's binaries.
Ever since this crowdfunding has started 3 years ago, the goal of this
project has shifted more and more towards a full-on code review rather than
being just a mechanical decompilation:
Hardcoded constants were derived from as few truly hardcoded values
as possible, which uncovered their intended meaning and highlighted any
Code was deduplicated to a perhaps obsessive level (I'm still trying
to find a balance)
Tons of comments everywhere to put everything into context
And, of course, 2½ years worth of blog
posts summarizing any highlights, glitches, and secrets. (There
might still be some left to be discovered!)
As a result, modding the games and porting them away from the PC-98
platform is now a lot easier.
What does this not mean?
This is not a piracy release. ReC98 only provides the code that the
game's .EXE and .COM files are built out of. Without the rest of the
original data files, supplied from a pre-existing game copy, the code won't
do very much.
Even apart from ZUN's own code quality, the ReC98 repository is not as
polished and consistent as it could be, having seen multiple code structure
evolutions over the 8 years of its existence.
TH01 hasn't magically reached Doom levels of easy portability now. As a
decompilation of the exact code that ZUN wrote for the PC-98 platform, it is
very PC-98-native, and wildly mixes game logic with hardware
accesses. As ZUN's first foray into game development, he understandably
didn't see the need for writing an engine or hardware abstraction layer
So while this milestone opened the floodgates to PC-98-native mods, I
wouldn't advise trying to attempt a port away from PC-98 right now. But then
again, I have a financial interest in being a part of the porting process,
and who knows, maybe you can just merge in a PC-98 emulator core and
get started with something halfway decent in a short amount of time. After
all, TH01 is by far the easiest PC-98 Touhou game to port to other systems,
as it makes the least use of hardware features. (Edit
(2023-03-30): 📝 Turns out that this
crown actually goes to TH02. It features the least amount of ZUN-written
PC-98-specific rendering code out of all the 5 games, with most of it
being decently abstracted via master.lib.)
However, this game in particular raises the question of what exactly
one would even want to port. TH01 is a broken flicker-fest that
overwhelmingly suffers the drawbacks of PC-98 hardware rather than using it
to its advantage. Out of the 78 bugs that I ended up labeling as such, the
majority are sprite blitting issues,
while you can count the instances of
good hardware use on one hand.
And even at the level of game logic, this game features a lot of
weird, inconsistent behavior. Less rigorous projects such as uth05win would probably
promptly identify these issues as bugs and fix them. On the one hand, this
shows that there is a part of the community that wants sane versions of
these games which behave as expected. In other parts of the community
though, such projects quickly gain the reputation of being too inaccurate to
bother about them.
Some terminology might help here. If you look over the ReC98 codebase,
you'll find that I classified any weird code into three categories.
Edit (2023-03-05): These have been overhauled with a new
landmine category for invisible issues. Check CONTRIBUTING.md
for the complete and current current definition of all weird code
🔗 ZUN bugs: Broken
code that results from logic errors or incorrect programming
language/API/hardware use, with enough evidence in the code to indicate that
ZUN did not intend the bug. Fixing these issues must not affect hypothetical
replay compatibility, and any resulting visual changes must match ZUN's
🔗 ZUN quirks:
Weird code that looks incorrect in context. Fixing these issues would change
gameplay enough to desync a hypothetical replay recorded on the original
version, or affect the visuals so much that the result is no longer faithful
to ZUN's original release. It might very well be called a fangame at that
🔗 ZUN bloat:
Code that wastes memory, CPU cycles, or even just the mental capacity of
anyone trying to read and understand the code. If you want to write a
particularly resource-intensive mod, these are the places you could claim
some of those resources from.
All crashes are bugs
All blitting issues related to inappropriate VRAM byte alignment are
The idea of splitting TH01 across three executables is its biggest
source of bloat. It wastes disk space, the game doesn't even make use of the
memory gained from unloading unneeded code and data, it complicates the
build process and code structure with inconsistencies between the individual
binaries, and the required inter-process communication via shared memory
adds another piece of global state mutation headache.
Since I'm not in the business of writing fanfiction, I won't offer any
option that fixes quirks. That's where all of you can come in, and
use ReC98 as a base for remasters and remakes. As for bloat and bugs though,
there are many ways we could go from here:
If you want to ultimately try porting the game yourself, but still
support ReC98 somehow, I can recommend the ZUN code cleanup goal.
This is the most conservative option that leaves all bugs and quirks in
place and only removes bloat, rearchitecting the codebase so that
it's easier to work with.
For an improved gameplay experience on PC-98, choose the TH01
Anniversary Edition goal. In addition to the above code cleanup, this
goal fixes every bug with the game, most notably all the sprite
flickering by implementing a completely new renderer, while maintaining
hypothetical replay compatibility to ZUN's original release.
If you're mainly interested in seeing any variety of TH01 ported away
from PC-98 to any system, choose the Portability to non-PC-98 systems
goal. In this one, I'm going to develop the abstraction layers that would
ultimately bring this game to the aforementioned Doom level of portability,
while still keeping it running with better than original performance on
Replay support is also something you could order…
… as is Multilingual translation support (on PC-98), for those
sweet non-ASCII characters if that's your thing.
Then again, with all these choices in mind, maybe we should just let TH01 be
what it is: ZUN's first game, evidence for the truth that no programmer
writes good code the first time around, and more of a historical curiosity
than anything you'd want to maintain and modernize. The idea of moving on to
the next game and decompiling all 5 PC-98 Touhou games in order has
certainly shown to be
popular among the backers who funded this 100% goal.
Since the beginning of the year, I've been dramatically raising the level of
quality and care I've been putting into this project, leading to 9 of the 10
longest blog posts having been written in the past 8 months. The community
reception has been even more supportive as well, with all of you still
regularly selling out the store in return. To match the level of quality
with the community demand, I'm raising push prices from
to per push, as of this blog
post. 📝 As usual, I'm going to deliver any
existing orders in the backlog at the value they were originally purchased
at. Due to the way the cap has to be calculated, these contributions now
appear to have increased in value by 25%.
However, I do realize that this might make regular pushes prohibitively
expensive for some. This could especially prevent all these exciting modding
goals from ever getting off the ground. Thinking about it though, the push
system is only really necessary for the core reverse-engineering business,
where longer, concentrated stretches of work allow me to study a new piece
of code in a larger context and improve the quality of the final result. In
contrast, modding-related goals could theoretically be segmented into
arbitrarily small portions of work, as I have a clear idea of where I want
to go and how to get there.
Thus, I'm introducing microtransactions, now available for all
modding-related goals. These allow you to order fractional pieces of work
for as low as 1 €, which I will immediately deliver without requiring others
to fund a full push first. Edit (2022-08-16): And then the
store still sold out with a single regular contribution by
nrook towards more reverse-engineering. Guess that this
experiment will have to wait a little while longer, then… 😅
Next up: Taking a break and recovering from crunch time by improving video
playback on this blog and working on Shuusou Gyoku,
before returning to Touhou in September.
Oh look, it's another rather short and straightforward boss with a rather
small number of bugs and quirks. Yup, contrary to the character's
popularity, Mima's premiere is really not all that special in terms of code,
and continues the trend established with
📝 Kikuri and
📝 SinGyoku. I've already covered
📝 the initial sprite-related bugs last November,
so this post focuses on the main code of the fight itself. The overview:
The TH01 Mima fight consists of 3 phases, with phases 1 and 3 each
corresponding to one half of the 12-HP bar.
📝 Just like with SinGyoku, the distinction
between the red-white and red parts is purely visual once again, and doesn't
reflect anything about the boss script. As usual, all of the phases have to
be completed in order.
Phases 1 and 3 cycle through 4 danmaku patterns each, for a total of 8.
The cycles always start on a fixed pattern.
3 of the patterns in each phase feature rotating white squares, thus
introducing a new sprite in need of being unblitted.
Phase 1 additionally features the "hop pattern" as the last one in its
cycle. This is the only pattern where Mima leaves the seal in the center of
the playfield to hop from one edge of the playfield towards the other, while
also moving slightly higher up on the Y axis, and staying on the final
position for the next pattern cycle. For the first time, Mima selects a
random starting edge, which is then alternated on successive cycles.
Since the square entities are local to the respective pattern function,
Phase 1 can only end once the current pattern is done, even if Mima's HP are
already below 6. This makes Mima susceptible to the
📝 test/debug mode HP bar heap corruption bug.
Phase 2 simply consists of a spread-in teleport back to Mima's initial
position in the center of the playfield. This would only have been strictly
necessary if phase 1 ended on the hop pattern, but is done regardless of the
previous pattern, and does provide a nice visual separation between the two
That's it – nothing special in Phase 3.
And there aren't even any weird hitboxes this time. What is maybe
special about Mima, however, is how there's something to cover about all of
her patterns. Since this is TH01, it's won't surprise anyone that the
rotating square patterns are one giant copy-pasta of unblitting, updating,
and rendering code. At least ZUN placed the core polar→Cartesian
transformation in a separate function for creating regular polygons
with an arbitrary number of sides, which might hint toward some more varied
shapes having been planned at one point?
5 of the 6 patterns even follow the exact same steps during square update
Calculate square corner coordinates
Unblit the square
Update the square angle and radius
Use the square corner coordinates for spawning pellets or missiles
Recalculate square corner coordinates
Render the square
Notice something? Bullets are spawned before the corner coordinates
are updated. That's why their initial positions seem to be a bit off – they
are spawned exactly in the corners of the square, it's just that it's
the square from 8 frames ago.
Once ZUN reached the final laser pattern though, he must have noticed that
there's something wrong there… or maybe he just wanted to fire those
lasers independently from the square unblit/update/render timer for a
change. Spending an additional 16 bytes of the data segment for conveniently
remembering the square corner coordinates across frames was definitely a
When Mima isn't shooting bullets from the corners of a square or hopping
across the playfield, she's raising flame pillars from the bottom of the playfield within very specifically calculated
random ranges… which are then rendered at byte-aligned VRAM positions, while
collision detection still uses their actual pixel position. Since I don't
want to sound like a broken record all too much, I'll just direct you to
📝 Kikuri, where we've seen the exact same issue with the teardrop ripple sprites.
The conclusions are identical as well.
However, I'd say that the saddest part about this pattern is how choppy it
is, with the circle/pillar entities updating and rendering at a meager 7
FPS. Why go that low on purpose when you can just make the game render ✨
smoothly ✨ instead?
The reason quickly becomes obvious: With TH01's lack of optimization, going
for the full 56.4 FPS would have significantly slowed down the game on its
intended 33 MHz CPUs, requiring more than cheap surface-level ASM
optimization for a stable frame rate. That might very well have been ZUN's
reason for only ever rendering one circle per frame to VRAM, and designing
the pattern with these time offsets in mind. It's always been typical for
PC-98 developers to target the lowest-spec models that could possibly still
run a game, and implementing dynamic frame rates into such an engine-less
game is nothing I would wish on anybody. And it's not like TH01 is
particularly unique in its choppiness anyway; low frame rates are actually a
rather typical part of the PC-98 game aesthetic.
The final piece of weirdness in this fight can be found in phase 1's hop
pattern, and specifically its palette manipulation. Just from looking at the
pattern code itself, each of the 4 hops is supposed to darken the hardware
palette by subtracting #444 from every color. At the last hop,
every color should have therefore been reduced to a pitch-black
#000, leaving the player completely blind to the movement of
the chasing pellets for 30 frames and making the pattern quite ghostly
indeed. However, that's not what we see in the actual game:
Looking at the frame counter, it appears that something outside the
pattern resets the palette every 40 frames. The only known constant with a
value of 40 would be the invincibility frames after hitting a boss with the
Orb, but we're not hitting Mima here…
But as it turns out, that's exactly where the palette reset comes from: The
hop animation darkens the hardware palette directly, while the
📝 infamous 12-parameter boss collision handler function
unconditionally resets the hardware palette to the "default boss palette"
every 40 frames, regardless of whether the boss was hit or not. I'd classify
this as a bug: That function has no business doing periodic hardware palette
resets outside the invincibility flash effect, and it completely defies
common sense that it does.
That explains one unexpected palette change, but could this function
possibly also explain the other infamous one, namely, the temporary green
discoloration in the Konngara fight? That glitch comes down to how the game
actually uses two global "default" palettes: a default boss
palette for undoing the invincibility flash effect, and a default
stage palette for returning the colors back to normal at the end of
the bomb animation or when leaving the Pause menu. And sure enough, the
stage palette is the one with the green color, while the boss
palette contains the intended colors used throughout the fight. Sending the
latter palette to the graphics chip every 40 frames is what corrects
the discoloration, which would otherwise be permanent.
The green color comes from BOSS7_D1.GRP, the scrolling
background of the entrance animation. That's what turns this into a clear
bug: The stage palette is only set a single time in the entire fight,
at the beginning of the entrance animation, to the palette of this image.
Apart from consistency reasons, it doesn't even make sense to set the stage
palette there, as you can't enter the Pause menu or bomb during a blocking
And just 3 lines of code later, ZUN loads BOSS8_A1.GRP, the
main background image of the fight. Moving the stage palette assignment
there would have easily prevented the discoloration.
But yeah, as you can tell, palette manipulation is complete jank in this
game. Why differentiate between a stage and a boss palette to begin with?
The blocking Pause menu function could have easily copied the original
palette to a local variable before darkening it, and then restored it after
closing the menu. It's not so easy for bombs as the intended palette could
change between the start and end of the animation, but the code could have
still been simplified a lot if there was just one global "default palette"
variable instead of two. Heck, even the other bosses who manipulate their
palettes correctly only do so because they manually synchronize the two
after every change. The proper defense against bugs that result from wild
mutation of global state is to get rid of global state, and not to put up
safety nets hidden in the middle of existing effect code.
In any case, that's Mima done! 7th PC-98 Touhou boss fully
decompiled, 24 bosses remaining, and 59 functions left in all of TH01.
In other thrilling news, my call for secondary funding priorities in new
TH01 contributions has given us three different priorities so far. This
raises an interesting question though: Which of these contributions should I
now put towards TH01 immediately, and which ones should I leave in the
backlog for the time being? Since I've never liked deciding on priorities,
let's turn this into a popularity contest instead: The contributions with
the least popular secondary priorities will go towards TH01 first, giving
the most popular priorities a higher chance to still be left over after TH01
is done. As of this delivery, we'd have the following popularity order:
TH05 (1.67 pushes), from T0182
Seihou (1 push), from T0184
TH03 (0.67 pushes), from T0146
Which means that T0146 will be consumed for TH01 next, followed by T0184 and
then T0182. I only assign transactions immediately before a delivery though,
so you all still have the chance to change up these priorities before the
Next up: The final boss of TH01 decompilation, YuugenMagan… if the current
or newly incoming TH01 funds happen to be enough to cover the entire fight.
If they don't turn out to be, I will have to pass the time with some Seihou
work instead, missing the TH01 anniversary deadline as a result.Edit (2022-07-18): Thanks to Yanga for
securing the funding for YuugenMagan after all! That fight will feature
slightly more than half of all remaining code in TH01's
REIIDEN.EXE and the single biggest function in all of PC-98
Touhou, let's go!
More than 50% of all PC-98 Touhou game code has now been
reverse-engineered! 🎉 While this number isn't equally distributed among the
games, we've got one game very close to 100% and reverse-engineered most of
the core features of two others. During the last 32 months of continuous
funding, I've averaged an overall speed of 1.11% total RE per month. That
looks like a decent prediction of how much more time it will take for 100%
across all games – unless, of course, I'd get to work towards some of the
non-RE goals in the meantime.
70 functions left in TH01, with less than 10,000 ASM instructions
remaining! Due to immense hype, I've temporarily raised the cap by 50% until
August 15. With the last TH01 pushes delivering at roughly 1.5× of the
currently calculated average speed, that should be more than enough to get
TH01 done – especially since I expect YuugenMagan to come with lots of
redundant code. Therefore, please also request a secondary priority for
these final TH01 RE contributions.
So, how did this card-flipping stage obstacle delivery get so horribly
delayed? With all the different layouts showcased in the 28 card-flipping
stages, you'd expect this to be among the more stable and bug-free parts of
the codebase. Heck, with all stage objects being placed on a 32×32-pixel
grid, this is the first TH01-related blog post this year that doesn't have
to describe an alignment-related unblitting glitch!
That alone doesn't mean that this code is free from quirky behavior though,
and we have to look no further than the first few lines of the collision
handling for round bumpers to already find a whole lot of that. Simplified,
they do the following:
Immediately, you wonder why these assignments only exist for the Y
coordinate. Sure, hitting a bumper from the left or right side should happen
less often, but it's definitely possible. Is it really a good idea to warp
the Orb to the top or bottom edge of a bumper regardless?
What's more important though: The fact that these immediate assignments
exist at all. The game's regular Orb physics work by producing a Y velocity
from the single force acting on the Orb and a gravity factor, and are
completely independent of its current Y position. A bumper collision does
also apply a new force onto the Orb further down in the code, but these
assignments still bypass the physics system and are bound to have
some knock-on effect on the Orb's movement.
To observe that effect, we just have to enter Stage 18 on the 地獄/Jigoku route, where it's particularly trivial to
reproduce. At a 📝 horizontal velocity of ±4,
these assignments are exactly what can cause the Orb to endlessly
bounce between two bumpers. As rudimentary as the Orb's physics may be, just
letting them do their work would have entirely prevented these loops:
Now, you might be thinking that these Y assignments were just an attempt to
prevent the Orb from colliding with the same bumper again on the next frame.
After all, those 24 pixels exactly correspond to ⅓ of the height of a
bumper's hitbox with an additional pixel added on top. However, the game
already perfectly prevents repeated collisions by turning off collision
testing with the same bumper for the next 7 frames after a collision. Thus,
we can conclude that ZUN either explicitly coded bumper collision handling
to facilitate these loops, or just didn't take out that code after
inevitably discovering what it did. This is not janky code, it's not a
glitch, it's not sarcasm from my end, and it's not the game's physics being
But wait. Couldn't these assignments just be a remnant from a time in
development before ZUN decided on the 7-frame delay on further
collisions? Well, even that explanation stops holding water after the next
few lines of code. Simplified, again:
What's important here is the part that's not in the code – namely,
anything that handles X velocities of -8 or +8. In those cases, the Orb
simply continues in the same horizontal direction. The manual Y assignment
is the only part of the code that actually prevents a collision there, as
the newly applied force is not guaranteed to be enough:
Forgetting to handle ⅖ of your discrete X velocity cases is simply not
something you do by accident. So we might as well say that ZUN deliberately
designed the game to behave exactly as it does in this regard.
Bumpers also come in vertical or horizontal bar shapes. Their collision
handling also turns off further collision testing for the next 7 frames, and
doesn't do any manual coordinate assignment. That's definitely a step up in
cleanliness from round bumpers, but it doesn't seem to keep in mind that the
player can fire a new shot every 4 frames when standing still. That makes it
immediately obvious why this works:
That's the most well-known case of reducing the Orb's horizontal velocity to
0 by exactly hitting it with shots in its center and then button-mashing it
through a horizontal bar. This also works with vertical bars and yields even
more interesting results there, but if we want to have any chance of
understanding what happens there, we have to first go over some basics:
Collision detection for all stage obstacles is done in row-major
order from the top-left to the bottom-right corner of the
All obstacles are collision-tested independently from each other, with
the collision response code immediately following the test.
The hitboxes for bumper bars extend far past their 32×32 sprites to make
sure that the Orb can collide with them from any side. They are a
pixel-perfect* 87×56 pixels for horizontal bars, and 57×87 pixels for
vertical ones. Yes, that's no typo, they really do differ in one pixel.
Changing the Y velocity during such a collision just involves applying a
new force with the magnitude of the negated current Y velocity, which can be
done multiple times during a frame without changing the result. This
explains why the force is correctly inverted in the clip above, despite the
Orb colliding with two bumpers simultaneously.
Lacking a similar force system, the X coordinate is simply directly
However, if that were everything the game did, kicking the Orb into a column
of vertical bumper bars would lead them to behave more like a rope that the
Orb can climb, as the initial collision with two hitboxes cancels out the
intended sign change that reflects the Orb away from the bars:
While that would have been a fun gameplay mechanic on its own, it
immediately breaks apart once you place two vertical bumper bars next to
each other. Due to how these bumper bar hitboxes extend past their sprites,
any two adjacent vertical bars will end up with the exact same hitbox in
absolute screen coordinates. Stage 17 on the
魔界/Makai route contains exactly such a layout:
ZUN's workaround: Setting a "vertical bumper bar block flag" after any
collision with such a bar, which simply disables any collision with
any vertical bar for the next 7 frames. This quick hack made all
vertical bars work as intended, and avoided the need for involving the Orb's
X velocity in any kind of physics system.
Edit (2022-07-12): This flag only works around glitches
that would be caused by simultaneously colliding with more than one vertical
bar. The actual response to a bumper bar collision still remains unaffected,
and is very naive:
Horizontal bars always invert the Orb's Y velocity
Vertical bars invert either the Y or X velocity depending on whether
the Orb's current X velocity is 0 (Y) or not (X)
These conditions are only correct if the Orb comes in at an angle roughly
between 45° and 135° on either side of a bar. If it's anywhere close to 0°
or 180°, this response will be incorrect, and send the Orb straight
through the bar. Since the large hitboxes make this easily possible, you can
still get the Orb to climb a vertical column, or glide along a horizontal
With that clarified, we can now try mashing the Orb into these two vertical
At first, that workaround doesn't seem to make a difference here. As we
expect, the frame numbers now tell us that only one of the two bumper bars
in a row activates, but we couldn't have told otherwise as the number of
bars has no effect on newly applied Y velocity forces. On a closer look, the
Orb's rise to the top of the playfield is in fact caused by that
workaround though, combined with the unchanged top-to-bottom order of
collision testing. As soon as any bumper bar completed its 7
collision delay frames, it resets the aforementioned flag, which already
reactivates collision handling for any remaining vertical bumper bars during
the same frame. Look out for frames with both a 7 and a 1, like the one marked in the video above:
The 7 will always appear before
the 1 in the row-major order. Whenever
this happens, the current oscillation period is cut down from 7 to 6
frames – and because collision testing runs from top to bottom, this will
always happen during the falling part. Depending on the Y velocity, the
rising part may also be cut down to 6 frames from time to time, but that one
at least has a chance to last for the full 7 frames. This difference
adds those crucial extra frames of upward movement, which add up to send the
Orb to the top. Without the flag, you'd always see the Orb oscillating
between a fixed range of the bar column.
Finally, it's the "top of playfield" force that gradually slows down the Orb
and makes sure it ultimately only moves at sub-pixel velocities, which have
no visible effect. Because
📝 the regular effect of gravity is reset with
each newly applied force, it's completely negated during most of the climb.
This even holds true once the Orb reached the top: Since the Orb requires a
negative force to repeatedly arrive up there and be bounced back, this force
will stay active for the first 5 of the 7 collision frames and not move the
Orb at all. Once gravity kicks in at the 5th frame and adds 1 to
the Y velocity, it's already too late: The new velocity can't be larger than
0.5, and the Orb only has 1 or 2 frames before the flag reset causes it to
be bounced back up to the top again.
Portals, on the other hand, turn out to be much simpler than the old
description that ended up on Touhou Wiki in October 2005 might suggest.
Everything about their teleportations is random: The destination portal, the
exit force (as an integer between -9 and +9), as well as the exit X
velocity, with each of the
📝 5 distinct horizontal velocities having an
equal chance of being chosen. Of course, if the destination portal is next
to the left or right edge of the playfield and it chooses to fire the Orb
towards that edge, it immediately bounces off into the opposite direction,
whereas the 0 velocity is always selected with a constant 20% probability.
The selection process for the destination portal involves a bit more than a
single rand() call. The game bundles all obstacles in a single
structure of dynamically allocated arrays, and only knows how many obstacles
there are in total, not per type. Now, that alone wouldn't have much
of an impact on random portal selection, as you could simply roll a random
obstacle ID and try again if it's not a portal. But just to be extra cute,
ZUN instead iterates over all obstacles, selects any non-entered portal with
a chance of ¼, and just gives up if that dice roll wasn't successful after
16 loops over the whole array, defaulting to the entered portal in that
In all its silliness though, this works perfectly fine, and results in a
chance of 0.7516(𝑛 - 1) for the Orb exiting out of the
same portal it entered, with 𝑛 being the total number of portals in a
stage. That's 1% for two portals, and 0.01% for three. Pretty decent for a
random result you don't want to happen, but that hurts nobody if it does.
The one tiny ZUN bug with portals is technically not even part of the newly
decompiled code here. If Reimu gets hit while the Orb is being sent through
a portal, the Orb is immediately kicked out of the portal it entered, no
matter whether it already shows up inside the sprite of the destination
portal. Neither of the two portal sprites is reset when this happens,
leading to "two Orbs" being visible simultaneously.
This makes very little sense no matter how you look at it. The Orb doesn't
receive a new velocity or force when this happens, so it will simply
re-enter the same portal once the gameplay resumes on Reimu's next life:
That left another ½ of a push over at the end. Way too much time to finish
FUUIN.exe, way too little time to start with Mima… but the bomb
animation fit perfectly in there. No secrets or bugs there, just a bunch of
sprite animation code wasting at least another 82 bytes in the data segment.
The special effect after the kuji-in sprites uses the same single-bitplane
32×32 square inversion effect seen at the end of Kikuri's and Sariel's
entrance animation, except that it's a 3-stack of 16-rings moving at 6, 7,
and 8 pixels per frame respectively. At these comparatively slow speeds, the
byte alignment of each square adds some further noise to the discoloration
pattern… if you even notice it below all the shaking and seizure-inducing
hardware palette manipulation.
And yes, due to the very destructive nature of the effect, the game does in
fact rely on it only being applied to VRAM page 0. While that will cause
every moving sprite to tear holes into the inverted squares along its
trajectory, keeping a clean playfield on VRAM page 1 is what allows all that
pixel damage to be easily undone at the end of this 89-frame animation.
Next up: Mima! Let's hope that stage obstacles already were the most complex
part remaining in TH01…
The "bad" news first: Expanding to Stripe in order to support Google Pay
requires bureaucratic effort that is not quite justified yet, and would only
be worth it after the next price increase.
Visualizing technical debt has definitely been overdue for a while though.
With 1 of these 2 pushes being focused on this topic, it makes sense to
summarize once again what "technical debt"
means in the context of ReC98, as this info was previously kind of scattered
over multiple blog posts. Mainly, it encompasses
any ZUN-written code
that we did name and reverse-engineer,
but which we simply moved out into dedicated files that are then
#included back into the big .ASM translation units,
without worrying about decompilation or proving undecompilability for
Technically (ha), it would also include all of master.lib, which has
always been compiled into the binaries in this way, and which will require
quite a bit of dedicated effort to be moved out into a properly linkable
library, once it's feasible. But this code has never been part of any
progress metric – in fact, 0% RE is
defined as the total number of x86 instructions in the binary minus
any library code. There is also no relation between instruction numbers and
the time it will take to finalize master.lib code, let alone a precedent of
how much it would cost.
If we now want to express technical debt as a percentage, it's clear where
the 100% point would be: when all RE'd code is also compiled in from a
translation unit outside the big .ASM one. But where would 0% be? Logically,
it would be the point where no reverse-engineered code has ever been moved
out of the big translation units yet, and nothing has ever been decompiled.
With these boundary points, this is what we get:
Not too bad! So it's 6.22% of total RE that we will have to revisit at some
point, concentrated mostly around TH04 and TH05 where it resulted from a
focus on position independence. The prices also give an accurate impression
of how much more work would be required there.
But is that really the best visualization? After all, it requires an
understanding of our definition of technical debt, so it's maybe not the
most useful measurement to have on a front page. But how about subtracting
those 6.22% from the number shown on the RE% bars? Then, we get this:
Which is where we get to the good news: Twitter surprisingly helped me out
in choosing one visualization over the other, voting
7:2 in favor of the Finalized version. While this one requires
you to manually calculate € finalized - € RE'd to
obtain the raw financial cost of technical debt, it clearly shows, for the
first time, how far away we are from the main goal of fully decompiling all
5 games… at least to the extent it's possible.
Now that the parser is looking at these recursively included .ASM files for
the first time, it needed a small number of improvements to correctly handle
the more advanced directives used there, which no automatic disassembler
would ever emit. Turns out I've been counting some directives as
instructions that never should have been, which is where the additional
0.02% total RE came from.
One more overcounting issue remains though. Some of the RE'd assembly slices
included by multiple games contain different if branches for
each game, like this:
; An example assembly file included by both TH04's and TH05's MAIN.EXE:
if (GAME eq 5)
; (Code for TH05)
; (Code for TH04)
Currently, the parser simply ignores if, else, and
endif, leading to the combined code of all branches being
counted for every game that includes such a file. This also affects the
calculated speed, and is the reason why finalization seems to be slightly
faster than reverse-engineering, at currently 471 instructions per push
compared to 463. However, it's not that bad of a signal to send: Most of the
not yet finalized code is shared between TH04 and TH05, so finalizing it
will roughly be twice as fast as regular reverse-engineering to begin with.
(Unless the code then turns out to be twice as complex than average code…
For completeness, finalization is now also shown as part of the per-commit metrics. Now it's clearly visible what I was
doing in those very slow five months between P0131 and P0140, where
the progress bar didn't move at all: Repaying 3.49% of previously
accumulated technical debt across all games. 👌
As announced, I've also implemented a new caching system for this website,
as the second main feature of these two pushes. By appending a hash string
to the URLs of static resources, your browser should now both cache them
forever and re-download them once they did change on the server. This
avoids the unnecessary (and quite frankly, embarrassing) re-requests for all
static resources that typically just return a 304 Not Modified
response. As a result, the blog should now load a bit faster on repeated
visits, especially on slower connections. That should allow me to
deliberately not paginate it for another few years, without it getting all
too slow – and should prepare us for the day when our first game
reaches 100% and the server will get smashed.
However, I am open to changing the progress blog link in the
navigation bar at the top to the list of tags, once
people start complaining.
Apart frome some more invisible correctness and QoL improvements, I've also
prepared some new funding goals, but I'll cover those once the store
reopens, next year. Syntax highlighting for code snippets would have also
been cool, but unfortunately didn't make it into those two pushes. It's
still on the list though!
Next up: Back to RE with the TH03 score file format, and other code that
Made it through almost three years without a price increase! It's been
overdue for a while, though.
With the last months being full of rather research- and documentation-heavy
pushes, I've been just about able to keep up with the existing
subscriptions. By now, the amount of quality control and documentation I
found myself putting into this project has far surpassed the raw
reverse-engineering work. Back at the beginning of 2019 when I decided on
the previous push price of 30 €, I didn't have this blog nor the current
aspirations at code quality. Neither of these have ever been reflected in
the price, and I still find it hard to put a number on them. On the other
hand, I continue to dislike the typical Patreon model of no inherent defined
obligations on my part, and no direct association of the resulting work with
the person who funded it. You might have noticed that I don't use the word
"donations" anywhere, and instead refer to them as "orders" or "purchases" –
and that's precisely for this reason.
The result, however, has been a sold-out store for pretty much all of 2021.
I can only begin to imagine how much potential revenue I've already lost
from people who might have wanted to contribute at one point, but couldn't,
and have already written off this project…
Raising prices is pretty much the only way to get the pending workload back
to a more comfortable amount. I also thought about a two-tiered system: Have
a documentation-less option for 30 €, and take 60 € for any push that should
be accompanied by a blog post. However, skimping on documentation will
compromise the quality of the code as well. Writing these blog posts
presents another chance of improving it before release, which has made quite
a difference on many occasions. And after all, this documentation is the one
thing about ReC98 that people mainly interact with. As long as we haven't
hit 100% RE, the actual code seems to be an afterthought, which is perfectly
understandable: Why start work on a bigger mod or port now if the code is
steadily improving in every aspect, and it all will be just a bit more
maintainable in a few months?
But why go more commercial then, and especially now? If the recent attention
to spaztron64's PC-98 Touhou
collection package is any indication, ReC98 has a way bigger career
potential than the dead-end RL job I found myself in. Demand for fixed
translations and replay
support is definitely there – and given that these haven't been done so
far, it's very likely that I'll end up as the one to implement such mods,
especially if that should happen before reaching 100% RE or PI. People also
still seem to want* a port to IBM-compatible DOS,
📝 even though this makes no sense? But if
this is something you all want to pay for, then sure, why not.
And even right now, working on ReC98 sure beats writing junk software using
ill-suited technologies for highly corporate clients, or living close to a
world where academic papers are valued higher than working and maintained
code. I am in fact very happy whenever I'm done with that for the day, and
get to work on ReC98! Who would have thought.
So let's try to grow this into an actual business and raise prices to match
demand, going up to a nicely divisible 60 € per push. If you all still
manage to regularly sell out the store at this level and I get to
raise prices again, I should be able to reduce RL work further and therefore
raise the cap as well. Now that I've also clarified a potential route
towards self-employment, I'm going to react to these sell-out events more
quickly, and with smaller raises. So, no further immediate doubling in the
Now, will this delay the currently highly awaited 100% completion of TH01
past August 2022, the 25th anniversary of its release? We'll see once we're
back to an almost empty backlog, after I'm done with the TH01 Sariel fight.
I'm hopeful that such a price increase will give a new voice to the goals
and priorities of less wealthy potential patrons. This crowdfunding is very
much designed to be hacked by "microtransactions" – small contributions with
specific requests that require other, larger generic contributions to be
fulfilled – and I'd like to see more of that. 😛 And even if 60 € per push
is already more than the combined fandom wants to pay, that means I can get
the 📝 16-bit build system done before the
first big 100% release. (Trust me, you really want that!)
I will still deliver the entire current backlog at the value the
contributions were originally purchased at. Due to the way the cap has to be
calculated, these contributions now appear to have doubled in value. All
existing subscriptions will then pay for half of their original pushes
starting with their respective December 2021 transaction.
Next up: A bunch of smaller website features, including:
a caching strategy for static content,
a third set of percentage bars to visualize the remaining technical
and, hopefully, some new payment methods that expand the number of
countries I can accept money from. (Yes, this was requested at one point
earlier this year!)
Secured a 30-hour RL workweek to leave plenty of time for this project,
Touhou Patch Center's commissioned MediaWiki update work is also nearing
completion, time to reopen the store! Since it's been a long time, here's
an overview of where we currently are in each game and binary, and what
the next logical step would be:
MAIN.EXE: The final PC-98-specific low-level rendering functions are blocked by a single inconsistent and thus undecompilable assembly instruction. Rather than going for 📝 code generation and turning the rest of the function into a mess, I'd like to introduce a new build step between compilation and linking to patch "mistakes" like these. This would be an investment of 1-2 pushes into more readable code. Otherwise, I could continue with
player character movement and rendering code, followed by the code for player shots and the individual shot types,
bomb rendering, or
the stage/game clear screens and their score calculations.
MAINE.EXE: The ending sequences. Required for the multilingual translations that Touhou Patch Center might commission once they become viable.
OP.EXE: Story Mode initialization, or the title screen.
MAIN.EXE: The enemy structure? Or the bullet structure, maybe? Either way, we're still missing a lot of essential gameplay structures.
MAINL.EXE: The stage end / VS dialogue followed by the stage start screen. Required for the multilingual translations that Touhou Patch Center might commission once they become viable.
OP.EXE: The game title slide-in animation, followed by the High Score menu and the Music Room. The latter will be done in parallel with the TH05 version.
There's a lot of partially reverse-engineered but not yet decompiled code, including all of this game's custom entity types. Covering that would significantly boost finalization% for very little money. Otherwise, we could go for either
Gengetsu's boss script
HUD rendering (in parallel with TH05)
in-game dialog (in parallel with TH05)
player update code (in parallel with TH05), required for all midbosses in this game
player shot control functions
the end-of-stage bonus calculation
MAINE.EXE: High score name entry.
OP.EXE: High Score menu, followed by the Music Room. Will be done in parallel with TH04's version.
MAIN.EXE: Got quite a lot of segment splits where we could immediately continue:
midboss and boss script code, either continuing with the in-game order and the Stage 2 midboss fight, or going directly for
Mai & Yuki,
the Extra Stage midboss,
stage tile rendering
in-game dialog (in parallel with TH04)
player update code (in parallel with TH04)
items and extends
the single-color areas of boss backgrounds, which use the GRCG's TDW mode
HUD rendering (in parallel with TH05)
boss sprite rendering boilerplate
player shot collision detection and rendering
MAINE.EXE: The staff roll animation.
But as always, you can request pretty much any other part of any game.
We're now at a pretty good place as far as arbitrary requests are
concerned, as I simply can't decide myself where to put all the current
pending contributions in the funding backlog. 😅 By
spending only the missing amount of money to complete any of those, you can
capture any of those "fractional" contributions towards a specific goal.
The next specific requests are going to set the priorities of this project
for quite some time! The best strategy: Spend a low amount of money on
something very specific, and watch as existing generic contributions will
necessarily have to be put towards making that specific goal happen 😛
ReC98 would highly benefit from a build server – both in order to
immediately spot issues like this one, and as a service for modders.
Even more so than the usual open-source project of its size, I would say.
But that might be exactly
because it doesn't seem like something you can trivially outsource
to one of the big CI providers for open-source projects, and quickly set
it up with a few lines of YAML.
That might still work in the beginning, and we might get by with a regular
64-bit Windows 10 and DOSBox running the exact build tools from the DevKit.
Ideally, though, such a server should really run the optimal configuration
of a 32-bit Windows 10, allowing both the 32-bit and the 16-bit build step
to run natively, which already is something that no popular CI service out
there offers. Then, we'd optimally expand to Linux, every other Windows
version down to 95, emulated PC-98 systems, other TASM versions… yeah, it'd
be a lot. An experimental project all on its own, with additional hosting
costs and probably diminishing returns, the more it expands…
I've added it as a category to the order form, let's see how much interest
there is once the store reopens (which will be at the beginning of May, at
the latest). That aside, it would 📝 also be
a great project for outside contributors!
So, technical debt, part 8… and right away, we're faced with TH03's
low-level input function, which
📝 once📝 again📝 insists on being word-aligned in a way we
can't fake without duplicating translation units.
Being undecompilable isn't exactly the best property for a function that
has been interesting to modders in the past: In 2018,
spaztron64 created an
ASM-level mod that hardcoded more ergonomic key bindings for human-vs-human
multiplayer mode: 2021-04-04-TH03-WASD-2player.zip
However, this remapping attempt remained quite limited, since we hadn't
(and still haven't) reached full position independence for TH03 yet.
There's quite some potential for size optimizations in this function, which
would allow more BIOS key groups to already be used right now, but it's not
all that obvious to modders who aren't intimately familiar with x86 ASM.
Therefore, I really wouldn't want to keep such a long and important
function in ASM if we don't absolutely have to…
… and apparently, that's all the motivation I needed? So I took the risk,
and spent the first half of this push on reverse-engineering
TCC.EXE, to hopefully find a way to get word-aligned code
segments out of Turbo C++ after all.
And there is! The -WX option, used for creating
applications, messes up all sorts of code generation aspects in weird
ways, but does in fact mark the code segment as word-aligned. We can
consider ourselves quite lucky that we get to use Turbo C++ 4.0, because
this feature isn't available in any previous version of Borland's C++
That allowed us to restore all the decompilations I previously threw away…
well, two of the three, that lookup table generator was too much of a mess
in C. But what an abuse this is. The
subtly different code generation has basically required one creative
workaround per usage of -WX. For example, enabling that option
causes the regular PUSH BP and POP BP prolog and
epilog instructions to be wrapped with INC BP and
DEC BP, for some reason:
inc bp ; ???
mov bp, sp
; [… function code …]
dec bp ; ???
Luckily again, all the functions that currently require -WX
don't set up a stack frame and don't take any parameters.
While this hasn't directly been an issue so far, it's been pretty
close: snd_se_reset(void) is one of the functions that require
word alignment. Previously, it shared a translation unit with the
immediately following snd_se_play(int new_se), which does take
a parameter, and therefore would have had its prolog and epilog code messed
up by -WX.
Since the latter function has a consistent (and thus, fakeable) alignment,
I simply split that code segment into two, with a new -WX
translation unit for just snd_se_reset(void). Problem solved –
after all, two C++ translation units are still better than one ASM
translation unit. Especially with all the
previous #include improvements.
The rest was more of the usual, getting us 74% done with repaying the
technical debt in the SHARED segment. A lot of the remaining
26% is TH04 needing to catch up with TH03 and TH05, which takes
comparatively little time. With some good luck, we might get this
done within the next push… that is, if we aren't confronted with all too
many more disgusting decompilations, like the two functions that ended this
If we are, we might be needing 10 pushes to complete this after all, but
that piece of research was definitely worth the delay. Next up: One more of
Alright, back to continuing the master.hpp transition started
in P0124, and repaying technical debt. The last blog post already
announced some ridiculous decompilations… and in fact, not a single
one of the functions in these two pushes was decompilable into
idiomatic C/C++ code.
As usual, that didn't keep me from trying though. The TH04 and TH05
version of the infamous 16-pixel-aligned, EGC-accelerated rectangle
blitting function from page 1 to page 0 was fairly average as far as
unreasonable decompilations are concerned.
The big blocker in TH03's MAIN.EXE, however, turned out to be
the .MRS functions, used to render the gauge attack portraits and bomb
backgrounds. The blitting code there uses the additional FS and GS segment
registers provided by the Intel 386… which
are not supported by Turbo C++'s inline assembler, and
can't be turned into pointers, due to a compiler bug in Turbo C++ that
generates wrong segment prefix opcodes for the _FS and
Apparently I'm the first one to even try doing that with this compiler? I
haven't found any other mention of this bug…
Compiling via assembly (#pragma inline) would work around
this bug and generate the correct instructions. But that would incur yet
another dependency on a 16-bit TASM, for something honestly quite
What we can always do, however, is using __emit__() to simply
output x86 opcodes anywhere in a function. Unlike spelled-out inline
assembly, that can even be used in helper functions that are supposed to
inline… which does in fact allow us to fully abstract away this compiler
bug. Regular if() comparisons with pseudo-registers
wouldn't inline, but "converting" them into C++ template function
specializations does. All that's left is some C preprocessor abuse
to turn the pseudo-registers into types, and then we do retain a
normal-looking poke() call in the blitting functions in the
Yeah… the result is
I may have gone too far in a few places…
One might certainly argue that all these ridiculous decompilations
actually hurt the preservation angle of this project. "Clearly, ZUN
couldn't have possibly written such unreasonable C++ code.
So why pretend he did, and not just keep it all in its more natural ASM
form?" Well, there are several reasons:
Future port authors will merely have to translate all the
pseudo-registers and inline assembly to C++. For the former, this is
typically as easy as replacing them with newly declared local variables. No
need to bother with function prolog and epilog code, calling conventions, or
the build system.
No duplication of constants and structures in ASM land.
As a more expressive language, C++ can document the code much better.
Meticulous documentation seems to have become the main attraction of ReC98
these days – I've seen it appreciated quite a number of times, and the
continued financial support of all the backers speaks volumes. Mods, on the
other hand, are still a rather rare sight.
Having as few .ASM files in the source tree as possible looks better to
casual visitors who just look at GitHub's repo language breakdown. This way,
ReC98 will also turn from an "Assembly project" to its rightful state
of "C++ project" much sooner.
And finally, it's not like the ASM versions are
gone – they're still part of the Git history.
Unfortunately, these pushes also demonstrated a second disadvantage in
trying to decompile everything possible: Since Turbo C++ lacks TASM's
fine-grained ability to enforce code alignment on certain multiples of
bytes, it might actually be unfeasible to link in a C-compiled object file
at its intended original position in some of the .EXE files it's used in.
Which… you're only going to notice once you encounter such a case. Due to
the slightly jumbled order of functions in the
📝 second, shared code segment, that might
be long after you decompiled and successfully linked in the function
And then you'll have to throw away that decompilation after all 😕 Oh
well. In this specific case (the lookup table generator for horizontally
flipping images), that decompilation was a mess anyway, and probably
helped nobody. I could have added a dummy .OBJ that does nothing but
enforce the needed 2-byte alignment before the function if I
really insisted on keeping the C version, but it really wasn't
Now that I've also described yet another meta-issue, maybe there'll
really be nothing to say about the next technical debt pushes?
Next up though: Back to actual progress
again, with TH01. Which maybe even ends up pushing that game over the 50%
And indeed, I got to end my vacation with a lot of image format and
blitting code, covering the final two formats, .GRC and .BOS. .GRC was
nothing noteworthy – one function for loading, one function for
byte-aligned blitting, and one function for freeing memory. That's it –
not even a unblitting function for this one. .BOS, on the other hand…
…has no generic (read: single/sane) implementation, and is only
implemented as methods of some boss entity class. And then again for
Sariel's dress and wand animations, and then again for Reimu's
animations, both of which weren't even part of these 4 pushes. Looking
forward to decompiling essentially the same algorithms all over again… And
that's how TH01 became the largest and most bloated PC-98 Touhou game. So
yeah, still not done with image formats, even at 44% RE.
This means I also had to reverse-engineer that "boss entity" class… yeah,
what else to call something a boss can have multiple of, that may or may
not be part of a larger boss sprite, may or may not be animated, and that
may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this
class. Since renaming all these variables in ASM land is tedious anyway, I
went the extra mile and directly defined separate, meaningful names for
the entities of all bosses. These also now document the natural order in
which the bosses will ultimately be decompiled. So, unless a backer
requests anything else, this order will be:
(code for regular card-flipping stages)
As everyone kind of expects from TH01 by now, this class reveals yet
another… um, unique and quirky piece of code architecture. In
addition to the position and hitbox members you'd expect from a class like
this, the game also stores the .BOS metadata – width, height, animation
frame count, and 📝 bitplane pointer slot
number – inside the same class. But if each of those still corresponds to
one individual on-screen sprite, how can YuugenMagan have 5 eye sprites,
or Kikuri have more than one soul and tear sprite? By duplicating that
metadata, of course! And copying it from one entity to another
At this point, I feel like I even have to congratulate the game for not
actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760
bytes of waste would have definitely been noticeable in the DOS days.
Makes much more sense to waste that amount of space on an unused C++
exception handler, and a bunch of redundant, unoptimized blitting
(Thinking about it, YuugenMagan fits this entire system perfectly. And
together with its position in the game's code – last to be decompiled
means first on the linker command line – we might speculate that
YuugenMagan was the first boss to be programmed for TH01?)
So if a boss wants to use sprites with different sizes, there's no way
around using another entity. And that's why Girl-Elis and Bat-Elis are two
distinct entities internally, and have to manually sync their position.
Except that there's also a third one for Attacking-Girl-Elis,
because Girl-Elis has 9 frames of animation in total, and the global .BOS
bitplane pointers are divided into 4 slots of only 8 images each.
Same for SinGyoku, who is split into a sphere entity, a
person entity, and a… white flash entity for all three forms,
all at the same resolution. Or Konngara's facial expressions, which also
require two entities just for themselves.
And once you decompile all this code, you notice just how much of it the
game didn't even use. 13 of the 50 bytes of the boss entity class are
outright unused, and 10 bytes are used for a movement clamping and lock
system that would have been nice if ZUN also used it outside of
Kikuri's soul sprites. Instead, all other bosses ignore this system
completely, and just
the X/Y coordinates of the boss entities directly.
As for the rendering functions, 5 out of 10 are unused. And while those
definitely make up less than half of the code, I still must have
spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis'
entrance animation, the class provides functions for wavy blitting and
unblitting, which use a separate X coordinate for every line of the
sprite. But there's also an unused and sort of broken one for unblitting
two overlapping wavy sprites, located at the same Y coordinate. This might
indicate that Elis could originally split herself into two sprites,
similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind
of animation effect, who knows.
After over 3 months of TH01 progress though, it's finally time to look at
other games, to cover the rest of the crowdfunding backlog. Next up: Going
back to TH05, and getting rid of those last PI false positives. And since
I can potentially spend the next 7 weeks on almost full-time ReC98 work,
I've also re-opened the store until October!
TH01 pellets are coming up next, and for the first time, we'll have the
chance to move hardcoded sprite data from ASM land to C land. As it would
turn out, bad luck with the 2-byte alignment at the end of
REIIDEN.EXE's data segment pretty much forces us to declare
TH01's pellet sprites in C if we want to decompile the final few pellet
functions without ugly workarounds for the float literals there. And while
I could have just converted them into a C array and called it a day, it
did raise the question of when we are going to do this The Right And
Moddable Way, by auto-converting actual image files into ASM or C arrays
during the build process. These arrays are even more annoying to edit in
C, after all – unlike TASM, the old C++ we have to work with doesn't
support binary number literals, only hexadecimal or, gasp, octal.
Without the explicit funding for such a converter,
I reached out to
GitHub, asking backers and outside contributors whether they'd be in
favor of it. As something that requires no RE skills and collides with
nothing else, it would be a perfect task for C/C++ coders who want to
support ReC98 with something other than money.
And surprisingly, those still exist!
Jonathan Campbell, of
went ahead and implemented all the required functionality, within just a
few days. Thanks again! The result is probably a lot more portable than it
would have been if I had written it. Which is pretty relevant for future
port authors – any additional tooling we write ourselves should not
add to the list of problems they'll have to worry about.
Right now, all of the sprites are #included from the big ASM
dump files, which means that they have to be converted before those files
are assembled during the 32-bit build part. We could have introduced a
third distinct build step there, perhaps even a 16-bit one so that we can
use Turbo C++ 4.0J to also compile the converter… However, the more
reasonable option was to do this at the beginning of the 32-bit build
step, and add a 32-bit Windows C++ compiler to the list of tools required
for ReC98's build process.
And the best choice for ReC98 is, in fact… 🥁… the 20-year-old Borland C++
5.5 freeware release.
See the README for a lengthy justification, as well as
So yes, all sprites mentioned in the GitHub issue can now be modded by
simply editing .BMP files, using an image editor of your choice. 🖌
And now that that's dealt with, it's finally time for more actual
progress! TH01 pellets coming tomorrow.
Did WindowsTiger just cover
2% over all games on his
own? While not all of that passed my review, +1.59% RE and +1.66% PI
over all 5 games is still pretty noteworthy, and comfortably pushes TH05
over the 25% mark in RE, and the 60% mark in PI.
While I definitely do appreciate such contributions, reviewing and
adapting these to my current code organization standards also takes more
time than I'd like it to take. And taken to this level, it does
kind of undermine this crowdfunding project, causing both a literal
denial of service and exactly the stress that this crowdfunding was
designed to avoid. Most of the time, I can't merge all of that as-is
without knowingly creating annoyances down the line. But I don't want to
just ignore it either, or reject every non-perfect commit…
That's also why I let it slide this time, due to some of the RE work in
there being genuinely amazing. In the future though, be aware that your
chance of having your work merged diminishes the further you move ahead of
my current master branch. In extreme cases like this one, I'll
then just be waiting until enough generic reverse-engineering pushes have
accrued, and treat the merge as regular work.
But now, time to continue with the regular programming… I am kind
of exhausted from all of this, so no bullets for the next two
Touhou Patch Center pushes, still… Good thing there's still plenty of
simpler things with big percentage gains to be done:
WindowsTiger mostly focused on OP.EXE which I tended to
neglect, as the big MAIN executables seemed to be more
interesting to my backers. (It's not like anyone ever requestedOP to be done either – like, who even cares about boring menu
source code, right?) Good that I therefore sort of left it as low-hanging
fruit to be grabbed by outside contributors – because now, TH04's and
TH05's OP.EXE are close to 100% position-independence. The
GENSOU.SCR format is pretty much the only thing missing there
right now, so let's finally go all the way there, I'd say.
And in TH05, there's still Reimu's shot type functions left to be
With no feedback to 📝 last week's blog post,
I assume you all are fine with how things are going? Alright then, another
one towards position independence, with the same approach as before…
Since -Tom- wanted to learn something about how the PC-98
EGC is used in TH04 and TH05, I took a look at master.lib's
egc_shift_*() functions. These simply do a hardware-accelerated
memmove() of any VRAM region, and are used for screen shaking
effects. Hover over the image below for the raw effect:
Then, I finally wanted to take a look at the bullet structures, but it
required way too much reverse-engineering to even start within ¾ of
a position independence push. Even with the help of uth05win –
bullet handling was changed quite a bit from TH04 to TH05.
What I ultimately settled on was more raw, "boring" PI work based around
an already known set of functions. For this one, I looked at vector
construction… and this time, that actually made the games a little
bit more position-independent, and wasn't just all about removing
false positives from the calculation. This was one of the few sets of
functions that would also apply to TH01, and it revealed just how
chaotically that game was coded. This one commit shows three ways how ZUN
stored regular 2D points in TH01:
"regularly", like in master.lib's Point structure (X
first, Y second)
reversed, (Y first and X second), then obviously with two distinct
variables declared next to each other
… yeah. But in more productive news, this did actually lay the
groundwork for TH04 and TH05 bullet structures. Which might even be coming
up within the next big, 5-push order from Touhou Patch Center? These are
the priorities I got from them, let's see how close I can get!
So, here we have the first two pushes with an explicit focus on position
independence… and they start out looking barely different from regular
reverse-engineering? They even already deduplicate a bunch of item-related
code, which was simple enough that it required little additional work?
Because the actual work, once again, was in comparing uth05win's
interpretations and naming choices with the original PC-98 code? So that
we only ended up removing a handful of memory references there?
(Oh well, you can mod item drops now!)
So, continuing to interpret PI as a mere by-product of reverse-engineering
might ultimately drive up the total PI cost quite a bit. But alright then,
let's systematically clear out some false positives by looking at
master.lib function calls instead… and suddenly we get the PI progress we
were looking for, nicely spread out over all games since TH02. That kinda
makes it sound like useless work, only done because it's dictated by some
counting algorithm on a website. But decompilation will want to convert
all of these values to decimal anyway. We're merely doing that right now,
across all games.
Then again, it doesn't actually make any game more
position-independent, and only proves how position-independent it already
was. So I'm really wondering right now whether I should just rush
actual position independence by simply identifying structures and
their sizes, and not bother with members or false positives until that's
done. That would certainly get the job done for TH04 and TH05 in just a
few more pushes, but then leave all the proving work (and the road
to 100% PI on the front page) to reverse-engineering.
I don't know. Would it be worth it to have a game that's "maybe
fully position-independent", only for there to maybe be rare edge
cases where it isn't?
Or maybe, continuing to strike a balance between identifying false
positives (fast) and reverse-engineering structures (slow) will continue
to work out like it did now, and make us end up close to the current
estimate, which was attractive enough to sell out the crowdfunding for the
first time… 🤔
Please give feedback! If possible, by Friday evening UTC+1, before I start
working on the next PI push, this time with a focus on TH04.