And then, the supposed boilerplate code revealed yet another confusing issue
that quickly forced me back to serial work, leading to no parallel progress
made with Shuusou Gyoku after all. 🥲 The list of functions I put together
for the first ½ of this push seemed so boring at first, and I was so sure
that there was almost nothing I could possibly talk about:
TH02's gaiji animations at the start and end of each stage, resembling
opening and closing window blind slats. ZUN should have maybe not defined
the regular whitespace gaiji as what's technically the last frame of the
closing animation, but that's a minor nitpick. Nothing special there
The remaining spawn functions for TH04's and TH05's gather circles. The
only dumb antic there is the way ZUN initializes the template for bullets
fired at the end of the animation, featuring ASM instructions that are
equivalent to what Turbo C++ 4.0J generates for the __memcpy__
intrinsic, but show up in a different order. Which means that they must have
been handwritten. I already figured that out in 2022
though, so this was just more of the same.
EX-Alice's override for the game's main 16×16 sprite sheet, loaded
during her dialog script. More of a naming and consistency challenge, if
The rendering function for the big 48×48 explosion sprite, which also
features the same clipping quirk?
That's three instances of ZUN removing sprites way earlier than you'd want
to, intentionally deciding against those sprites flying smoothly in and out
of the playfield. Clearly, there has to be a system and a reason behind it.
Turns out that it can be almost completely blamed on master.lib. None of the
super_*() sprite blitting functions can clip the rendered
sprite to the edges of VRAM, and much less to the custom playfield rectangle
we would actually want here. This is exactly the wrong choice to make for a
game engine: Not only is the game developer now stuck with either rendering
the sprite in full or not at all, but they're also left with the burden of
manually calculating when not to display a sprite.
However, strictly limiting the top-left screen-space coordinate to
(0, 0) and the bottom-right one to (640, 400) would actually
stop rendering some of the sprites much earlier than the clipping conditions
we encounter in these games. So what's going on there?
The answer is a combination of playfield borders, hardware scrolling, and
master.lib needing to provide at least some help to support the
latter. Hardware scrolling on PC-98 works by dividing VRAM into two vertical
partitions along the Y-axis and telling the GDC to display one of them at
the top of the screen and the other one below. The contents of VRAM remain
unmodified throughout, which raises the interesting question of how to deal
with sprites that reach the vertical edges of VRAM. If the top VRAM row that
starts at offset 0x0000 ends up being displayed below
the bottom row of VRAM that starts at offset 0x7CB0 for 399 of
the 400 possible scrolling positions, wouldn't we then need to vertically
wrap most of the rendered sprites?
For this reason, master.lib provides the super_roll_*()
functions, which unconditionally perform exactly this vertical wrapping. But
this creates a new problem: If these functions still can't clip, and don't
even know which VRAM rows currently correspond to the top and bottom row of
the screen (since master.lib's graph_scrollup() function
doesn't retain this information), won't we also see sprites wrapping around
the actual edges of the screen? That's something we certainly
wouldn't want in a vertically scrolling game…
The answer is yes, and master.lib offers no solution for this issue. But
this is where the playfield borders come in, and helpfully cover 16 pixels
at the top and 16 pixels at the bottom of the screen. As a result, they can
hide up to 32 rows of potentially wrapped sprite pixels below them:
And that's how the lowest possible top Y coordinate for sprites blitted
using the master.lib super_roll_*() functions during the
scrolling portions of TH02, TH04, and TH05 is not 0, but -16. Any lower, and
you would actually see some of the sprite's upper pixels at the
bottom of the playfield, as there are no more opaque black text cells to
cover them. Theoretically, you could lower this number for
some animation frames that start with multiple rows of transparent
pixels, but I thankfully haven't found any instance of ZUN using such a
hack. So far, at least…
Visualized like that, it all looks quite simple and logical, but for days, I
did not realize that these sprites were rendered to a scrolling VRAM.
This led to a much more complicated initial explanation involving the
invisible extra space of VRAM between offsets 0x7D00 and
0x7FFF that effectively grant a hidden additional 9.6 lines
below the playfield. Or even above, since PC-98 hardware ignores the highest
bit of any offset into a VRAM bitplane segment
(& 0x7FFF), which prevents blitting operations from
accidentally reaching into a different bitplane. Together with the
aforementioned rows of transparent pixels at the top of these midboss
sprites, the math would have almost worked out exactly.
The need for manual clipping also applies to the X-axis. Due to the lack of
scrolling in this dimension, the boundaries there are much more
straightforward though. The minimum left coordinate of a sprite can't fall
below 0 because any smaller coordinate would wrap around into the
📝 tile source area and overwrite some of the
pixels there, which we obviously don't want to re-blit every frame.
Similarly, the right coordinate must not extend into the HUD, which starts
at 448 pixels.
The last part might be surprising if you aren't familiar with the PC-98 text
chip. Contrary to the CGA and VGA text modes of IBM-compatibles, PC-98 text
cells can only use a single color for either their foreground or
background, with the other pixels being transparent and always revealing the
pixels in VRAM below. If you look closely at the HUD in the images above,
you can see how the background of cells with gaiji glyphs is slightly
brighter (◼ #100) than the opaque black
cells (◼ #000) surrounding them. This
rather custom color clearly implies that those pixels must have been
rendered by the graphics GDC. If any other sprite was rendered below the
HUD, you would equally see it below the glyphs.
So in the end, I did find the clear and logical system I was looking for,
and managed to reduce the new clipping conditions down to a
set of basic rules for each edge. Unfortunately, we also need a second
macro for each edge to differentiate between sprites that are smaller or
larger than the playfield border, which is treated as either 32×32 (for
super_roll_*()) or 32×16 (for non-"rolling"
super_*() functions). Since smaller sprites can be fully
contained within this border, the games can stop rendering them as soon as
their bottom-right coordinate is no longer seen within the playfield, by
comparing against the clipping boundaries with <= and
>=. For example, a 16×16 sprite would be completely
invisible once it reaches (16, 0), so it would still be rendered at
(17, 1). A larger sprite during the scrolling part of a stage, like,
say, the 64×64 midbosses, would still be rendered if their top-left
coordinate was (0, -16), so ZUN used < and
> comparisons to at least get an additional pixel before
having to stop rendering such a sprite. Turbo C++ 4.0J sadly can't
constant-fold away such a difference in comparison operators.
And for the most part, ZUN did follow this system consistently. Except for,
of course, the typical mistakes you make when faced with such manual
decisions, like how he treated TH04's Stage 4 midboss as a "small" sprite
below 32×32 pixels (it's 64×64), losing that precious one extra pixel. Or
how the entire rendering code for the 48×48 boss explosion sprite pretends
that it's actually 64×64 pixels large, which causes even the initial
transformation into screen space to be misaligned from the get-go.
But these are additional bugs on top of the single
one that led to all this research.
Because that's what this is, a bug. 🐞 Every resulting pixel boundary is a
systematic result of master.lib's unfortunate lack of clipping. It's as much
of a bug as TH01's byte-aligned rendering of entities whose internal
position is not byte-aligned. In both cases, the entities are alive,
simulated, and partake in collision detection, but their rendered appearance
doesn't accurately reflect their internal position.
Initially, I classified
📝 the sudden pop-in of TH05's Stage 5 midboss
as a quirk because we had no conclusive evidence that this wasn't
intentional, but now we do. There have been multiple explanations for why
ZUN put borders around the playfield, but master.lib's lack of sprite
clipping might be the biggest reason.
And just like byte-aligned rendering, the clipping conditions can easily be
removed when porting the game away from PC-98 hardware. That's also what
uth05win chose to do: By using OpenGL and not having to rely on hardware
scrolling, it can simply place every sprite as a textured quad at its exact
position in screen space, and then draw the black playfield borders on top
in the end to clip everything in a single draw call. This way, the Stage 5
midboss can smoothly fly into the playfield, just as defined by its movement
Meanwhile, I designed the interface of the 📝 generic blitter used in the TH01 Anniversary Edition entirely around
clipping the blitted sprite at any explicit combination of VRAM edges. This
was nothing I tacked on in the end, but a core aspect that informed the
architecture of the code from the very beginning. You really want to
have one and only one place where sprite clipping is done right – and
only once per sprite, regardless of how many bitplanes you want to write to.
Which brings us to the goal that the final ¼ of this push went toward. I
thought I was going to start cleaning up the
📝 player movement and rendering code, but
that turned out too complicated for that amount of time – especially if you
want to start with just cleanup, preserving all original bugs for the
Fixing and smoothening player and Orb movement would be the next big task in
Anniversary Edition development, needing about 3 pushes. It would start with
more performance research into runtime-shifting of larger sprites, followed
by extending my generic blitter according to the results, writing new
optimized loaders for the original image formats, and finally rewriting all
rendering code accordingly. With that code in place, we can then start
cleaning up and fixing the unique code for each boss, one by one.
Until that's funded, the code still contains a few smaller and easier pieces
of code that are equally related to rendering bugs, but could be dealt with
in a more incremental way. Line rendering is one of those, and first needs
some refactoring of every call site, including
📝 the rotating squares around Mima and
📝 YuugenMagan's pentagram. So far, I managed
to remove another 1,360 bytes from the binary within this final ¼ of a push,
but there's still quite a bit to do in that regard.
This is the perfect kind of feature for smaller (micro-)transactions. Which
means that we've now got meaningful TH01 code cleanup and Anniversary
Edition subtasks at every price range, no matter whether you want to invest
a lot or just a little into this goal.
If you can, because Ember2528 revealed the plan behind
his Shuusou Gyoku contributions: A full-on Linux port of the game, which
will be receiving all the funding it needs to happen. 🐧 Next up, therefore:
Turning this into my main project within ReC98 for the next couple of
months, and getting started by shipping the long-awaited first step towards
I've raised the cap to avoid the potential of rounding errors, which might
prevent the last needed Shuusou Gyoku push from being correctly funded. I
already had to pick the larger one of the two pending TH02 transactions for
this push, because we would have mathematically ended up
1/25500 short of a full push with the smaller
transaction. And if I'm already at it, I might
as well free up enough capacity to potentially ship the complete OpenGL
backend in a single delivery, which is currently estimated to cost 7 pushes
More than three months without any reverse-engineering progress! It's been
way too long. Coincidentally, we're at least back with a surprising 1.25% of
overall RE, achieved within just 3 pushes. The ending script system is not
only more or less the same in TH04 and TH05, but actually originated in
TH03, where it's also used for the cutscenes before stages 8 and 9. This
means that it was one of the final pieces of code shared between three of
the four remaining games, which I got to decompile at roughly 3× the usual
speed, or ⅓ of the price.
The only other bargains of this nature remain in OP.EXE. The
Music Room is largely equivalent in all three remaining games as well, and
the sound device selection, ZUN Soft logo screens, and main/option menus are
the same in TH04 and TH05. A lot of that code is in the "technically RE'd
but not yet decompiled" ASM form though, so it would shift Finalized% more
significantly than RE%. Therefore, make sure to order the new
Finalization option rather than Reverse-engineering if you
want to make number go up.
So, cutscenes. On the surface, the .TXT files look simple enough: You
directly write the text that should appear on the screen into the file
without any special markup, and add commands to define visuals, music, and
other effects at any place within the script. Let's start with the basics of
how text is rendered, which are the same in all three games:
First off, the text area has a size of 480×64 pixels. This means that it
does not correspond to the tiled area painted into TH05's
Since the font weight can be customized, all text is rendered to VRAM.
This also includes gaiji, despite them ignoring the font weight
The system supports automatic line breaks on a per-glyph basis, which
move the text cursor to the beginning of the red text area. This might seem like a piece of long-forgotten
ancient wisdom at first, considering the absence of automatic line breaks in
Windows Touhou. However, ZUN probably implemented it more out of pure
necessity: Text in VRAM needs to be unblitted when starting a new box, which
is way more straightforward and performant if you only need to worry
about a fixed area.
The system also automatically starts a new (key press-separated) text
box after the end of the 4th line. However, the text cursor is
also unconditionally moved to the top-left corner of the yellow name
area when this happens, which is almost certainly not what you expect, given
that automatic line breaks stay within the red area. A script author might
as well add the necessary text box change commands manually, if you're
forced to anticipate the automatic ones anyway…
Due to ZUN forgetting an unblitting call during the TH05 refactoring of the
box background buffer, this feature is even completely broken in that game,
as any new text will simply be blitted on top of the old one:
Overall, the system is geared toward exclusively full-width text. As
exemplified by the 2014 static English patches and the screenshots in this
blog post, half-width text is possible, but comes with a lot of
Each loop of the script interpreter starts by looking at the next
byte to distinguish commands from text. However, this step also skips
over every ASCII space and control character, i.e., every byte
≤ 32. If you only intend to display full-width glyphs anyway, this
sort of makes sense: You gain complete freedom when it comes to the
physical layout of these script files, and it especially allows commands
to be freely separated with spaces and line breaks for improved
readability. Still, enforcing commands to be separated exclusively by
line breaks might have been even better for readability, and would have
freed up ASCII spaces for regular text…
Non-command text is blindly processed and rendered two bytes at a
time. The rendering function interprets these bytes as a Shift-JIS
string, so you can use half-width characters here. While the
second byte can even be an ASCII 0x20 space due to the
parser's blindness, all half-width characters must still occur in pairs
that can't be interrupted by commands:
As a workaround for at least the ASCII space issue, you can replace
them with any of the unassigned
Shift-JIS lead bytes – 0x80, 0xA0, or
anything between 0xF0 and 0xFF inclusive.
That's what you see in all screenshots of this post that display
Finally, did you know that you can hold ESC to fast-forward
through these cutscenes, which skips most frame delays and reduces the rest?
Due to the blocking nature of all commands, the ESC key state is
only updated between commands or 2-byte text groups though, so it can't
interrupt an ongoing delay.
Superficially, the list of game-specific differences doesn't look too long,
and can be summarized in a rather short table:
It's when you get into the implementation that the combined three systems
reveal themselves as a giant mess, with more like 56 differences between the
games. Every single new weird line of code opened up
another can of worms, which ultimately made all of this end up with 24
pieces of bloat and 14 bugs. The worst of these should be quite interesting
for the general PC-98 homebrew developers among my audience:
The final official 0.23 release of master.lib has a bug in
graph_gaiji_put*(). To calculate the JIS X 0208 code point for
a gaiji, it is enough to ADD 5680h onto the gaiji ID. However,
these functions accidentally use ADC instead, which incorrectly
adds the x86 carry flag on top, causing weird off-by-one errors based on the
previous program state. ZUN did fix this bug directly inside master.lib for
TH04 and TH05, but still needed to work around it in TH03 by subtracting 1
from the intended gaiji ID. Anyone up for maintaining a bug-fixed master.lib
The worst piece of bloat comes from TH03 and TH04 needlessly
switching the visibility of VRAM pages while blitting a new 320×200 picture.
This makes it much harder to understand the code, as the mere existence of
these page switches is enough to suggest a more complex interplay between
the two VRAM pages which doesn't actually exist. Outside this visibility
switch, page 0 is always supposed to be shown, and page 1 is always used
for temporarily storing pixels that are later crossfaded onto page 0. This
is also the only reason why TH03 has to render text and gaiji onto both VRAM
pages to begin with… and because TH04 doesn't, changing the picture in the
middle of a string of text is technically bugged in that game, even though
you only get to temporarily see the new text on very underclocked PC-98
These performance implications made me wonder why cutscenes even bother with
writing to the second VRAM page anyway, before copying each crossfade step
to the visible one.
📝 We learned in June how costly EGC-"accelerated" inter-page copies are;
shouldn't it be faster to just blit the image once rather than twice?
Well, master.lib decodes .PI images into a packed-pixel format, and
unpacking such a representation into bitplanes on the fly is just about the
worst way of blitting you could possibly imagine on a PC-98. EGC inter-page
copies are already fairly disappointing at 42 cycles for every 16 pixels, if
we look at the i486 and ignore VRAM latencies. But under the same
conditions, packed-pixel unpacking comes in at 81 cycles for every 8
pixels, or almost 4× slower. On lower-end systems, that can easily sum up to
more than one frame for a 320×200 image. While I'd argue that the resulting
tearing could have been an acceptable part of the transition between two
images, it's understandable why you'd want to avoid it in favor of the
pure effect on a slower framerate.
Really makes me wonder why master.lib didn't just directly decode .PI images
into bitplanes. The performance impact on load times should have been
negligible? It's such a good format for
the often dithered 16-color artwork you typically see on PC-98, and
deserves better than master.lib's implementation which is both slow to
decode and slow to blit.
That brings us to the individual script commands… and yes, I'm going to
document every single one of them. Some of their interactions and edge cases
are not clear at all from just looking at the code.
Almost all commands are preceded by… well, a 0x5C lead byte.
Which raises the question of whether we should
document it as an ASCII-encoded \ backslash, or a Shift-JIS-encoded
¥ yen sign. From a gaijin perspective, it seems obvious that it's a
backslash, as it's consistently displayed as one in most of the editors you
would actually use nowadays. But interestingly, iconv
-f shift-jis -t utf-8 does convert any 0x5C
lead bytes to actual ¥ U+00A5 YEN SIGN code points
Ultimately, the distinction comes down to the font. There are fonts
that still render 0x5C as ¥, but mainly do so out
of an obvious concern about backward compatibility to JIS X 0201, where this
mapping originated. Unsurprisingly, this group includes MS Gothic/Mincho,
the old Japanese fonts from Windows 3.1, but even Meiryo and Yu
Gothic/Mincho, Microsoft's modern Japanese fonts. Meanwhile, pretty much
every other modern font, and freely licensed ones in particular, render this
code point as \, even if you set your editor to Shift-JIS. And
while ZUN most definitely saw it as a ¥, documenting this code
point as \ is less ambiguous in the long run. It can only
possibly correspond to one specific code point in either Shift-JIS or UTF-8,
and will remain correct even if we later mod the cutscene system to support
Now we've only got to clarify the parameter syntax, and then we can look at
the big table of commands:
Numeric parameters are read as sequences of up to 3 ASCII digits. This
limits them to a range from 0 to 999 inclusive, with 000 and
0 being equivalent. Because there's no further sentinel
character, any further digit from the 4th one onwards is
interpreted as regular text.
Filename parameters must be terminated with a space or newline and are
limited to 12 characters, which translates to 8.3 basenames without any
directory component. Any further characters are ignored and displayed as
text as well.
Each .PI image can contain up to four 320×200 pictures ("quarters") for
the cutscene picture area. In the script commands, they are numbered like
Clears both VRAM pages by filling them with VRAM color 0. 🐞
In TH03 and TH04, this command does not update the internal text area
background used for unblitting. This bug effectively restricts usage of
this command to either the beginning of a script (before the first
background image is shown) or its end (after no more new text boxes are
started). See the image below for an
example of using it anywhere else.
Sets the font weight to a value between 0 (raw font ROM glyphs) to 3
(very thicc). Specifying any other value has no effect.
🐞 In TH04 and TH05, \b3 leads to glitched pixels when
rendering half-width glyphs due to a bug in the newly micro-optimized
ASM version of
📝 graph_putsa_fx(); see the image below for an example.
In these games, the parameter also directly corresponds to the
graph_putsa_fx() effect function, removing the sanity check
that was present in TH03. In exchange, you can also access the four
dissolve masks for the bold font (\b2) by specifying a
parameter between 4 (fewest pixels) to 7 (most
pixels). Demo video below.
Changes the text color to VRAM color 15.
Adds a color map entry: If 字 is the first code point
inside the name area on a new line, the text color is automatically set
to 15. Up to 8 such entries can be registered
before overflowing the statically allocated buffer.
🐞 The comma is assumed to be present even if the color parameter is omitted.
Plays the sound effect with the given ID.
Calls master.lib's palette_black_in() or
palette_black_out() to play a hardware palette fade
animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
Fades out BGM volume via PMD's AH=02h interrupt call,
in a non-blocking way. The fade speed can range from 1 (slowest) to 127 (fastest).
Values from 128 to 255 technically correspond to
AH=02h's fade-in feature, which can't be used from cutscene
scripts because it requires BGM volume to first be lowered via
AH=19h, and there is no command to do that.
Plays a blocking 8-frame screen shake
Shows the gaiji with the given ID from 0 to 255
at the current cursor position. Even in TH03, gaiji always ignore the
text delay interval configured with \v.
TH05's replacement for the \ga command from TH03 and
TH04. The default ID of 3 corresponds to the
gaiji. Not to be confused with \@, which starts with a backslash,
unlike this command.
Shows the gaiji.
Shows the gaiji.
Shows the gaiji.
Shows the gaiji.
Shows the gaiji.
Shows the gaiji.
Waits 0 frames (0 = forever) for an advance key to be pressed before
continuing script execution. Before waiting, TH05 crossfades in any new
text that was previously rendered to the invisible VRAM page…
🐞 …but TH04 doesn't, leaving the text invisible during the wait time.
As a workaround, \vp1 can be
used before \k to immediately display that text without a
Stops the currently playing BGM.
Restarts playback of the currently loaded BGM from the
Stops the currently playing BGM, loads a new one from the given
file, and starts playback.
Starts a new line at the leftmost X coordinate of the box, i.e., the
start of the name area. This is how scripts can "change" the name of the
currently speaking character, or use the entire 480×64 pixels without
being restricted to the non-name area.
Note that automatic line breaks already move the cursor into a new line.
Using this command at the "end" of a line with the maximum number of 30
full-width glyphs would therefore start a second new line and leave the
previously started line empty.
If this command moved the cursor into the 5th line of a box,
\s is executed afterward, with
any of \n's parameters passed to \s.
Deallocates the loaded .PI image.
Loads the .PI image with the given file into the single .PI slot
available to cutscenes. TH04 and TH05 automatically deallocate any
previous image, 🐞 TH03 would leak memory without a manual prior call to
Sets the hardware palette to the one of the loaded .PI image.
Sets the loaded .PI image as the full-screen 640×400 background
image and overwrites both VRAM pages with its pixels, retaining the
current hardware palette.
Runs \pp followed by \p@.
Ends a text box and starts a new one. Fades in any text rendered to
the invisible VRAM page, then waits 0 frames
(0 = forever) for an advance key to be
pressed. Afterward, the new text box is started with the cursor moved to
the top-left corner of the name area. \s- skips the wait time and starts the new box
Sets palette brightness via master.lib's
palette_settone() to any value from 0 (fully black) to 200
(fully white). 100 corresponds to the palette's original colors.
Preceded by a 1-frame delay unless ESC is held.
Sets the number of frames to wait between every 2 bytes of rendered
Sets the number of frames to spend on each of the 4 fade
steps when crossfading between old and new text. The game-specific
default value is also used before the first use of this command.
Shows VRAM page 0. Completely useless in
TH03 (this game always synchronizes both VRAM pages at a command
boundary), only of dubious use in TH04 (for working around a bug in \k), and the games always return to
their intended shown page before every blitting operation anyway. A
debloated mod of this game would just remove this command, as it exposes
an implementation detail that script authors should not need to worry
about. None of the original scripts use it anyway.
\w and \wk wait for the given number
\wm and \wmk wait until PMD has played
back the current BGM for the total number of measures, including
loops, given in the first parameter, and fall back on calling
\w and \wk with the second parameter as
the frame number if BGM is disabled.
🐞 Neither PMD nor MMD reset the internal measure when stopping
playback. If no BGM is playing and the previous BGM hasn't been
played back for at least the given number of measures, this command
Since both TH04 and TH05 fade in any new text from the invisible VRAM
page, these commands can be used to simulate TH03's typing effect in
those games. Demo video below.
Contrary to \k and \s, specifying 0 frames would
simply remove any frame delay instead of waiting forever.
The TH03-exclusive k variants allow the delay to be
interrupted if ⏎ Return or Shot are held down.
TH04 and TH05 recognize the k as well, but removed its
All of these commands have no effect if ESC is held.
Calls master.lib's palette_white_in() or
palette_white_out() to play a hardware palette fade
animation from or to white, spending roughly 1 frame on each of the 16 fade steps.
Immediately displays the given quarter of the loaded .PI image in
the picture area, with no fade effect. Any value ≥ 4 resets the picture area to black.
Crossfades the picture area between its current content and quarter
#4 of the loaded .PI image, spending 1 frame on each of the 4 fade steps unless
ESC is held. Any value ≥ 4 is
replaced with quarter #0.
Stops script execution. Must be called at the end of each file;
otherwise, execution continues into whatever lies after the script
buffer in memory.
TH05 automatically deallocates the loaded .PI image, TH03 and TH04
require a separate manual call to \p- to not leak its memory.
Bold values signify the default if the parameter
is omitted; \c is therefore
equivalent to \c15.
So yeah, that's the cutscene system. I'm dreading the moment I will have to
deal with the other command interpreter in these games, i.e., the
stage enemy system. Luckily, that one is completely disconnected from any
other system, so I won't have to deal with it until we're close to finishing
MAIN.EXE… that is, unless someone requests it before. And it
won't involve text encodings or unblitting…
The cutscene system got me thinking in greater detail about how I would
implement translations, being one of the main dependencies behind them. This
goal has been on the order form for a while and could soon be implemented
for these cutscenes, with 100% PI being right around the corner for the TH03
and TH04 cutscene executables.
Once we're there, the "Virgin" old-school way of static translation patching
for Latin-script languages could be implemented fairly quickly:
Establish basic UTF-8 parsing for less painful manual editing of the
Procedurally generate glyphs for the few required additional letters
based on existing font ROM glyphs. For example, we'd generate ä
by painting two short lines on top of the font ROM's a glyph,
or generate ¿ by vertically flipping the question mark. This
way, the text retains a consistent look regardless of whether the translated
game is run with an NEC or EPSON font ROM, or the that Neko Project II auto-generates if you
don't provide either.
(Optional) Change automatic line breaks to work on a per-word
basis, rather than per-glyph
That's it – script editing and distribution would be handled by your local
translation group. It might seem as if this would also work for Greek and
Cyrillic scripts due to their presence in the PC-98 font ROM, but I'm not
sure if I want to attempt procedurally shrinking these glyphs from 16×16 to
8×16… For any more thorough solution, we'd need to go for a more "Chad" kind
of full-blown translation support:
Implement text subdivisions at a sensible granularity while retaining
automatic line and box breaks
Compile translatable text into a Japanese→target language dictionary
(I'm too old to develop any further translation systems that would overwrite
modded source text with translations of the original text)
Implement a custom Unicode font system (glyphs would be taken from GNU
Unifont unless translators provide a different 8×16 font for their
Combine the text compiler with the font compiler to only store needed
glyphs as part of the translation's font file (dealing with a multi-MB font
file would be rather ugly in a Real Mode game)
Write a simple install/update/patch stacking tool that supports both
.HDI and raw-file DOSBox-X scenarios (it's different enough from thcrap to
warrant a separate tool – each patch stack would be statically compiled into
a single package file in the game's directory)
Add a nice language selection option to the main menu
(Optional) Support proportional fonts
Which sounds more like a separate project to be commissioned from
Touhou Patch Center's Open Collective funds, separate from the ReC98 cap.
This way, we can make sure that the feature is completely implemented, and I
can talk with every interested translator to make sure that their language
It's still cheaper overall to do this on PC-98 than to first port the games
to a modern system and then translate them. On the other hand, most
of the tasks in the Chad variant (3, 4, 5, and half of 2) purely deal with
the difficulty of getting arbitrary Unicode characters to work natively in a
PC-98 DOS game at all, and would be either unnecessary or trivial if we had
already ported the game. Depending on where the patrons' interests lie, it
may not be worth it. So let's see what all of you think about which
way we should go, or whether it's worth doing at all. (Edit
(2022-12-01): With Splashman's
order towards the stage dialogue system, we've pretty much confirmed that it
is.) Maybe we want to meet in the middle – using e.g. procedural glyph
generation for dynamic translations to keep text rendering consistent with
the rest of the PC-98 system, and just not support non-Latin-script
languages in the beginning? In any case, I've added both options to the
order form. Edit (2023-07-28):Touhou Patch Center has agreed to fund
a basic feature set somewhere between the Virgin and Chad level. Check the
📝 dedicated announcement blog post for more
details and ideas, and to find out how you can support this goal!
Surprisingly, there was still a bit of RE work left in the third push after
all of this, which I filled with some small rendering boilerplate. Since I
also wanted to include TH02's playfield overlay functions,
1/15 of that last push went towards getting a
TH02-exclusive function out of the way, which also ended up including that
game in this delivery.
The other small function pointed out how TH05's Stage 5 midboss pops into
the playfield quite suddenly, since its clipping test thinks it's only 32
pixels tall rather than 64:
Next up: Staying with TH05 and looking at more of the pattern code of its
boss fights. Given the remaining TH05 budget, it makes the most sense to
continue in in-game order, with Sara and the Stage 2 midboss. If more money
comes in towards this goal, I could alternatively go for the Mai & Yuki
fight and immediately develop a pretty fix for the cheeto storage
glitch. Also, there's a rather intricate
pull request for direct ZMBV decoding on the website that I've still got
TH05 has passed the 50% RE mark, with both MAIN.EXE and the
game as a whole! With that, we've also reached what -Tom-
wanted out of the project, so he's suspending his discount offer for a
Curve bullets are now officially called cheetos! 76.7% of
fans prefer this term, and it fits into the 8.3 DOS filename scheme much
better than homing lasers (as they're called in
OMAKE.TXT) or Taito
lasers (which would indeed have made sense as well).
…oh, and I managed to decompile Shinki within 2 pushes after all. That
left enough budget to also add the Stage 1 midboss on top.
So, Shinki! As far as final boss code is concerned, she's surprisingly
economical, with 📝 her background animations
making up more than ⅓ of her entire code. Going straight from TH01's
📝 final📝 bosses
to TH05's final boss definitely showed how much ZUN had streamlined
danmaku pattern code by the end of PC-98 Touhou. Don't get me wrong, there
is still room for improvement: TH05 not only
📝 reuses the same 16 bytes of generic boss state we saw in TH04 last month,
but also uses them 4× as often, and even for midbosses. Most importantly
though, defining danmaku patterns using a single global instance of the
group template structure is just bad no matter how you look at it:
The script code ends up rather bloated, with a single MOV
instruction for setting one of the fields taking up 5 bytes. By comparison,
the entire structure for regular bullets is 14 bytes large, while the
template structure for Shinki's 32×32 ball bullets could have easily been
reduced to 8 bytes.
Since it's also one piece of global state, you can easily forget to set
one of the required fields for a group type. The resulting danmaku group
then reuses these values from the last time they were set… which might have
been as far back as another boss fight from a previous stage.
And of course, I wouldn't point this out if it
didn't actually happen in Shinki's pattern code. Twice.
Declaring a separate structure instance with the static data for every
pattern would be both safer and more space-efficient, and there's
more than enough space left for that in the game's data segment.
But all in all, the pattern functions are short, sweet, and easy to follow.
patternis significantly more complex than the others, but still
far from TH01's final bosses at their worst. I especially like the clear
architectural separation between "one-shot pattern" functions that return
true once they're done, and "looping pattern" functions that
run as long as they're being called from a boss's main function. Not many
all too interesting things in these pattern functions for the most part,
except for two pieces of evidence that Shinki was coded after Yumeko:
The gather animation function in the first two phases contains a bullet
group configuration that looks like it's part of an unused danmaku
pattern. It quickly turns out to just be copy-pasted from a similar function
in Yumeko's fight though, where it is turned into actual
As one of the two places where ZUN forgot to set a template field, the
lasers at the end of the white wing preparation pattern reuse the 6-pixel
width of Yumeko's final laser pattern. This actually has an effect on
gameplay: Since these lasers are active for the first 8 frames after
Shinki's wings appear on screen, the player can get hit by them in the last
2 frames after they grew to their final width.
Speaking about that wing sprite: If you look at ST05.BB2 (or
any other file with a large sprite, for that matter), you notice a rather
weird file layout:
And it's not a limitation of the sprite width field in the BFNT+ header
either. Instead, it's master.lib's BFNT functions which are limited to
sprite widths up to 64 pixels… or at least that's what
MASTER.MAN claims. Whatever the restriction was, it seems to be
completely nonexistent as of master.lib version 0.23, and none of the
master.lib functions used by the games have any issues with larger
Since ZUN stuck to the supposed 64-pixel width limit though, it's now the
game that expects Shinki's winged form to consist of 4 physical
sprites, not just 1. Any conversion from another, more logical sprite sheet
layout back into BFNT+ must therefore replicate the original number of
sprites. Otherwise, the sequential IDs ("patnums") assigned to every newly
loaded sprite no longer match ZUN's hardcoded IDs, causing the game to
crash. This is exactly what used to happen with -Tom-'s
MysticTK automation scripts,
which combined these exact sprites into a single large one. This issue has
now been fixed – just in case there are some underground modders out there
who used these scripts and wonder why their game crashed as soon as the
Shinki fight started.
And then the code quality takes a nosedive with Shinki's main function.
Even in TH05, these boss and midboss update
functions are still very imperative:
The origin point of all bullet types used by a boss must be manually set
to the current boss/midboss position; there is no concept of a bullet type
tracking a certain entity.
The same is true for the target point of a player's homing shots…
… and updating the HP bar. At least the initial fill animation is
abstracted away rather decently.
Incrementing the phase frame variable also must be done manually. TH05
even "innovates" here by giving the boss update function exclusive ownership
of that variable, in contrast to TH04 where that ownership is given out to
the player shot collision detection (?!) and boss defeat helper
Speaking about collision detection: That is done by calling different
functions depending on whether the boss is supposed to be invincible or
Timeout conditions? No standard way either, and all done with manual
if statements. In combination with the regular phase end
condition of lowering (mid)boss HP to a certain value, this leads to quite a
convoluted control flow.
The manual calls to the score bonus functions for cleared phases at least provide some sense of orientation.
One potentially nice aspect of all this imperative freedom is that
phases can end outside of HP boundaries… by manually incrementing the
phase variable and resetting the phase frame variable to 0.
The biggest WTF in there, however, goes to using one of the 16 state bytes
as a "relative phase" variable for differentiating between boss phases that
share the same branch within the switch(boss.phase)
statement. While it's commendable that ZUN tried to reduce code duplication
for once, he could have just branched depending on the actual
boss.phase variable? The same state byte is then reused in the
"devil" pattern to track the activity state of the big jerky lasers in the
second half of the pattern. If you somehow managed to end the phase after
the first few bullets of the pattern, but before these lasers are up,
Shinki's update function would think that you're still in the phase
before the "devil" pattern. The main function then sequence-breaks
right to the defeat phase, skipping the final pattern with the burning Makai
background. Luckily, the HP boundaries are far away enough to make this
impossible in practice.
The takeaway here: If you want to use the state bytes for your custom
boss script mods, alias them to your own 16-byte structure, and limit each
of the bytes to a clearly defined meaning across your entire boss script.
One final discovery that doesn't seem to be documented anywhere yet: Shinki
actually has a hidden bomb shield during her two purple-wing phases.
uth05win got this part slightly wrong though: It's not a complete
shield, and hitting Shinki will still deal 1 point of chip damage per
frame. For comparison, the first phase lasts for 3,000 HP, and the "devil"
pattern phase lasts for 5,800 HP.
And there we go, 3rd PC-98 Touhou boss
script* decompiled, 28 to go! 🎉 In case you were expecting a fix for
the Shinki death glitch: That one
is more appropriately fixed as part of the Mai & Yuki script. It also
requires new code, should ideally look a bit prettier than just removing
cheetos between one frame and the next, and I'd still like it to fit within
the original position-dependent code layout… Let's do that some other
Not much to say about the Stage 1 midboss, or midbosses in general even,
except that their update functions have to imperatively handle even more
subsystems, due to the relative lack of helper functions.
The remaining ¾ of the third push went to a bunch of smaller RE and
finalization work that would have hardly got any attention otherwise, to
help secure that 50% RE mark. The nicest piece of code in there shows off
what looks like the optimal way of setting up the
📝 GRCG tile register for monochrome blitting
in a variable color:
mov ah, palette_index ; Any other non-AL 8-bit register works too.
; (x86 only supports AL as the source operand for OUTs.)
rept 4 ; For all 4 bitplanes…
shr ah, 1 ; Shift the next color bit into the x86 carry flag
sbb al, al ; Extend the carry flag to a full byte
; (CF=0 → 0x00, CF=1 → 0xFF)
out 7Eh, al ; Write AL to the GRCG tile register
Thanks to Turbo C++'s inlining capabilities, the loop body even decompiles
into a surprisingly nice one-liner. What a beautiful micro-optimization, at
a place where micro-optimization doesn't hurt and is almost expected.
Unfortunately, the micro-optimizations went all downhill from there,
becoming increasingly dumb and undecompilable. Was it really necessary to
save 4 x86 instructions in the highly unlikely case of a new spark sprite
being spawned outside the playfield? That one 2D polar→Cartesian
conversion function then pointed out Turbo C++ 4.0J's woefully limited
support for 32-bit micro-optimizations. The code generation for 32-bit
📝 pseudo-registers is so bad that they almost
aren't worth using for arithmetic operations, and the inline assembler just
flat out doesn't support anything 32-bit. No use in decompiling a function
that you'd have to entirely spell out in machine code, especially if the
same function already exists in multiple other, more idiomatic C++
Rounding out the third push, we got the TH04/TH05 DEMO?.REC
replay file reading code, which should finally prove that nothing about the
game's original replay system could serve as even just the foundation for
community-usable replays. Just in case anyone was still thinking that.
Next up: Back to TH01, with the Elis fight! Got a bit of room left in the
cap again, and there are a lot of things that would make a lot of
TH04 would really enjoy a large number of dedicated pushes to catch up
with TH05. This would greatly support the finalization of both games.
Continuing with TH05's bosses and midbosses has shown to be good value
for your money. Shinki would have taken even less than 2 pushes if she
hadn't been the first boss I looked at.
Oh, and I also added Seihou as a selectable goal, for the two people out
there who genuinely like it. If I ever want to quit my day job, I need to
branch out into safer territory that isn't threatened by takedowns, after
Here we go, new C code! …eh, it will still take a bit to really get
decompilation going at the speeds I was hoping for. Especially with the
sheer amount of stuff that is set in the first few significant
functions we actually can decompile, which now all has to be
correctly declared in the C world. Turns out I spent the last 2 years
screwing up the case of exported functions, and even some of their names,
so that it didn't actually reflect their calling convention… yup. That's
just the stuff you tend to forget while it doesn't matter.
To make up for that, I decided to research whether we can make use of some
C++ features to improve code readability after all. Previously, it seemed
that TH01 was the only game that included any C++ code, whereas TH02 and
later seemed to be 100% C and ASM. However, during the development of the
soon to be released new build system, I noticed that even this old
compiler from the mid-90's, infamous for prioritizing compile speeds over
all but the most trivial optimizations, was capable of quite surprising
levels of automatic inlining with class methods…
…leading the research to culminate in the mindblow that is
9d121c7 – yes, we can use C++ class methods
and operator overloading to make the code more readable, while still
generating the same code than if we had just used C and preprocessor
Looks like there's now the potential for a few pull requests from outside
devs that apply C++ features to improve the legibility of previously
decompiled and terribly macro-ridden code. So, if anyone wants to help
without spending money…
Back to actual development! Starting off this stretch with something
fairly mechanical, the few remaining generic boss and midboss state
variables. And once we start converting the constant numbers used for and
around those variables into decimal, the estimated position independence
probability immediately jumped by 5.31% for TH04's MAIN.EXE,
and 4.49% for TH05's – despite not having made the game any more position-
independent than it was before. Yup… lots of false positives in there, but
who can really know for sure without having put in the work.
But now, we've RE'd enough to finally decompile something again next,
4 years after the last decompilation of anything!
So, after introducing instruction number statistics… let's go for over 2,000 lines that won't show up there immediately That being (mid-)boss HP, position, and sprite ID variables for TH04/TH05. Doesn't sound like much, but it kind of is if you insist on decimal numbers for easier comparison with uth05win's source code.