And we're back to PC-98 Touhou for a brief interruption of the ongoing Shuusou Gyoku Linux port.
Let's clear some of the Touhou-related progress from the backlog, and use
the unconstrained nature of these contributions to prepare the
📝 upcoming non-ASCII translations commissioned by Touhou Patch Center.
The current budget won't cover all of my ambitions, but it would at least be
nice if all text in these games was feasibly translatable by the time I
officially start working on that project.
At a little over 3 pushes, it might be surprising to see that this took
longer than the
📝 TH03/TH04/TH05 cutscene system. It's
obvious that TH02 started out with a different system for in-game dialog,
but while TH04 and TH05 look identical on the surface, they only
actually share 30% of their dialog code. So this felt more like decompiling
2.4 distinct systems, as opposed to one identical base with tons of
game-specific differences on top.
The table of contents was pretty popular last time around, so let's have
another one:
Let's start with the ones from TH04 and TH05, since they are not that
broken. For TH04, ZUN started out by copy-pasting the cutscene system,
causing the result to inherit many of the caveats I already described in the
cutscene blog post:
It's still a plaintext format geared exclusively toward full-width
Japanese text.
The parser still ignores all whitespace, forcing ASCII text into hacks
with unassigned Shift-JIS lead bytes outside the second byte of a 2-byte
chunk.
Commands are still preceded by a 0x5C byte, which renders
as either a \ or a ¥ depending on your font and
interpretation of Shift-JIS.
Command parameters are parsed in exactly the same way, with all the same
limits.
A lot of the same script commands are identical, including 7 of them
that were not used in TH04's original dialog scripts.
Then, however, he greatly simplified the system. Mainly, this was done by
moving text rendering from the PC-98 graphics chip to the text chip, which
avoids the need for any text-related unblitting code, but ZUN also added a
bunch of smaller changes:
The player must advance through every dialog box by releasing any held
keys and then pressing any key mapped to a game action. There are no
timeouts.
The delay for every 2 bytes of text was doubled to 2 frames, and can't
be overridden.
Instead of holding ESC to fast-forward, pressing any key
will immediately print the entire rest of a text box.
Dialogs run in their own single-buffered frame loop, interrupting the
rest of the game. The other VRAM page keeps the background pixels required
for unblitting the face images.
All script commands that affect the graphics layer are preceded by a
1-frame delay. ZUN most likely did this because of the single-buffered
nature, as it prevents tearing on the first frame by waiting for the CRT
beam to return to the top-left corner before changing any pixels.
Both boxes are intended to contain up to 30 half-width characters on
each of their up to 3 lines, but nothing in the code enforces these limits.
There is no support for automatic line breaks or starting new boxes.
TH05 then moved from TH04's plaintext scripts to the binary
.TX2 format while removing all the unused commands copy-pasted
from the cutscene system. Except for a
single additional command intended to clear a text box, TH05's dialog
system only supports a strict subset of the features of TH04's system.
This change also introduced the following differences compared to TH04:
The game now stores the dialog of all 4 playable characters in the same
file, with a (4 + 1)-word header that indicates the byte offset
and length of each character's script. This way, it can load only the one
script for the currently played character.
Since there is no need for whitespace in a binary format, you can now
use ASCII 0x20 spaces even as the first byte of a 2-byte text
chunk! 🥳
All command parameters are now mandatory.
Filenames are now passed directly by pointer to the respective game
function. Therefore, they now need to be null-terminated, but can in turn be
as long as
📝 the number of remaining bytes in the allocated dialog segment.
In practice though, the game still runs on DOS and shares its restriction of
8.3 filenames…
When starting a new dialog box, any existing text in the other box is
now colored blue.
Thanks to ZUN messing up the return values of the command-interpreting
switch function, you can effectively use only line break and gaiji commands in the middle of text. All other
commands do execute, but the interpreter then also treats their command byte
as a Shift-JIS lead byte and places it in text RAM together with whatever
other byte follows in the script.
This is why TH04 can and does put its \= commandsinto the boxes
started with the 0 or 1 commands, but TH05 has to
put its 0x02 commands before the equivalent 0x0D.
For modding these files, you probably want to use TXDEF from
-Tom-'s MysticTK. It decodes these
files into a text representation, and its encoder then takes care of the
character-specific byte offsets in the 10-byte header. This text
representation simplifies the format a lot by avoiding all corner cases and
landmines you'd experience during hex-editing – most notably by interpreting
the box-starting 0x0D as a
command to show text that takes a string parameter, avoiding the broken
calls to script commands in the middle of text. However, you'd still have to
manually ensure an even number of bytes on every line of text.
In the entry function of TH05's dialog loop, we also encounter the hack that
is responsible for properly handling
📝 ZUN's hidden Extra Stage replay. Since the
dialog loop doesn't access the replay inputs but still requires key presses
to advance through the boxes, ZUN chose to just skip the dialog altogether in the
specific case of the Extra Stage replay being active, and replicated all
sprite management commands from the dialog script by just hardcoding
them.
And you know what? Not only do I not mind this hack, but I would have
preferred it over the actual dialog system! The aforementioned sprite
management commands effectively boil down to manual memory management,
deallocating all stage enemy and midboss sprites and thus ensuring that the
boss sprites end up at specific master.lib sprite IDs (patnums). The
hardcoded boss rendering function then expects these sprites to be available
at these exact IDs… which means that the otherwise hardcoded bosses can't
render properly without the dialog script running before them.
There is absolutely no excuse for the game to burden dialog scripts with
this functionality. Sure, delayed deallocation would allow them to blit
stage-specific sprites, but the original games don't do that; probably
because none of the two games feature an unblitting command. And even if
they did, it would have still been cleaner to expose the boss-specific
sprite setup as a single script command that can then also be called from
game code if the script didn't do so. Commands like these just are a recipe
for crashes, especially with parsers that expect fullwidth Shift-JIS
text and where misaligned ASCII text can easily cause these commands to be
skipped.
But then again, it does make for funny screenshot material if you
accidentally the deallocation and then see bosses being turned into stage
enemies:
With all the general details out of the way, here's the command reference:
0 1
0x00 0x01
Selects either the player character (0) or the boss (1) as the
currently speaking character, and moves the cursor to the beginning of
the text box. In TH04, this command also directly starts the new dialog
box, which is probably why it's not prefixed with a \ as it
only makes sense outside of text. TH05 requires a separate 0x0D command to do the
same.
\=1
0x02 0x!!
Replaces the face portrait of the currently active speaking
character with image #1 within her .CD2
file.
\=255
0x02 0xFF
Removes the face portrait from the currently active text box.
\l,filename
0x03 filename 0x00
Calls master.lib's super_entry_bfnt() function, which
loads sprites from a BFNT file to consecutive IDs starting at the
current patnum write cursor.
\c
0x04
Deallocates all stage-specific BFNT sprites (i.e., stage enemies and
midbosses), freeing up conventional RAM for the boss sprites and
ensuring that master.lib's patnum write cursor ends up at
128 /
180.
In TH05's Extra Stage, this command also replaces
📝 the sprites loaded from MIKO16.BFT with the ones from ST06_16.BFT.
\d
Deallocates all face portrait images.
The game automatically does this at the end of each dialog sequence.
However, ZUN wanted to load Stage 6 Yuuka's 76 KiB of additional
animations inside the script via \l, and would have once again
run up against the master.lib heap size limit without that extra free
memory.
\m,filename
0x05 filename 0x00
Stops the currently playing BGM, loads a new one from the given
file, and starts playback.
\m$
0x05 $ 0x00
Stops the currently playing BGM.
Note that TH05 interprets $ as a null-terminated filename as
well.
\m*
Restarts playback of the currently loaded BGM from the
beginning.
\b0,0,0
0x06 0x!!!!0x!!!!0x!!
Blits the master.lib patnum with the ID indicated by the third
parameter to the current VRAM page at the top-left screen position
indicated by the first two parameters.
\e0
Plays the sound effect with the given ID.
\t100
Sets palette brightness via master.lib's
palette_settone() to any value from 0 (fully black) to 200
(fully white). 100 corresponds to the palette's original colors.
\fo1
\fi1
Calls master.lib's palette_black_out() or
palette_black_in() to play a hardware palette fade
animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
\wo1
\wi1
0x09 0x!!
0x0A 0x!!
Calls master.lib's palette_white_out() or
palette_white_in() to play a hardware palette fade
animation from or to white, spending roughly 1 frame on each of the 16 fade steps. The
TH05 version of 0x09 also clears the text in both boxes
before the animation.
\n
0x0B
Starts a new line by resetting the X coordinate of the TRAM cursor
to the left edge of the text area and incrementing the Y coordinate.
The new line will always be the next one below the last one that was
properly started, regardless of whether the text previously wrapped to
the next TRAM row at the edge of the screen.
\g8
Plays a blocking 8-frame screen shake
animation. Copy-pasted from the cutscene parser, but actually used right
at the end of the dialog shown before TH04's Bad Ending.
\ga0
0x0C 0x!!
Shows the gaiji with the given ID from 0 to 255
at the current cursor position, ignoring the per-glyph delay.
\k0
Waits 0 frames (0 = forever) for any key
to be pressed before continuing script execution.
Takes the current dialog cursor as the top-left corner of a
240×48-pixel rectangle, and replaces all text RAM characters within that
rectangle with whitespace.
This is only used to clear the player character's text box before
Shinki's final いくよ‼ box. Shinki has two
consecutive text boxes in all 4 scripts here, and ZUN probably wanted to
clear the otherwise blue text to imply a dramatic pause before Shinki's
final sentence. Nice touch.
(You could, however, also use it after a
box-ending 0xFF command to mess with text RAM in
general.)
\#
Quits the currently running loop. This returns from either the text
loop to the command loop, or it ends the dialog sequence by returning
from the command loop back to gameplay. If this stage of the game later
starts another dialog sequence, it will start at the next script
byte.
\$
Like \#, but first waits for any key to be
pressed.
0xFF
Behaves like TH04's \$ in the text loop, and like
\# in the command loop. Hence, it's not possible in TH05 to
automatically end a text box and advance to the next one without waiting
for a key press.
Unused commands are in gray.
At the end of the day, you might criticize the system for how its landmines
make it annoying to mod in ASCII text, but it all works and does what it's
supposed to. ZUN could have written the cleanest single and central
Shift-JIS iterator that properly chunks a byte buffer into halfwidth and
fullwidth codepoints, and I'd still be throwing it out for the upcoming
non-ASCII translations in favor of something that either also supports UTF-8
or performs dictionary lookups with a full box of text.
The only actual bug can be found in the input detection, which once
again doesn't correctly handle the infamous key
up/key down scancode quirk of PC-98 keyboards. All it takes
is one wrongly placed input polling call, and suddenly you have to think
about how the update cycle behind the PC-98 keyboard state bytes
might cause the game to run the regular 2-frame delay for a single
2-byte chunk of text before it shows the full text of a box after
all… But even this bug is highly theoretical and could probably only be
observed very, very rarely, and exclusively on real hardware.
The same can't be said about TH02 though, but more on that later. Let's
first take a look at its data, which started out much simpler in that game.
The STAGE?.TXT files contain just raw Shift-JIS text with no
trace of commands or structure. Turning on the whitespace display feature in
your editor reveals how the dialog system even assumes a fixed byte
length for each box: 36 bytes per line which will appear on screen, followed
by 4 bytes of padding, which the original files conveniently use to visually
split the lines via a CR/LF newline sequence. Make sure to disable trimming
of trailing whitespace in your editor to not ruin the file when modding the
text…
Consequently, everything else is hardcoded – every effect shown between text
boxes, the face portrait shown for each box, and even how many boxes are
part of each dialog sequence. Which means that the source code now contains
a
long hardcoded list of face IDs for most of the text boxes in the game,
with the rest being part of the
dedicated hardcoded dialog scripts for 2/3 of the
game's stages.
Without the restriction to a fixed set of scripting commands, TH02 naturally
gravitated to having the most varied dialog sequences of all PC-98 Touhou
games. This flexibility certainly facilitated Mima's grand entrance
animation in Stage 4, or the different lines in Stage 4 and 5 depending on
whether you already used a continue or not. Marisa's post-boss dialog even
inserts the number of continues into the text itself – by, you guessed it,
writing to hardcoded byte offsets inside the dialog text before printing it
to the screen. But once again, I have nothing to
criticize here – not even the fact that the alternate dialog scripts have to
mutate the "box cursor" to jump to the intended boxes within the file. I
know that some people in my audience like VMs, but I would have considered
it more bloated if ZUN had implemented a full-blown scripting
language just to handle all these special cases.
Another unique aspect of TH02 is the way it stores its face portraits, which
are infamous for how hard they are to find in the original data files. These
sprites are actually map tiles, stored in MIKO_K.MPN,
and drawn using the same functions used to blit the regular map tiles to the
📝 tile source area in VRAM. We can only guess
why ZUN chose this one out of the three graphics formats he used in TH02:
BFNT supports transparency, but sacrifices one of the 16 colors to do
so. ZUN only used 15 colors for the face portraits, but might have wanted to
keep open the option to use that 16th color. The detailed
backgrounds also suggest that these images were never supposed to be
transparent to begin with.
PI is used for all bigger and non-transparent images, but ZUN would have
had to write a separate small function to blit a 48×48 subsection of such an
image. That certainly wouldn't have stopped him in the TH01 days, but he
probably was already past that point by this game.
That only leaves .MPN. Sure, he did have to slice each face into 9
separate 16×16 "map" tiles to use this format, but that's a small price to
pay in exchange for not having to write any new low-level blitting code,
especially since he must have already had an asset pipeline to generate
these files.
And since you're certainly wondering about all these black tiles at the
edges: Yes, these are not only part of the file and pad it from the required
240×192 pixels to 256×256, but also kept in memory during a stage, wasting
9.5 KiB of conventional RAM. That's 172 seconds of potential input
replay data, just for those people who might still think that we need EMS
for replays.
Alright, we've got the text, we've got the faces, let's slide in the box and
display it all on screen. Apparently though, we also have to blit the player
and option sprites using raw, low-level master.lib function calls in the
process? This can't be right, especially because ZUN
always blits the option sprite associated with the Reimu-A shot type,
regardless of which one the player actually selected. And if you keep moving
above the box area before the dialog starts, you get to see exactly how
wrong this is:
Let's look closer at Reimu's sprite during the slide-in animation, and in
the two frames before:
This one image shows off no less than 4 bugs:
ZUN blits the stationary player sprite here, regardless of whether the
player was previously moving left or right. This is a nice way of indicating
that Reimu stops moving once the dialog starts, but maybe ZUN should
have unblitted the old sprite so that the new one wouldn't have appeared on
top. The game only unblits the 384×64 pixels covered by the dialog box on
every frame of the slide-in animation, so Reimu would only appear correctly
if her sprite happened to be entirely located within that area.
All sprites are shifted up by 1 pixel in frame 2️⃣. This one is not a
bug in the dialog system, but in the main game loop. The game runs the
relevant actions in the following order:
Invalidate any map tiles covered by entities
Redraw invalidated tiles
Decrement the Y coordinate at the top of VRAM according to the
scroll speed
Update and render all game entities
Scroll in new tiles as necessary according to the scroll speed, and
report whether the game has scrolled one pixel past the end of the
map
If that happened, pretend it didn't by incrementing the value
calculated in #3 for all further frames and skipping to
#8.
Issue a GDC SCROLL command to reflect the line
calculated in #3 on the display
Wait for VSync
Flip VRAM pages
Start boss if we're past the end of the map
The problem here: Once the dialog starts, the game has already rendered
an entire new frame, with all sprites being offset by a new Y scroll
offset, without adjusting the graphics GDC's scroll registers to
compensate. Hence, the Y position in 3️⃣ is the correct one, and the
whole existence of frame 2️⃣ is a bug in itself. (Well… OK, probably a
quirk because speedrunning exists, and it would be pretty annoying to
synchronize any video regression tests of the future TH02 Anniversary
Edition if it renders one fewer frame in the middle of a stage.)
ZUN blits the option sprites to their position from frame 1️⃣. This
brings us back to
📝 TH02's special way of retaining the previous and current position in a two-element array, indexed with a VRAM page ID.
Normally, this would be equivalent to using dedicated prev and
cur structure fields and you'd just index it with the back page
for every rendering call. But if you then decide to go single-buffered for
dialogs and render them onto the front page instead…
Note that fixing bug #2 would not cancel out this one – the sprites would
then simply be rendered to their position in the frame before 1️⃣.
And of course, the fixed option sprite ID also counts as a bug.
As for the boxes themselves, it's yet another loop that prints 2-byte chunks
of Shift-JIS text at an even slower fixed interval of 3 frames. In an
interesting quirk though, ZUN assumes that every box starts with the name of
the speaking character in its first two fullwidth Shift-JIS characters,
followed by a fullwidth colon. These 6 bytes are displayed immediately at
the start of every box, without the usual delay. The resulting alignment
looks rather janky with Genjii, whose single right-padded 亀
kanji looks quite awkward with the fullwidth space between the name
and the colon. Kind of makes you wonder why ZUN just didn't spell out his
proper name, 玄爺, instead, but I get the stylistic
difference.
In Stage 4, the two-kanji assumption then breaks with Marisa's three-kanji
name, which causes the full-width colon to be printed as the first delayed
character in each of her boxes:
That's all the issues and quirks in the system itself. The scripts
themselves don't leave much room for bugs as they basically just loop over
the hardcoded face ID array at this level… until we reach the end of the
game. Previously, the slide-in animation could simply use the tile
invalidation and re-rendering system to unblit the box on each frame, which
also explained why Reimu had to be separately rendered on top. But this no
longer works with a custom-rendered boss background, and so the game just
chooses to flood-fill the area with graphics chip color #0:
For Mima's final defeat dialog though, ZUN chose to not even show the box.
He might have realized the issue by that point, or simply preferred the more
dramatic effect this had on the lines. The resulting issues, however, might
even have ramifications for such un-technical things as lore and
character dynamics. As it turns out, the code
for this dialog sequence does in fact render Mima's smiling face for all
boxes?! You only don't see it in the original game because it's rendered to
the other VRAM page that remains invisible during the dialog sequence:
Here's how I interpret the situation:
The function that launches into the final part of the dialog script
starts with dedicated
code to re-render Mima to the back page, on top of the previously
rendered planet background. Since the entire script runs on the front
page (and thus, on top of the previous frame) and the game launches into
the ending immediately after, you don't ever get to see this new partial
frame in the original game.
Showing this partial frame would also ensure that you can actually
read the dialog text without a surrounding box. Then, the white
letters won't ever be put on top of any white bullets – or, worse, be completely invisible if the
dialog is triggered in the middle of Reimu-B's bomb animation, which
fills VRAM with lots of white pixels.
Hence, we've got enough evidence to classify not showing the back page
as a ZUN
bug. 🐞
However, Mima's smiling face jars with the words she says here. Adding
the face would deviate more significantly from the original game than
removing the player shot, item, bullet, or spark sprites would. It's
imaginable that ZUN just forgot about the dedicated code that
re-rendered just Mima to the back page, but the faces add
something to the dialog, and ZUN would have clearly noticed and
fixed it if their absence wasn't intended. Heck, ZUN might have just put
something related to Mima into the code because TH02's dialog system has
no way of not drawing a face for a dialog box. Filling the face
area with graphics chip color #0, as seen in the first and third boxes
of the Extra Stage pre-boss dialog, would have been an alternative, but
that would have been equally wrong with regard to the background.
Hence, the invisible face portrait from the original game is a ZUN
quirk. 🎺
So, the future TH02 Anniversary Edition will fix the bug by showing
the back page, but retain the quirk by rewriting the dialog code to
not blit the face.
And with that, we've secured all in-game dialog for the upcoming non-ASCII
translations! The remaining 2/3 of the last push made
for a good occasion to also decompile the small amount of code related to
TH03's win messages, stored in the @0?TX.TXT files. Similar to
TH02's dialog format, these files are also split into fixed-size blocks of
3×60 bytes. But this time, TH03 loads all 60 bytes of a line, including the
CR/LF line breaking codepoints in the original files, into the statically
allocated buffer that it renders from. These control characters are then
only filtered to whitespace by ZUN's graph_putsa_fx() function.
If you remove the line breaks, you get to use the full 60 bytes on every
line.
The final commits went to the MIKO.CFG loading and saving
functions used in TH04's and TH05's OP.EXE, as well as TH04's
game startup code to finally catch up with
📝 TH05's counterpart from over 3 years ago.
This brought us right in front of the main menu rendering code in both TH04
and TH05, which is identical in both games and will be tackled in the next
PC-98 Touhou delivery.
Next up, though: Returning to Shuusou Gyoku, and adding support for SC-88Pro
recordings as BGM. Which may or may not come with a slight controversy…
Here we go, TH01 Sariel! This is the single biggest boss fight in all of
PC-98 Touhou: If we include all custom effect code we previously decompiled,
it amounts to a total of 10.31% of all code in TH01 (and 3.14%
overall). These 8 pushes cover the final 8.10% (or 2.47% overall),
and are likely to be the single biggest delivery this project will ever see.
Considering that I only managed to decompile 6.00% across all games in 2021,
2022 is already off to a much better start!
So, how can Sariel's code be that large? Well, we've got:
16 danmaku patterns; including the one snowflake detonating into a giant
94×32 hitbox
Gratuitous usage of floating-point variables, bloating the binary thanks
to Turbo C++ 4.0J's particularly horrid code generation
The hatching birds that shoot pellets
3 separate particle systems, sharing the general idea, overall code
structure, and blitting algorithm, but differing in every little detail
The "gust of wind" background transition animation
5 sets of custom monochrome sprite animations, loaded from
BOSS6GR?.GRC
A further 3 hardcoded monochrome 8×8 sprites for the "swaying leaves"
pattern during the second form
In total, it's just under 3,000 lines of C++ code, containing a total of 8
definite ZUN bugs, 3 of them being subpixel/pixel confusions. That might not
look all too bad if you compare it to the
📝 player control function's 8 bugs in 900 lines of code,
but given that Konngara had 0… (Edit (2022-07-17):
Konngara contains two bugs after all: A
📝 possible heap corruption in test or debug mode,
and the infamous
📝 temporary green discoloration.)
And no, the code doesn't make it obvious whether ZUN coded Konngara or
Sariel first; there's just as much evidence for either.
Some terminology before we start: Sariel's first form is separated
into four phases, indicated by different background images, that
cycle until Sariel's HP reach 0 and the second, single-phase form
starts. The danmaku patterns within each phase are also on a cycle,
and the game picks a random but limited number of patterns per phase before
transitioning to the next one. The fight always starts at pattern 1 of phase
1 (the random purple lasers), and each new phase also starts at its
respective first pattern.
Sariel's bugs already start at the graphics asset level, before any code
gets to run. Some of the patterns include a wand raise animation, which is
stored in BOSS6_2.BOS:
The "lowered wand" sprite is missing in this file simply because it's
captured from the regular background image in VRAM, at the beginning of the
fight and after every background transition. What I previously thought to be
📝 background storage code has therefore a
different meaning in Sariel's case. Since this captured sprite is fully
opaque, it will reset the entire 128×128 wand area… wait, 128×128, rather
than 96×96? Yup, this lowered sprite is larger than necessary, wasting 1,967
bytes of conventional memory. That still doesn't quite explain the
second sprite in BOSS6_2.BOS though. Turns out that the black
part is indeed meant to unblit the purple reflection (?) in the first
sprite. But… that's not how you would correctly unblit that?
The first sprite already eats up part of the red HUD line, and the second
one additionally fails to recover the seal pixels underneath, leaving a nice
little black hole and some stray purple pixels until the next background
transition. Quite ironic given that both
sprites do include the right part of the seal, which isn't even part of the
animation.
Just like Konngara, Sariel continues the approach of using a single function
per danmaku pattern or custom entity. While I appreciate that this allows
all pattern- and entity-specific state to be scoped locally to that one
function, it quickly gets ugly as soon as such a function has to do more than one thing.
The "bird function" is particularly awful here: It's just one if(…)
{…} else if(…) {…} else if(…) {…} chain with different
branches for the subfunction parameter, with zero shared code between any of
these branches. It also uses 64-bit floating-point double as
its subpixel type… and since it also takes four of those as parameters
(y'know, just in case the "spawn new bird" subfunction is called), every
call site has to also push four double values onto the stack.
Thanks to Turbo C++ even using the FPU for pushing a 0.0 constant, we
have already reached maximum floating-point decadence before even having
seen a single danmaku pattern. Why decadence? Every possible spawn position
and velocity in both bird patterns just uses pixel resolution, with no
fractional component in sight. And there goes another 720 bytes of
conventional memory.
Speaking about bird patterns, the red-bird one is where we find the first
code-level ZUN bug: The spawn cross circle sprite suddenly disappears after
it finished spawning all the bird eggs. How can we tell it's a bug? Because
there is code to smoothly fly this sprite off the playfield, that
code just suddenly forgets that the sprite's position is stored in Q12.4
subpixels, and treats it as raw screen pixels instead.
As a result, the well-intentioned 640×400
screen-space clipping rectangle effectively shrinks to 38×23 pixels in the
top-left corner of the screen. Which the sprite is always outside of, and
thus never rendered again.
The intended animation is easily restored though:
Also, did you know that birds actually have a quite unfair 14×38-pixel
hitbox? Not that you'd ever collide with them in any of the patterns…
Another 3 of the 8 bugs can be found in the symmetric, interlaced spawn rays
used in three of the patterns, and the 32×32 debris "sprites" shown at their endpoint, at
the edge of the screen. You kinda have to commend ZUN's attention to detail
here, and how he wrote a lot of code for those few rapidly animated pixels
that you most likely don't
even notice, especially with all the other wrong pixels
resulting from rendering glitches. One of the bugs in the very final pattern
of phase 4 even turns them into the vortex sprites from the second pattern
in phase 1 during the first 5 frames of
the first time the pattern is active, and I had to single-step the blitting
calls to verify it.
It certainly was annoying how much time I spent making sense of these bugs,
and all weird blitting offsets, for just a few pixels… Let's look at
something more wholesome, shall we?
So far, we've only seen the PC-98 GRCG being used in RMW (read-modify-write)
mode, which I previously
📝 explained in the context of TH01's red-white HP pattern.
The second of its three modes, TCR (Tile Compare Read), affects VRAM reads
rather than writes, and performs "color extraction" across all 4 bitplanes:
Instead of returning raw 1bpp data from one plane, a VRAM read will instead
return a bitmask, with a 1 bit at every pixel whose full 4-bit color exactly
matches the color at that offset in the GRCG's tile register, and 0
everywhere else. Sariel uses this mode to make sure that the 2×2 particles
and the wind effect are only blitted on top of "air color" pixels, with
other parts of the background behaving like a mask. The algorithm:
Set the GRCG to TCR mode, and all 8 tile register dots to the air
color
Read N bits from the target VRAM position to obtain an N-bit mask where
all 1 bits indicate air color pixels at the respective position
AND that mask with the alpha plane of the sprite to be drawn, shifted to
the correct start bit within the 8-pixel VRAM byte
Set the GRCG to RMW mode, and all 8 tile register dots to the color that
should be drawn
Write the previously obtained bitmask to the same position in VRAM
Quite clever how the extracted colors double as a secondary alpha plane,
making for another well-earned good-code tag. The wind effect really doesn't deserve it, though:
ZUN calculates every intermediate result inside this function
over and over and over again… Together with some ugly
pointer arithmetic, this function turned into one of the most tedious
decompilations in a long while.
This gradual effect is blitted exclusively to the front page of VRAM,
since parts of it need to be unblitted to create the illusion of a gust of
wind. Then again, anything that moves on top of air-colored background –
most likely the Orb – will also unblit whatever it covered of the effect…
As far as I can tell, ZUN didn't use TCR mode anywhere else in PC-98 Touhou.
Tune in again later during a TH04 or TH05 push to learn about TDW, the final
GRCG mode!
Speaking about the 2×2 particle systems, why do we need three of them? Their
only observable difference lies in the way they move their particles:
Up or down in a straight line (used in phases 4 and 2,
respectively)
Left or right in a straight line (used in the second form)
Left and right in a sinusoidal motion (used in phase 3, the "dark
orange" one)
Out of all possible formats ZUN could have used for storing the positions
and velocities of individual particles, he chose a) 64-bit /
double-precision floating-point, and b) raw screen pixels. Want to take a
guess at which data type is used for which particle system?
If you picked double for 1) and 2), and raw screen pixels for
3), you are of course correct! Not that I'm implying
that it should have been the other way round – screen pixels would have
perfectly fit all three systems use cases, as all 16-bit coordinates
are extended to 32 bits for trigonometric calculations anyway. That's what,
another 1.080 bytes of wasted conventional memory? And that's even
calculated while keeping the current architecture, which allocates
space for 3×30 particles as part of the game's global data, although only
one of the three particle systems is active at any given time.
That's it for the first form, time to put on "Civilization
of Magic"! Or "死なばもろとも"? Or "Theme of 地獄めくり"? Or whatever SYUGEN is
supposed to mean…
… and the code of these final patterns comes out roughly as exciting as
their in-game impact. With the big exception of the very final "swaying
leaves" pattern: After 📝 Q4.4,
📝 Q28.4,
📝 Q24.8, and double variables,
this pattern uses… decimal subpixels? Like, multiplying the number by
10, and using the decimal one's digit to represent the fractional part?
Well, sure, if you really insist on moving the leaves in cleanly
represented integer multiples of ⅒, which is infamously impossible in IEEE
754. Aside from aesthetic reasons, it only really combines less precision
(10 possible fractions rather than the usual 16) with the inferior
performance of having to use integer divisions and multiplications rather
than simple bit shifts. And it's surely not because the leaf sprites needed
an extended integer value range of [-3276, +3276], compared to
Q12.4's [-2047, +2048]: They are clipped to 640×400 screen space
anyway, and are removed as soon as they leave this area.
This pattern also contains the second bug in the "subpixel/pixel confusion
hiding an entire animation" category, causing all of
BOSS6GR4.GRC to effectively become unused:
At least their hitboxes are what you would expect, exactly covering the
30×30 pixels of Reimu's sprite. Both animation fixes are available on the th01_sariel_fixes
branch.
After all that, Sariel's main function turned out fairly unspectacular, just
putting everything together and adding some shake, transition, and color
pulse effects with a bunch of unnecessary hardware palette changes. There is
one reference to a missing BOSS6.GRP file during the
first→second form transition, suggesting that Sariel originally had a
separate "first form defeat" graphic, before it was replaced with just the
shaking effect in the final game.
Speaking about the transition code, it is kind of funny how the… um,
imperative and concrete nature of TH01 leads to these 2×24
lines of straight-line code. They kind of look like ZUN rattling off a
laundry list of subsystems and raw variables to be reinitialized, making
damn sure to not forget anything.
Whew! Second PC-98 Touhou boss completely decompiled, 29 to go, and they'll
only get easier from here! 🎉 The next one in line, Elis, is somewhere
between Konngara and Sariel as far as x86 instruction count is concerned, so
that'll need to wait for some additional funding. Next up, therefore:
Looking at a thing in TH03's main game code – really, I have little
idea what it will be!
Now that the store is open again, also check out the
📝 updated RE progress overview I've posted
together with this one. In addition to more RE, you can now also directly
order a variety of mods; all of these are further explained in the order
form itself.
EMS memory! The
infamous stopgap measure between the 640 KiB ("ought to be enough for
everyone") of conventional
memory offered by DOS from the very beginning, and the later XMS standard for
accessing all the rest of memory up to 4 GiB in the x86 Protected Mode. With
an optionally active EMS driver, TH04 and TH05 will make use of EMS memory
to preload a bunch of situational .CDG images at the beginning of
MAIN.EXE:
The "eye catch" game title image, shown while stages are loaded
The character-specific background image, shown while bombing
The player character dialog portraits
TH05 additionally stores the boss portraits there, preloading them
at the beginning of each stage. (TH04 instead keeps them in conventional
memory during the entire stage.)
Once these images are needed, they can then be copied into conventional
memory and accessed as usual.
Uh… wait, copied? It certainly would have been possible to map EMS
memory to a regular 16-bit Real Mode segment for direct access,
bank-switching out rarely used system or peripheral memory in exchange for
the EMS data. However, master.lib doesn't expose this functionality, and
only provides functions for copying data from EMS to regular memory and vice
versa.
But even that still makes EMS an excellent fit for the large image files
it's used for, as it's possible to directly copy their pixel data from EMS
to VRAM. (Yes, I tried!) Well… would, because ZUN doesn't do
that either, and always naively copies the images to newly allocated
conventional memory first. In essence, this dumbs down EMS into just another
layer of the memory hierarchy, inserted between conventional memory and
disk: Not quite as slow as disk, but still requiring that
memcpy() to retrieve the data. Most importantly though: Using
EMS in this way does not increase the total amount of memory
simultaneously accessible to the game. After all, some other data will have
to be freed from conventional memory to make room for the newly loaded data.
The most idiomatic way to define the game-specific layout of the EMS area
would be either a struct or an enum.
Unfortunately, the total size of all these images exceeds the range of a
16-bit value, and Turbo C++ 4.0J supports neither 32-bit enums
(which are silently degraded to 16-bit) nor 32-bit structs
(which simply don't compile). That still leaves raw compile-time constants
though, you only have to manually define the offset to each image in terms
of the size of its predecessor. But instead of doing that, ZUN just placed
each image at a nice round decimal offset, each slightly larger than the
actual memory required by the previous image, just to make sure that
everything fits. This results not only in quite
a bit of unnecessary padding, but also in technically the single
biggest amount of "wasted" memory in PC-98 Touhou: Out of the 180,000 (TH04)
and 320,000 (TH05) EMS bytes requested, the game only uses 135,552 (TH04)
and 175,904 (TH05) bytes. But hey, it's EMS, so who cares, right? Out of all
the opportunities to take shortcuts during development, this is among the
most acceptable ones. Any actual PC-98 model that could run these two games
comes with plenty of memory for this to not turn into an actual issue.
On to the EMS-using functions themselves, which are the definition of
"cross-cutting concerns". Most of these have a fallback path for the non-EMS
case, and keep the loaded .CDG images in memory if they are immediately
needed. Which totally makes sense, but also makes it difficult to find names
that reflect all the global state changed by these functions. Every one of
these is also just called from a single place, so inlining
them would have saved me a lot of naming and documentation trouble
there.
The TH04 version of the EMS allocation code was actually displayed on ZUN's monitor in the
2010 MAG・ネット documentary; WindowsTiger already transcribed the low-quality video image
in 2019. By 2015 ReC98 standards, I would have just run with that, but
the current project goal is to write better code than ZUN, so I didn't. 😛
We sure ain't going to use magic numbers for EMS offsets.
The dialog init and exit code then is completely different in both games,
yet equally cross-cutting. TH05 goes even further in saving conventional
memory, loading each individual player or boss portrait into a single .CDG
slot immediately before blitting it to VRAM and freeing the pixel data
again. People who play TH05 without an active EMS driver are surely going to
enjoy the hard drive access lag between each portrait change…
TH04, on the other hand, also abuses the dialog
exit function to preload the Mugetsu defeat / Gengetsu entrance and
Gengetsu defeat portraits, using a static variable to track how often the
function has been called during the Extra Stage… who needs function
parameters anyway, right?
This is also the function in which TH04 infamously crashes after the Stage 5
pre-boss dialog when playing with Reimu and without any active EMS driver.
That crash is what motivated this look into the games' EMS usage… but the
code looks perfectly fine? Oh well, guess the crash is not related to EMS
then. Next u–
OK, of course I can't leave it like that. Everyone is expecting a fix now,
and I still got half of a push left over after decompiling the regular EMS
code. Also, I've now RE'd every function that could possibly be involved in
the crash, and this is very likely to be the last time I'll be looking at
them.
Turns out that the bug has little to do with EMS, and everything to do with
ZUN limiting the amount of conventional RAM that TH04's
MAIN.EXE is allowed to use, and then slightly miscalculating
this upper limit. Playing Stage 5 with Reimu is the most asset-intensive
configuration in this game, due to the combination of
6 player portraits (Marisa has only 5), at 128×128 pixels each
a 288×256 background for the boss fight, tied in size only with the
ones in the Extra Stage
the additional 96×80 image for the vertically scrolling stars during
the stage, wastefully stored as 4 bitplanes rather than a single one.
This image is never freed, not even at the end of the stage.
Remove any single one of the above points, and this crash would have never
occurred. But with all of them combined, the total amount of memory consumed
by TH04's MAIN.EXE just barely exceeds ZUN's limit of 320,000
bytes, by no more than 3,840 bytes, the size of the star image.
But wait: As we established earlier, EMS does nothing to reduce the amount
of conventional memory used by the game. In fact, if you disabled TH04's EMS
handling, you'd still get this crash even if you are running an EMS
driver and loaded DOS into the High Memory Area to free up as much
conventional RAM as possible. How can EMS then prevent this crash in the
first place?
The answer: It's only because ZUN's usage of EMS bypasses the need to load
the cached images back out of the XOR-encrypted 東方幻想.郷
packfile. Leaving aside the general
stupidity of any game data file encryption*, master.lib's decryption
implementation is also quite wasteful: It uses a separate buffer that
receives fixed-size chunks of the file, before decrypting every individual
byte and copying it to its intended destination buffer. That really
resembles the typical slowness of a C fread() implementation
more than it does the highly optimized ASM code that master.lib purports to
be… And how large is this well-hidden decryption buffer? 4 KiB.
So, looking back at the game, here is what happens once the Stage 5
pre-battle dialog ends:
Reimu's bomb background image, which was previously freed to make space
for her dialog portraits, has to be loaded back into conventional memory
from disk
BB0.CDG is found inside the 東方幻想.郷
packfile
file_ropen() ends up allocating a 4 KiB buffer for the
encrypted packfile data, getting us the decisive ~4 KiB closer to the memory
limit
The .CDG loader tries to allocate 52 608 contiguous bytes for the
pixel data of Reimu's bomb image
This would exceed the memory limit, so hmem_allocbyte()
fails and returns a nullptr
ZUN doesn't check for this case (as usual)
The pixel data is loaded to address 0000:0000,
overwriting the Interrupt Vector Table and whatever comes after
The game crashes
The 4 KiB encryption buffer would only be freed by the corresponding
file_close() call, which of course never happens because the
game crashes before it gets there. At one point, I really did suspect the
cause to be some kind of memory leak or fragmentation inside master.lib,
which would have been quite delightful to fix.
Instead, the most straightforward fix here is to bump up that memory limit
by at least 4 KiB. Certainly easier than squeezing in a
cdg_free() call for the star image before the pre-boss dialog
without breaking position dependence.
Or, even better, let's nuke all these memory limits from orbit
because they make little sense to begin with, and fix every other potential
out-of-memory crash that modders would encounter when adding enough data to
any of the 4 games that impose such limits on themselves. Unless you want to
launch other binaries (which need to do their own memory allocations) after
launching the game, there's really no reason to restrict the amount of
memory available to a DOS process. Heck, whenever DOS creates a new one, it
assigns all remaining free memory by default anyway.
Removing the memory limits also removes one of ZUN's few error checks, which
end up quitting the game if there isn't at least a given maximum amount of
conventional RAM available. While it might be tempting to reserve enough
memory at the beginning of execution and then never check any allocation for
a potential failure, that's exactly where something like TH04's crash
comes from.
This game is also still running on DOS, where such an initial allocation
failure is very unlikely to happen – no one fills close to half of
conventional RAM with TSRs and then tries running one of these games. It
might have been useful to detect systems with less than 640 KiB of
actual, physical RAM, but none of the PC-98 models with that little amount
of memory are fast enough to run these games to begin with. How ironic… a
place where ZUN actually added an error check, and then it's mostly
pointless.
Here's an archive that contains both fix variants, just in case. These were
compiled from the th04_noems_crash_fix
and mem_assign_all
branches, and contain as little code changes as possible. Edit (2022-04-18): For TH04, you probably want to download
the 📝 community choice fix package instead,
which contains this fix along with other workarounds for the Divide
error crashes.
2021-11-29-Memory-limit-fixes.zip
So yeah, quite a complex bug, leaving no time for the TH03 scorefile format
research after all. Next up: Raising prices.
OK, TH01 missile bullets. Can we maybe have a well-behaved entity type,
without any weirdness? Just once?
Ehh, kinda. Apart from another 150 bytes wasted on unused structure members,
this code is indeed more on the low end in terms of overall jank. It does
become very obvious why dodging these missiles in the YuugenMagan, Mima, and
Elis fights feels so awful though: An unfair 46×46 pixel hitbox around
Reimu's center pixel, combined with the comeback of
📝 interlaced rendering, this time in every
stage. ZUN probably did this because missiles are the only 16×16 sprite in
TH01 that is blitted to unaligned X positions, which effectively ends up
touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would
have been totally possible to render every missile in every frame at roughly
the same amount of CPU time that the original game uses for interlaced
rendering:
Note that all missile sprites only use two colors, white and green.
Instead of naively going with the usual four bitplanes, extract the
pixels drawn in each of the two used colors into their own bitplanes.
master.lib calls this the "tiny format".
Use the GRCG to draw these two bitplanes in the intended white and green
colors, halving the amount of VRAM writes compared to the original
function.
(Not using the .PTN format would have also avoided the inconsistency of
storing the missile sprites in boss-specific sprite slots.)
That's an optimization that would have significantly benefitted the game, in
contrast to all of the fake ones
introduced in later games. Then again, this optimization is
actually something that the later games do, and it might have in fact been
necessary to achieve their higher bullet counts without significant
slowdown.
After some effectively unused Mima sprite effect code that is so broken that
it's impossible to make sense out of it, we get to the final feature I
wanted to cover for all bosses in parallel before returning to Sariel: The
separate sprite background storage for moving or animated boss sprites in
the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin
with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from
main memory on every frame. After all, TH01 and TH02 had a minimum required
clock speed of 33 MHz, half of the speed required for the later three games.
So, he simply blitted these boss sprites to both VRAM pages, leading
the usual unblitting calls to only remove the other sprites on top of the
boss. However, these bosses themselves want to move across the screen…
and this makes it necessary to save the stage background behind them
in some other way.
Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM
into a sprite slot. No problem with that approach in theory, as the size of
all these bigger sprites is a multiple of 32×32; splitting a larger sprite
into these smaller 32×32 chunks makes the code look just a little bit clumsy
(and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot
that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is
blitted on top of her regular sprite, using just regular sprite
transparency, she ends up with her infamous third arm:
Ironically, there's an unused code path in Mima's unblit function where ZUN
assumes a height of 48 pixels for Mima's animation sprites rather than the
actual 64. This leads to even clumsier .PTN function calls for the bottom
128×16 pixels… Failing to unblit the bottom 16 pixels would have also
yielded that third arm, although it wouldn't have looked as natural. Still
wouldn't say that it was intentional; maybe this casting sprite was just
added pretty late in the game's development?
So, mission accomplished, Sariel unblocked… at 2¼ pushes. That's quite some time left for some smaller stage initialization
code, which bundles a bunch of random function calls in places where they
logically really don't belong. The stage opening animation then adds a bunch
of VRAM inter-page copies that are not only redundant but can't even be
understood without knowing the hidden internal state of the last VRAM page
accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any
complexity limit on inlining arithmetic expressions, as long as they only
operate on compile-time constants. That's how we get macro-free,
compile-time Shift-JIS to JIS X 0208 conversion of the individual code
points in the 東方★靈異伝 string, in a compiler from 1994. As long as you
don't store any intermediate results in variables, that is…
But wait, there's more! With still ¼ of a push left, I also went for the
boss defeat animation, which includes the route selection after the SinGyoku
fight.
As in all other instances, the 2× scaled font is accomplished by first
rendering the text at regular 1× resolution to the other, invisible VRAM
page, and then scaled from there to the visible one. However, the route
selection is unique in that its scaled text is both drawn transparently on
top of the stage background (not onto a black one), and can also change
colors depending on the selection. It would have been no problem to unblit
and reblit the text by rendering the 1× version to a position on the
invisible VRAM page that isn't covered by the 2× version on the visible one,
but ZUN (needlessly) clears the invisible page before rendering any text.
Instead, he assigned a separate VRAM color for both
the 魔界 and 地獄 options, and only changed the palette value for
these colors to white or gray, depending on the correct selection. This is
another one of the
📝 rare cases where TH01 demonstrates good use of PC-98 hardware,
as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.
Then, why does this still not count as good-code? When
changing palette colors, you kinda need to be aware of everything
else that can possibly be on screen, which colors are used there, and which
aren't and can therefore be used for such an effect without affecting other
sprites. In this case, well… hover over the image below, and notice how
Reimu's hair and the bomb sprites in the HUD light up when Makai is
selected:
This push did end on a high note though, with the generic, non-SinGyoku
version of the defeat animation being an easily parametrizable copy. And
that's how you decompile another 2.58% of TH01 in just slightly over three
pushes.
Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and
SinGyoku without needing any more detours into non-boss code. Thanks to the
current TH01 funding subscriptions, I can plan to cover most, if not all, of
Sariel in a single push series, but the currently 3 pending pushes probably
won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got
quite a lot of not specifically TH01-related funds in the backlog to pass
the time though.
Due to recent developments, it actually makes quite a lot of sense to take a
break from TH01: spaztron64 has
managed what every Touhou download site so far has failed to do: Bundling
all 5 game onto a single .HDI together with pre-configured PC-98
emulators and a nice boot menu, and hosting the resulting package on a
proper website. While this first release is already quite good (and much
better than my attempt from 2014), there is still a bit of room for
improvement to be gained from specific ReC98 research. Next up,
therefore:
Researching how TH04 and TH05 use EMS memory, together with the cause
behind TH04's crash in Stage 5 when playing as Reimu without an EMS driver
loaded, and
reverse-engineering TH03's score data file format
(YUME.NEM), which hopefully also comes with a way of building a
file that unlocks all characters without any high scores.
Done with the .BOS format, at last! While there's still quite a bunch of
undecompiled non-format blitting code left, this was in fact the final
piece of graphics format loading code in TH01.
📝 Continuing the trend from three pushes ago,
we've got yet another class, this time for the 48×48 and 48×32 sprites
used in Reimu's gohei, slide, and kick animations. The only reason these
had to use the .BOS format at all is simply because Reimu's regular
sprites are 32×32, and are therefore loaded from
📝 .PTN files.
Yes, this makes no sense, because why would you split animations for
the same character across two file formats and two APIs, just because
of a sprite size difference?
This necessity for switching blitting APIs might also explain why Reimu
vanishes for a few frames at the beginning and the end of the gohei swing
animation, but more on that once we get to the high-level rendering code.
Now that we've decompiled all the .BOS implementations in TH01, here's an
overview of all of them, together with .PTN to show that there really was
no reason for not using the .BOS API for all of Reimu's sprites:
CBossEntity
CBossAnim
CPlayerAnim
ptn_* (32×32)
Format
.BOS
.BOS
.BOS
.PTN
Hitbox
✔
✘
✘
✘
Byte-aligned blitting
✔
✔
✔
✔
Byte-aligned unblitting
✔
✘
✔
✔
Unaligned blitting
Single-line and wave only
✘
✘
✘
Precise unblitting
✔
✘
✔
✔
Per-file sprite limit
8
8
32
64
Pixels blitted at once
16
16
8
32
And even that last property could simply be handled by branching based on
the sprite width, and wouldn't be a reason for switching formats. But
well, it just wouldn't be TH01 without all that redundant bloat though,
would it?
The basic loading, freeing, and blitting code was yet another variation
on the other .BOS code we've seen before. So this should have caused just
as little trouble as the CBossAnim code… except that
CPlayerAnimdid add one slightly difficult function to
the mix, which led to it requiring almost a full push after all.
Similar to 📝 the unblitting code for moving lasers we've seen in the last push,
ZUN tries to minimize the amount of VRAM writes when unblitting Reimu's
slide animations. Technically, it's only necessary to restore the pixels
that Reimu traveled by, plus the ones that wouldn't be redrawn by
the new animation frame at the new X position.
The theoretically arbitrary distance between the two sprites is, of
course, modeled by a fixed-size buffer on the stack
, coming with the further assumption that the
sprite surely hasn't moved by more than 1 horizontal VRAM byte compared to
the last frame. Which, of course, results in glitches if that's not the
case, leaving little Reimu parts in VRAM if the slide speed ever exceeded
8 pixels per frame. (Which it never does,
being hardcoded to 6 pixels, but still.). As it also turns out, all those
bit masking operations easily lead to incredibly sloppy C code.
Which compiles into incredibly terrible ASM, which in turn might end up
wasting way more CPU time than the final VRAM write optimization would
have gained? Then again, in-depth profiling is way beyond the scope of
this project at this point.
Next up: The TH04 main menu, and some more technical debt.
This time around, laser is 📝 actually not
difficult, with TH01's shootout laser class being simple enough to nicely
fit into a single push. All other stationary lasers (as used by
YuugenMagan, for example) don't even use a class, and are simply treated
as regular lines with collision detection.
But of course, the shootout lasers also come with the typical share of
TH01 jank we've all come to expect by now. This time, it already starts
with the hardcoded sprite data:
A shootout laser can have a width from 1 to 8 pixels, so ZUN stored a
separate 16×1 sprite with a line for each possible width (left-to-right).
Then, he shifted all of these sprites 1 pixel to the right for all of the
8 possible start positions within a planar VRAM byte (top-to-bottom).
Because… doing that bit shift programmatically is way too
expensive, so let's pre-shift at compile time, and use 16× the memory per
sprite?
Since a bunch of other sprite sheets need to be pre-shifted as well (this
is the 5th one we've found so far), our sprite converter has a feature to
automatically generate those pre-shifted variations. This way, we can
abstract away that implementation detail and leave modders with .BMP files
that still only contain a single version of each sprite. But, uh…, wait,
in this sprite sheet, the second row for 1-pixel lasers is accidentally
shifted right by one more pixel that it should have been?! Which means
that
we can't use the auto-preshift feature here, and have to store this
weird-looking (and quite frankly, completely unnecessary) sprite sheet in
its entirety
ZUN did, at least during TH01's development, not have a sprite
converter, and directly hardcoded these dot patterns in the C++ code
The waste continues with the class itself. 69 bytes, with 22 bytes
outright unused, and 11 not really necessary. As for actual innovations
though, we've got
📝 another 32-bit fixed-point type, this
time actually using 8 bits for the fractional part. Therefore, the
ray position is tracked to the 1/256th of a pixel, using the full
precision of master.lib's 8-bit sin() and cos() lookup
tables.
Unblitting is also remarkably efficient: It's only done once the laser
stopped extending and started moving, and only for the exact pixels at the
start of the ray that the laser traveled by in a single frame. If only the
ray part was also rendered as efficiently – it's fully blitted every frame,
right next to the collision detection for each row of the ray.
With a public interface of two functions (spawn, and update / collide /
unblit / render), that's superficially all there is to lasers in this
game. There's another (apparently inlined) function though, to both reset
and, uh, "fully unblit" all lasers at the end of every boss fight… except
that it fails hilariously at doing the latter, and ends up effectively
unblitting random 32-pixel line segments, due to ZUN confusing both the
coordinates and the parameter types for the line unblitting function.
A while ago, I was asked about
this crash that tends to
happen when defeating Elis. And while you can clearly see the random
unblitted line segments that are missing from the sprites, I don't
quite think we've found the cause for the crash, since the
📝 line unblitting function used theredoes clip its coordinates to the VRAM range.
Next up: The final piece of image format code in TH01, covering Reimu's
sprites!
Back to TH01, and its boss sprite format… with a separate class for
storing animations that only differs minutely from the
📝 regular boss entity class I covered last time?
Decompiling this class was almost free, and the main reason why the first
of these pushes ended up looking pretty huge.
Next up were the remaining shape drawing functions from the code segment
that started with the .GRC functions. P0105 already started these with the
(surprisingly sanely implemented) 8×8 diamond, star, and… uh, snowflake
(?) sprites
,
prominently seen in the Konngara, Elis, and Sariel fights, respectively.
Now, we've also got:
ellipse arcs with a customizable angle distance between the individual
dots – mostly just used for drawing full circles, though
line loops – which are only used for the rotating white squares around
Mima, meaning that the white star in the YuugenMagan fight got a completely
redundant reimplementation
and the surprisingly weirdest one, drawing the red invincibility
sprites.
The weirdness becomes obvious with just a single screenshot:
First, we've got the obvious issue of the sprites not being clipped at the
right edge of VRAM, with the rightmost pixels in each row of the sprite
extending to the beginning of the next row. Well, that's just what you get
if you insist on writing unique low-level blitting code for the majority
of the individual sprites in the game… 🤷
More importantly though, the sprite sheet looks like this:
So how do we even get these fully filled red diamonds?
Well, turns out that the sprites are never consistently unblitted during
their 8 frames of animation. There is a function that looks
like it unblits the sprite… except that it starts with by enabling the
GRCG and… reading from the first bitplane on the background page?
If this was the EGC, such a read would fill some internal registers with
the contents of all 4 bitplanes, which can then subsequently be blitted to
all 4 bitplanes of any VRAM page with a single memory write. But with the
GRCG in RMW mode, reads do nothing special, and simply copy the memory
contents of one bitplane to the read destination. Maybe ZUN thought
that setting the RMW color to red
also sets some internal 4-plane mask register to match that color?
Instead, the rather random pixels read from the first bitplane are then
used as a mask for a second blit of the same red sprite.
Effectively, this only really "unblits" the invincibility pixels that are
drawn on top of Reimu's sprite. Since Reimu is drawn first, the
invincibility sprites are overwritten anyway. But due to the palette color
layout of Reimu's sprite, its pixels end up fully masking away any
invincibility sprite pixels in that second blit, leaving VRAM untouched as
a result. Anywhere else though, this animation quickly turns into the
union of all animation frames.
Then again, if that 16-dot-aligned rectangular unblitting function is all
you know about the EGC, and you can't be bothered to write a perfect
unblitter for 8×8 sprites, it becomes obvious why you wouldn't want to use
it:
Because Reimu would barely be visible under all that flicker. In
comparison, those fully filled diamonds actually look pretty good.
After all that, the remaining time wouldn't have been enough for the next
few essential classes, so I closed out the push with three more VRAM
effects instead:
Single-bitplane pixel inversion inside a 32×32 square – the main effect
behind the discoloration seen in the bomb animation, as well as the
expanding squares at the end of Kikuri's and Sariel's entrance
animation
EGC-accelerated VRAM row copies – the second half of smooth and fully
hardware-accelerated scrolling for backgrounds that are twice the size of
VRAM
And finally, the VRAM page content transition function using meshed 8×8
squares, used for the blocky transition to Sariel's first and second phases.
Which is quite ridiculous in just how needlessly bloated it is. I'm positive
that this sort of thing could have also been accelerated using the PC-98's
EGC… although simply writing better C would have already gone a long way.
The function also comes with three unused mesh patterns.
And with that, ReC98, as a whole, is not only ⅓ done, but I've also fully
caught up with the feature backlog for the first time in the history of
this crowdfunding! Time to go into maintenance mode then, while we wait
for the next pushes to be funded. Got a huge backlog of tiny maintenance
issues to address at a leisurely pace, and of course there's also the
📝 16-bit build system waiting to be
finished.
And indeed, I got to end my vacation with a lot of image format and
blitting code, covering the final two formats, .GRC and .BOS. .GRC was
nothing noteworthy – one function for loading, one function for
byte-aligned blitting, and one function for freeing memory. That's it –
not even a unblitting function for this one. .BOS, on the other hand…
…has no generic (read: single/sane) implementation, and is only
implemented as methods of some boss entity class. And then again for
Sariel's dress and wand animations, and then again for Reimu's
animations, both of which weren't even part of these 4 pushes. Looking
forward to decompiling essentially the same algorithms all over again… And
that's how TH01 became the largest and most bloated PC-98 Touhou game. So
yeah, still not done with image formats, even at 44% RE.
This means I also had to reverse-engineer that "boss entity" class… yeah,
what else to call something a boss can have multiple of, that may or may
not be part of a larger boss sprite, may or may not be animated, and that
may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this
class. Since renaming all these variables in ASM land is tedious anyway, I
went the extra mile and directly defined separate, meaningful names for
the entities of all bosses. These also now document the natural order in
which the bosses will ultimately be decompiled. So, unless a backer
requests anything else, this order will be:
Konngara
Sariel
Elis
Kikuri
SinGyoku
(code for regular card-flipping stages)
Mima
YuugenMagan
As everyone kind of expects from TH01 by now, this class reveals yet
another… um, unique and quirky piece of code architecture. In
addition to the position and hitbox members you'd expect from a class like
this, the game also stores the .BOS metadata – width, height, animation
frame count, and 📝 bitplane pointer slot
number – inside the same class. But if each of those still corresponds to
one individual on-screen sprite, how can YuugenMagan have 5 eye sprites,
or Kikuri have more than one soul and tear sprite? By duplicating that
metadata, of course! And copying it from one entity to another
At this point, I feel like I even have to congratulate the game for not
actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760
bytes of waste would have definitely been noticeable in the DOS days.
Makes much more sense to waste that amount of space on an unused C++
exception handler, and a bunch of redundant, unoptimized blitting
functions
(Thinking about it, YuugenMagan fits this entire system perfectly. And
together with its position in the game's code – last to be decompiled
means first on the linker command line – we might speculate that
YuugenMagan was the first boss to be programmed for TH01?)
So if a boss wants to use sprites with different sizes, there's no way
around using another entity. And that's why Girl-Elis and Bat-Elis are two
distinct entities internally, and have to manually sync their position.
Except that there's also a third one for Attacking-Girl-Elis,
because Girl-Elis has 9 frames of animation in total, and the global .BOS
bitplane pointers are divided into 4 slots of only 8 images each.
Same for SinGyoku, who is split into a sphere entity, a
person entity, and a… white flash entity for all three forms,
all at the same resolution. Or Konngara's facial expressions, which also
require two entities just for themselves.
And once you decompile all this code, you notice just how much of it the
game didn't even use. 13 of the 50 bytes of the boss entity class are
outright unused, and 10 bytes are used for a movement clamping and lock
system that would have been nice if ZUN also used it outside of
Kikuri's soul sprites. Instead, all other bosses ignore this system
completely, and just
party on
the X/Y coordinates of the boss entities directly.
As for the rendering functions, 5 out of 10 are unused. And while those
definitely make up less than half of the code, I still must have
spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis'
entrance animation, the class provides functions for wavy blitting and
unblitting, which use a separate X coordinate for every line of the
sprite. But there's also an unused and sort of broken one for unblitting
two overlapping wavy sprites, located at the same Y coordinate. This might
indicate that Elis could originally split herself into two sprites,
similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind
of animation effect, who knows.
After over 3 months of TH01 progress though, it's finally time to look at
other games, to cover the rest of the crowdfunding backlog. Next up: Going
back to TH05, and getting rid of those last PI false positives. And since
I can potentially spend the next 7 weeks on almost full-time ReC98 work,
I've also re-opened the store until October!
Wait, PI for FUUIN.EXE is mainly blocked by the high score
menu? That one should really be properly decompiled in a separate
RE push, since it's also present in largely identical form in
REIIDEN.EXE… but I currently lack the explicit funding to do
that.
And as it turns out, I shouldn't really capture any of the existing generic
RE contributions for it either. Back in 2018 when I ran the crowdfunding
on the Touhou Patch Center Discord server, I said that generic RE
contributions would never go towards TH01. No one was interested in that
game back then, and as it's significantly different from all the other
games, it made sense to only cover it if explicitly requested.
As Touhou Patch Center still remains one of the biggest supporters and
advertisers for ReC98, someone recently believed that this rule was still
in effect, despite not being mentioned anywhere on this website.
Fast forward to today, and TH01 has become the single most supported game
lately, with plenty of incomplete pushes still open to be completed.
Reverse-engineering it has proven to be quite efficient, yielding lots of
completion percentage points per push. This, I suppose, is exactly what
backers that don't give any specific priorities are mainly interested in.
Therefore, I will allocate future partial
contributions to TH01, whenever it makes sense.
So, instead of rushing TH01 PI, let's wait for Ember2528's
April subscription, and get the 25% total RE milestone with some TH05 PI
progress instead. This one primarily focused on the gather circles
(spirals…?), the third-last missing entity type in TH05. These are
rendered using the same 8×8 pellet sprite introduced in TH02… except that
the actual pellets received a darkened bottom part in TH04
.
Which, in turn, is actually rendered quite efficiently – the games first
render the top white part of all pellets, followed by the bottom gray part
of all pellets. The PC-98 GRCG is used throughout the process, doing its
typical job of accelerating monochrome blitting, and by arranging the
rendering like this, only two GRCG color changes are required to draw any
number of pellets. I guess that makes it quite a worthwhile
optimization? Don't ask me for specific performance numbers or even saved
cycles, though
Sadly, we've already reached the end of fast triple-speed TH01 progress
with 📝 the last push, which decompiled the
last segment shared by all three of TH01's executables. There's still a
bit of double-speed progress left though, with a small number of code
segments that are shared between just two of the three executables.
At the end of the first one of these, we've got all the code for the .GRZ
format – which is yet another run-length encoded image format, but this
time storing up to 16 full 640×400 16-color images with an alpha bit. This
one is exclusively used to wastefully store Konngara's sword slash and
kuji-in kill
animations. Due to… suboptimal code organization, the code for the format
is also present in OP.EXE, despite not being used there. But
hey, that brings TH01 to over 20% in RE!
Decoupling the RLE command stream from the pixel data sounds like a nice
idea at first, allowing the format to efficiently encode a variety of
animation frames displayed all over the screen… if ZUN actually made
use of it. The RLE stream also has quite some ridiculous overhead,
starting with 1 byte to store the 1-bit command (putting a single 8×1
pixel block, or entering a run of N such blocks). Run commands then store
another 1-byte run length, which has to be followed by another
command byte to identify the run as putting N blocks, or skipping N blocks.
And the pixel data is just a sequence of these blocks for all 4 bitplanes,
in uncompressed form…
Also, have some rips of all the images this format is used for:
To make these, I just wrote a small viewer, calling the same decompiled
TH01 code: 2020-03-07-grzview.zip
Obviously, this means that it not only must to be run on a PC-98, but also
discards the alpha information.
If any backers are really interested in having a proper converter
to and from PNG, I can implement that in an upcoming push… although that
would be the perfect thing for outside contributors to do.
Next up, we got some code for the PI format… oh, wait, the actual files
are called "GRP" in TH01.
So, where to start? Well, TH04 bullets are hard, so let's
procrastinate start with TH03 instead
The 📝 sprite display functions are the
obvious blocker for any structure describing a sprite, and therefore most
meaningful PI gains in that game… and I actually did manage to fit a
decompilation of those three functions into exactly the amount of time
that the Touhou Patch Center community votes alloted to TH03
reverse-engineering!
And a pretty amazing one at that. The original code was so obviously
written in ASM and was just barely decompilable by exclusively using
register pseudovariables and a bit of goto, but I was able to
abstract most of that away, not least thanks to a few helpful optimization
properties of Turbo C++… seriously, I can't stop marveling at this ancient
compiler. The end result is both readable, clear, and dare I say
portable?! To anyone interested in porting TH03,
take a look. How painful would it be to port that away from 16-bit
x86?
However, this push is also a typical example that the RE/PI priorities can
only control what I look at, and the outcome can actually differ
greatly. Even though the priorities were 65% RE and 35% PI, the progress
outcome was +0.13% RE and +1.35% PI. But hey, we've got one more push with
a focus on TH03 PI, so maybe that one will include more RE than
PI, and then everything will end up just as ordered?