And we're back to PC-98 Touhou for a brief interruption of the ongoing Shuusou Gyoku Linux port.
Let's clear some of the Touhou-related progress from the backlog, and use
the unconstrained nature of these contributions to prepare the
📝 upcoming non-ASCII translations commissioned by Touhou Patch Center.
The current budget won't cover all of my ambitions, but it would at least be
nice if all text in these games was feasibly translatable by the time I
officially start working on that project.
At a little over 3 pushes, it might be surprising to see that this took
longer than the
📝 TH03/TH04/TH05 cutscene system. It's
obvious that TH02 started out with a different system for in-game dialog,
but while TH04 and TH05 look identical on the surface, they only
actually share 30% of their dialog code. So this felt more like decompiling
2.4 distinct systems, as opposed to one identical base with tons of
game-specific differences on top.
The table of contents was pretty popular last time around, so let's have
another one:
Let's start with the ones from TH04 and TH05, since they are not that
broken. For TH04, ZUN started out by copy-pasting the cutscene system,
causing the result to inherit many of the caveats I already described in the
cutscene blog post:
It's still a plaintext format geared exclusively toward full-width
Japanese text.
The parser still ignores all whitespace, forcing ASCII text into hacks
with unassigned Shift-JIS lead bytes outside the second byte of a 2-byte
chunk.
Commands are still preceded by a 0x5C byte, which renders
as either a \ or a ¥ depending on your font and
interpretation of Shift-JIS.
Command parameters are parsed in exactly the same way, with all the same
limits.
A lot of the same script commands are identical, including 7 of them
that were not used in TH04's original dialog scripts.
Then, however, he greatly simplified the system. Mainly, this was done by
moving text rendering from the PC-98 graphics chip to the text chip, which
avoids the need for any text-related unblitting code, but ZUN also added a
bunch of smaller changes:
The player must advance through every dialog box by releasing any held
keys and then pressing any key mapped to a game action. There are no
timeouts.
The delay for every 2 bytes of text was doubled to 2 frames, and can't
be overridden.
Instead of holding ESC to fast-forward, pressing any key
will immediately print the entire rest of a text box.
Dialogs run in their own single-buffered frame loop, interrupting the
rest of the game. The other VRAM page keeps the background pixels required
for unblitting the face images.
All script commands that affect the graphics layer are preceded by a
1-frame delay. ZUN most likely did this because of the single-buffered
nature, as it prevents tearing on the first frame by waiting for the CRT
beam to return to the top-left corner before changing any pixels.
Both boxes are intended to contain up to 30 half-width characters on
each of their up to 3 lines, but nothing in the code enforces these limits.
There is no support for automatic line breaks or starting new boxes.
TH05 then moved from TH04's plaintext scripts to the binary
.TX2 format while removing all the unused commands copy-pasted
from the cutscene system. Except for a
single additional command intended to clear a text box, TH05's dialog
system only supports a strict subset of the features of TH04's system.
This change also introduced the following differences compared to TH04:
The game now stores the dialog of all 4 playable characters in the same
file, with a (4 + 1)-word header that indicates the byte offset
and length of each character's script. This way, it can load only the one
script for the currently played character.
Since there is no need for whitespace in a binary format, you can now
use ASCII 0x20 spaces even as the first byte of a 2-byte text
chunk! 🥳
All command parameters are now mandatory.
Filenames are now passed directly by pointer to the respective game
function. Therefore, they now need to be null-terminated, but can in turn be
as long as
📝 the number of remaining bytes in the allocated dialog segment.
In practice though, the game still runs on DOS and shares its restriction of
8.3 filenames…
When starting a new dialog box, any existing text in the other box is
now colored blue.
Thanks to ZUN messing up the return values of the command-interpreting
switch function, you can effectively use only line break and gaiji commands in the middle of text. All other
commands do execute, but the interpreter then also treats their command byte
as a Shift-JIS lead byte and places it in text RAM together with whatever
other byte follows in the script.
This is why TH04 can and does put its \= commandsinto the boxes
started with the 0 or 1 commands, but TH05 has to
put its 0x02 commands before the equivalent 0x0D.
For modding these files, you probably want to use TXDEF from
-Tom-'s MysticTK. It decodes these
files into a text representation, and its encoder then takes care of the
character-specific byte offsets in the 10-byte header. This text
representation simplifies the format a lot by avoiding all corner cases and
landmines you'd experience during hex-editing – most notably by interpreting
the box-starting 0x0D as a
command to show text that takes a string parameter, avoiding the broken
calls to script commands in the middle of text. However, you'd still have to
manually ensure an even number of bytes on every line of text.
In the entry function of TH05's dialog loop, we also encounter the hack that
is responsible for properly handling
📝 ZUN's hidden Extra Stage replay. Since the
dialog loop doesn't access the replay inputs but still requires key presses
to advance through the boxes, ZUN chose to just skip the dialog altogether in the
specific case of the Extra Stage replay being active, and replicated all
sprite management commands from the dialog script by just hardcoding
them.
And you know what? Not only do I not mind this hack, but I would have
preferred it over the actual dialog system! The aforementioned sprite
management commands effectively boil down to manual memory management,
deallocating all stage enemy and midboss sprites and thus ensuring that the
boss sprites end up at specific master.lib sprite IDs (patnums). The
hardcoded boss rendering function then expects these sprites to be available
at these exact IDs… which means that the otherwise hardcoded bosses can't
render properly without the dialog script running before them.
There is absolutely no excuse for the game to burden dialog scripts with
this functionality. Sure, delayed deallocation would allow them to blit
stage-specific sprites, but the original games don't do that; probably
because none of the two games feature an unblitting command. And even if
they did, it would have still been cleaner to expose the boss-specific
sprite setup as a single script command that can then also be called from
game code if the script didn't do so. Commands like these just are a recipe
for crashes, especially with parsers that expect fullwidth Shift-JIS
text and where misaligned ASCII text can easily cause these commands to be
skipped.
But then again, it does make for funny screenshot material if you
accidentally the deallocation and then see bosses being turned into stage
enemies:
With all the general details out of the way, here's the command reference:
0 1
0x00 0x01
Selects either the player character (0) or the boss (1) as the
currently speaking character, and moves the cursor to the beginning of
the text box. In TH04, this command also directly starts the new dialog
box, which is probably why it's not prefixed with a \ as it
only makes sense outside of text. TH05 requires a separate 0x0D command to do the
same.
\=1
0x02 0x!!
Replaces the face portrait of the currently active speaking
character with image #1 within her .CD2
file.
\=255
0x02 0xFF
Removes the face portrait from the currently active text box.
\l,filename
0x03 filename 0x00
Calls master.lib's super_entry_bfnt() function, which
loads sprites from a BFNT file to consecutive IDs starting at the
current patnum write cursor.
\c
0x04
Deallocates all stage-specific BFNT sprites (i.e., stage enemies and
midbosses), freeing up conventional RAM for the boss sprites and
ensuring that master.lib's patnum write cursor ends up at
128 /
180.
In TH05's Extra Stage, this command also replaces
📝 the sprites loaded from MIKO16.BFT with the ones from ST06_16.BFT.
\d
Deallocates all face portrait images.
The game automatically does this at the end of each dialog sequence.
However, ZUN wanted to load Stage 6 Yuuka's 76 KiB of additional
animations inside the script via \l, and would have once again
run up against the master.lib heap size limit without that extra free
memory.
\m,filename
0x05 filename 0x00
Stops the currently playing BGM, loads a new one from the given
file, and starts playback.
\m$
0x05 $ 0x00
Stops the currently playing BGM.
Note that TH05 interprets $ as a null-terminated filename as
well.
\m*
Restarts playback of the currently loaded BGM from the
beginning.
\b0,0,0
0x06 0x!!!!0x!!!!0x!!
Blits the master.lib patnum with the ID indicated by the third
parameter to the current VRAM page at the top-left screen position
indicated by the first two parameters.
\e0
Plays the sound effect with the given ID.
\t100
Sets palette brightness via master.lib's
palette_settone() to any value from 0 (fully black) to 200
(fully white). 100 corresponds to the palette's original colors.
\fo1
\fi1
Calls master.lib's palette_black_out() or
palette_black_in() to play a hardware palette fade
animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
\wo1
\wi1
0x09 0x!!
0x0A 0x!!
Calls master.lib's palette_white_out() or
palette_white_in() to play a hardware palette fade
animation from or to white, spending roughly 1 frame on each of the 16 fade steps. The
TH05 version of 0x09 also clears the text in both boxes
before the animation.
\n
0x0B
Starts a new line by resetting the X coordinate of the TRAM cursor
to the left edge of the text area and incrementing the Y coordinate.
The new line will always be the next one below the last one that was
properly started, regardless of whether the text previously wrapped to
the next TRAM row at the edge of the screen.
\g8
Plays a blocking 8-frame screen shake
animation. Copy-pasted from the cutscene parser, but actually used right
at the end of the dialog shown before TH04's Bad Ending.
\ga0
0x0C 0x!!
Shows the gaiji with the given ID from 0 to 255
at the current cursor position, ignoring the per-glyph delay.
\k0
Waits 0 frames (0 = forever) for any key
to be pressed before continuing script execution.
Takes the current dialog cursor as the top-left corner of a
240×48-pixel rectangle, and replaces all text RAM characters within that
rectangle with whitespace.
This is only used to clear the player character's text box before
Shinki's final いくよ‼ box. Shinki has two
consecutive text boxes in all 4 scripts here, and ZUN probably wanted to
clear the otherwise blue text to imply a dramatic pause before Shinki's
final sentence. Nice touch.
(You could, however, also use it after a
box-ending 0xFF command to mess with text RAM in
general.)
\#
Quits the currently running loop. This returns from either the text
loop to the command loop, or it ends the dialog sequence by returning
from the command loop back to gameplay. If this stage of the game later
starts another dialog sequence, it will start at the next script
byte.
\$
Like \#, but first waits for any key to be
pressed.
0xFF
Behaves like TH04's \$ in the text loop, and like
\# in the command loop. Hence, it's not possible in TH05 to
automatically end a text box and advance to the next one without waiting
for a key press.
Unused commands are in gray.
At the end of the day, you might criticize the system for how its landmines
make it annoying to mod in ASCII text, but it all works and does what it's
supposed to. ZUN could have written the cleanest single and central
Shift-JIS iterator that properly chunks a byte buffer into halfwidth and
fullwidth codepoints, and I'd still be throwing it out for the upcoming
non-ASCII translations in favor of something that either also supports UTF-8
or performs dictionary lookups with a full box of text.
The only actual bug can be found in the input detection, which once
again doesn't correctly handle the infamous key
up/key down scancode quirk of PC-98 keyboards. All it takes
is one wrongly placed input polling call, and suddenly you have to think
about how the update cycle behind the PC-98 keyboard state bytes
might cause the game to run the regular 2-frame delay for a single
2-byte chunk of text before it shows the full text of a box after
all… But even this bug is highly theoretical and could probably only be
observed very, very rarely, and exclusively on real hardware.
The same can't be said about TH02 though, but more on that later. Let's
first take a look at its data, which started out much simpler in that game.
The STAGE?.TXT files contain just raw Shift-JIS text with no
trace of commands or structure. Turning on the whitespace display feature in
your editor reveals how the dialog system even assumes a fixed byte
length for each box: 36 bytes per line which will appear on screen, followed
by 4 bytes of padding, which the original files conveniently use to visually
split the lines via a CR/LF newline sequence. Make sure to disable trimming
of trailing whitespace in your editor to not ruin the file when modding the
text…
Consequently, everything else is hardcoded – every effect shown between text
boxes, the face portrait shown for each box, and even how many boxes are
part of each dialog sequence. Which means that the source code now contains
a
long hardcoded list of face IDs for most of the text boxes in the game,
with the rest being part of the
dedicated hardcoded dialog scripts for 2/3 of the
game's stages.
Without the restriction to a fixed set of scripting commands, TH02 naturally
gravitated to having the most varied dialog sequences of all PC-98 Touhou
games. This flexibility certainly facilitated Mima's grand entrance
animation in Stage 4, or the different lines in Stage 4 and 5 depending on
whether you already used a continue or not. Marisa's post-boss dialog even
inserts the number of continues into the text itself – by, you guessed it,
writing to hardcoded byte offsets inside the dialog text before printing it
to the screen. But once again, I have nothing to
criticize here – not even the fact that the alternate dialog scripts have to
mutate the "box cursor" to jump to the intended boxes within the file. I
know that some people in my audience like VMs, but I would have considered
it more bloated if ZUN had implemented a full-blown scripting
language just to handle all these special cases.
Another unique aspect of TH02 is the way it stores its face portraits, which
are infamous for how hard they are to find in the original data files. These
sprites are actually map tiles, stored in MIKO_K.MPN,
and drawn using the same functions used to blit the regular map tiles to the
📝 tile source area in VRAM. We can only guess
why ZUN chose this one out of the three graphics formats he used in TH02:
BFNT supports transparency, but sacrifices one of the 16 colors to do
so. ZUN only used 15 colors for the face portraits, but might have wanted to
keep open the option to use that 16th color. The detailed
backgrounds also suggest that these images were never supposed to be
transparent to begin with.
PI is used for all bigger and non-transparent images, but ZUN would have
had to write a separate small function to blit a 48×48 subsection of such an
image. That certainly wouldn't have stopped him in the TH01 days, but he
probably was already past that point by this game.
That only leaves .MPN. Sure, he did have to slice each face into 9
separate 16×16 "map" tiles to use this format, but that's a small price to
pay in exchange for not having to write any new low-level blitting code,
especially since he must have already had an asset pipeline to generate
these files.
And since you're certainly wondering about all these black tiles at the
edges: Yes, these are not only part of the file and pad it from the required
240×192 pixels to 256×256, but also kept in memory during a stage, wasting
9.5 KiB of conventional RAM. That's 172 seconds of potential input
replay data, just for those people who might still think that we need EMS
for replays.
Alright, we've got the text, we've got the faces, let's slide in the box and
display it all on screen. Apparently though, we also have to blit the player
and option sprites using raw, low-level master.lib function calls in the
process? This can't be right, especially because ZUN
always blits the option sprite associated with the Reimu-A shot type,
regardless of which one the player actually selected. And if you keep moving
above the box area before the dialog starts, you get to see exactly how
wrong this is:
Let's look closer at Reimu's sprite during the slide-in animation, and in
the two frames before:
This one image shows off no less than 4 bugs:
ZUN blits the stationary player sprite here, regardless of whether the
player was previously moving left or right. This is a nice way of indicating
that Reimu stops moving once the dialog starts, but maybe ZUN should
have unblitted the old sprite so that the new one wouldn't have appeared on
top. The game only unblits the 384×64 pixels covered by the dialog box on
every frame of the slide-in animation, so Reimu would only appear correctly
if her sprite happened to be entirely located within that area.
All sprites are shifted up by 1 pixel in frame 2️⃣. This one is not a
bug in the dialog system, but in the main game loop. The game runs the
relevant actions in the following order:
Invalidate any map tiles covered by entities
Redraw invalidated tiles
Decrement the Y coordinate at the top of VRAM according to the
scroll speed
Update and render all game entities
Scroll in new tiles as necessary according to the scroll speed, and
report whether the game has scrolled one pixel past the end of the
map
If that happened, pretend it didn't by incrementing the value
calculated in #3 for all further frames and skipping to
#8.
Issue a GDC SCROLL command to reflect the line
calculated in #3 on the display
Wait for VSync
Flip VRAM pages
Start boss if we're past the end of the map
The problem here: Once the dialog starts, the game has already rendered
an entire new frame, with all sprites being offset by a new Y scroll
offset, without adjusting the graphics GDC's scroll registers to
compensate. Hence, the Y position in 3️⃣ is the correct one, and the
whole existence of frame 2️⃣ is a bug in itself. (Well… OK, probably a
quirk because speedrunning exists, and it would be pretty annoying to
synchronize any video regression tests of the future TH02 Anniversary
Edition if it renders one fewer frame in the middle of a stage.)
ZUN blits the option sprites to their position from frame 1️⃣. This
brings us back to
📝 TH02's special way of retaining the previous and current position in a two-element array, indexed with a VRAM page ID.
Normally, this would be equivalent to using dedicated prev and
cur structure fields and you'd just index it with the back page
for every rendering call. But if you then decide to go single-buffered for
dialogs and render them onto the front page instead…
Note that fixing bug #2 would not cancel out this one – the sprites would
then simply be rendered to their position in the frame before 1️⃣.
And of course, the fixed option sprite ID also counts as a bug.
As for the boxes themselves, it's yet another loop that prints 2-byte chunks
of Shift-JIS text at an even slower fixed interval of 3 frames. In an
interesting quirk though, ZUN assumes that every box starts with the name of
the speaking character in its first two fullwidth Shift-JIS characters,
followed by a fullwidth colon. These 6 bytes are displayed immediately at
the start of every box, without the usual delay. The resulting alignment
looks rather janky with Genjii, whose single right-padded 亀
kanji looks quite awkward with the fullwidth space between the name
and the colon. Kind of makes you wonder why ZUN just didn't spell out his
proper name, 玄爺, instead, but I get the stylistic
difference.
In Stage 4, the two-kanji assumption then breaks with Marisa's three-kanji
name, which causes the full-width colon to be printed as the first delayed
character in each of her boxes:
That's all the issues and quirks in the system itself. The scripts
themselves don't leave much room for bugs as they basically just loop over
the hardcoded face ID array at this level… until we reach the end of the
game. Previously, the slide-in animation could simply use the tile
invalidation and re-rendering system to unblit the box on each frame, which
also explained why Reimu had to be separately rendered on top. But this no
longer works with a custom-rendered boss background, and so the game just
chooses to flood-fill the area with graphics chip color #0:
For Mima's final defeat dialog though, ZUN chose to not even show the box.
He might have realized the issue by that point, or simply preferred the more
dramatic effect this had on the lines. The resulting issues, however, might
even have ramifications for such un-technical things as lore and
character dynamics. As it turns out, the code
for this dialog sequence does in fact render Mima's smiling face for all
boxes?! You only don't see it in the original game because it's rendered to
the other VRAM page that remains invisible during the dialog sequence:
Here's how I interpret the situation:
The function that launches into the final part of the dialog script
starts with dedicated
code to re-render Mima to the back page, on top of the previously
rendered planet background. Since the entire script runs on the front
page (and thus, on top of the previous frame) and the game launches into
the ending immediately after, you don't ever get to see this new partial
frame in the original game.
Showing this partial frame would also ensure that you can actually
read the dialog text without a surrounding box. Then, the white
letters won't ever be put on top of any white bullets – or, worse, be completely invisible if the
dialog is triggered in the middle of Reimu-B's bomb animation, which
fills VRAM with lots of white pixels.
Hence, we've got enough evidence to classify not showing the back page
as a ZUN
bug. 🐞
However, Mima's smiling face jars with the words she says here. Adding
the face would deviate more significantly from the original game than
removing the player shot, item, bullet, or spark sprites would. It's
imaginable that ZUN just forgot about the dedicated code that
re-rendered just Mima to the back page, but the faces add
something to the dialog, and ZUN would have clearly noticed and
fixed it if their absence wasn't intended. Heck, ZUN might have just put
something related to Mima into the code because TH02's dialog system has
no way of not drawing a face for a dialog box. Filling the face
area with graphics chip color #0, as seen in the first and third boxes
of the Extra Stage pre-boss dialog, would have been an alternative, but
that would have been equally wrong with regard to the background.
Hence, the invisible face portrait from the original game is a ZUN
quirk. 🎺
So, the future TH02 Anniversary Edition will fix the bug by showing
the back page, but retain the quirk by rewriting the dialog code to
not blit the face.
And with that, we've secured all in-game dialog for the upcoming non-ASCII
translations! The remaining 2/3 of the last push made
for a good occasion to also decompile the small amount of code related to
TH03's win messages, stored in the @0?TX.TXT files. Similar to
TH02's dialog format, these files are also split into fixed-size blocks of
3×60 bytes. But this time, TH03 loads all 60 bytes of a line, including the
CR/LF line breaking codepoints in the original files, into the statically
allocated buffer that it renders from. These control characters are then
only filtered to whitespace by ZUN's graph_putsa_fx() function.
If you remove the line breaks, you get to use the full 60 bytes on every
line.
The final commits went to the MIKO.CFG loading and saving
functions used in TH04's and TH05's OP.EXE, as well as TH04's
game startup code to finally catch up with
📝 TH05's counterpart from over 3 years ago.
This brought us right in front of the main menu rendering code in both TH04
and TH05, which is identical in both games and will be tackled in the next
PC-98 Touhou delivery.
Next up, though: Returning to Shuusou Gyoku, and adding support for SC-88Pro
recordings as BGM. Which may or may not come with a slight controversy…
P0227
TH05 decompilation (Sara) / Research (Relativity of near references)
P0228
TH05 finalization (Lasers)
💰 Funded by:
nrook, [Anonymous]
🏷️ Tags:
Starting the year with a delivery that wasn't delayed until the last
day of the month for once, nice! Still, very soon and
high-maintenance did not go well together…
It definitely wasn't Sara's fault though. As you would expect from a Stage 1
Boss, her code was no challenge at all. Most of the TH02, TH04, and TH05
bosses follow the same overall structure, so let's introduce a new table to
replace most of the boilerplate overview text:
Phase #
Patterns
HP boundary
Timeout condition
(Entrance)
4,650
288 frames
2
4
2,550
2,568 frames
(= 32 patterns)
3
4
450
5,296 frames
(= 24 patterns)
4
1
0
1,300 frames
Total
9
9,452 frames
In Phases 2 and 3, Sara cycles between waiting, moving randomly for a
fixed 28 frames, and firing a random pattern among the 4 phase-specific
ones. The pattern selection makes sure to never
pick any pattern twice in a row. Both phases contain spiral patterns that
only differ in the clockwise or counterclockwise turning direction of the
spawner; these directions are treated as individual unrelated patterns, so
it's possible for the "same" pattern to be fired multiple times in a row
with a flipped direction.
The two phases also differ in the wait and pattern durations:
In Phase 2, the wait time starts at 64 frames and decreases by 12
frames after the first 5 patterns each, ending on a minimum of 4 frames.
In Phase 3, it's a constant 16 frames instead.
All Phase 2 patterns are fired for 28 frames, after a 16-frame
gather animation. The Phase 3 pattern time starts at 80 frames and
increases by 24 frames for the first 6 patterns, ending at 200 frames
for all later ones.
Phase 4 consists of the single laser corridor pattern with additional
random bullets every 16 frames.
And that's all the gameplay-relevant detail that ZUN put into Sara's code. It doesn't even make sense to describe the remaining
patterns in depth, as their groups can significantly change between
difficulties and rank values. The
📝 general code structure of TH05 bosses
won't ever make for good-code, but Sara's code is just a
lesser example of what I already documented for Shinki.
So, no bugs, no unused content, only inconsequential bloat to be found here,
and less than 1 push to get it done… That makes 9 PC-98 Touhou bosses
decompiled, with 22 to go, and gets us over the sweet 50% overall
finalization mark! 🎉 And sure, it might be possible to pass through the
lasers in Sara's final pattern, but the boss script just controls the
origin, angle, and activity of lasers, so any quirk there would be part of
the laser code… wait, you can do what?!?
TH05 expands TH04's one-off code for Yuuka's Master and Double Sparks into a
more featureful laser system, and Sara is the first boss to show it off.
Thus, it made sense to look at it again in more detail and finalize the code
I had purportedly
📝 reverse-engineered over 4 years ago.
That very short delivery notice already hinted at a very time-consuming
future finalization of this code, and that prediction certainly came true.
On the surface, all of the low-level laser ray rendering and
collision detection code is undecompilable: It uses the SI and
DI registers without Turbo C++'s safety backups on the stack,
and its helper functions take their input and output parameters from
convenient registers, completely ignoring common calling conventions. And
just to raise the confusion even further, the code doesn't just set
these registers for the helper function calls and then restores their
original values, but permanently shifts them via additions and
subtractions. Unfortunately, these convenient registers also include the
BP base pointer to the stack frame of a function… and shifting
that register throws any intuition behind accessed local variables right out
of the window for a good part of the function, requiring a correctly shifted
view of the stack frame just to make sense of it again.
How could such code even have been written?! This
goes well beyond the already wrong assumption that using more stack space is
somehow bad, and straight into the territory of self-inflicted pain.
So while it's not a lot of instructions, it's quite dense and really hard to
follow. This code would really benefit from a decompilation that
anchors all this madness as much as possible in existing C++ structures… so
let's decompile it anyway?
Doing so would involve emitting lots of raw machine code bytes to hide the
SI and DI registers from the compiler, but I
already had a certain
📝 batshit insane compiler bug workaround abstraction
lying around that could make such code more readable. Hilariously, it only
took this one additional use case for that abstraction to reveal itself as
premature and way too complicated. Expanding
the core idea into a full-on x86 instruction generator ended up simplifying
the code structure a lot. All we really want there is a way to set all
potential parameters to e.g. a specific form of the MOV
instruction, which can all be expressed as the parameters to a force-inlined
__emit__() function. Type safety can help by providing
overloads for different operand widths here, but there really is no need for
classes, templates, or explicit specialization of templates based on
classes. We only need a couple of enums with opcode, register,
and prefix constants from the x86 reference documentation, and a set of
associated macros that token-paste pseudoregisters onto the prefixes of
these enum constants.
And that's how you get a custom compile-time assembler in a 1994 C++
compiler and expand the limits of decompilability even further. What's even
truly left now? Self-modifying code, layout tricks that can't be replicated
with regularly structured control flow… and that's it. That leaves quite a
few functions I previously considered undecompilable to be revisited once I
get to work on making this game more portable.
With that, we've turned the low-level laser code into the expected horrible
monstrosity that exposes all the hidden complexity in those few ASM
instructions. The high-level part should be no big deal now… except that
we're immediately bombarded with Fixup overflow errors at link
time? Oh well, time to finally learn the true way of fixing this highly
annoying issue in a second new piece of decompilation tech – and one
that might actually be useful for other x86 Real Mode retro developers at
that.
Earlier in the RE history of TH04 and TH05, I often wrote about the need to
split the two original code segments into multiple segments within two
groups, which makes it possible to slot in code from different
translation units at arbitrary places within the original segment. If we
don't want to define a unique segment name for each of these slotted-in
translation units, we need a way to set custom segment and group names in C
land. Turbo C++ offers two #pragmas for that:
#pragma option -zCsegment -zPgroup – preferred in most
cases as it's equivalent to setting the default segment and group via the
command line, but can only be used at the beginning of a translation unit,
before the first non-preprocessor and non-comment C language token
#pragma codeseg segment <group> – necessary if a
translation unit needs to emit code into two or more segments
For the most part, these #pragmas work well, but they seemed to
not help much when it came to calling near functions declared
in different segments within the same group. It took a bit of trial and
error to figure out what was actually going on in that case, but there
is a clear logic to it:
Symbols are allocated to the segment and group that's active during
their first appearance, no matter whether that appearance is a declaration
or definition. Any later appearance of the function in a different segment
is ignored.
The linker calculates the 16-bit offsets of such references relative to
the symbol's declared segment, not its actual one. Turbo C++ does
not show an error or warning if the declared and actual segments are
different, as referencing the same symbol from multiple segments is a valid
use case. The linker merely throws the Fixup overflow error if
the calculated distance exceeds 64 KiB and thus couldn't possibly fit
within a near reference. With a wrong segment declaration
though, your code can be incorrect long before a fixup hits that limit.
Summarized in code:
#pragma option -zCfoo_TEXT -zPfoo
void bar(void);
void near qux(void); // defined somewhere else, maybe in a different segment
#pragma codeseg baz_TEXT baz
// Despite the segment change in the line above, this function will still be
// put into `foo_TEXT`, the active segment during the first appearance of the
// function name.
void bar(void) {
}
// This function hasn't been declared yet, so it will go into `baz_TEXT` as
// expected.
void baz(void) {
// This `near` function pointer will be calculated by subtracting the
// flat/linear address of qux() inside the binary from the base address
// of qux()'s declared segment, i.e., `foo_TEXT`.
void (near *ptr_to_qux)(void) = qux;
}
So yeah, you might have to put #pragma codeseg into your
headers to tell the linker about the correct segment of a
near function in advance. 🤯 This is an important insight for
everyone using this compiler, and I'm shocked that none of the Borland C++
books documented the interaction of code segment definitions and
near references at least at this level of clarity. The TASM
manuals did have a few pages on the topic of groups, but that syntax
obviously doesn't apply to a C compiler. Fixup overflows in particular are
such a common error and really deserved better than the unhelpful 🤷
of an explanation that ended up in the User's Guide. Maybe this whole
technique of custom code segment names was considered arcane even by 1993,
judging from the mere three sentences that #pragma codeseg was
documented with? Still, it must have been common knowledge among Amusement
Makers, because they couldn't have built these exact binaries without
knowing about these details. This is the true solution to
📝 any issues involving references to near functions,
and I'm glad to see that ZUN did not in fact lie to the compiler. 👍
OK, but now the remaining laser code compiles, and we get to write
C++ code to draw some hitboxes during the two collision-detected states of
each laser. These confirm what the low-level code from earlier already
uncovered: Collision detection against lasers is done by testing a
12×12-pixel box at every 16 pixels along the length of a laser, which leaves
obvious 4-pixel gaps at regular intervals that the player can just pass
through. This adds
📝 yet📝 another📝 quirk to the growing list of quirks that
were either intentional or must have been deliberately left in the game
after their initial discovery. This is what constants were invented for, and
there really is no excuse for not using them – especially during
intoxicated coding, and/or if you don't have a compile-time abstraction for
Q12.4 literals.
Using subpixel coordinates in collision detection also introduces a slight
inaccuracy into any hitbox visualization recorded in-engine on a 16-color
PC-98. Since we have to render discrete pixels, we cannot exactly place a
Q12.4 coordinate in the 93.75% of cases where the fractional part is
non-zero. This is why pretty much every laser segment hitbox in the video
above shows up as 7×7 rather than 6×6: The actual W×H area of each box is 13
pixels smaller, but since the hitbox lies between these pixels, we
cannot indicate where it lies exactly, and have to err on the
side of caution. It's also why Reimu's box slightly changes size as she
moves: Her non-diagonal movement speed is 3.5 pixels per frame, and the
constant focused movement in the video above halves that to 1.75 pixels,
making her end up on an exact pixel every 4 frames. Looking forward to the
glorious future of displays that will allow us to scale up the playfield to
16× its original pixel size, thus rendering the game at its exact internal
resolution of 6144×5888 pixels. Such a port would definitely add a lot of
value to the game…
The remaining high-level laser code is rather unremarkable for the most
part, but raises one final interesting question: With no explicitly defined
limit, how wide can a laser be? Looking at the laser structure's 1-byte
width field and the unsigned comparisons all throughout the update and
rendering code, the answer seems to be an obvious 255 pixels. However, the
laser system also contains an automated shrinking state, which can be most
notably seen in Mai's wheel pattern. This state shrinks a laser by 2 pixels
every 2 frames until it reached a width of 0. This presents a problem with
odd widths, which would fall below 0 and overflow back to 255 due to the
unsigned nature of this variable. So rather than, I don't know, treating
width values of 0 as invalid and stopping at a width of 1, or even adding a
condition for that specific case, the code just performs a signed
comparison, effectively limiting the width of a shrinkable laser to a
maximum of 127 pixels. This small signedness
inconsistency now forces the distinction between shrinkable and
non-shrinkable lasers onto every single piece of code that uses lasers. Yet
another instance where
📝 aiming for a cinematic 30 FPS look
made the resulting code much more complicated than if ZUN had just evenly
spread out the subtraction across 2 frames. 🤷
Oh well, it's not as if any of the fixed lasers in the original scripts came
close to any of these limits. Moving lasers are much more streamlined and
limited to begin with: Since they're hardcoded to 6 pixels, the game can
safely assume that they're always thinner than the 28 pixels they get
gradually widened to during their decay animation.
Finally, in case you were missing a mention of hitboxes in the previous
paragraph: Yes, the game always uses the aforementioned 12×12 boxes,
regardless of a laser's width.
That was what, 50% of this blog post just being about complications that
made laser difficult for no reason? Next up: The first TH01 Anniversary
Edition build, where I finally get to reap the rewards of having a 100%
decompiled game and write some good code for once.
More than three months without any reverse-engineering progress! It's been
way too long. Coincidentally, we're at least back with a surprising 1.25% of
overall RE, achieved within just 3 pushes. The ending script system is not
only more or less the same in TH04 and TH05, but actually originated in
TH03, where it's also used for the cutscenes before stages 8 and 9. This
means that it was one of the final pieces of code shared between three of
the four remaining games, which I got to decompile at roughly 3× the usual
speed, or ⅓ of the price.
The only other bargains of this nature remain in OP.EXE. The
Music Room is largely equivalent in all three remaining games as well, and
the sound device selection, ZUN Soft logo screens, and main/option menus are
the same in TH04 and TH05. A lot of that code is in the "technically RE'd
but not yet decompiled" ASM form though, so it would shift Finalized% more
significantly than RE%. Therefore, make sure to order the new
Finalization option rather than Reverse-engineering if you
want to make number go up.
So, cutscenes. On the surface, the .TXT files look simple enough: You
directly write the text that should appear on the screen into the file
without any special markup, and add commands to define visuals, music, and
other effects at any place within the script. Let's start with the basics of
how text is rendered, which are the same in all three games:
First off, the text area has a size of 480×64 pixels. This means that it
does not correspond to the tiled area painted into TH05's
EDBK?.PI images:
Since the font weight can be customized, all text is rendered to VRAM.
This also includes gaiji, despite them ignoring the font weight
setting.
The system supports automatic line breaks on a per-glyph basis, which
move the text cursor to the beginning of the red text area. This might seem like a piece of long-forgotten
ancient wisdom at first, considering the absence of automatic line breaks in
Windows Touhou. However, ZUN probably implemented it more out of pure
necessity: Text in VRAM needs to be unblitted when starting a new box, which
is way more straightforward and performant if you only need to worry
about a fixed area.
The system also automatically starts a new (key press-separated) text
box after the end of the 4th line. However, the text cursor is
also unconditionally moved to the top-left corner of the yellow name
area when this happens, which is almost certainly not what you expect, given
that automatic line breaks stay within the red area. A script author might
as well add the necessary text box change commands manually, if you're
forced to anticipate the automatic ones anyway…
Due to ZUN forgetting an unblitting call during the TH05 refactoring of the
box background buffer, this feature is even completely broken in that game,
as any new text will simply be blitted on top of the old one:
Overall, the system is geared toward exclusively full-width text. As
exemplified by the 2014 static English patches and the screenshots in this
blog post, half-width text is possible, but comes with a lot of
asterisks attached:
Each loop of the script interpreter starts by looking at the next
byte to distinguish commands from text. However, this step also skips
over every ASCII space and control character, i.e., every byte
≤ 32. If you only intend to display full-width glyphs anyway, this
sort of makes sense: You gain complete freedom when it comes to the
physical layout of these script files, and it especially allows commands
to be freely separated with spaces and line breaks for improved
readability. Still, enforcing commands to be separated exclusively by
line breaks might have been even better for readability, and would have
freed up ASCII spaces for regular text…
Non-command text is blindly processed and rendered two bytes at a
time. The rendering function interprets these bytes as a Shift-JIS
string, so you can use half-width characters here. While the
second byte can even be an ASCII 0x20 space due to the
parser's blindness, all half-width characters must still occur in pairs
that can't be interrupted by commands:
As a workaround for at least the ASCII space issue, you can replace
them with any of the unassigned
Shift-JIS lead bytes – 0x80, 0xA0, or
anything between 0xF0 and 0xFF inclusive.
That's what you see in all screenshots of this post that display
half-width spaces.
Finally, did you know that you can hold ESC to fast-forward
through these cutscenes, which skips most frame delays and reduces the rest?
Due to the blocking nature of all commands, the ESC key state is
only updated between commands or 2-byte text groups though, so it can't
interrupt an ongoing delay.
Superficially, the list of game-specific differences doesn't look too long,
and can be summarized in a rather short table:
It's when you get into the implementation that the combined three systems
reveal themselves as a giant mess, with more like 56 differences between the
games. Every single new weird line of code opened up
another can of worms, which ultimately made all of this end up with 24
pieces of bloat and 14 bugs. The worst of these should be quite interesting
for the general PC-98 homebrew developers among my audience:
The final official 0.23 release of master.lib has a bug in
graph_gaiji_put*(). To calculate the JIS X 0208 code point for
a gaiji, it is enough to ADD 5680h onto the gaiji ID. However,
these functions accidentally use ADC instead, which incorrectly
adds the x86 carry flag on top, causing weird off-by-one errors based on the
previous program state. ZUN did fix this bug directly inside master.lib for
TH04 and TH05, but still needed to work around it in TH03 by subtracting 1
from the intended gaiji ID. Anyone up for maintaining a bug-fixed master.lib
repository?
The worst piece of bloat comes from TH03 and TH04 needlessly
switching the visibility of VRAM pages while blitting a new 320×200 picture.
This makes it much harder to understand the code, as the mere existence of
these page switches is enough to suggest a more complex interplay between
the two VRAM pages which doesn't actually exist. Outside this visibility
switch, page 0 is always supposed to be shown, and page 1 is always used
for temporarily storing pixels that are later crossfaded onto page 0. This
is also the only reason why TH03 has to render text and gaiji onto both VRAM
pages to begin with… and because TH04 doesn't, changing the picture in the
middle of a string of text is technically bugged in that game, even though
you only get to temporarily see the new text on very underclocked PC-98
systems.
These performance implications made me wonder why cutscenes even bother with
writing to the second VRAM page anyway, before copying each crossfade step
to the visible one.
📝 We learned in June how costly EGC-"accelerated" inter-page copies are;
shouldn't it be faster to just blit the image once rather than twice?
Well, master.lib decodes .PI images into a packed-pixel format, and
unpacking such a representation into bitplanes on the fly is just about the
worst way of blitting you could possibly imagine on a PC-98. EGC inter-page
copies are already fairly disappointing at 42 cycles for every 16 pixels, if
we look at the i486 and ignore VRAM latencies. But under the same
conditions, packed-pixel unpacking comes in at 81 cycles for every 8
pixels, or almost 4× slower. On lower-end systems, that can easily sum up to
more than one frame for a 320×200 image. While I'd argue that the resulting
tearing could have been an acceptable part of the transition between two
images, it's understandable why you'd want to avoid it in favor of the
pure effect on a slower framerate.
Really makes me wonder why master.lib didn't just directly decode .PI images
into bitplanes. The performance impact on load times should have been
negligible? It's such a good format for
the often dithered 16-color artwork you typically see on PC-98, and
deserves better than master.lib's implementation which is both slow to
decode and slow to blit.
That brings us to the individual script commands… and yes, I'm going to
document every single one of them. Some of their interactions and edge cases
are not clear at all from just looking at the code.
Almost all commands are preceded by… well, a 0x5C lead byte.
Which raises the question of whether we should
document it as an ASCII-encoded \ backslash, or a Shift-JIS-encoded
¥ yen sign. From a gaijin perspective, it seems obvious that it's a
backslash, as it's consistently displayed as one in most of the editors you
would actually use nowadays. But interestingly, iconv
-f shift-jis -t utf-8 does convert any 0x5C
lead bytes to actual ¥ U+00A5 YEN SIGN code points
.
Ultimately, the distinction comes down to the font. There are fonts
that still render 0x5C as ¥, but mainly do so out
of an obvious concern about backward compatibility to JIS X 0201, where this
mapping originated. Unsurprisingly, this group includes MS Gothic/Mincho,
the old Japanese fonts from Windows 3.1, but even Meiryo and Yu
Gothic/Mincho, Microsoft's modern Japanese fonts. Meanwhile, pretty much
every other modern font, and freely licensed ones in particular, render this
code point as \, even if you set your editor to Shift-JIS. And
while ZUN most definitely saw it as a ¥, documenting this code
point as \ is less ambiguous in the long run. It can only
possibly correspond to one specific code point in either Shift-JIS or UTF-8,
and will remain correct even if we later mod the cutscene system to support
full-blown Unicode.
Now we've only got to clarify the parameter syntax, and then we can look at
the big table of commands:
Numeric parameters are read as sequences of up to 3 ASCII digits. This
limits them to a range from 0 to 999 inclusive, with 000 and
0 being equivalent. Because there's no further sentinel
character, any further digit from the 4th one onwards is
interpreted as regular text.
Filename parameters must be terminated with a space or newline and are
limited to 12 characters, which translates to 8.3 basenames without any
directory component. Any further characters are ignored and displayed as
text as well.
Each .PI image can contain up to four 320×200 pictures ("quarters") for
the cutscene picture area. In the script commands, they are numbered like
this:
0
1
2
3
\@
Clears both VRAM pages by filling them with VRAM color 0. 🐞
In TH03 and TH04, this command does not update the internal text area
background used for unblitting. This bug effectively restricts usage of
this command to either the beginning of a script (before the first
background image is shown) or its end (after no more new text boxes are
started). See the image below for an
example of using it anywhere else.
\b2
Sets the font weight to a value between 0 (raw font ROM glyphs) to 3
(very thicc). Specifying any other value has no effect.
🐞 In TH04 and TH05, \b3 leads to glitched pixels when
rendering half-width glyphs due to a bug in the newly micro-optimized
ASM version of
📝 graph_putsa_fx(); see the image below for an example.
In these games, the parameter also directly corresponds to the
graph_putsa_fx() effect function, removing the sanity check
that was present in TH03. In exchange, you can also access the four
dissolve masks for the bold font (\b2) by specifying a
parameter between 4 (fewest pixels) to 7 (most
pixels). Demo video below.
\c15
Changes the text color to VRAM color 15.
\c=字,15
Adds a color map entry: If 字 is the first code point
inside the name area on a new line, the text color is automatically set
to 15. Up to 8 such entries can be registered
before overflowing the statically allocated buffer.
🐞 The comma is assumed to be present even if the color parameter is omitted.
\e0
Plays the sound effect with the given ID.
\f
(no-op)
\fi1
\fo1
Calls master.lib's palette_black_in() or
palette_black_out() to play a hardware palette fade
animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
\fm1
Fades out BGM volume via PMD's AH=02h interrupt call,
in a non-blocking way. The fade speed can range from 1 (slowest) to 127 (fastest).
Values from 128 to 255 technically correspond to
AH=02h's fade-in feature, which can't be used from cutscene
scripts because it requires BGM volume to first be lowered via
AH=19h, and there is no command to do that.
\g8
Plays a blocking 8-frame screen shake
animation.
\ga0
Shows the gaiji with the given ID from 0 to 255
at the current cursor position. Even in TH03, gaiji always ignore the
text delay interval configured with \v.
@3
TH05's replacement for the \ga command from TH03 and
TH04. The default ID of 3 corresponds to the
gaiji. Not to be confused with \@, which starts with a backslash,
unlike this command.
@h
Shows the gaiji.
@t
Shows the gaiji.
@!
Shows the gaiji.
@?
Shows the gaiji.
@!!
Shows the gaiji.
@!?
Shows the gaiji.
\k0
Waits 0 frames (0 = forever) for an advance key to be pressed before
continuing script execution. Before waiting, TH05 crossfades in any new
text that was previously rendered to the invisible VRAM page…
🐞 …but TH04 doesn't, leaving the text invisible during the wait time.
As a workaround, \vp1 can be
used before \k to immediately display that text without a
fade-in animation.
\m$
Stops the currently playing BGM.
\m*
Restarts playback of the currently loaded BGM from the
beginning.
\m,filename
Stops the currently playing BGM, loads a new one from the given
file, and starts playback.
\n
Starts a new line at the leftmost X coordinate of the box, i.e., the
start of the name area. This is how scripts can "change" the name of the
currently speaking character, or use the entire 480×64 pixels without
being restricted to the non-name area.
Note that automatic line breaks already move the cursor into a new line.
Using this command at the "end" of a line with the maximum number of 30
full-width glyphs would therefore start a second new line and leave the
previously started line empty.
If this command moved the cursor into the 5th line of a box,
\s is executed afterward, with
any of \n's parameters passed to \s.
\p
(no-op)
\p-
Deallocates the loaded .PI image.
\p,filename
Loads the .PI image with the given file into the single .PI slot
available to cutscenes. TH04 and TH05 automatically deallocate any
previous image, 🐞 TH03 would leak memory without a manual prior call to
\p-.
\pp
Sets the hardware palette to the one of the loaded .PI image.
\p@
Sets the loaded .PI image as the full-screen 640×400 background
image and overwrites both VRAM pages with its pixels, retaining the
current hardware palette.
\p=
Runs \pp followed by \p@.
\s0
\s-
Ends a text box and starts a new one. Fades in any text rendered to
the invisible VRAM page, then waits 0 frames
(0 = forever) for an advance key to be
pressed. Afterward, the new text box is started with the cursor moved to
the top-left corner of the name area. \s- skips the wait time and starts the new box
immediately.
\t100
Sets palette brightness via master.lib's
palette_settone() to any value from 0 (fully black) to 200
(fully white). 100 corresponds to the palette's original colors.
Preceded by a 1-frame delay unless ESC is held.
\v1
Sets the number of frames to wait between every 2 bytes of rendered
text.
Sets the number of frames to spend on each of the 4 fade
steps when crossfading between old and new text. The game-specific
default value is also used before the first use of this command.
\v2
\vp0
Shows VRAM page 0. Completely useless in
TH03 (this game always synchronizes both VRAM pages at a command
boundary), only of dubious use in TH04 (for working around a bug in \k), and the games always return to
their intended shown page before every blitting operation anyway. A
debloated mod of this game would just remove this command, as it exposes
an implementation detail that script authors should not need to worry
about. None of the original scripts use it anyway.
\w64
\w and \wk wait for the given number
of frames
\wm and \wmk wait until PMD has played
back the current BGM for the total number of measures, including
loops, given in the first parameter, and fall back on calling
\w and \wk with the second parameter as
the frame number if BGM is disabled.
🐞 Neither PMD nor MMD reset the internal measure when stopping
playback. If no BGM is playing and the previous BGM hasn't been
played back for at least the given number of measures, this command
will deadlock.
Since both TH04 and TH05 fade in any new text from the invisible VRAM
page, these commands can be used to simulate TH03's typing effect in
those games. Demo video below.
Contrary to \k and \s, specifying 0 frames would
simply remove any frame delay instead of waiting forever.
The TH03-exclusive k variants allow the delay to be
interrupted if ⏎ Return or Shot are held down.
TH04 and TH05 recognize the k as well, but removed its
functionality.
All of these commands have no effect if ESC is held.
\wm64,64
\wk64
\wmk64,64
\wi1
\wo1
Calls master.lib's palette_white_in() or
palette_white_out() to play a hardware palette fade
animation from or to white, spending roughly 1 frame on each of the 16 fade steps.
\=4
Immediately displays the given quarter of the loaded .PI image in
the picture area, with no fade effect. Any value ≥ 4 resets the picture area to black.
\==4,1
Crossfades the picture area between its current content and quarter
#4 of the loaded .PI image, spending 1 frame on each of the 4 fade steps unless
ESC is held. Any value ≥ 4 is
replaced with quarter #0.
\$
Stops script execution. Must be called at the end of each file;
otherwise, execution continues into whatever lies after the script
buffer in memory.
TH05 automatically deallocates the loaded .PI image, TH03 and TH04
require a separate manual call to \p- to not leak its memory.
Bold values signify the default if the parameter
is omitted; \c is therefore
equivalent to \c15.
So yeah, that's the cutscene system. I'm dreading the moment I will have to
deal with the other command interpreter in these games, i.e., the
stage enemy system. Luckily, that one is completely disconnected from any
other system, so I won't have to deal with it until we're close to finishing
MAIN.EXE… that is, unless someone requests it before. And it
won't involve text encodings or unblitting…
The cutscene system got me thinking in greater detail about how I would
implement translations, being one of the main dependencies behind them. This
goal has been on the order form for a while and could soon be implemented
for these cutscenes, with 100% PI being right around the corner for the TH03
and TH04 cutscene executables.
Once we're there, the "Virgin" old-school way of static translation patching
for Latin-script languages could be implemented fairly quickly:
Establish basic UTF-8 parsing for less painful manual editing of the
source files
Procedurally generate glyphs for the few required additional letters
based on existing font ROM glyphs. For example, we'd generate ä
by painting two short lines on top of the font ROM's a glyph,
or generate ¿ by vertically flipping the question mark. This
way, the text retains a consistent look regardless of whether the translated
game is run with an NEC or EPSON font ROM, or the that Neko Project II auto-generates if you
don't provide either.
(Optional) Change automatic line breaks to work on a per-word
basis, rather than per-glyph
That's it – script editing and distribution would be handled by your local
translation group. It might seem as if this would also work for Greek and
Cyrillic scripts due to their presence in the PC-98 font ROM, but I'm not
sure if I want to attempt procedurally shrinking these glyphs from 16×16 to
8×16… For any more thorough solution, we'd need to go for a more "Chad" kind
of full-blown translation support:
Implement text subdivisions at a sensible granularity while retaining
automatic line and box breaks
Compile translatable text into a Japanese→target language dictionary
(I'm too old to develop any further translation systems that would overwrite
modded source text with translations of the original text)
Implement a custom Unicode font system (glyphs would be taken from GNU
Unifont unless translators provide a different 8×16 font for their
language)
Combine the text compiler with the font compiler to only store needed
glyphs as part of the translation's font file (dealing with a multi-MB font
file would be rather ugly in a Real Mode game)
Write a simple install/update/patch stacking tool that supports both
.HDI and raw-file DOSBox-X scenarios (it's different enough from thcrap to
warrant a separate tool – each patch stack would be statically compiled into
a single package file in the game's directory)
Add a nice language selection option to the main menu
(Optional) Support proportional fonts
Which sounds more like a separate project to be commissioned from
Touhou Patch Center's Open Collective funds, separate from the ReC98 cap.
This way, we can make sure that the feature is completely implemented, and I
can talk with every interested translator to make sure that their language
works.
It's still cheaper overall to do this on PC-98 than to first port the games
to a modern system and then translate them. On the other hand, most
of the tasks in the Chad variant (3, 4, 5, and half of 2) purely deal with
the difficulty of getting arbitrary Unicode characters to work natively in a
PC-98 DOS game at all, and would be either unnecessary or trivial if we had
already ported the game. Depending on where the patrons' interests lie, it
may not be worth it. So let's see what all of you think about which
way we should go, or whether it's worth doing at all. (Edit
(2022-12-01): With Splashman's
order towards the stage dialogue system, we've pretty much confirmed that it
is.) Maybe we want to meet in the middle – using e.g. procedural glyph
generation for dynamic translations to keep text rendering consistent with
the rest of the PC-98 system, and just not support non-Latin-script
languages in the beginning? In any case, I've added both options to the
order form. Edit (2023-07-28):Touhou Patch Center has agreed to fund
a basic feature set somewhere between the Virgin and Chad level. Check the
📝 dedicated announcement blog post for more
details and ideas, and to find out how you can support this goal!
Surprisingly, there was still a bit of RE work left in the third push after
all of this, which I filled with some small rendering boilerplate. Since I
also wanted to include TH02's playfield overlay functions,
1/15 of that last push went towards getting a
TH02-exclusive function out of the way, which also ended up including that
game in this delivery.
The other small function pointed out how TH05's Stage 5 midboss pops into
the playfield quite suddenly, since its clipping test thinks it's only 32
pixels tall rather than 64:
Next up: Staying with TH05 and looking at more of the pattern code of its
boss fights. Given the remaining TH05 budget, it makes the most sense to
continue in in-game order, with Sara and the Stage 2 midboss. If more money
comes in towards this goal, I could alternatively go for the Mai & Yuki
fight and immediately develop a pretty fix for the cheeto storage
glitch. Also, there's a rather intricate
pull request for direct ZMBV decoding on the website that I've still got
to review…
…or maybe not that soon, as it would have only wasted time to
untangle the bullet update commits from the rest of the progress. So,
here's all the bullet spawning code in TH04 and TH05 instead. I hope
you're ready for this, there's a lot to talk about!
(For the sake of readability, "bullets" in this blog post refers to the
white 8×8 pellets
and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)
But first, what was going on📝 in 2020? Spent 4 pushes on the basic types
and constants back then, still ended up confusing a couple of things, and
even getting some wrong. Like how TH05's "bullet slowdown" flag actually
always prevents slowdown and fires bullets at a constant speed
instead. Or how "random spread" is not the
best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen,
which deserve different names:
Bullets are zapped at the end of most midboss and boss phases, and
cleared everywhere else – most notably, during bombs, when losing a
life, or as rewards for extends or a maximized Dream bonus. The
Bonus!! points awarded for zapping bullets are calculated iteratively,
so it's not trivial to give an exact formula for these. For a small number
𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛
points – or, using uth05win's (correct) recursive definition,
Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10.
However, one of the internal step variables is capped at a different number
of points for each difficulty (and game), after which the points only
increase linearly. Hence, "semi-exponential".
On to TH04's bullet spawn code then, because that one can at least be
decompiled. And immediately, we have to deal with a pointless distinction
between regular bullets, with either a decelerating or constant
velocity, and special bullets, with preset velocity changes during
their lifetime. That preset has to be set somewhere, so why have
separate functions? In TH04, this separation continues even down to the
lowest level of functions, where values are written into the global bullet
array. TH05 merges those two functions into one, but then goes too far and
uses self-modifying code to save a grand total of two local variables…
Luckily, the rest of its actual code is identical to TH04.
Most of the complexity in bullet spawning comes from the (thankfully
shared) helper function that calculates the velocities of the individual
bullets within a group. Both games handle each group type via a large
switch statement, which is where TH04 shows off another Turbo
C++ 4.0 optimization: If the range of case values is too
sparse to be meaningfully expressed in a jump table, it usually generates a
linear search through a second value table. But with the -G
command-line option, it instead generates branching code for a binary
search through the set of cases. 𝑂(log 𝑛) as the worst case for a
switch statement in a C++ compiler from 1994… that's so cool.
But still, why are the values in TH04's group type enum all
over the place to begin with?
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only
shows up here and in a few places in TH02, compared to at least 50
switch value tables.
In all of its micro-optimized pointlessness, TH05's undecompilable version
at least fixes some of TH04's redundancy. While it's still not even
optimal, it's at least a decently written piece of ASM…
if you take the time to understand what's going on there, because it
certainly took quite a bit of that to verify that all of the things which
looked like bugs or quirks were in fact correct. And that's how the code
for this function ended up with 35% comments and blank lines before I could
confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04,
where an invalid bullet group type would fill all remaining slots in the
bullet array with identical versions of the first bullet.
Something that both games also share in these functions is an over-reliance
on globals for return values or other local state. The most ridiculous
example here: Tuning the speed of a bullet based on rank actually mutates
the global bullet template… which ZUN then works around by adding a wrapper
function around both regular and special bullet spawning, which saves the
base speed before executing that function, and restores it afterward.
Add another set of wrappers to bypass that exact
tuning, and you've expanded your nice 1-function interface to 4 functions.
Oh, and did I mention that TH04 pointlessly duplicates the first set of
wrapper functions for 3 of the 4 difficulties, which can't even be
explained with "debugging reasons"? That's 10 functions then… and probably
explains why I've procrastinated this feature for so long.
At this point, I also finally stopped decompiling ZUN's original ASM just
for the sake of it. All these small TH05 functions would look horribly
unidiomatic, are identical to their decompiled TH04 counterparts anyway,
except for some unique constant… and, in the case of TH05's rank-based
speed tuning function, actually become undecompilable as soon as we
want to return a C++ class to preserve the semantic meaning of the return
value. Mainly, this is because Turbo C++ does not allow register
pseudo-variables like _AX or _AL to be cast into
class types, even if their size matches. Decompiling that function would
have therefore lowered the quality of the rest of the decompiled code, in
exchange for the additional maintenance and compile-time cost of another
translation unit. Not worth it – and for a TH05 port, you'd already have to
decompile all the rest of the bullet spawning code anyway!
The only thing in there that was still somewhat worth being
decompiled was the pre-spawn clipping and collision detection function. Due
to what's probably a micro-optimization mistake, the TH05 version continues
to spawn a bullet even if it was spawned on top of the player. This might
sound like it has a different effect on gameplay… until you realize that
the player got hit in this case and will either lose a life or deathbomb,
both of which will cause all on-screen bullets to be cleared anyway.
So it's at most a visual glitch.
But while we're at it, can we please stop talking about hitboxes? At least
in the context of TH04 and TH05 bullets. The actual collision detection is
described way better as a kill delta of 8×8 pixels between the
center points of the player and a bullet. You can distribute these pixels
to any combination of bullet and player "hitboxes" that make up 8×8. 4×4
around both the player and bullets? 1×1 for bullets, and 8×8 for the
player? All equally valid… or perhaps none of them, once you keep in mind
that other entity types might have different kill deltas. With that in
mind, the concept of a "hitbox" turns into just a confusing abstraction.
The same is true for the 36×44 graze box delta. For some reason,
this one is not exactly around the center of a bullet, but shifted to the
right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the
player, but only up to 16 pixels left of the player. uth05win also spotted
this… and rotated the deltas clockwise by 90°?!
Which brings us to the bullet updates… for which I still had to
research a decompilation workaround, because
📝 P0148 turned out to not help at all?
Instead, the solution was to lie to the compiler about the true segment
distance of the popup function and declare its signature far
rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function
call/return per frame, and those precious 2 bytes in the BSS segment
that he didn't have to spend on a segment value.
📝 Another function that didn't have just a
single declaration in a common header file… really,
📝 how were these games even built???
The function itself is among the longer ones in both games. It especially
stands out in the indentation department, with 7 levels at its most
indented point – and that's the minimum of what's possible without
goto. Only two more notable discoveries there:
Bullets are the only entity affected by Slow Mode. If the number of
bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04,
or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame
rate by 33%, by waiting for one additional VSync event every two frames.
The code also reveals a second tier, with 50% slowdown for a slightly
higher number of bullets, but that conditional branch can never be executed
Bullets must have been grazed in a previous frame before they can
be collided with. (Note how this does not apply to bullets that spawned
on top of the player, as explained earlier!)
Whew… When did ReC98 turn into a full-on code review?! 😅 And after all
this, we're still not done with TH04 and TH05 bullets, with all the
special movement types still missing. That should be less than one push
though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun
rewriting the Touhou Wiki Gameplay pages 😛
P0111
TH05 RE (Code around the final MAIN.EXE data references, part 1/2)
P0112
TH05 RE (Code around the final MAIN.EXE data references, part 2/2)
💰 Funded by:
[Anonymous], Blue Bolt
🏷️ Tags:
Only one newly ordered push since I've reopened the store? Great, that's
all the justification I needed for the extended maintenance delay that was
part of these two pushes 😛
Having to write comments to explain whether coordinates are relative to
the top-left corner of the screen or the top-left corner of the playfield
has finally become old. So, I introduced
distinct
types for all the coordinate systems we typically encounter, applying
them to all code decompiled so far. Note how the planar nature of PC-98
VRAM meant that X and Y coordinates also had to be different from each
other. On the X side, there's mainly the distinction between the
[0; 640] screen space and the corresponding [0; 80] VRAM byte
space. On the Y side, we also have the [0; 400] screen space, but
the visible area of VRAM might be limited to [0; 200] when running in
the PC-98's line-doubled 640×200 mode. A VRAM Y coordinate also always
implies an added offset for vertical scrolling.
During all of the code reconstruction, these types can only have a
documenting purpose. Turning them into anything more than just
typedefs to int, in order to define conversion
operators between them, simply won't recompile into identical binaries.
Modding and porting projects, however, now have a nice foundation for
doing just that, and can entirely lift coordinate system transformations
into the type system, without having to proofread all the meaningless
int declarations themselves.
So, what was left in terms of memory references? EX-Alice's fire waves
were our final unknown entity that can collide with the player. Decently
implemented, with little to say about them.
That left the bomb animation structures as the one big remaining PI
blocker. They started out nice and simple in TH04, with a small 6-byte
star animation structure used for both Reimu and Marisa. TH05, however,
gave each character her own animation… and what the hell is going
on with Reimu's blue stars there? Nope, not going to figure this out on
ASM level.
A decompilation first required some more bomb-related variables to be
named though. Since this was part of a generic RE push, it made sense to
do this in all 5 games… which then led to nice PI gains in anything
but TH05. Most notably, we now got the
"pulling all items to player" flag in TH04 and TH05, which is
actually separate from bombing. The obvious cheat mod is left as an
exercise to the reader.
So, TH05 bomb animations. Just like the
📝 custom entity types of this game, all 4
characters share the same memory, with the superficially same 10-byte
structure.
But let's just look at the very first field. Seen from a low level, it's a
simple struct { int x, y; } pos, storing the current position
of the character-specific bomb animation entity. But all 4 characters use
this field differently:
For Reimu's blue stars, it's the top-left position of each star, in the
12.4 fixed-point format. But unlike the vast majority of these values in
TH04 and TH05, it's relative to the top-left corner of the
screen, not the playfield. Much better represented as
struct { Subpixel screen_x, screen_y; } topleft.
For Marisa's lasers, it's the center of each circle, as a regular 12.4
fixed-point coordinate, relative to the top-left corner of the playfield.
Much better represented as
struct { Subpixel x, y; } center.
For Mima's shrinking circles, it's the center of each circle in regular
pixel coordinates. Much better represented as
struct { screen_x_t x; screen_y_t y; } center.
For Yuuka's spinning heart, it's the top-left corner in regular pixel
coordinates. Much better represented as
struct { screen_x_t x; screen_y_t y; } topleft.
And yes, singular. The game is actually smart enough to only store a single
heart, and then create the rest of the circle on the fly. (If it were even
smarter, it wouldn't even use this structure member, but oh well.)
Therefore, I decompiled it as 4 separate structures once again, bundled
into an union of arrays.
As for Reimu… yup, that's some pointer arithmetic straight out of
Jigoku* for setting and updating the positions of the falling star
trails. While that certainly required several
comments to wrap my head around the current array positions, the one "bug"
in all this arithmetic luckily has no effect on the game.
There is a small glitch with the growing circles, though. They are
spawned at the end of the loop, with their position taken from the star
pointer… but after that pointer has already been incremented. On
the last loop iteration, this leads to an out-of-bounds structure access,
with the position taken from some unknown EX-Alice data, which is 0 during
most of the game. If you look at the animation, you can easily spot these
bugged circles, consistently growing from the top-left corner (0, 0)
of the playfield:
After all that, there was barely enough remaining time to filter out and
label the final few memory references. But now, TH05's
MAIN.EXE is technically position-independent! 🎉
-Tom- is going to work on a pretty extensive demo of this
unprecedented level of efficient Touhou game modding. For a more impactful
effect of both the 100% PI mark and that demo, I'll be delaying the push
covering the remaining false positives in that binary until that demo is
done. I've accumulated a pretty huge backlog of minor maintenance issues
by now…
Next up though: The first part of the long-awaited build system
improvements. I've finally come up with a way of sanely accelerating the
32-bit build part on most setups you could possibly want to build ReC98
on, without making the building experience worse for the other few setups.
P0109
TH04/TH05 decompilation (Boss movement / Bullet group tuning)
💰 Funded by:
[Anonymous], Blue Bolt
🏷️ Tags:
Back to TH05! Thanks to the good funding situation, I can strike a nice
balance between getting TH05 position-independent as quickly as possible,
and properly reverse-engineering some missing important parts of the game.
Once 100% PI will get the attention of modders, the code will then be in
better shape, and a bit more usable than if I just rushed that goal.
By now, I'm apparently also pretty spoiled by TH01's immediate
decompilability, after having worked on that game for so long.
Reverse-engineering in ASM land is pretty annoying, after all,
since it basically boils down to meticulously editing a piece of ASM into
something I can confidently call "reverse-engineered". Most of the
time, simply decompiling that piece of code would take just a little bit
longer, but be massively more useful. So, I immediately tried decompiling
with TH05… and it just worked, at every place I tried!? Whatever the issue
was that made 📝 segment splitting so
annoying at my first attempt, I seem to have completely solved it in the
meantime. 🤷 So yeah, backers can now request pretty much any part of TH04
and TH05 to be decompiled immediately, with no additional segment
splitting cost.
(Protip for everyone interested in starting their own ReC project: Just
declare one segment per function, right from the start, then group them
together to restore the original code segmentation…)
Except that TH05 then just throws more of its infamous micro-optimized and
undecompilable ASM at you. 🙄 This push covered the function that adjusts
the bullet group template based on rank and the selected difficulty,
called every time such a group is configured. Which, just like pretty
much all of TH05's bullet spawning code, is one of those undecompilable
functions. If C allowed labels of other functions as goto
targets, it might have been decompilable into something useful to
modders… maybe. But like this, there's no point in even trying.
This is such a terrible idea from a software architecture point of view, I
can't even. Because now, you suddenly have to mirror your C++
declarations in ASM land, and keep them in sync with each other. I'm
always happy when I get to delete an ASM declaration from the codebase
once I've decompiled all the instances where it was referenced. But for
TH05, we now have to keep those declarations around forever. 😕 And all
that for a performance increase you probably couldn't even measure. Oh
well, pulling off Galaxy Brain-level ASM optimizations is kind of
fun if you don't have portability plans… I guess?
If I started a full fangame mod of a PC-98 Touhou game, I'd base it on
TH04 rather than TH05, and backport selected features from TH05 as
needed. Just because it was released later doesn't make it better, and
this is by far not the only one of ZUN's micro-optimizations that just
went way too far.
Dropping down to ASM also makes it easier to introduce weird quirks.
Decompiled, one of TH05's tuning conditions for
stack
groups on Easy Mode would look something like:
case BP_STACK:
// […]
if(spread_angle_delta >= 2) {
stack_bullet_count--;
}
The fields of the bullet group template aren't typically reset when
setting up a new group. So, spread_angle_delta in the context
of a stack group effectively refers to "the delta angle of the last
spread group that was fired before this stack – whenever that was".
uth05win also spotted this quirk, considered it a bug, and wrote
fanfiction by changing spread_angle_delta to
stack_bullet_count.
As usual for functions that occur in more than one game, I also decompiled
the TH04 bullet group tuning function, and it's perfectly sane, with no
such quirks.
In the more PI-focused parts of this push, we got the TH05-exclusive
smooth boss movement functions, for flying randomly or towards a given
point. Pretty unspectacular for the most part, but we've got yet another
uth05win inconsistency in the latter one. Once the Y coordinate gets close
enough to the target point, it actually speeds up twice as much as the
X coordinate would, whereas uth05win used the same speedup factors for
both. This might make uth05win a couple of frames slower in all boss
fights from Stage 3 on. Hard to measure though – and boss movement partly
depends on RNG anyway.
Next up: Shinki's background animations – which are actually the single
biggest source of position dependence left in TH05.