- 📝 Posted:
- 🚚 Summary of:
- P0223, P0224, P0225
- ⌨ Commits:
- 💰 Funded by:
- rosenrose, Blue Bolt, Splashman, -Tom-, Yanga, Enderwolf, 32th System
- 🏷 Tags:
- rec98 th02 th03 th04 th05 cutscene blitting micro-optimization glitch master.lib pc98 performance kaja unused meta midboss animation
More than three months without any reverse-engineering progress! It's been
way too long. Coincidentally, we're at least back with a surprising 1.25% of
overall RE, achieved within just 3 pushes. The ending script system is not
only more or less the same in TH04 and TH05, but actually originated in
TH03, where it's also used for the cutscenes before stages 8 and 9. This
means that it was one of the final pieces of code shared between three of
the four remaining games, which I got to decompile at roughly 3× the usual
speed, or ⅓ of the price.
The only other bargains of this nature remain in
Music Room is largely equivalent in all three remaining games as well, and
the sound device selection, ZUN Soft logo screens, and main/option menus are
the same in TH04 and TH05. A lot of that code is in the "technically RE'd
but not yet decompiled" ASM form though, so it would shift Finalized% more
significantly than RE%. Therefore, make sure to order the new
Finalization option rather than Reverse-engineering if you
want to make number go up.
So, cutscenes. On the surface, the .TXT files look simple enough: You directly write the text that should appear on the screen into the file without any special markup, and add commands to define visuals, music, and other effects at any place within the script. Let's start with the basics of how text is rendered, which are the same in all three games:
- First off, the text area has a size of 480×64 pixels. This means that it
does not correspond to the tiled area painted into TH05's
- Since the font weight can be customized, all text is rendered to VRAM. This also includes gaiji, despite them ignoring the font weight setting.
- The system supports automatic line breaks on a per-glyph basis, which move the text cursor to the beginning of the red text area. This might seem like a piece of long-forgotten ancient wisdom at first, considering the absence of automatic line breaks in Windows Touhou. However, ZUN probably implemented it more out of pure necessity: Text in VRAM needs to be unblitted when starting a new box, which is way more straightforward and performant if you only need to worry about a fixed area.
- The system also automatically starts a new (key press-separated) text
box after the end of the 4th line. However, the text cursor is
also unconditionally moved to the top-left corner of the yellow name
area when this happens, which is almost certainly not what you expect, given
that automatic line breaks stay within the red area. A script author might
as well add the necessary text box change commands manually, if you're
forced to anticipate the automatic ones anyway…
Due to ZUN forgetting an unblitting call during the TH05 refactoring of the box background buffer, this feature is even completely broken in that game, as any new text will simply be blitted on top of the old one:
- Overall, the system is geared toward exclusively full-width text. As
exemplified by the 2014 static English patches and the screenshots in this
blog post, half-width text is possible, but comes with a lot of
- Each loop of the script interpreter starts by looking at the next byte to distinguish commands from text. However, this step also skips over every ASCII space and control character, i.e., every byte ≤ 32. If you only intend to display full-width glyphs anyway, this sort of makes sense: You gain complete freedom when it comes to the physical layout of these script files, and it especially allows commands to be freely separated with spaces and line breaks for improved readability. Still, enforcing commands to be separated exclusively by line breaks might have been even better for readability, and would have freed up ASCII spaces for regular text…
- Non-command text is blindly processed and rendered two bytes at a
time. The rendering function interprets these bytes as a Shift-JIS
string, so you can use half-width characters here. While the
second byte can even be an ASCII
0x20space due to the parser's blindness, all half-width characters must still occur in pairs that can't be interrupted by commands:
- As a workaround for at least the ASCII space issue, you can replace
them with any of the unassigned
Shift-JIS lead bytes –
0xA0, or anything between
0xFFinclusive. That's what you see in all screenshots of this post that display half-width spaces.
- Finally, did you know that you can hold ESC to fast-forward through these cutscenes, which skips most frame delays and reduces the rest? Due to the blocking nature of all commands, the ESC key state is only updated between commands or 2-byte text groups though, so it can't interrupt an ongoing delay.
Superficially, the list of game-specific differences doesn't look too long, and can be summarized in a rather short table:
|Script size limit||65536 bytes (heap-allocated)||8192 bytes (statically allocated)|
|Delay between every 2 bytes of text||1 frame by default, customizable via
|Text delay when holding ESC||Varying speed-up factor||None|
|Visibility of new text||Immediately typed onto the screen||Rendered onto invisible VRAM page, faded in on wait commands|
|Visibility of old text||Unblitted when starting a new box||Left on screen until crossfaded out with new text|
|Key binding for advancing the script||Any key||⏎ Return, Shot, or ESC|
|Animation while waiting for an advance key||None||, past right edge of current row|
|Inexplicable delays||None||1 frame before changing pictures and after rendering new text boxes|
|Additional delay per interpreter loop||614.4 µs||None||614.4 µs|
It's when you get into the implementation that the combined three systems reveal themselves as a giant mess, with more like 56 differences between the games. Every single new weird line of code opened up another can of worms, which ultimately made all of this end up with 24 pieces of bloat and 14 bugs. The worst of these should be quite interesting for the general PC-98 homebrew developers among my audience:
- The final official 0.23 release of master.lib has a bug in
graph_gaiji_put*(). To calculate the JIS X 0208 code point for a gaiji, it is enough to
ADD 5680honto the gaiji ID. However, these functions accidentally use
ADCinstead, which incorrectly adds the x86 carry flag on top, causing weird off-by-one errors based on the previous program state. ZUN did fix this bug directly inside master.lib for TH04 and TH05, but still needed to work around it in TH03 by subtracting 1 from the intended gaiji ID. Anyone up for maintaining a bug-fixed master.lib repository?
The worst piece of bloat comes from TH03 and TH04 needlessly switching the visibility of VRAM pages while blitting a new 320×200 picture. This makes it much harder to understand the code, as the mere existence of these page switches is enough to suggest a more complex interplay between the two VRAM pages which doesn't actually exist. Outside this visibility switch, page 0 is always supposed to be shown, and page 1 is always used for temporarily storing pixels that are later crossfaded onto page 0. This is also the only reason why TH03 has to render text and gaiji onto both VRAM pages to begin with… and because TH04 doesn't, changing the picture in the middle of a string of text is technically bugged in that game, even though you only get to temporarily see the new text on very underclocked PC-98 systems.
These performance implications made me wonder why cutscenes even bother with writing to the second VRAM page anyway, before copying each crossfade step to the visible one. 📝 We learned in June how costly EGC-"accelerated" inter-page copies are; shouldn't it be faster to just blit the image once rather than twice?
Well, master.lib decodes .PI images into a packed-pixel format, and unpacking such a representation into bitplanes on the fly is just about the worst way of blitting you could possibly imagine on a PC-98. EGC inter-page copies are already fairly disappointing at 42 cycles for every 16 pixels, if we look at the i486 and ignore VRAM latencies. But under the same conditions, packed-pixel unpacking comes in at 81 cycles for every 8 pixels, or almost 4× slower. On lower-end systems, that can easily sum up to more than one frame for a 320×200 image. While I'd argue that the resulting tearing could have been an acceptable part of the transition between two images, it's understandable why you'd want to avoid it in favor of the pure effect on a slower framerate.
Really makes me wonder why master.lib didn't just directly decode .PI images into bitplanes. The performance impact on load times should have been negligible? It's such a good format for the often dithered 16-color artwork you typically see on PC-98, and deserves better than master.lib's implementation which is both slow to decode and slow to blit.
That brings us to the individual script commands… and yes, I'm going to document every single one of them. Some of their interactions and edge cases are not clear at all from just looking at the code.
Almost all commands are preceded by… well, a
0x5C lead byte.
Which raises the question of whether we should
document it as an ASCII-encoded \ backslash, or a Shift-JIS-encoded
¥ yen sign. From a gaijin perspective, it seems obvious that it's a
backslash, as it's consistently displayed as one in most of the editors you
would actually use nowadays. But interestingly,
-f shift-jis -t utf-8 does convert any
lead bytes to actual
¥ U+00A5 YEN SIGN code points
Ultimately, the distinction comes down to the font. There are fonts that still render
¥, but mainly do so out
of an obvious concern about backward compatibility to JIS X 0201, where this
mapping originated. Unsurprisingly, this group includes MS Gothic/Mincho,
the old Japanese fonts from Windows 3.1, but even Meiryo and Yu
Gothic/Mincho, Microsoft's modern Japanese fonts. Meanwhile, pretty much
every other modern font, and freely licensed ones in particular, render this
code point as
\, even if you set your editor to Shift-JIS. And
while ZUN most definitely saw it as a
¥, documenting this code
\ is less ambiguous in the long run. It can only
possibly correspond to one specific code point in either Shift-JIS or UTF-8,
and will remain correct even if we later mod the cutscene system to support
Now we've only got to clarify the parameter syntax, and then we can look at the big table of commands:
- Numeric parameters are read as sequences of up to 3 ASCII digits. This
limits them to a range from 0 to 999 inclusive, with
0being equivalent. Because there's no further sentinel character, any further digit from the 4th one onwards is interpreted as regular text.
- Filename parameters must be terminated with a space or newline and are limited to 12 characters, which translates to 8.3 basenames without any directory component. Any further characters are ignored and displayed as text as well.
- Each .PI image can contain up to four 320×200 pictures ("quarters") for
the cutscene picture area. In the script commands, they are numbered like
0 1 2 3
|\@||Clears both VRAM pages by filling them with VRAM color 0.|
🐞 In TH03 and TH04, this command does not update the internal text area background used for unblitting. This bug effectively restricts usage of this command to either the beginning of a script (before the first background image is shown) or its end (after no more new text boxes are started). See the image below for an example of using it anywhere else.
|\b2||Sets the font weight to a value between 0 (raw font ROM glyphs) to 3 (very thicc). Specifying any other value has no effect.|
|🐞 In TH04 and TH05,
In these games, the parameter also directly corresponds to the
|\c15||Changes the text color to VRAM color 15.|
|\c=字,15||Adds a color map entry: If 字 is the first code point
inside the name area on a new line, the text color is automatically set
to 15. Up to 8 such entries can be registered
before overflowing the statically allocated buffer.
🐞 The comma is assumed to be present even if the color parameter is omitted.
|\e0||Plays the sound effect with the given ID.|
|\fm1||Fades out BGM volume via PMD's
Values from 128 to 255 technically correspond to
|\g8||Plays a blocking 8-frame screen shake animation.|
|\ga0||Shows the gaiji with the given ID from 0 to 255
at the current cursor position. Even in TH03, gaiji always ignore the
text delay interval configured with
|@3||TH05's replacement for the
|@h||Shows the gaiji.|
|@t||Shows the gaiji.|
|@!||Shows the gaiji.|
|@?||Shows the gaiji.|
|@!!||Shows the gaiji.|
|@!?||Shows the gaiji.|
|\k0||Waits 0 frames (0 = forever) for an advance key to be pressed before
continuing script execution. Before waiting, TH05 crossfades in any new
text that was previously rendered to the invisible VRAM page…
🐞 …but TH04 doesn't, leaving the text invisible during the wait time. As a workaround,
|\m$||Stops the currently playing BGM.|
|\m*||Restarts playback of the currently loaded BGM from the beginning.|
|\m,filename||Stops the currently playing BGM, loads a new one from the given file, and starts playback.|
|\n||Starts a new line at the leftmost X coordinate of the box, i.e., the
start of the name area. This is how scripts can "change" the name of the
currently speaking character, or use the entire 480×64 pixels without
being restricted to the non-name area.
Note that automatic line breaks already move the cursor into a new line. Using this command at the "end" of a line with the maximum number of 30 full-width glyphs would therefore start a second new line and leave the previously started line empty.
If this command moved the cursor into the 5th line of a box,
|\p-||Deallocates the loaded .PI image.|
|\p,filename||Loads the .PI image with the given file into the single .PI slot
available to cutscenes. TH04 and TH05 automatically deallocate any
previous image, 🐞 TH03 would leak memory without a manual prior call to
|\pp||Sets the hardware palette to the one of the loaded .PI image.|
|\p@||Sets the loaded .PI image as the full-screen 640×400 background image and overwrites both VRAM pages with its pixels, retaining the current hardware palette.|
|Ends a text box and starts a new one. Fades in any text rendered to
the invisible VRAM page, then waits 0 frames
(0 = forever) for an advance key to be
pressed. Afterward, the new text box is started with the cursor moved to
the top-left corner of the name area.
|\t100||Sets palette brightness via master.lib's
|\v1||Sets the number of frames to wait between every 2 bytes of rendered text.|
|Sets the number of frames to spend on each of the 4 fade steps when crossfading between old and new text. The game-specific default value is also used before the first use of this command.|
|\vp0||Shows VRAM page 0. Completely useless in
TH03 (this game always synchronizes both VRAM pages at a command
boundary), only of dubious use in TH04 (for working around a bug in
All of these commands have no effect if ESC is held.
|\=4||Immediately displays the given quarter of the loaded .PI image in the picture area, with no fade effect. Any value ≥ 4 resets the picture area to black.|
|\==4,1||Crossfades the picture area between its current content and quarter #4 of the loaded .PI image, spending 1 frame on each of the 4 fade steps unless ESC is held. Any value ≥ 4 is replaced with quarter #0.|
|\$||Stops script execution. Must be called at the end of each file;
otherwise, execution continues into whatever lies after the script
buffer in memory.
TH05 automatically deallocates the loaded .PI image, TH03 and TH04 require a separate manual call to
\cis therefore equivalent to
So yeah, that's the cutscene system. I'm dreading the moment I will have to
deal with the other command interpreter in these games, i.e., the
stage enemy system. Luckily, that one is completely disconnected from any
other system, so I won't have to deal with it until we're close to finishing
MAIN.EXE… that is, unless someone requests it before. And it
won't involve text encodings or unblitting…
The cutscene system got me thinking in greater detail about how I would
implement translations, being one of the main dependencies behind them. This
goal has been on the order form for a while and could soon be implemented
for these cutscenes, with 100% PI being right around the corner for the TH03
and TH04 cutscene executables.
Once we're there, the "Virgin" old-school way of static translation patching for Latin-script languages could be implemented fairly quickly:
- Establish basic UTF-8 parsing for less painful manual editing of the source files
- Procedurally generate glyphs for the few required additional letters
based on existing font ROM glyphs. For example, we'd generate
äby painting two short lines on top of the font ROM's
aglyph, or generate
¿by vertically flipping the question mark. This way, the text retains a consistent look regardless of whether the translated game is run with an NEC or EPSON font ROM, or the that Neko Project II auto-generates if you don't provide either.
- (Optional) Change automatic line breaks to work on a per-word basis, rather than per-glyph
That's it – script editing and distribution would be handled by your local translation group. It might seem as if this would also work for Greek and Cyrillic scripts due to their presence in the PC-98 font ROM, but I'm not sure if I want to attempt procedurally shrinking these glyphs from 16×16 to 8×16… For any more thorough solution, we'd need to go for a more "Chad" kind of full-blown translation support:
- Implement text subdivisions at a sensible granularity while retaining automatic line and box breaks
- Compile translatable text into a Japanese→target language dictionary (I'm too old to develop any further translation systems that would overwrite modded source text with translations of the original text)
- Implement a custom Unicode font system (glyphs would be taken from GNU Unifont unless translators provide a different 8×16 font for their language)
- Combine the text compiler with the font compiler to only store needed glyphs as part of the translation's font file (dealing with a multi-MB font file would be rather ugly in a Real Mode game)
- Write a simple install/update/patch stacking tool that supports both .HDI and raw-file DOSBox-X scenarios (it's different enough from thcrap to warrant a separate tool – each patch stack would be statically compiled into a single package file in the game's directory)
- Add a nice language selection option to the main menu
- (Optional) Support proportional fonts
Which sounds more like a separate project to be commissioned from
Touhou Patch Center's Open Collective funds, separate from the ReC98 cap.
This way, we can make sure that the feature is completely implemented, and I
can talk with every interested translator to make sure that their language
It's still cheaper overall to do this on PC-98 than to first port the games to a modern system and then translate them. On the other hand, most of the tasks in the Chad variant (3, 4, 5, and half of 2) purely deal with the difficulty of getting arbitrary Unicode characters to work natively in a PC-98 DOS game at all, and would be either unnecessary or trivial if we had already ported the game. Depending on where the patrons' interests lie, it may not be worth it. So let's see what all of you think about which way we should go
, or whether it's worth doing at all. (Edit
(2022-12-01): With Splashman's
order towards the stage dialogue system, we've pretty much confirmed that it
is.) Maybe we want to meet in the middle – using e.g. procedural glyph
generation for dynamic translations to keep text rendering consistent with
the rest of the PC-98 system, and just not support non-Latin-script
languages in the beginning? In any case, I've added both options to the
Edit (2023-07-28): Touhou Patch Center has agreed to fund a basic feature set somewhere between the Virgin and Chad level. Check the 📝 dedicated announcement blog post for more details and ideas, and to find out how you can support this goal!
Surprisingly, there was still a bit of RE work left in the third push after
all of this, which I filled with some small rendering boilerplate. Since I
also wanted to include TH02's playfield overlay functions,
1/15 of that last push went towards getting a
TH02-exclusive function out of the way, which also ended up including that
game in this delivery.
The other small function pointed out how TH05's Stage 5 midboss pops into the playfield quite suddenly, since its clipping test thinks it's only 32 pixels tall rather than 64:
Next up: Staying with TH05 and looking at more of the pattern code of its boss fights. Given the remaining TH05 budget, it makes the most sense to continue in in-game order, with Sara and the Stage 2 midboss. If more money comes in towards this goal, I could alternatively go for the Mai & Yuki fight and immediately develop a pretty fix for the cheeto storage glitch. Also, there's a rather intricate pull request for direct ZMBV decoding on the website that I've still got to review…