That was quick: In a surprising turn of events, Romantique Tp themselves came in just one day after the last blog post went up, updated me with their current and much more positive opinion on Sound Canvas VA, and confirmed that real SC-88Pro hardware clamps invalid Reverb Macro values to the specified range. I promised to release a new Sound Canvas VA BGM pack for free once I knew the exact behavior of real hardware, so let's go right back to Seihou and also integrate the necessary SysEx patches into the game's MIDI player behind a toggle. This would also be a great occasion to quickly incorporate some long overdue code maintenance and build system improvements, and a migration to C++ modules in particular. When I started the Shuusou Gyoku Linux port a year ago, the combination of modules and <windows.h> threw lots of weird errors and even crashed the Visual Studio compiler. But nowadays, Microsoft even uses modules in the Office code base. This must mean that these issues are fixed by now, right?
Well, there's still a bug that causes the modularized C++ standard library to be basically unusable in combination with the static analyzer, and somehow, I was the first one to report it. So it's 3½ years after C++20 was finalized, and somehow, modules are still a bleeding-edge feature and a second-class citizen in even the compiler that supports them the best. I want fast compile times already! 😕
Thankfully, Microsoft agrees that this is a bug, and will work on it at some point. While we're waiting, let's return to the original plan of decompiling the endings of the one PC-98 Touhou game that still needed them decompiled.
After the textless slideshows of TH01, TH02 was the first Touhou game to feature lore text in its endings. Given that this game stores its 📝 in-game dialog text in fixed-size plaintext files, you wouldn't expect anything more fancy for the endings either, so it's not surprising to see that the END?.TXT files use the same concept, with 44 visible bytes per line followed by two bytes of padding for the CR/LF newline sequence. Each of these lines is typed to the screen in full, with all whitespace and a fixed time for each 2-byte chunk.
As a result, everything surrounding the text is just as hardcoded as TH01's endings were, which once again opens up the possibility of freely integrating all sorts of creative animations without the overhead of an interpreter. Sadly, TH02 only makes use of this freedom in a mere two cases: the picture scrolling effect from Reimu's head to Marisa's head in the Bad Endings, and a single hardware palette change in the Good Endings.
Hardcoding also still made sense for this game because of how the ending text is structured. The Good and Bad Endings for the individual shot types respectively share 55% and 77% of their text, and both only diverge after the first 27 lines. In straight-line procedural code, this translates to one branch for each shot type at a single point, neatly matching the high-level structure of these endings.
But that's the end of the positive or neutral aspects I can find in these scripts. The worst part, by far, is ZUN's approach to displaying the text in alternating colors, and how it impacts the entire structure of the code.
The simplest solution would have involved a hardcoded array with the color of each line, just like how the in-game dialogs store the face IDs for each text box. But for whatever reason, ZUN did not apply this piece of wisdom to the endings and instead hardcoded these color changes by… mutating a global variable before calling the text typing function for every individual line. This approach ruins any possibility of compressing the script code into loops. While ZUN did use loops, all of them are very short because they can only last until the next color change. In the end, the code contains 90 explicitly spelled-out calls to the 5-parameter line typing function that only vary in the pointer to each line and in the slower speed used for the one or two final lines of each ending. As usual, I've deduplicated the code in the ReC98 repository down to a sensible level, but here's the full inlined and macro-expanded horror:
It's highly likely that this is what ZUN hacked into his PC-98 and was staring at back in 1997.
All this redundancy bloats the two script functions for the 6 endings to a whopping 3,344 bytes inside TH02's MAINE.EXE. In particular, the single function that covers the three Good Endings ends up with a total of 631 x86 ASM instructions, making it the single largest function in TH02 and the 7th longest function in all of PC-98 Touhou. If the 📝 single-executable build for TH02's debloated and anniversary branches ends up needing a few more KB to reduce its size below the original MAIN.EXE, there are lots of opportunities to compress it all.
The ending text can also be fast-forwarded by holding any key. As we've come to expect for this sort of ZUN code, the text typing function runs its own rendering loop with VSync delays and input detection, which means that we 📝 once📝 again have to talk about the infamous quirk of the PC-98 keyboard controller in relation to held keys. We've still got 54 not yet decompiled calls to input detection functions left in this codebase, are you excited yet?!
Holding any key speeds up the text of all ending lines before the last one by displaying two kana/kanji instead of one per rendered frame and reducing the delay between the rendered frames to 1/3 of its regular length. In pseudocode:
for(i = 0; i < number_of_2_byte_chunks_on_displayed_line; i++) {
input = convert_current_pc98_bios_input_state_to_game_specific_bitflags();
add_chunk_to_internal_text_buffer(i);
blit_internal_text_buffer_from_the_beginning();
if(input == INPUT_NONE) {
// Basic case, no key pressed
frame_delay(frames_per_chunk);
} else if((i % 2) == 1) {
// Key pressed, chunk number is odd.
frame_delay(frames_per_chunk / 3);
} else {
// Key pressed, chunk number is even.
// No delay; next iteration adds to the same frame.
}
}
This is exactly the kind of code you would write if you wanted to deliberately maximize the impact of this hardware quirk. If the game happens to read the current input state right after a key up scancode for the last previously held and game-relevant key, it will then wrongly take the branch that uninterruptibly waits for the regular, non-divided amount of VSync interrupts. In my tests, this broke the rhythm of the fast-forwarded text about once per line. Note how this branch can also be taken on an even chunk: Rendering glyphs straight from font ROM to VRAM is not exactly cheap, and if each iteration (needlessly) blits one more full-width glyph than the last one, the probability of a key up scancode arriving in the middle of a frame only increases.
The fact that TH02 allows any of the supported input keys to be held points to another detail of this quirk I haven't mentioned so far. If you press multiple keys at once, the PC-98's keyboard controller only sends the periodic key up scancodes as long as you are holding the last key you pressed. Because the controller only remembers this last key, pressing and releasing any other key would get rid of these scancodes for all keys you are still holding.
As usual, this ZUN bug only occurs on real hardware and with DOSBox-X's correct emulation of the PC-98 keyboard controller.
After the ending, we get to witness the most seamless transition between ending and Staff Roll in any Touhou game as the BGM immediately changes to the Staff Roll theme, and the ending picture is shifted into the same place where the Staff Roll pictures will appear. Except that the code misses the exact position by four pixels, and cuts off another four pixels at the right edge of the picture:
Also, note the green 1-pixel line at the right edge of this specific picture. This is a bug in the .PI file where the picture is indeed shifted one pixel to the left.
What follows is a comparatively large amount of unused content for a single scene. It starts right at the end of this underappreciated 11-frame animation loaded from ENDFT.BFT:
Wastefully using the 4bpp BFNT format. The single frame at the end of the animation is unused; while it might look identical to the ZUN glyphs later on in the Staff Roll, that's only because both are independently rendered boldfaced versions of the same font ROM glyphs. Then again, it does prove that ZUN created this animation on a PC-98 model made by NEC, as the Epson clones used a font ROM with a distinctly different look.
TH02's Staff Roll is also unique for the pre-made screenshots of all 5 stages that get shown together with a fancy rotating rectangle animation while the Staff Roll progresses in sync with the BGM. The first interesting detail shows up immediately after the first image, where the code jumps over one of the 320×200 quarters in ED06.PI, leaving the screenshot of the Stage 2 midboss unused.
All of the cutscenes in PC-98 Touhou store their pictures as 320×200 quarters within a single 640×400 .PI file. Anywhere else, all four quarters are supposed to be displayed with the same palette specified in the .PI header, but TH02's Staff Roll screenshots are also unique in how all quarters beyond the top-left one require palettes loaded from external .RGB files to look right. Consequently, the game doesn't clearly specify the intended palette of this unused screenshot, and leaves two possibilities:
The unused second 320×200 quarter of TH02's ED06.PI, displayed in the Stage 2 color palette used in-game.
The unused second 320×200 quarter of TH02's ED06.PI, displayed in the palette specified in the .PI header. These are the colors you'd see when looking at the file in a .PI viewer, when converting it into another format with the usual tools, or in sprite rips that don't take TH02's hardcoded palette changes into account. These colors are only intended for the Stage 1 screenshot in the top-left quarter of the file.
The unused second 320×200 quarter of TH02's ED06.PI, displayed in the palette from ED06B.RGB, which the game uses for the following screenshot of the Meira fight. As it's from the same stage, it almost matches the in-game colors seen in 1️⃣, and only differs in the white color (#FFF) being slightly red-tinted (#FCC).
It might seem obvious that the Stage 2 palette in 1️⃣ is the correct one, but ZUN indeed uses ED06B.RGB with the red-tinted white color for the following screenshot of the Meira fight. Not only does this palette not match Meira's in-game appearance, but it also discolors the rectangle animation and the surrounding Staff Roll text:
Also, that tearing on frame #1 is not a recording artifact, but the expected result of yet another VSync-related landmine. 💣 This time, it's caused by the combination of 1) the entire sequence from the ending to the verdict screen being single-buffered, and 2) this animation always running immediately after an expensive operation (640×400 .PI image loading and blitting to VRAM, 320×200 VRAM inter-page copy, or hardware palette loading from a packed file), without waiting for the VSync interrupt. This makes it highly likely for the first frame of this animation to start rendering at a point where the (real or emulated) electron beam has already traveled over a significant portion of the screen.
But when I went into Stage 2 to compare these colors to the in-game palette, I found something even more curious. ZUN obviously made this screenshot with the Reimu-C shot type, but one of the shot sprites looks slightly different from how it does in-game. These screenshots must have been made earlier in development when the sprite didn't yet feature the second ring at the top. The same applies to the Stage 4 screenshot later on:
Finally, the rotating rectangle animation delivers one more minor rendering bug. Each of the 20 frames removes the largest and outermost rectangle from VRAM by redrawing it in the same black color of the background before drawing the remaining rectangles on top. The corners of these rectangles are placed on a shrinking circle that starts with a radius of 256 pixels and is centered at (192, 200), which results in a maximum possible X coordinate of 448 for the rightmost corner of the rectangle. However, the Staff Roll text starts at an X coordinate of 416, causing the first two full-width glyphs to still fall within the area of the circle. Each line of text is also only rendered once before the animation. So if any of the rectangles then happens to be placed at an angle that causes its edges to overlap the text, its removal will cut small holes of black pixels into the glyphs:
The green dotted circle corresponds to the newest/smallest rectangle. Note how ZUN only happened to avoid the holes for the two final animations by choosing an initial angle and angular velocity that causes the resulting rectangles to just barely avoid touching the TEST PLAYER glyphs.
At least the following verdict screen manages to have no bugs aside from the slightly imperfect centering of its table values, and only comes with a small amount of additional bloat. Let's get right to the mapping from skill points to the 12 title strings from END3.TXT, because one of them is not like the others:
Skill
Title
≥100
神を超えた巫女!!
90 - 99
もはや神の領域!!
80 - 99
A級シューター!!
78 - 79
うきうきゲーマー!
77
バニラはーもにー!
70 - 76
うきうきゲーマー!
60 - 69
どきどきゲーマー!
50 - 59
要練習ゲーマー
40 - 49
非ゲーマー級
30 - 39
ちょっとだめ
20 - 29
非人間級
10 - 19
人間でない何か
≤9
死んでいいよ、いやいやまじで
Looks like I'm the first one to document the required skill points as well? Everyoneelse just copy-pastes END3.TXT without providing context.
So how would you get exactly 77 and achieve vanilla harmony? Here's the formula:
* Ranges from 0 (Easy) to 3 (Lunatic). † Across all 5 stages.
With Easy Mode capping out at 85, this is possible on every difficulty, although it requires increasingly perfect play the lower you go. Reaching 77 on purpose, however, pretty much demands a careful route through the entire game, as every collected and missed item will influence the item_skill in some way. This almost feels it's like the ultimate challenge that this game has to offer. Looking forward to the first Vanilla Harmony% run!
And with that, TH02's MAINE.EXE is both fully position-independent and ready for translation. There's a tiny bit of undecompiled bit of code left in the binary, but I'll leave that for rounding up a future TH02 decompilation push.
With one of the game's skill-based formulas decompiled, it's fitting to round out the second push with the other two. The in-game bonus tables at the end of a stage also have labels that we'd eventually like to translate, after all.
The bonus formula for the 4 regular stages is also the first place where we encounter TH02's rank value, as well as the only instance in PC-98 Touhou where the game actually displays a rank-derived value to the player. KirbyComment and Colin Douglas Howell accurately documented the rank mechanics over at Touhou Wiki two years ago, which helped quite a bit as rank would have been slightly out of scope for these two pushes. 📝 Similar to TH01, TH02's rank value only affects bullet speed, but the exact details of how rank is factored in will have to wait until RE progress arrives at this game's bullet system.
These bonuses are calculated by taking a sum of various gameplay metrics and multiplying it with the amount of point items collected during the stage. In the 4 regular stages, the sum consists of:
難易度
Difficulty level* × 2,000
ステージ
(Rank + 16) × 200
ボム
max((2,500 - (Bombs used* × 500)), 0)
ミス
max((3,000 - (Lives lost* × 1,000)), 0)
靈撃初期数
(4 - Starting bombs) × 800
靈夢初期数
(5 - Starting lives) × 1,000
* Within this stage, across all continues.
Yup, 封魔録.TXT does indeed document this correctly.
As rank can range from -6 to +4 on Easy and +16 on the other difficulties, this sum can range between:
Easy
Normal
Hard
Lunatic
Minimum
2,800
4,800
6,800
8,800
Maximum
16,700
21,100
23,100
25,100
The sum for the Extra Stage is not documented in 封魔録.TXT:
クリア
10,000
ミス回数
max((20,000 - (Lives lost × 4,000)), 0)
ボム回数
max((20,000 - (Bombs used × 4,000)), 0)
クリアタイム
⌊max((20,000 - Boss fight frames*), 0) ÷ 10⌋ × 10
* Amount of frames spent fighting Evil Eye Σ, counted from the end of the pre-boss dialog until the start of the defeat animation.
And that's two pushes packed full of the most bloated and copy-pasted code that's unique to TH02! So bloated, in fact, that TH02 RE as a whole jumped by almost 7%, which in turn finally pushed overall RE% over the 60% mark. 🎉 It's been a while since we hit a similar milestone; 50% overall RE happened almost 2 years ago during 📝 P0204, a month before I completed the TH01 decompilation.
Next up: Continuing to wait for Microsoft to fix the static analyzer bug until May at the latest, and working towards the newly popular dreams of TH03 netplay by looking at some of its foundational gameplay code.
> "OK, TH03/TH04/TH05 cutscenes done, let's quickly finish the Touhou Patch Center MediaWiki upgrade. Just some scripting and verification left, it will be done so quickly that I don't even have to mention it on this blog"
> Still not done after 3 weeks
> Blocked by one final critical bug that really should be fixed upstream
> Code reviewers are probably on vacation
And so, the year unfortunately ended with yet another slow month. During the
MediaWiki upgrade, I was slowly decompiling the TH05 Sara fight on the side,
but stumbled over one interesting but high-maintenance detail there that
would really enhance her blog post. TH02 would need a lot of attention for
the basic rendering calls as well…
…so let's end the year with Shuusou Gyoku instead, looking at its most
critical issue in particular. As if that were the easy option here…
The game does not run properly on modern Windows systems due to its usage of
the ancient DirectDraw APIs, with issues ranging from unbearable slowdown to
glitched colors to the game not even starting at all. Thankfully, Shuusou
Gyoku is not the only ancient Windows game affected by these issues, and
people have developed a variety of generic DirectDraw wrappers and patches
for playing such games on modern systems. Out of all these, DDrawCompat is one of the
simpler solutions for Shuusou Gyoku in particular: Just drop its
ddraw proxy DLL into the game directory, and the game will run
as it's supposed to.
So let's just bundle that DLL with all my future Shuusou Gyoku releases
then? That would have been the quick and dirty option, coming with
several drawbacks:
Linux users might be annoyed by the potential need to configure a native
DLL override for ddraw.dll. It's not too much of an issue as we
could simply rename the DLL and replace the import with the new name.
However, doing that reproducibly would already involve changes to either the
DDrawCompat or Shuusou Gyoku build process.
Win32 API hooking is another potential point of failure in general,
requiring continual maintenance for new Windows versions. This is not even a
hypothetical concern: DDrawCompat does rely on particularly volatile Win32
API details, to the point that the recent Windows 11 22H2 update completely
broke it, causing a hang at startup that required a workaround.
But sure, it's still just a single third-party component. Keeping it up to
date doesn't sound too bad by itself…
…if DDrawCompat weren't evolving way beyond what we need to keep Shuusou
Gyoku running. Being a typical DirectDraw wrapper, it has always aimed to
solve all sorts of issues in old DirectDraw games. However, the latest
version, 0.4.0, has gone above and beyond in this regard, adding lots of
configuration options with default settings that actually
break Shuusou Gyoku.
To get a glimpse of how this is likely to play out, we only have to look at
the more mature DxWnd
project. In its expert mode, DxWnd features three rows of tabs, each packed
with checkboxes that toggle individual hacks, and most of these are
related to something that Shuusou Gyoku could be affected by. Imagine
checking a precise permutation of a three-digit number of checkboxes just to
keep an old game running at full speed on modern systems…
Finally, aesthetic and bloat considerations. If
📝 C++ fstreams were already too embarrassing
with the ~100 KB of bloat they add to the binary, a 565 KiB DLL is
even worse. And that's the old version 0.3.2 – version 0.4.0 comes in
at 2.43 MiB.
Fortunately, I had the budget to dig a bit deeper and figure out what
exactly DDrawCompat does to make Shuusou Gyoku work properly. Turns
out that among all the hooks and patches, the game only needs the most
central one: Enforcing a 32-bit display mode regardless of whatever lower
bit depth the game requests natively, combined with converting the game's
pixel buffer to 32-bit on the fly.
So does this mean that adding 32-bit to the game's list of supported bit
depths is everything we have to do?
Interestingly, Shuusou Gyoku already saved the DirectDraw enumeration flag
that indicates support for 32-bit display modes. The official version just
did nothing with it.
Well, almost everything. Initially, this surprised me as well: With
all the if statements checking for precise bit depths, you
would think that supporting one more bit depth would be way harder in this
code base. As it turned out though, these conditional branches are not
really about 8-bit or 16-bit color for the most part, but instead
differentiate between two very distinct rendering approaches:
"8-bit" is a pure 2D mode with palettized colors,
while "16-bit" is a hybrid 2D/3D mode that uses Direct3D 2 on top of DirectDraw, with
3-channel RGB colors.
Consequently, most of these branches deal with differences between these two
approaches that couldn't be nicely abstracted away in pbg's renderer
interface: Specific palette changes that are exclusive to "8-bit" mode, or
certain entities and effects whose Direct3D draw calls in "16-bit" mode
require tailor-made approximations for the "8-bit" mode. Since our new
32-bit mode is equivalent to the 16-bit mode in all of these branches, I
only needed to replace the raw number comparisons with more meaningful
method calls.
That only left a very small number of 2D raster effects that directly write
to or read from DirectDraw surface memory, and therefore do need to know the
bit size of each pixel. Thanks to std::variant and
std::visit(), adding 32-bit support becomes trivial here: By
rewriting the code in a generic manner that derives all offsets from the
template type, you only have to say hey,
I'd like to have 32-bit as well, and C++ will automatically
instantiate correct 32-bit variants of all bit depth-dependent code
snippets.
There are only three features in the entire game that access pixel buffers
this way: a color key retrieval function, the lens ball animation on the
logo screen, and… the ending staff roll? Sure, the text sprites fade in and
out, but so does the picture next to it, using Direct3D alpha blending or
palette color ramping depending on the current rendering mode. Instead, the
only reason why these sprites directly access their pixel buffer is… an
unused and pretty wild spiral effect. 😮 It's still part of the code, and
only doesn't show up because the
parameters that control its timing were commented out before release:
They probably considered it too wild for the mood of this
ending.
The main ending text was the only remaining issue of mojibake present in my
previous Shuusou Gyoku builds, and is now fixed as well. Windows can
render Shift-JIS text via GDI even outside Japanese locale, but only when
explicitly selecting a font that supports the SHIFTJIS_CHARSET,
and the game simply didn't select any font for rendering this text.
Thus, GDI fell back onto its default font, which obviously is only
guaranteed to support the SHIFTJIS_CHARSET if your system
locale is set to Japanese. This is why the font in the original game might
lookdifferent between systems.
For my build, I chose the font that would appear on a clean Windows
installation – a basic 400-weighted MS Gothic at font size 16, which is
already used all throughout the game.
Alright, 32-bit mode complete, let's set it as the default if possible… and
break compatibility to the original 秋霜CFG.DAT format in the
process? When validating this file, the original game only allows the
originally supported 8-bit or 16-bit modes. Setting the
BitDepth field to any other value causes the entire file
to be reset to its defaults, re-locking the Extra Stage in the process.
Introducing a backward-compatible version
system for 秋霜CFG.DAT was beyond the scope of this push.
Changing the validation to a per-field approach was a good small first step
to take though. The new build no longer validates the BitDepth
field against a fixed list, but against the actually supported bit depths on
your system, picking a different supported one if necessary. With the
original approach, this would have caused your entire configuration to fail
the validation check. Instead, you can now safely update to the new build
without losing your option settings, or your previously unlocked access to
the Extra Stage.
Side note: The validation limit for starting bombs is off by one, and the
one for starting lives check is off by two. By modifying
秋霜CFG.DAT, you could theoretically get new games to start with
7 lives and 3 bombs… if you then calculate a correct checksum for your
hacked config file, that is. 🧑💻
Interestingly, DirectDraw doesn't even indicate support for 8-bit or 16-bit
color on systems that are affected by the initially mentioned issues.
Therefore, these issues are not the fault of DirectDraw, but of
Shuusou Gyoku, as the original release requested a bit depth that it has
even verified to be unsupported. Unfortunately, Windows sides with
Sim City Shuusou Gyoku here: If you previously experimented with the
Windows app compatibility settings, you might have ended up with the
DWM8And16BitMitigation flag assigned to the full file path of
your Shuusou Gyoku executable in either
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\Layers, or
As the term mitigation suggests, these modes are (poorly) emulated,
which is exactly what causes the issues with this game in the first place.
Sure, this might be the lesser evil from the point of view of an operating
system: If you don't have the budget for a full-blown DDrawCompat-style
DirectDraw wrapper, you might consider it better for users to have the game
run poorly than have it fail at startup due to incorrect API usage.
Controlling this with a flag that sticks around for future runs of a binary
is definitely suboptimal though, especially given how hard it
is to programmatically remove this flag within the binary itself. It
only adds additional complexity to the ideal clean upgrade path.
So, make sure to check your registry and manually remove these flags for the
time being. Without them, the new Config → Graphic menu will
correctly prevent you from selecting anything else but 32-bit on modern
Windows.
After all that, there was just enough time left in this push to implement
basic locale independence, as requested by the Seihou development
Discord group, without looking into automatic fixes for previous mojibake
filenames yet. Combining std::filesystem::path with the native
Win32 API should be straightforward and bloat-free, especially with all the
abstractions I've been building, right?
Well, turns out that std::filesystem::path does not
actually meet my expectations. At least as long as it's not
constexpr-enabled, because you still get the unfortunate
conversion from narrow to wide encoding at runtime, even for globals with
static storage duration. That brings us back to writing our path abstraction
in terms of the regular std::string and
std::wstring containers, which at least allow us to enforce the
respective encoding at compile time. Even std::string_view only
adds to the complexity here, as its strings are never inherently
null-terminated, which is required by both the POSIX and Win32 APIs. Not to
mention dynamic filenames: C++20's std::format() would be the
obvious idiomatic choice here, but using it almost doubles the size
of the compiled binary… 🤮
In the end, the most bloat-free way of implementing C++ file I/O in 2023 is
still the same as it was 30 years ago: Call system APIs, roll a custom
abstraction that conditionally uses the L prefix, and pass
around raw pointers. And if you need a dynamic filename, just write the
dynamic characters into arrays at fixed positions. Just as PC-98 Touhou used
to do…
Oh, and the game's window also uses a Unicode title bar now.
And that's it for this push! Make sure to rename your configuration
(秋霜CFG.DAT), score (秋霜SC.DAT), and replay
(秋霜りぷ*.DAT) filenames if you were previously running the
game on a non-Japanese locale, and then grab the new build:
Next up: Starting the new year with all my plans hopefully working out for
once. TH05 Sara very soon, ZMBV code review afterward, low-hanging fruit of
the TH01 Anniversary Edition after that, and then kicking off TH02 with a
bunch of low-level blitting code.