Blog

Showing all posts tagged

📝 Posted:
💰 Funded by:
32th System, [Anonymous], iruleatgames, Blue Bolt
🏷️ Tags:

Surprise! The last missing main menu in PC-98 Touhou was, in fact, not that hard. Finishing the rest of TH03's OP.EXE took slightly shorter than the expected 2 pushes, which left enough room to uncover an unexpected mystery and take important leaps in position independence…

  1. TH03's main/option menu
  2. Picking opponents for Story Mode
  3. Demo Play difficulty?
  4. More variables from TH02's Stage 3
  5. Technical position independence for TH03's MAINL.EXE

For TH03, ZUN stepped up the visual quality of the main menu items by exchanging TH02's monospaced font with fixed, pre-composited strings of proportional text. While TH04 would later place its menu text in VRAM, TH03 still wanted to stay with TH02's approach of using gaiji to display the menu items on the PC-98 text layer. Since gaiji have a fixed size of 16×16 pixels, this requires the pre-composited bitmaps to be cut into blocks of that size and padded with blank pixels as necessary:

The proportional text section from TH03's MIKOFT.BFT, with the 16×16 gaiji grid overlaid

If your combined amount of text is short enough to fit into the PC-98's 256 gaiji slots, this is a nice way of using hardware features to replace the need for a proportional text renderer. It especially simplifies transitions between menus – simply wiping the entire TRAM is both cheap and certainly less error-prone than (un)blitting pixels in VRAM, which 📝 ZUN was always kind of sloppy at.
However, all this text still needs to be composited and cut into gaiji somewhere. If you do that manually, it's easy to lose sight of how the text is supposed to appear on screen, especially if you decide to horizontally center it. Then, you're in for some awkward coordinate fiddling as you try to place these 16-pixel bricks into the 8-pixel text grid to somehow make it all appear centered:

TH03's main menu box as it appears in the original gameTH03's main menu box with opaque gaiji to highlight their exact locations in TRAMTH03's main menu box with correct centering
TH03's Option menu box as it appears in the original gameTH03's Option menu box with opaque gaiji to highlight their exact locations in TRAMTH03's Option menu box with correct centering
The VS Start menu actually is correctly centered.

Then again, did ZUN actually want to center the Option menu like this? :thonk: Even the main menu looks kind of uncanny with perfect centering, probably because I'm so used to the original. Imperfect centering usually counts as a bug, but this case is quirky enough to leave it as is. We might want to perfectly center any future translations, but that would definitely cost a bit as I'd then actually need to write that proportional text renderer.
Apart from that, we're left with only a very short list of actual bugs and landmines:

While the rest of the code is not free of the usual nitpicks, those don't matter in the grand scheme of things. The code for the sliding 東方時空 animation is even better: it makes decent use of the EGC and page flipping, and places the 📝 loading calls for the character selection portraits at sensible points where the animation naturally wants to have a delay anyway. We're definitely ending the main menus of PC-98 Touhou on a high note here.


You might have already spotted some unfamiliar text in the gaiji above, and indeed, we've got three pieces of unused text in these two menus! Starting from the top, the Watch label is entirely unused as none of its gaiji IDs are referenced anywhere in the final code. The label's placement within the gaiji IDs would imply that this option was once part of the main menu, but nothing in the game suggests that the main menu ever had a bigger box that could fit a 7th element. On the contrary, every piece of menu code assumes that the box sprites loaded from OPWIN.BFT are exactly 128 pixels high:

TH03's OPWIN.BFT
Fun fact: The code doesn't even use the 16 pixels in the middle, and instead just assumes that the pixels between the X coordinates of [8; 16[ and [32; 40[ are identical.

The unused MIDI music option has already been widely documented elsewhere. Changing the first byte in YUME.CFG to 02 has no functional effect because ZUN removed most MIDI-related code before release. He did forget a few instances though, and the surviving dedicated switch case in the Option menu is now the entire reason why you can reveal this text without modifying the binary. Changing the option will always flip its value back to either off or FM(86).
Last but not least, we have the Type label and its associated numbers. These are the most interesting ones in my eyes; nobody talks about them, even though we have definite proof that they were used for the KeyConfig options at some earlier point in development:

But how exactly can we prove this? The code does string together the respective gaiji IDs and defines the resulting arrays right next to the final KeyConfig types, but doesn't use these arrays anywhere. By itself, this only means that these labels were associated with some option that may have existed at one point in development. The proof must therefore come from outside the code – and in this case, it comes from both 夢時空.TXT and 時空_1.TXT, which still refer to the KeyConfig options as numbered types:

 ■6.操作方法 […]    FM音源のジョイスティックが無い場合は、TYPE1にしてください。    ○TYPE1 Key vs Key […]    ○TYPE2 Joy vs Key […]    ○TYPE3 Key vs Joy

That's all I've got about the menus, so let's talk characters and gameplay! When playing Story Mode, OP.EXE picks the opponents for all stages immediately after the 📝 Select screen has faded out. Each character fights a fixed and hardcoded opponent in Stage 7's Decisive Match:

PlayerStage 7 opponent
Reimu ReimuMima Mima
Mima MimaReimu Reimu
Marisa MarisaReimu Reimu
Ellen EllenMarisa Marisa
Kotohime KotohimeReimu Reimu
Kana KanaEllen Ellen
Rikako RikakoKana Kana
Chiyuri ChiyuriKotohime Kotohime
Yumemi YumemiRikako Rikako

The opponents for the first 6 stages, however, are indeed completely random, and picked by master.lib's reimplementation of the Borland RNG. The game only needs to ensure that no character is picked twice, which it does like this:

const int stage_7_opponent = HARDCODED_STAGE_7_OPPONENT_FOR[playchar];
bool opponent_seen[7] = { false };

for(int stage = 0; stage < 6; stage++) {
	int candidate;
	do {
		// Pick a random character between Reimu and Rikako
		candidate = (irand() % 7);
	} while(opponent_seen[candidate] || (stage_7_opponent == candidate));
	opponent_seen[candidate] = true;
	story_opponent[stage] = candidate;
}
Characters are numbered from 0 (Reimu Reimu) to 8 (Yumemi Yumemi), following the order in the Stage 7 table above.

Yup. For every stage, ZUN re-rolls until the RNG returns a character who hasn't yet been seen in a previous stage – even in Stage 6 where there's only one possible character left. :zunpet: Since each successive stage makes it harder for the inner loop to find a valid answer, you start to wonder if there is some unlucky combination of seed and player character that causes the game to just hang forever.
So I tested all possible 232 seed values for all 9 player characters and… nope, Borland's RNG is good enough to eventually always return the only possible answer. The inner loop for Stage 6 does occasionally run for a disproportionate number of iterations, with the worst case being 134 re-rolls when playing Rikako Rikako's Story Mode with a seed value of 0x099BDA86. But even that is many orders of magnitude away from manifesting as any kind of noticeable delay. And on average, it just takes 17.15 iterations to determine all 6 random opponents.


The attract demos are another intriguing aspect that I initially didn't even have on my radar for the main menu. touhou-memories raises an interesting question: The demos start at Gauge and Boss Attack level 9, which would imply Lunatic difficulty, but the enemy formations don't match what you'd normally get on Lunatic. So, which difficulty were they recorded on?
Our already RE'd code clears up the first part of that question. TH03's demos are not recordings, but simply regular VS rounds in CPU vs. CPU mode that automatically quit back to the title screen after 7,000 frames. They can only possibly appear pre-recorded because the game cycles through a mere four hardcoded character pairings with fixed RNG seeds:

Demo #P1P2Seed
1Mima MimaReimu Reimu600
2Marisa MarisaRikako Rikako1000
3Ellen EllenKana Kana3200
4Kotohime KotohimeMarisa Marisa500
Certainly an odd choice if your game already had the feature to let arbitrary CPU-controlled characters fight each other. That would have even naturally worked for the trial version, which doesn't contain demos at all.

Then again, even a "random" character selection would have appeared deterministic to an outside observer. As usual for PC-98 Touhou, the RNG seed is initialized to 0 at startup and then simply increments after every frame you spend on the title screen and inside the top-level main, Option, and character selection menus – and yes, it does stay constant inside the VS Start menu. But since these demos always start after waiting exactly 520 frames on the title screen without pressing any key to enter the main menu, there's no actual source of randomness anywhere. ZUN could have classically initialized the RNG with the current system time, which is what we used to do back in the day before operating systems had easily accessible APIs for true randomness, but he chose not to, for whatever reason.

The difficulty question, however, is not so easy to answer. The demo startup code in the main menu doesn't override the configured difficulty, and neither does any other of the binaries depending on the demo ID. This seems to suggest that the demos simply run at the difficulty you last configured in the Option menu, just like regular VS matches. But then, you'd expect them to run differently depending on that difficulty, which they demonstrably don't. They always start on Gauge and Boss Attack level 9, and their last frame before the exit animation is always identical, right down to the score, reinforcing the pre-recorded impression:

Screenshot of the last frame (#7,000) of TH03's first demo (Mima vs. Reimu).Screenshot of the last frame (#7,000) of TH03's second demo (Marisa vs. Rikako).Screenshot of the last frame (#7,000) of TH03's third demo (Ellen vs. Kana).Screenshot of the last frame (#7,000) of TH03's fourth demo (Kotohime vs. Marisa).
Note that it takes much longer than the expected 2:04 minutes for the game to reach this end state. Each WARNING!! You are forced to evade / Your life is in peril popup freezes gameplay for 26 frames which don't count toward the demo frame counter. That's why these popups will provide such a great 📝 resynchronization opportunity for netplay. It's almost as if Versus Touhou was designed from the start with rollback netcode in mind! :godzun:

With quite a bit of time left over in the second push, it made sense to look at a bit of code around the Gauge and Boss Attack levels to hopefully get a better idea of what's going on there. The Gauge Attack levels are very straightforward – they can range from 1 to 16 inclusive, which matches the range that the game can depict with its gaiji, and all parts of the game agree about how they're interpreted:

The 16 proportional digit gaiji from TH03's GAMEFT.BFT
Stored in GAMEFT.BFT.

The same can't be said about the Boss Attack level though, as the gauge and the WARNING!! popup interpret the same internal variable as two different levels?

Darkened screenshot of a TH03 Boss Attack fired off near the beginning of a match at Lunatic difficulty, highlighting the discrepancy between the Boss Attack level shown in the gauge at the bottom of each playfield (10) and the one shown in the WARNING!! popup (9)

This apparent inconsistency raises quite a few questions. After all, these gaiji have to be addressed by adding an offset from 0 to 15 to the ID of the 1 gaiji, but the levels are supposed to range from 1 to 16. Does this mean that one of these two displays has an off-by-one error? You can't fire a Level 0 Boss Attack because the level always increments before every attack, but would 0 still be a technically valid Boss Attack level?
Decompiling the static HUD code debunks at least the first question as ZUN resolves the apparent off-by-one error by explicitly capping the displayed level to 16. And indeed, if a round lasts until the maximum Boss Attack level, the two numbers end up matching:

Darkened screenshot of a TH03 Boss Attack fired off near the end of a match, highlighting how both the gauge and the WARNING!! popup agree on the level once it reaches 16

This suggests that the popup indicates the level of the incoming attack while the gauge indicates the level of the next one to be fired by any player. That said, this theory not only needs tons of comments to explain it within the code, but also contradicts 夢時空.TXT, which explicitly describes the level next to the gauge as the 現在のBOSSアタックのレベル. Still, it remains our best bet until we've decompiled a few of the Boss Attacks and saw how they actually use this single variable.


So, what does this tell us about the demo difficulty? Now that we can search the code for these variables, we quickly come across the dedicated demo-specific branch that initializes these levels to the observable fixed values, along with two other variables I haven't researched so far. This confirms that demos run at a custom difficulty, as the two other variables receive slightly different values in regular gameplay.

However, it's still a good idea to check the code for any other potential effects of the difficulty setting. Maybe they're just hard to spot in demos? Doesn't difficulty typically affect a whole lot of other things in Touhou game code? Well, not in TH03 – MAIN.EXE only ever looks at the configured difficulty in three places, and all of them are part of the code that initializes a new round.
This reveals the true nature of difficulty in TH03: It's exclusively specified in terms of these five variables, and the Easy/Normal/Hard/Lunatic/"Demo" settings can be thought of as simply being presets for them. Story Mode adds 📝 the AI's number of safety frames to the list of variables and factors the current stage number into their values, but the concept stays the same. In this regard, TH03's design is unusually clean, making it perhaps the only Touhou game with not even a single "if difficulty is this, then do that" branch in script code. It's certainly the only PC-98 Touhou game with this property.

But it gets even better if we consider what this means for netplay. We now know that the configured difficulty is part of the match-defining parameters that must be synced between both players, just like the selected characters and the RNG seed. But why stop there? How about letting players not just choose between the presets, but allowing them to customize each of the five variables independently? Boom, we've just skyrocketed the replay value of netplay. 🚀 It's discoveries like these that justify my decision to start the road toward netplay by decompiling all of OP.EXE: In-engine menus are the cleanest and most friendly way of allowing players to configure all these variables, and now they're also the easiest and most natural choice from a technical point of view.


But wait, there's still some time left in that second push! The remaining fraction of the OP.EXE reverse-engineering contribution had repeating decimals, so let's do some quick TH02 PI work to remove the matching instance of repeating decimals from the backlog. This was very much a continuation of 📝 last year's light PI work; while the regular TH02 decompilation progress has focused and will continue to focus on the big features, it still left plenty of low-hanging PI fruit in boss code.
The first animation frame of TH02's Stage 3 midboss, taken from STAGE2.BMT Back then, we left with the positions of the Five Magic Stones, where ZUN's choice of storing them in arrays was almost revolutionary compared to what we saw in TH01. The same now applies to the state flags and total damage amount of not just the boss of Stage 3, but also the two independently damageable entities of the stage's midboss. In total, all of the newly identified arrays made up 3.36% of all memory references in TH02, and we're not even done with Stage 3. The first animation frame of TH02's Stage 3 midboss, taken from STAGE2.BMT


Actually, you know what, let's round out that second push with even more low-hanging PI fruit and ensure 📝 technical position independence for TH03's MAINL.EXE. This was very helpful considering that I'm going to build netplay into the anniversary branch, whose debloated foundation 📝 aims to merge every game into as few executables as possible. Due to TH03's overall lower level of bloat and the dedicated SPRITE16-based rendering code in MAIN.EXE, it might not make as much sense to merge all three of TH03's .EXE binaries as it did for TH01, and MAIN.EXE's lack of position independence currently prevents this anyway. However, merging just OP.EXE and MAINL.EXE makes tremendous sense not just for TH03, but for the other three games as well. These binaries have a much smaller ratio of ZUN code to library code, and use the same file formats and subsystems.
But that's not even the best part! Once we've factored out all the invisible inconsistencies between the games, we get to share all of this code across all of the four games. Hence, technical position independence for TH03's MAINL.EXE also was the final obstacle in the way of a single consistent and ultimately portable version of all of this code. 🙌

So, how do we go from here to 📝 the short-term half-PC-98/half-modern netplay option that Ember2528 is now funding? Most of the netcode will be unrelated to TH03 in particular, but we'd obviously still want to reverse-engineer more of MAIN.EXE to ensure a high-quality integration. So how about alternating the upcoming deliveries between pure RE work and any new or modded code? Next up, therefore, I'll go for the latter and debloat OP.EXE so that I can later add the netplay features without pulling my hair out. At that point, it also makes sense to take the first steps into portability; I've got some initial ideas I'm excited to implement, and Congrio's tiny bit of funding just begs to be removed from the backlog. :tannedcirno:
(And I'm definitely going to defuse all the tearing landmines because my goodness are they infuriating when slowing down the game or working with screen recordings.)

📝 Posted:
💰 Funded by:
Blue Bolt, [Anonymous], Splashman
🏷️ Tags:

TH05's OP.EXE? It's not one of the 📝 main blockers for multilingual translation support, but fine, let's push it to 100% RE. This didn't go all too quickly after all, though – sure, we were only missing the High Score viewer, but that's technically a menu. By now, we all know the level of code quality we can reasonably expect from ZUN's menu code, especially if we simultaneously look at how it's implemented in TH04 as well. But how much could I possibly say about even a static screen?

  1. Generating the initial High Score lists
  2. TH04/TH05's High Score viewer
  3. TH05-exclusive palette bugs when switching between the main menu and the High Score viewer
  4. Please don't fund ZUN Soft logo finalization just yet 🙇

Then again, with half of the funding for this push not being constrained to RE, OP.EXE wasn't the worst choice. In both TH04 and TH05, the High Score viewer's code is preceded by all the functions needed to handle the GENSOU.SCR scorefile format, which I already RE'd 📝 in late 2019. Back then, it turned out to be one of the most needlessly inconsistent pieces of code in all of PC-98 Touhou, with a slightly different implementation in each of the 6 binaries that was waiting for its equally messy decompilation ever since.
Most of these inconsistencies just add bloat, but TH05's different stage number defaults for the Extra Stage do have the tiniest visible impact on the game. Since 2019 was before we had our current system of classifying weird code, let's take a quick look at this again:

Screenshot of TH05's High Score viewer, showing the default GENSOU.SCR data for the Extra Stage as generated by OP.EXE Screenshot of TH05's High Score viewer, showing the default GENSOU.SCR data for the Extra Stage as generated by MAINE.EXE with the same decreasing stage numbers as for the regular stages
In the end, this is a landmine, albeit a slightly unusual one. OP.EXE always needs to load GENSOU.SCR to determine whether the Extra Stage is unlocked and can be selected in the main menu. If that file is corrupted or doesn't exist yet, OP.EXE will always recreate it. Therefore, MAINE.EXE's recreation code would only ever run if GENSOU.SCR got deleted or corrupted while playing the game. This can only happen through code that runs outside the game or as the result of failing hardware, and thus goes beyond our criteria for observability.

On to the actual High Score screen then! The OP.EXE code I decompiled here only covers the viewer, the actual score registration is part of MAINE.EXE and is a completely different beast that only shares a few code snippets at best. This means that I'll have to do this all over again at some point down the line, which will result in another few pushes that look very similar to this one. 🥲
By now, it's no surprise that even this static screen has more or less the same density of bugs, landmines, and bloat as ZUN's more dynamic and animated menus. This time however, the worst source of bloat lies more on the meta level: TH04's version explicitly spells out every single loading and rendering call for both of that game's playable characters, rather than covering them with loops like TH05 does for its four characters. As a result, the two games only share 3¼ out of the 7 functions in even this simple viewer screen. It definitely didn't have to be this way.

On the bright side, the code starts off with a feature that probably only scoreplayers and their followers have been consciously aware of: The High Score screens can display 9-digit scores without glitches, unlike the in-game HUD's infamous overflow that turns the 8th digit into a letter once the score exceeds 100 million points.
To understand why this is such a surprise, we have to look at how scores are tracked in-game where the glitch does happen. This brings us back to the binary-coded decimal format that the final three PC-98 Touhou games use for their scores, which we didn't have to deal with 📝 for almost three years. On paper, the fixed-size array of 8 digits used by the three games would leave no room for a 9th one, so why don't we get a counterstop at 99,999,999 points, similar to what happens in modern Touhou? Let's look at the concrete example of adding, say, 200,000 points to a score of 99,899,990 points, and step through the algorithm for the most significant four digits:

score BCD delta
09 09 08 09 09 09 09 00 + 00 00 02 00 00 00 00 00
= 09 09 08 09 09 09 09 00 + 00 00 02 00 00 00 00 00
= 09 0A 00 09 09 09 09 00 + 00 00 02 00 00 00 00 00
= 0A 00 00 09 09 09 09 00 + 00 00 02 00 00 00 00 00
= 0A 00 00 09 09 09 09 00
It sure is neat how ZUN arranged the gaiji font in such a way that the HUD's rendering is an exact visual representation of the bytes in memory… at least for scores between 100,000,000 (A0000000) and 159,999,999 (F9999999) inclusive.
Formatted as big-endian for easier reading. Here's the relevant undecompilable ASM code, featuring the venerable AAA instruction.

In other words: The carry of each addition is regularly added to the next digit as if it were binary, and then the next iteration has to adjust that value as necessary and pass along any carry to the digit after that. But once we've reached the most significant digit, there is no way for its carry to go. So it just stays there, leaving the last digit with a value greater than 9 and effectively turning it from a BCD digit into a regular old 8-bit binary value. This leaves us with a maximum representable score of 2,559,999,999 points (FF 09 09 09 09 09 09 09) – and with the scores achieved by current TAS runs being far below that limit in both games, it's definitely not worth it to bother about rendering that 10th score digit anywhere.
In the High Score screens, ZUN also zero-padded each score to 8 digits, but only blitted the 9th digit into the padding between name and score if it's nonzero. From this code detail alone, we can tell that ZUN was fully aware of ≥100 million points being possible, but probably considered such high scores unlikely enough to not bother rearranging the in-game HUD to support 9 digits. After all, it only looks like there's plenty of unused space next to the HUD, but in reality, it's tightly surrounded by important VRAM regions on both sides: The 32 pixels to the left provide the much-needed sprite garbage area to support 📝 visually clipped sprites despite master.lib's lack of sprite clipping, and the 64 pixels to the right are home to the 📝 tile source area:

Constructed screenshot of TH04's in-game layout during the Elly boss fight, showing off how a score of 99,999,999 points would be reduced to the 8-digit 🎝9999999 string, with the PC-98's text layer hidden to reveal the sprite garbage areas and tile source areas that tightly surround the HUD Constructed screenshot of TH04's in-game layout during the Elly boss fight, showing off how a score of 99,999,999 points would be reduced to the 8-digit 🎝9999999 string; with the PC-98's text layer visible, it might seem as if it was no big deal to expand the HUD into render 9-digit scores
It sure wouldn't have been impossible. You could either sacrifice the two tiles that would cover the 9th digit in both the HiScore and Score row, or – even better – move these tiles under the existing padding space within the HUD. 📝 The tile sections of TH04 and TH05 already address their images using raw VRAM addresses, so this wouldn't have even required an additional tile index→VRAM address lookup table.

And sure enough, ZUN confirms this awareness in TH04's OMAKE.TXT:

9999万でもカンストしませんが、ゲーム中の得点表示、1千万の位が英字 表記となります。つまり、100000000点はA0000000点と表示 されます。(ネームレジスト画面は普通に表示されます) でも、1億点以上出せる人いるのかな。

(And indeed, the first documented legitimate run that crossed 100 million only happened 11 years after the game's release, with Reimu A on Normal difficulty.)

However, the highest score that the High Score screens of both games can display without visual glitches is not 999,999,999, as you would expect from 9 digits, but rather…

Screenshot of TH04's High Score viewer showing a score of 959,999,999 points for both Reimu and Marisa, which is the highest score that this screen can display without glitches Screenshot of TH05's High Score viewer showing a score of 959,999,999 points for all characters, which is the highest score that this screen can display without glitches
959 million?
(Also, this 9th digit nicely highlights a slight asymmetry in TH04's screen, where Marisa gets 4 fewer pixels of padding between names and scores.)

What a weird limit. Regardless of whether GENSOU.SCR saves its scores in a sane unsigned 32-bit format or a silly 8-digit BCD one, this limit makes no sense in either representation. In fact, GENSOU.SCR goes even further than BCD values, and instead uses… the ID of the corresponding gaiji in the 📝 bold font? :zunpet:
How cute. No matter how you look at it, storing digits with an added offset of 160 makes no sense:

It does start to explain the 959 million limit, though. Since each digit in GENSOU.SCR takes up 1 byte as well, they are indeed limited to a maximum value of (255 - 160) = 95 before they wrap back to 0.
But wait. If the game simply subtracts 160 from the gaiji index to get the digit value, shouldn't this subtraction also wrap back around from 0 to 255 and recover higher values without issue? The answer is, 📝 again, C's integer promotion: Splitting the binary value into two digits involves a division by 10, the C standard mandates that a regular untyped 10 is always of type int, the uint8_t digit operand gets promoted to match, and the result is actually negative and thus doesn't even get recognized as a 9th digit because no negative value is ≥10.

So what would happen if we were to enter a score that exceeds this limit? The registration screen in MAINE.EXE doesn't display the 9th digit and the 8th one wraps around. But it still sorts the score correctly, so at least the internal processing seems to work without any problem…

Screenshot of registering a score of 999,999,999 points in TH04, showing off how such a score would wrap around to 39,999,999 points, despite being sorted correctly. Screenshot of registering a score of 999,999,999 points in TH05, showing off how such a score would wrap around to 39,999,999 points, despite being sorted correctly.
(160 + 99) = 259, which wraps around to 3, so this makes perfect sense. We'll figure out the exact logic behind the differently colored sprite once RE progress reaches this screen.

But once you try viewing this score, you're instead greeted with VRAM corruption resulting from master.lib's super_put() function not bounds-checking the negative sprite IDs passed by the viewer:

Screenshot showing how trying to view a score above 959 million points in TH04's High Score viewer leads to VRAM corruption due to master.lib trying to blit a sprite with a negative ID Screenshot showing how trying to view a score above 959 million points in TH05's High Score viewer leads to VRAM corruption due to master.lib trying to blit a sprite with a negative ID

In a rare case for PC-98 Touhou, the High Score viewer also hides two interesting details regarding its BGM. Just like for the graphics, ZUN also coded a fade-in call for the music. In abbreviated ASM code:

mov ax, 0000h ; PMD AH=00H (start music playback)
int 60h
mov ax, 0280h ; PMD AH=02H (fade in/out)
int 60h
PMD API documentation is here.

However, the AH=02H fade-in call has no effect because AH=00h resets the music volume and would need to be followed by a volume-lowering AH=19h call. But even if there was such a call, the fade-in would sound terrible. 80h corresponds to the fastest possible fade-in speed of -128, which is almost but not quite instant. As such, the fade-in would leave the initial note on each channel muted while the rest of the track fades in very abruptly, which clashes badly with the bass and chord notes you'd expect to hear in the name registration themes of the two games:

At least the first issue could have been avoided if PMD's AH=00h call took optional parameters that describe the initial playback state instead of relying on these mutating calls later on. After all, it might be entirely possible for a bunch of interrupts to fire between AH=00h and these further calls, and if those interrupts take a while, the FM chip might have already played a few samples at PMD's default volume. Sure, Real Mode doesn't stop you from wrapping this sequence in CLI and STI instructions to work around this issue, but why rely on even more CPU state mutation when there would have been plenty of free x86 registers for passing more initial state to AH=00h?

The second detail is the complete opposite: It's a fade-out when leaving the menu, it uses PMD's slowest fade speed, and it does work and sound good. However, the speed is so slow that you typically barely notice the feature before the main menu theme starts playing again. But ZUN hid a small easter egg in the code: After the title screen background faded back in, the games wait for all inputs to be released before moving back into the main menu and playing the title screen theme. By holding any key when leaving the High Score viewer, you can therefore listen to the fade-out for as long as you want.
Although when I said that it works, this does not include TH04. 📝 As 📝 usual, this game's menus do not address the PC-98's keyboard scancode quirk with regard to held keys, causing the loop to break even while the player is still holding a key. There are 21 not yet RE'd input polling calls in TH02 and TH04 that will most certainly reveal similar inconsistencies, are you excited yet? :tannedcirno:
But in TH05, holding a key indeed reveals the hidden-content of a 37-second fade-out:

I'm holding Esc here, but this works with any key, even the ⬅️ left and ➡️ right arrow keys that don't quit out of the menu.

As you can already tell by the markers, the final bugs in TH05's (and only TH05's) OP.EXE are palette-related and revealed by switching between these two screens:

  1. Why does the title screen initially use an ever so slightly darker palette than it does when returning from the menu?
  2. What's with the sudden palette change between frames 1 and 2? Why are the colors suddenly much brighter?

1) is easily traced and attributed to an off-by-one error in the animation's palette fade code, but 2) is slightly more complex. This palette glitch only happens if the High Score viewer is the first palette-changing submenu you enter after the 📝 title animation. Just like 📝 TH03's character portraits, both TH04 and TH05 load the sprites for the High Score screen's digits (SCNUM.BFT) and rank indicator (HI_M.BFT) as soon as the title animation has finished. Since these are regular BFNT sprite sheets, ZUN loads them using master.lib's super_entry_bfnt(), and that's where the issue hides: master.lib's blocking palette fade functions operate on master.lib's main 8-bit palette, and super_entry_bfnt() overwrites this palette with the one in the BFNT header. Synchronizing the hardware palette with this newly loaded one would have immediately revealed this possibly unintended state mutation, but I get why master.lib might not have wanted to do that – after all, 📝 palette uploads aren't exactly cheap and would be very noticeable when loading multiple sprite sheets in a row.
In any case, this is no problem in TH04 as that game's HI_M.BFT and OP1.PI have identical palettes. But in TH05, HI_M.BFT has a significantly brighter palette:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
OP1.PI
HI01.PI / HI_M.BFT

And that's 100% RE for TH05's OP.EXE! 🎉 TH04's counterpart is not far behind either now, and only misses its title screen animation to reach the same mark.
As for 100% finalization, there's still the not yet decompiled TH04/TH05 version of the ZUN Soft logo that separates both OP.EXE binaries from this goal. But as I've mentioned 📝 time and time again, the most fitting moment for decompiling that animation would be right before reaching 100% on the entirety of either game. Really – as long as we aren't there, your funding is better invested into literally anything else. The ZUN Soft logo does not interact with or block work on any other part of the game, and any potential modding should be easy enough on the ASM level.
But thankfully, nobody actually scrolls down to the Finalized section. So I can rest assured that no one will take that moment away from me! :onricdennat:

Next up: I'd kinda like to stay with PC-98 Touhou for a little longer, but the current backlog is pulling into too many different directions and doesn't convincingly point toward one goal over any other. TH02 is close, but with an active subscription, it makes more sense to accumulate 3 pushes of funding and then go for that game's bullet system in January. This is why I'm OK with subscriptions exceeding the cap every once in a while, because they do allow me to plan ahead in the long term.
So, let's wait a few days for all of you to capture the open towards something more specific. But if the backlog stays as indecisive as it is now, I'll instead go for finishing the Shuusou Gyoku Linux port, hopefully in time for the holiday season.
As for prices, indeed seems to be the point where my supply meets the community's demand for this project and the store no longer sells out immediately. So for the time being, we're going to stay at that push price and I won't increase it any further upon hitting the cap.

📝 Posted:
💰 Funded by:
[Anonymous], 32th System
🏷️ Tags:

Remember when ReC98 was about researching the PC-98 Touhou games? After over half a year, we're finally back with some actual RE and decompilation work. The 📝 build system improvement break was definitely worth it though, the new system is a pure joy to use and injected some newfound excitement into day-to-day development.
And what game would be better suited for this occasion than TH03, which currently has the highest number of individual backers interested in it. Funding the full decompilation of TH03's OP.EXE is the clearest signal you can send me that 📝 you want your future TH03 netplay to be as seamlessly integrated and user-friendly as possible. We're just two menu screens away from reaching that goal anyway, and the character selection screen fits nicely into a single push.

  1. TH03's character selection screen
  2. Improved blog navigability

The code of a menu typically starts with loading all its graphics, and TH03's character selection already stands out in that regard due to the sheer amount of image data it involves. Each of the game's 9 selectable characters comes with

  1. a 192×192-pixel portrait (??SL.CD2),
  2. a 32×44-pixel pictogram describing her Extra Attack (in SLEX.CD2), and
  3. a 128×16-pixel image of her name (in CHNAME.BFT). While this image just consists of regular boldfaced versions of font ROM glyphs that the game could just render procedurally, pre-rendering these names and keeping them around in memory does make sense for performance reasons, as we're soon going to see. What doesn't make sense, though, is the fact that this is a 16-color BFNT image instead of a monochrome one, wasting both memory and rendering time.

Luckily, ZUN was sane enough to draw each character's stats programmatically. If you've ever looked through this game's data, you might have wondered where the game stores the sprite for an individual stat star. There's SLWIN.CDG, but that file just contains a full stat window with five stars in all three rows. And sure enough, ZUN renders each character's stats not by blitting sprites, but by painting (5 - value) yellow rectangles over the existing stars in that image. :tannedcirno:

TH03's SLWIN.CDG, showing off how ZUN baked all 15 possible stat stars into the image
The only stat-related image you will find as part of the game files. The number of stat stars per character is hardcoded and not based on any other internal constant we know about.
Together with the EXTRA🎔 window and the question mark portrait for Story Mode, all of this sums up to 255,216 bytes of image data across 14 files. You could remove the unnecessary alpha plane from SLEX.CD2 (-1,584 bytes) or store CHNAME.BFT in a 1-bit format (-6,912 bytes), but using 3.3% less memory barely makes a difference in the grand scheme of things.
From the code, we can assume that loading such an amount of data all at once would have led to a noticeable pause on the game's target PC-98 models. The obvious alternative would be to just start out with the initially visible images and lazy-load the data for other characters as the cursors move through the menu, but the resulting mini-latencies would have been bound to cause minor frame drops as well. Instead, ZUN opted for a rather creative solution: By segmenting the loading process into four parts and moving three of these parts ahead into the main menu, we instead get four smaller latencies in places where they don't stick out as much, if at all:

  1. The loading process starts at the logo animation, with Ellen's, Kotohime's, and Kana's portraits getting loaded after the 東方時空 letters finished sliding in. Why ZUN chose to start with characters #3, #4, and #5 is anyone's guess. :zunpet:
  2. Reimu's, Mima's, and Marisa's portraits as well as all 9 EXTRA🎔 attack pictograms are loaded at the end of the flash animation once the full title image is shown on screen and before the game is waiting for the player to press a key.
  3. The stat and EXTRA🎔 windows are loaded at the end of the main menu's slide-in animation… together with the question mark portrait for Story Mode, even though the player might not actually want to play Story Mode.
  4. Finally, the game loads Rikako's, Chiyuri's, and Yumemi's portraits after it cleared VRAM upon entering the Select screen, regardless of whether the latter two are even unlocked.

I don't like how ZUN implemented this split by using three separately named standalone functions with their own copy-pasted character loop, and the load calls for specific files could have also been arranged in a more optimal order. But otherwise, this has all the ingredients of good-code. As usual, though, ZUN then definitively ruins it all by counteracting the intended latency hiding with… deliberately added latency frames:

Sure, maybe loading the fourth part's 69,120 bytes from a highly fragmented hard drive might have even taken longer than 30 frames on a period-correct PC-98, but the point still stands that these delays don't solve the problem they are supposed to solve.


But the unquestionable main attraction of this menu is its fancy background animation. Mathematically, it consists of Lissajous curves with a twist: Instead of calculating each point as x = sin((fx·t)+ẟx) y = sin((fy·t)+ẟy) , TH03 effectively calculates its points as x = cos(fx·((t+ẟx) % 0xFF)) y = sin(fy·((t+ẟy) % 0xFF)) , due to t and being 📝 8-bit angles. Since the result of the addition remains 8-bit as well, it can and will regularly overflow before the frequency scaling factors fx and fy are applied, thus leading to sudden jumps between both ends of the 8-bit value range. The combination of this overflow and the gradual changes to fx and fy create all these interesting splits along the 360° of the curve:

At a high level, there really is just one big curve and one small curve, plus an array of trailing curves that approximate motion blur by subtracting from ẟx and ẟy.

In a rather unusual display of mathematical purity, ZUN fully re-calculates all variables and every point on every frame from just the single byte of state that indicates the current time within the animation's 128-frame cycle. However, that beauty is quickly tarnished by the sheer cost of fully recalculating these curves every frame:

This is decidedly more than the 1.17 million cycles we have between each VSync on the game's target 66 MHz CPUs. So it's not surprising that this effect is not rendered at 56.4 FPS, but instead drops the frame rate of the entire menu by targeting a hardcoded 1 frame per 3 VSync interrupts, or 18.8 FPS. Accordingly, I reduced the frame rate of the video above to represent the actual animation cycle as cleanly as possible.
Apparently, ZUN also tested the game on the 33 MHz PC-98 model that he targeted with TH01 and TH02, and realized that 4,096 points were way too much even at 18.8 FPS. So he also added a mechanism that decrements the number of trailing curves if the last frame took ≥5 VSync interrupts, down to a minimum of only a single extra curve. You can see this in action by underclocking the CPU in your Neko Project fork of choice.

But were any of these measures really necessary? Couldn't ZUN just have allocated a 12 KiB ring buffer to keep the coordinates of previous curves, thus reducing per-frame calculations to just 512 points? Well, he could have, but we now can't use such a buffer to optimize the original animation. The 8-bit main angle offset/animation cycle variable advances by 0x02 every frame, but some of the trailing curves subtract odd numbers from this variable and thus fall between two frames of the main curves.
So let's shelve the idea of high-level algorithmic optimizations. In this particular case though, even micro-optimizations can have massive benefits. The sheer number of points magnifies the performance impact of every suboptimal code generation decision within the inner point loop:

Multiplied by the number of points, even these low-hanging fruit already save a whopping ≥753,664 cycles per frame on an i486, without writing a single line of ASM! On Pentium CPUs such as the one in the PC-9821Xa7 that ZUN supposedly developed this game on, the savings are slightly smaller because far calls are much faster, but still come in at a hefty ≥491,520 cycles. Thus, this animation easily beats 📝 TH01's sprite blitting and unblitting code, which just barely hit the 6-digit mark of wasted cycles, and snatches the crown of being the single most unoptimized code in all of PC-98 Touhou.
The incredible irony here is that TH03 is the point where ZUN 📝 really 📝 started 📝 going 📝 overboard with useless ASM micro-optimizations, yet he didn't even begin to optimize the one thing that would have actually benefitted from it. Maybe he 📝 once again went for the 📽️ cinematic look 📽️ on purpose?

Unlike TH01's sprites though, all this wasted performance doesn't really matter much in the end. Sure, optimizing the animation would give us more trailing curves on slower PC-98 models, but any attempt to increase the frame rate by interpolating angles would send us straight into fanfiction territory. Due to the 0x02/2.8125° increment per cycle, tripling the frame rate of this animation would require a change to a very awkward (log2384) = 8.58-bit angle format, complete with a new 384-entry sine/cosine lookup table. And honestly, the effect does look quite impressive even at 18.8 FPS.


There are three more bugs and quirks in this animation that are unrelated to performance:

Now with the full 18 curves, a direction change of the smaller trailing curves at the end of the loop that only looks slightly odd, and a reversed and more natural plotting order.

If you want to play with the math in a more user-friendly and high-res way, here's a Desmos graph of the full animation, converted to 360° angles and with toggles for the discontinuity and trail count fixes.


Now that we fully understand how the curve animation works, there's one more issue left to investigate. Let's actually try holding the Z key to auto-select Reimu on the very first frame of the Story Mode Select screen:

The confirmation flash even happens before the menu's first page flip.

Stepping through the individual frames of the video above reveals quite a bit of tearing, particularly when VRAM is cleared in frame 1 and during the menu's first page flip in frame 49. This might remind you of 📝 the tearing issues in the Music Rooms – and indeed, this tearing is once again the expected result of ZUN landmines in the code, not an emulation bug. In fact, quite the contrary: Scanline-based rendering is a mark of quality in an emulator, as it always requires more coding effort and processing power than not doing it. Everyone's favorite two PC-98 emulators from 20 years ago might look nicer on a per-frame basis, but only because they effectively hide ZUN's frequent confusion around VRAM page flips.
To understand these tearing issues, we need to consider two more code details:

  1. If a frame took longer than 3 VSync interrupts to render, ZUN flips the VRAM pages immediately without waiting for the next VSync interrupt.
  2. The hardware palette fade-out is the last thing done at the end of the per-frame rendering loop, but before busy-waiting for the VSync interrupt.

The combination of 1) and the aforementioned 30-frame delay quirk explains Frame 49. There, the page flip happens within the second frame of the three-frame chunk while the electron beam is drawing row #156. DOSBox-X doesn't try to be cycle-accurate to specific CPUs, but 1 menu frame taking 1.39 real-time frames at 56.4 FPS is roughly in line with the cycle counting we did earlier.
Frame 97 is the much more intriguing one, though. While it's mildly amusing to see the palette actually go brighter for a single frame before it fades out, the interesting aspect here is that 2) practically guarantees its palette changes to happen mid-frame. And since the CRT's electron beam might be anywhere at that point… yup, that's how you'd get more than 16 colors out of the PC-98's 16-color graphics mode. 🎨
Let's exaggerate the brightness difference a bit in case the original difference doesn't come across too clearly on your display:

Frame 97 of the video above, with a brighter initial palette to highlight the mid-frame palette change
Probably not too much of a reason for demosceners to get excited; generic PC-98 code that doesn't try to target specific CPUs would still need a way of reliably timing such mid-frame palette changes. Bit 6 (0x40) of I/O port 0xA0 indicates HBlank, and the usual documentation suggests that you could just busy-wait for that bit to flip, but an HBlank interrupt would be much nicer.

This reproduces on both DOSBox-X and Neko Project 21/W, although the latter needs the Screen → Real palettes option enabled to actually emulate a CRT electron beam. Unfortunately, I couldn't confirm it on real hardware because my PC-9821Nw133's screen vinegar'd at the beginning of the year. But just as with the image loading times, TH03's remaining code sorts of indicate that mid-frame palette changes were noticeable on real hardware, by means of this little flag I RE'd way back in March 2019. Sure, palette_show() takes >2,850 cycles on a 486 to downconvert master.lib's 8-bit palette to the GDC's 4-bit format and send it over, and that might add up with more than one palette-changing effect per frame. But tearing is a way more likely explanation for deferring all palette updates until after VSync and to the next frame.

And that completes another menu, placing us a very likely 2 pushes away from completing TH03's OP.EXE! Not many of those left now…


To balance out this heavy research into a comparatively small amount of code, I slotted in 2024's Part 2 of my usual bi-annual website improvements. This time, they went toward future-proofing the blog and making it a lot more navigable. You've probably already noticed the changes, but here's the full changelog:

Speaking of microblogging platforms, I've now also followed a good chunk of the Touhou community to Bluesky! The algorithms there seem to treat my posts much more favorably than Twitter has been doing lately, despite me having less than 1/10 of mostly automatically migrated followers there. For now, I'm going to cross-post new stuff to both platforms, but I might eventually spend a push to migrate my entire tweet history over to a self-hosted PDS to own the primary source of this data.

Next up: Staying with main menus, but jumping forward to TH04 and TH05 and finalizing some code there. Should be a quick one.

📝 Posted:
💰 Funded by:
Ember2528, [Anonymous]
🏷️ Tags:

📝 Over two years since the previous largest delivery, we've now got a new record in every regard: 12 pushes across 5 repos, 215 commits, and a blog post with over 14,000 words and 48 pieces of media. 😱 Who would have thought that the superficially simple task of putting SC-88Pro recordings into Shuusou Gyoku would actually mainly focus on deep research into the underlying MIDI files? I don't typically cover much music-related content because it's a non-issue as far as PC-98 Touhou code is concerned, so it's quite fitting how extensive this one turned out. So here we go, the result of virtually unlimited funding and patience:

  1. The SC-88Pro recording controversy
  2. Undefined SysEx behavior
  3. Resolving the controversy, and making a choice (contains personal opinion)
  4. A Unix-style command-line MIDI filter (in Rust BTW)
  5. Visualizing MIDI files (for science, and not for playing them on a keyboard)
  6. Shuusou Gyoku's individual loop quirks 🎺
  7. Rewriting pbg's MIDI code
  8. Putting together the BGM packs
  9. Outgrowing miniaudio (and raging about single-file C libraries for a while)
  10. Remaining implementation details
  11. Pricing changes (and no, not everything's getting more expensive)

So where's the controversy? Romantique Tp obviously made the best and most careful real-hardware SC-88Pro recordings of all of ZUN's old MIDIs, including the original (OST) and arranged (AST) soundtrack of Shuusou Gyoku, right? Surely all I have to do now is to cut them into seamless loops to save a bit of disk space, and then put them into the game? Let's start at the end of the track list with the name registration theme, since it's light on instruments and has an obvious loop point that will be easy to spot in the waveform. But, um… wait a moment, that very first drum note comes a bit late, doesn't it?

This can also be heard in Romantique Tp's YouTube upload.
At a notated tempo of 96 BPM, these first four beats should take exactly 2.5 seconds, which they do in this seamlessly looping softsynth rendering.

That's… not quite the accuracy and perfection I was expecting. :thonk: But I think I know what we're seeing and hearing there. Let's look at the first few MIDI events across all channels:

Delta	Pulse	 Beat	Channel	Event
 +540	   960	  2:000	      1	Controller { CC   0, value   0 }
   +0	   960	  2:000	      1	Controller { CC  32, value   0 }
   +0	   960	  2:000	      1	ProgramChange {  37 }
[…]
   +0	   960	  2:000	      2	Controller { CC   0, value   0 }
   +0	   960	  2:000	      2	Controller { CC  32, value   0 }
   +0	   960	  2:000	      2	ProgramChange {  19 }
[…]
   +0	   960	  2:000	      3	Controller { CC   0, value   0 }
   +0	   960	  2:000	      3	Controller { CC  32, value   0 }
   +0	   960	  2:000	      3	ProgramChange {   6 }
[…]
   +0	   960	  2:000	      4	Controller { CC   0, value   0 }
   +0	   960	  2:000	      4	Controller { CC  32, value   0 }
   +0	   960	  2:000	      4	ProgramChange {   2 }
[…]
Delta	Pulse	 Beat	Channel	Event
   +0	  960	2:000	     10	Controller { CC   0, value   0 }
   +0	  960	2:000	     10	Controller { CC  32, value   0 }
   +0	  960	2:000	     10	ProgramChange {  25 }
   +0	  960	2:000	     10	Controller { CC   7, value 127 }
   +0	  960	2:000	     10	Controller { CC  11, value 127 }
   +0	  960	2:000	     10	Controller { CC  10, value  64 }
   +0	  960	2:000	     10	Controller { CC  91, value  80 }
   +0	  960	2:000	     10	Controller { CC  93, value  40 }
   +0	  960	2:000	     10	NoteOn { Key  42, Vel.  94 }
   +0	  960	2:000	     10	NoteOn { Key  36, Vel. 110 }
   +1	  961	2:001	     10	NoteOn { Key  42, Vel.   0 }
   +0	  961	2:001	     10	NoteOn { Key  36, Vel.   0 }
 +119	 1080	2:120	     10	NoteOn { Key  42, Vel.  34 }
   +1	 1081	2:121	     10	NoteOn { Key  42, Vel.   0 }
 +119	 1200	2:240	     10	NoteOn { Key  42, Vel.  64 }
   +0	 1200	2:240	     10	NoteOn { Key  36, Vel.  64 }
Also, the fact that GS doesn't put its drums on a non-general voice bank and instead relies on external channel configuration to differentiate drums from pitched instruments is making this Yamaha kid uncontrollably furious. 🤬

Yup. That's the sound of a vintage hardware synth being slow and taking a two-digit number of milliseconds to process a barrage of simultaneous Program Change messages, playing a MIDI file that doesn't take this reality into account and expects program changes to happen instantly.
I can only speak from my own experience of writing MIDIs for hardware synths here, but having the first note displaced by 50 ms is very much not the way a composer would have intended the music to be heard if the note is clearly notated to occur on the beat. If you had told me about such an issue when playing one of my MIDIs on a certain synth, I would have thanked you for the bug report! And I would have promptly released a fixed version of the MIDI with the Program Change events moved back by a beat or two. In the case of Shuusou Gyoku's MIDIs, this wouldn't even have added any additional delay in-game, as all of these files already start with at least one beat of leading silence to make room for setting Roland-specific synth parameters.

OK, but that's just a single isolated bass drum hit. If we wanted to, we could even fix this issue ourselves by splicing the same note from around the loop end point. Maybe this is just an isolated case and the rest of Romantique Tp's recordings are fine? Well…

Again, check Romantique Tp's YouTube upload for proof.
By the way, this seamless audio player is what consumed most of the two website pushes this time. The rest went to the slightly redesigned main page, whose progress bars now use the cap bar style and the GitHub badge colors.

This one is even worse. Here, the delay is so long relative to the tempo of the piece that the intended five drum hits pretty much turn into four.

This type of issue doesn't even have to be isolated to the very beginning of a piece. A few of the tracks in both the OST and AST start with an anacrusis on just one or two channels and leave the Program Change event barrage at the beginning of the first full measure. In 幻想科学 ~ Doll's Phantom for example, this creates a flam-like glitch where the bass on channel 2 is pretty much on time, but the crash hit on channel 10 only follows 50 ms later, after the SC-88Pro took its sweet time to process all the Program Change events on the channels between:

This is from the arranged soundtrack for a change. In that one, ZUN at least fixed the issue in the final three MIDIs (シルクロードアリス, 魔女達の舞踏会, and 二色蓮花蝶 ~ Ancients) that closed out this rearranging project in May 2001, which spread out their per-channel setup events over at least a single measure before playing any note.

Let's listen to that at half speed:

Romantique Tp's YouTube upload.
Still on point.

Sure, all of this is barely noticeable in casual listening, but very noticeable if you're the one who now has to cut these recordings into seamless loops. And these are just the most obvious timing issues that can be easily pinpointed and documented – the actual worst aspects are all the minor tempo and timing fluctuations throughout most of the pieces. With recordings that deviate ever so slightly from the tempo defined in the MIDI files, you can no longer rely on mathematically exact sample positions when cutting loops. Even if those positions do work out from time to time, there'd pretty much always be a discontinuity in the waveform at both ends of the loop, manifesting as a clearly audible click. In the end, the only way of finding good loop points in existing recordings involves straining your ears and listening very, very closely to avoid any audible glitches. 😩

But if you've taken a look at the second tabs in the clips above, you will have noticed that we don't necessarily have to be stuck with recordings from real hardware. In late 2015, Roland released Sound Canvas VA, a VST plugin that emulates the classic core of Roland's old Sound Canvas lineup, including the SC-88Pro. As long as we run such a software synthesizer through a quality VST host, a purely software-based solution should be way superior for recording looped BGM:

Any drawbacks? For our use case, all of them are found in the abysmal software quality of everything around the synth engine. As it's typical for the VST industry, Sound Canvas VA is excessively DRM'd – it takes multiple seconds to start up, and even then only allows a single process to run at any given time, immediately quitting every process beyond the first one with a misleading Parameter File1 Read Error message box. I totally believe anyone who claims that this makes SCVA more annoying than real hardware when composing new music. Retro gamers also dislike how Roland themselves no longer sells the 32-bit builds they used to offer for the first few versions. These old versions are now exclusively available through resellers, or on the seven seas.
But as far as the SC-88Pro emulation is concerned, there don't seem to be any technical reasons against it. There is a long thread over at VOGONS discussing all sorts of issues, but you have to dig quite deep to find any clear descriptions of bugs in SCVA's synth engine. Everything I found either only applies to the SC-55 emulation and not the SC-88Pro, was fixed by Roland in the meantime, or turned out to be a fixable bug in a MIDI file.

Nevertheless, Romantique Tp has a very negative opinion about SCVA, getting quite angry and defensive in this instance where someone favorably compared SCVA to their recordings. Edit (2024-03-10): These days, Romantique Tp has a much more favorable opinion on SCVA as well.
8 years after their release, however, the community unanimously accepts the Romantique Tp recordings as the intended way to listen to ZUN's old MIDIs, so choosing Sound Canvas VA for our Shuusou Gyoku builds might be a bad idea purely for PR reasons. At best, people would slightly wonder why I intentionally went with the opposite of the accepted reference recordings, but at worst, this entire project could face a violent backlash…


But wait, we've already heard one obvious difference between the real SC-88Pro and Sound Canvas VA. Let's listen to the very first clip again:

Ha! You can clearly hear a panning echo in the real-hardware recording that is missing from the Sound Canvas VA rendering. That's an obvious case of a core system effect not being reproduced correctly. If even that's undeniably broken, who knows which other subtle bugs SCVA suffers from, right? Case closed, Romantique Tp was right all along, SCVA is trash, real hardware reigns supreme :godzun:

Actually, let's look closer into this one. Panning delay effects like this are typically reverb-related, but General MIDI only specifies a single controller to specify the per-channel reverb level from 0 to 127. Any specific characteristics of the reverb therefore have to be configured using vendor-specific system-exclusive messages, or SysEx for short.
So it's down to one of the four SysEx messages at the beginning of the MIDI file:

Delta	Pulse	 Beat	Event
   +0	    0	0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)
 +240	  240	0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)
 +120	  360	0:360	SysEx(41 10 42 12 40 01 33 0F 7D F7)
  +60	  420	0:420	SysEx(41 10 42 12 40 01 34 30 5B F7)

Since these byte strings represent Roland-specific instructions, we can't learn anything from a raw MIDI event dump alone here. No problem though, let's just load these files into some old MIDI sequencer that targeted Roland synths, open its MIDI event list, and then they will be automatically decoded into a human-readable representation…
…or at least that's what I expected. In Yamaha land, XGworks has done that for Yamaha's own XG SysEx messages ever since 1997:

Screenshot of the MIDI Event Viewer in Yamaha's XGworks, showing off its automatic XG SysEx decoding feature.
No configuration required. You can even edit the textual Value1 representation and XGworks parses it back into the closest supported value!

But for Roland synths, there's… nothing similar? Seriously? 😶 Roland fanboys, how do you even live?! I mean, they are quick to recommend the typical bloated and sluggish big-name DAWs that take up multiple gigabytes of disk space, but none of the ones I tried seemed to have this feature. They can't have possibly been flinging around raw byte strings for the past 33 years?!
But once you look more into today's MIDI community, it becomes clear that this is exactly what they've been doing. Why else would so many people use the word complicated to describe Roland SysEx, or call it an old school/cryptic communication protocol in hexadecimal format? The latter is particularly hilarious because if you removed the word cryptic, this might as well describe all of MIDI, not just SysEx. :tannedcirno: Everything about this is a tooling issue, and Yamaha showed how easily it could have been solved. Instead, we get Sound Canvas experts, who should know more about the ecosystem than I do, making the incredible mental leap from "my DAW doesn't decode or easily generate SysEx" to "SysEx is antiquated" to "please just lift up these settings to the VST level and into my proprietary DAW's proprietary project format, that would be so much better"

Thankfully that's not entirely true. After some more digging and configuration, I found a somewhat workable solution involving a comparatively modern sequencer called Domino:

  1. Download either Domino's original Japanese version or the partial English translation. The .zip file on the release page contains a full standalone build.
  2. Open the File → Preferences menu and associate your MIDI output device with a module map. This makes sense for SysEx encoding/generation since it can limit the options in the UI to what's actually available on your target hardware, but is also required for selecting the respective SysEx map into Domino's SysEx decoder. There is no technical reason for this because SC-88Pro SysEx messages can be uniquely identified by the three vendor, device, and model ID bytes that every message starts with, but would be too easy and user-friendly. The perception of SysEx being a black art must be upheld at all costs.
    Screenshot of Domino's MIDI-OUT window, complete with garbled text
    I've kept the garbled text of the partial translation to emphasize the sheer amount of jank involved in this entire process.
  3. Load a MIDI file and let Domino "analyze" it:
    Screenshot of Domino's analysis message box
  4. Strangely enough, this will take quite a while – on my system, this analysis step runs at a speed of roughly 4.25 KB/s of MIDI data. Yes, kilobytes.
  5. Unfortunately, "control change macro restoration" also seems to mean that you don't get to see any raw bytes when selecting the respective MIDI track in the UI, but at least we get what we were looking for:
    Screenshot of the four SysEx messages of タイトルドメイド, Shuusou Gyoku's name registration theme, as decoded by Domino
    …for the most part?
    Pulse	Event
        0	SysEx(41 10 42 12 40 00 7F 00 41 F7)
      240	SysEx(41 10 42 12 40 01 30 14 7B F7)
      360	SysEx(41 10 42 12 40 01 33 0F 7D F7)
      420	SysEx(41 10 42 12 40 01 34 30 5B F7)

Alright, that's something we can work with. The GS Reset message is something that every Roland GS MIDI should start with, but it's immediately followed by a message that Domino failed to decode? The two subsequent reverb parameters make sense, but panning delays typically have more parameters than just a reverb level and time.
That unknown SysEx message shares much of the same bytes with the decoded ones though. So let's do what we maybe should have done all along, return to caveman, and check the SC-88Pro manual:

The relevant section from page 194. We can see how the address and value correspond to bytes 5-7 and 8 in the SysEx messages. Byte 9 is a checksum and byte 10 signals the end of the message.

And that's where we find what this particular issue boils down to. The missing SysEx message is clearly intended to be a Reverb Macro command, whose value can range from 0 to 7 inclusive on the SC-88Pro, but ZUN tries to specify Reverb Macro #14h, or 20 in decimal. The SC-88Pro manual does not specify what happens if a SysEx message wants to write an invalid value to a valid address, which means that we've firmly entered the territory of undefined behavior.
Edit (2024-03-10): Romantique Tp confirmed that the real SC-88Pro clamps these Reverb Macro IDs to the supported range of 0-7. Therefore, the appropriate course of action for guaranteeing the same sound on other Roland synths would be to fix the MIDI file and specify Reverb Macro #7 instead. But since this behavior remains technically undefined, we can still argue about ZUN's intention behind specifying the Reverb Macro like this:

In fact, 32 out of the 39 MIDIs across both of Shuusou Gyoku's soundtrack use this invalid Reverb Macro. The only ones that don't are


And that's where this quest seemed to end, until Romantique Tp themselves came in and suggested that I take a closer look at the GS Advanced Editor, or GSAE for short.

The splash screen of GSAE version 4.01e.
Make sure to connect a MIDI input device before starting GSAE, or it will silently crash immediately after this splash screen. At least it accepts any controller, so this might just be a bug instead of the typical user-hostile kind of hardware dongle DRM that is pervasive in today's synth industry. 1999 would seem a bit too early for that, thankfully.

I was aware of this tool, but hadn't initially considered it because it's always described as just a SysEx generator/encoder. In fact, the very existence of such a tool made no sense to me at first, and seemed to prove my point that the usability of GS SysEx was wholly inferior to what I was used to in Yamaha land. Like, why not build at least a tiny and stripped-down MIDI sequencer around this functionality that would allow you to insert SC-88Pro-specific messages at any point within a sequence, and not just the beginning? I can see the need for such a tool in today's world of closed-source DAWs where hardware MIDI modules are niche and retro and are only kept alive by a small community of enthusiasts. But why would its developers guarantee that MIDI composers would have to hop between programs even back in 1997? I can only imagine that they saw how every just slightly advanced MIDI sequencer or DAW back then already used its own project format instead of raw Standard MIDI Files, and assumed that composers would therefore be program-hopping anyway?
However, GSAE does support the import of settings from a MIDI file and features a SysEx history window that decodes every newly processed Roland SysEx byte string, which is all I was looking for. So let's throw in that same MIDI and…

Screenshot of GSAE's SysEx history window,showing the results of sending a GS Reverb Macro #20 message
That's the result of sending just the single F0 41 10 42 12 40 01 30 14 7B F7 message at the top.

Now that's some wild numbers. An equally invalid Reverb Character, and Reverb Level and Time values that even exceed their defined range of 0-127? Could it be that GSAE emulates the real-hardware response to invalid Reverb Macros here, and gives us the exact reverb setting we can hear in Romantique Tp's recording? This could even be the reason why GSAE is still used and recommended within today's Roland MIDI sequencing scene, and hasn't been supplanted by some more modern open-source tool written by the community.

In any case, these values have to come from somewhere, so let's reverse-engineer GSAE and figure out the logic behind them. Shoutout to IDR for being a great help with its automatic generation of IDC debug symbols for the Delphi standard library, and even including a few names of application-level widget class methods by reading Delphi-specific type information from the binary. This little sub-project made me also come around to appreciating Ghidra, whose decompiler and data type manager helped a lot and allowed me to find the relevant code section within just a few hours.
A~nd it turns out that the values all come from out-of-bounds accesses into arrays on the stack. :onricdennat: If we combine 25, 235, and 132 back into a 32-bit value, we get 0x19EB84, which is the virtual address of the relevant function's stack frame base pointer.
But it gets even more hilarious: If you enable debug text output via Option → Other Options → SMF → Insert text events to setup measures and export these imported settings back into a MIDI file, GSAE not only retains these invalid Reverb Macro IDs, but stringifies them via a simple lookup into a hardcoded string pointer array, again without any bounds checks. The effects of this are roughly what you would expect:

In the end, we have Domino not decoding the Reverb Macro message, and GSAE, the premier SysEx tool for Roland synths, responding to it in even more undefined and clearly bugged ways than real hardware apparently does. That's two programs confirming that whatever ZUN intended was never supposed to work reliably. And while we still don't know exactly what these reverb parameters are supposed to be, these observations solve the mystery as far as I'm concerned, and solidify my personal opinion on the matter.


So what do we do now, and which version do we go with? Optimally, I'd offer both versions and turn this controversy into a personal choice so that everybody wins… and Ember2528 agreed and generously provided all the funding to make it happen. 💸
If you haven't picked your favorite yet, here are some final arguments:

The Romantique Tp recordings certainly have something going for them with their provenance of coming from real hardware, and the care that Romantique Tp put into manually recording every single track, warts and all. I wholeheartedly agree that preserving the raw sound of playing the MIDI files into the hardware without thinking about bugs or quirks is an important angle to take when it comes to preservation. It's good that these recordings exist – after all, you wouldn't know which musical elements you'd possibly be missing in an emulation if you have nothing to compare it to. Even the muffled sound in the half-speed clip above can be an argument in their favor, as the SC-88Pro's DAC operates at 32 kHz and you wouldn't expect any meaningful frequency content between 16,000 and 22,050 Hz to begin with. Any frequency content in that range that does remain in Romantique Tp's recording is simply 📝 rolled-off imaging noise added during the ADC's resampling process.
All this is why they are a definite improvement over kaorin's 2007 recordings of only the AST, which used to be the previous reference recordings within the community. Those had all of the same timing issues and more, in addition to being so excessively volume-boosted that 0.15% of the samples across the entire soundtrack ended up clipped. That's 6.25 seconds out of 68:39m being lost to pure digital noise.

Most importantly though: ZUN himself said that only the real SC-88Pro will play back these files as he intended them to sound. This quote is likely where the tagline of Romantique Tp's entire recording project came from in the first place:

> 全てのデエタはSC-88ProもしくはSC-8850(ロオランド社)にて最適に聴けるように調整してあります > それ以外の音源でも、作者の意図した音ではない場合があります。 — ZUN on 東方幻想的音楽, his old MIDI page

However. ZUN is not exactly known for accurately and carefully preserving the legacy of his series, or really doing anything beyond parading his old games as unobtainable showpieces at conventions. With all the issues we've seen, preferring real hardware is ultimately just that: an angle, and a preference. This is why I disagree with the heavy and uncritical advertising that is mainly responsible for elevating the Romantique Tp recordings to their current reference status within the community, especially if at least half of the alleged superiority of real hardware is founded on undefined behavior that can easily be fixed in the MIDI files themselves if people only bothered to look.

Here's where I stand: MIDI files are digital sheet music first and foremost, not an inferior version of tracker modules where the samples are sold separately. As such, the specific synth a MIDI file was written for is merely a secondary property of the composition – and even more so if the MIDI file contains little to nothing in terms of sound design and mostly restricts itself to the basic feature set of General MIDI. In turn, synth quirks and bugs are not a defined part of the composition either, unless they are clearly annotated and documented in the file itself. And most importantly: If the MIDI file specifies a certain timing and a recording fails to reproduce that timing, then that recording is not an accurate representation of the MIDI file.
In that regard, Sound Canvas VA is not only the closest alternative to the real thing, as a few people in the MIDI and retrogaming scene do have to admit, but superior to the real thing. I'll gladly take clarity and perfect timing accuracy in exchange for minor differences in effects, especially if the MIDI file does not explicitly and correctly define said effects to begin with. If I want a panning delay as part of the reverb, I add the respective and correct SysEx message to define one – and if I don't, I do not care about the reverb. You might still get a panning delay on a certain synth, and you might even prefer how it sounds, but it's ultimately a rendering artifact and not a consciously intended part of the composition. In that way, it's similar to the individual flavor a musician adds to a performance of a piece of classical music.
And as far as the differences in frequency response and resonant filters are concerned: In Yamaha land, these are exactly the main distinguishing factors between vintage WF-192XG sound cards (resembling the real SC-88Pro in these characteristics) and the S-YXG50 softsynth (resembling SCVA). Once I found out about that softsynth and how much clearer it sounded in comparison, I sold that old PCI sound card soon after.

In the interest of preservation though, there's still one more unexplored solution that could be the ideal middle ground between the two approaches:

  1. Play the MIDIs through a real-hardware SC-88Pro again
  2. Capture the actually observed system-exclusive settings that fall within the synth's supported and documented ranges
  3. Insert them back into the MIDI file, creating a new bugfixed version
  4. Re-record that bugfixed version through Sound Canvas VA

Edit (2024-03-10): And since Romantique Tp has confirmed what exactly happens on real hardware, I'm going to do exactly that. These bugfixed Sound Canvas VA renderings will be a free bonus of the single next Shuusou Gyoku push, and will add another angle to the preservation of these soundtracks. In the meantime though, the Sound Canvas VA packs will sound like they do in the preview videos above.

Or, you know… Maybe none of this actually matters. Here's beatMARIO streaming some Shuusou Gyoku gameplay using what looks like a real-hardware SC-8850, which plays these MIDIs with occasionally noticeably different instrument patches and no panning delay in the name registration theme, and he still enjoyed every second of it. Imagine undefined SysEx behavior not even being consistent within the same family of Roland synths… nah, I'm done arguing, let's get back to the actual work and cut some loops.


Just to be clear: I'm not suggesting that Romantique Tp should have been the one to cut their recordings into loops, or even just the one who defined where the loop points are supposed to be. On the surface, this seems to be a non-issue, and you'd just pick a point wherever each track appears to loop, right? But with 39 MIDIs to cut and all the financial support from Ember2528, it made sense to also solve this problem more thoroughly, and algorithmically detect provably correct loop points for all of these files. Who knows, maybe we even find some surprises that make it all worth it?
This is the algorithm I came up with:

Of course, this algorithm isn't perfect and won't work for every MIDI file out there. It doesn't consider things like differently ordered events within the same MIDI pulse, (non-)registered parameter numbers, or the effect that SysEx messages can have on the state of individual channels. The latter would require the general SysEx decoding logic that I would have liked to have for the research above… actually, let's add an issue and add the project to the order form. I'd really like to see a comprehensive open-source cross-vendor SysEx decoder library in my lifetime.

As for the implementation, I was happy to write some Rust again for a change, as it's a great fit for these standalone greenfield command-line tools that don't have to directly interact with the legacy C++ code bases that this project usually deals with. It's even better if the foundational functionality is not just available in a crate, but in four, with the community already having gone through multiple iterations to arrive at a tried and tested winner. Who knows, maybe I even get to rewrite this website in it one day? Just for the sheer meme value of doing so, of course.
I also enjoyed this a lot from a technical point of view:

This algorithm works well for the long MIDI files of Shuusou Gyoku's OST that all contain multiple duplicates of their loop section, but it quickly reaches its limit with the AST. Following the classic two-loop + fade-out format, that soundtrack was meant to be played back in generic MIDI players, and not to actually be put back into the game in looped form. Since the loop algorithm did, in fact, find inconsistencies even in the OST, two copies of the apparent loop are sometimes not enough to prove cases where the actual loop ends much later than you think it does. In a few cases, it would be enough to simply remove all volume change events from the fade-out to prove the actual loop, but in others, the algorithm would need MIDI event data far past the end of the fade-out.

However, just giving up and not looping any of these tracks would be equally unfortunate. So how about shifting the question, from what's the best loop in this MIDI file to what's the best loop if the MIDI didn't fade out and instead repeated its apparent second loop a third time? As long as the detected loop in such a pre-processed file ends before the repeated range, it's still a valid loop in terms of the unmodified original.
Ideally, we want to do this pre-processing programmatically with the same Rust library instead of manually editing the MIDI. Many sequencers (and especially XGworks) apply significant changes to a MIDI file's internal structure when saving its internal representation back to a MIDI file, which might even mess with our loop algorithm. So it would be very nice to have a more trustworthy tool that applies only the edit we actually want, and perfectly retains the rest of the MIDI.

And that's how this sub-project turned into a small suite of command-line MIDI operations in the classic Unix filter/pipeline style: Each command reads a MIDI file from stdin, transforms it, and outputs text or the resulting MIDI file on stdout. This way, we gain maximum transparency and reproducibility as I can document the unique pre-processing steps for each AST track by simply providing the command lines. And sure, we're re-encoding and re-decoding the full MIDI sequence at every step along such a pipeline, but computers are fast, Rust and the midly library in particular are ⚡ blazingly fast ⚡, and the usability benefits of this pipeline model far outweigh any theoretical performance drops.
Here's the full list of commands that made it into the resulting mly tool:

This feature set should strike a good balance between not spending too much of the Shuusou Gyoku budget on tangential problems, but still offering a decent solution for the problem at hand. As a counterexample, the obvious killer feature – deserializing a dump back into a Standard MIDI File – would have gone way past the budget. While there are crates that free you from the need to write manual parsing code for basic data structures, they would instead require a lot of attribute boilerplate – and if the library that provided the structures doesn't already come with these attributes, you now have to duplicate all the structures, and convert back and forth between the original structures and your copies. Not to mention that we'd still have to write code for the high-level structure of the dump output…

If we put it all together, this is what we can do:

$ <ssg_02.mid mly loop-find
Best loop in note space: 4 events (between event #[117, 121[ and [121, 125[)
First note: event    71 / pulse    960 / beat   2:000 / 0:00:800m
Loop start: event   117 / pulse   1680 / beat   3:240 / 0:01:400m
  Loop end: event   121 / pulse   1920 / beat   4:000 / 0:01:600m

$ <ssg_02.mid mly cut 466: | mly loop-unfold 240: | mly -r 44100 loop-find
Track #0: Removing events #[16439, 19881[
Track #0: Repeating events #[8344, 16439[ at the end of the sequence
Best loop in note space: 8095 events (between event #[5625, 13720[ and [13720, 21815[)
First note: event    71 / pulse    960 / beat   2:000 / 0:00:800m
Loop start: event  5625 / pulse  75361 / beat 157:001 / 1:03:531m
  Loop end: event 13720 / pulse 183841 / beat 383:001 / 2:34:726m

Best loop in recording space:  8095 events (between event #[5709, 13804[ and [13804, 21899[)
First note: event    71 / pulse    960 / beat   2:000 / 0:00:800m / sample    35280.00
Loop start: event  5709 / pulse  77280 / beat 161:000 / 1:05:163m / sample  2873667.66
  Loop end: event 13804 / pulse 185760 / beat 387:000 / 2:36:358m / sample  6895375.27

Translation:


So, where are these loop quirks that justify why some of these audio files are longer than you'd think they should be? Just listing them as text wouldn't really communicate just how minor these are. It would be much nicer to visualize them in a way that highlights the exact inconsistencies within a fixed range of MIDI measures. Screenshots of MIDI sequencer or DAW windows won't capture these aspects all too well because these programs are geared toward fine-grained editing of single tracks, not visualization of details across all channels.

Screenshot of the first 8 measures of Shuusou Gyoku's Stage 1 theme (フォルスストロベリー) in its OST version, as visualized by REAPER's piano roll
REAPER's piano roll nicely snaps to a certain range, but good luck picking out the individual lines from the single volume lane at the bottom of the screen, or spotting a 7-point difference. Not to mention that CC #11 (Expression) makes up an equal part of a channel's final perceived volume, which is the metric we'd actually want to visualize.

Typical MIDI visualizers, however, are on the complete opposite end of the spectrum. In recent years, MIDI visualization has become synonymous with the typical Synthesia style of YouTube videos with a big keyboard at the bottom, note bars flying in from the top, and optional fancy effects once those notes hit the top of the keyboard. The Black MIDI community has been churning out tons of identically looking MIDI visualizers in recent years that mainly seem to differ in the programming language they're written in, and in how well they can cope with the blackest of black MIDIs.
Thankfully, most of these visualizers are open-source and have small and manageable codebases. The project with the most GitHub stars and the most generic name seemed to be the best starting point for hacking in the missing features, despite using GLSL shaders which I had no prior experience with. It was long overdue that I did something with GLSL though – it added a nice educational aspect to these hacks, and it still was easier than deciphering whatever the fastest and hyper-optimized Rust visualizer is doing.
Still, this visualizer needed a total of 18 small features and bugfixes to be actually usable for demonstrating Shuusou Gyoku's loop quirks. As such, these hacks turned into yet another tangential sub-project that could have easily consumed another two pushes if I cleaned up the code and published the result. But that would have really gone way past the budget for something that people might not even care about. So here's what we're going to do:


Alright then! Here's how to read the visualizations:



Before we package up these looped soundtracks, let's take a quick look at how they would be shown off in the Music Room. The Seihou Music Rooms carry over the per-channel keyboards from TH05, add the current per-channel volume, expression, and pan pot values, and top it off with a fake spectrum analyzer. All of these visualizations rely on MIDI data, and the Music Room would feel very dull and boring without them. Just look at Kioh Gyoku, whose Music Room basically turns into a still image in WAVE mode.
Retaining these visualizations even when playing waveform BGM was very important for me, and not just because it would make for a unique high-quality feature that would break new ground. It can also double as proof that the waveform versions are, in fact, in perfect sync with both the MIDIs they are based on, and, by extension, the respective stage scripts.
However, this would require the game to process the MIDIs and update the internal visualization state without simultaneously playing them back through the WinMM / MME / midiOut*() API. And just like graphics and text rendering, Shuusou Gyoku's original code came with zero architectural separation between platform-independent processing logic and platform-specific playback…

So I accidentally rewrote almost the entire MIDI code to achieve said separation. :tannedcirno: This also provided a great occasion to modernize this code and add some much-needed robustness for potential MIDI mods, while retaining the original code's approach of iterating over raw SMF byte streams. It might all have been very excessive for a delivery that was supposed to be just about waveform BGM support, but on the plus side, MIDI output is now portable to any other system's MIDI API as well.

Surprisingly though, it was Shuusou Gyoku's original MIDI timing that quickly turned out to be rather inaccurate, and not the waveforms. The exact numbers vary depending on the piece, but the game played back every MIDI about 1% slower than notated, adding about 2 or 3 seconds to their total playback time after 5 minutes. Tempo changes in particular were the biggest causes of desynchronizations with the waveforms… :thonk:
To understand how this can happen to begin with, we have to look closer at how you're supposed to use the midiOut*() API. This API is as low-level as it gets, only covering the transmission of a single MIDI message to the selected output device right now. There is no concept of note timing at this low level, so it's completely up to the program to parse delta times and tempo change events out of the MIDI file and correctly time the calls to this API for each MIDI message. With all the code that runs between the API and the actual renderer of the synth for every single message, the resulting timing can only ever be an approximation of the MIDI file. This doesn't really matter for the timescales and polyphony levels of typical music because, again, computers are fast, but such an API is fundamentally unsuitable for accurately playing back even just a moderately complex million-note Black MIDI. :onricdennat:

Shuusou Gyoku handles this required manual timing in the simplest possible way: It runs a MIDI processing function (Mid_Proc() in the code) at an interval of 10 ms, which processes and instantly sends out all MIDI events that have occurred at any point within the last 10 ms, maintaining merely their order. This explains not only why the original game incremented its MIDI TIMER by multiples of 10, but also the infamous missing drums when playing the soundtrack through the Microsoft GS Wavetable Synth:

But while sending MIDI events in such quantized chunks might not be perfect, it can't be the cause behind multi-second playback slowdowns. Instead, this issue has to boil down to the way Shuusou Gyoku times each individual message, and specifically how it converts between MIDI pulse units and real-time (milli)seconds. pbg's original MIDI code chose to do this in an equally confusing and inaccurate way: it kept two counters that tracked the current MIDI pulse before and after the latest tempo change, used the value of the latter counter to decide which events to process, and only added the pulse equivalent of 10 ms to this counter at the end of Mid_Proc() in the then current tempo. The commit message for my rewritten algorithm details the problems with this approach using nice ASCII art in case you're interested, but in short, the main problem lies in how the single final addition can only consider a single tempo change within each call to Mid_Proc(). If a MIDI file contains tempo ramps with less than 10 ms between each different tempo, the original game would only use the last of these tempo values as the basis for converting the entire 10 ms back into MIDI pulses. Not to mention that maybe MIDI pulses aren't the best unit in a game that still 📝 treats the FPU as lava and doesn't use any fixed-point means of increasing the resolution of the 10 ms→pulse division either…

On the contrary, it's much more accurate to immediately convert every encountered MIDI delta time to a real-time quantity and use that unit for event timing, especially if we want to restrict ourselves to integer math. Signed 64-bit integers are enough to fit the product of the slowest possible MIDI tempo ((224 - 1) µs per quarter note) and the highest possible MIDI delta time (228 - 1) at nanosecond precision (103), with one bit to spare. Then, we arrive at a much simpler timing algorithm:

The additive nature of this timer not only naturally allows more than one event to happen within a single Mid_Proc() call, but also averages out any minor timing inconsistencies across the length of a track.

This new algorithm did improve the overall timing accuracy, but only barely, shaving off just ≈100 ms of the total duration. Turns out that the main source behind the slowness was hiding somewhere else entirely, in the single line that deserializes tempo values from MIDI's big-endian representation into the native integer format:

assert(length_of_tempo_message == 3);
uint32_t tempo = 0;
for(int i = 0; i < length_of_tempo_message; i++) {
-	tempo += ((tempo << 8) + (*track_data++));
+	tempo  = ((tempo << 8) + (*track_data++));
}

Yup – the original code performed two additions per byte, which incorrectly added the interim value at every byte to the final result, and yielded a tempo that is ≈0.8% / ≈1 BPM slower than notated in the MIDI file, matching the number we were looking for. That's why the |/OR operator is the safer one to use in such a bit-twiddling context…
But now I'm curious. This is such a tiny bug that is bound to remain unnoticed until someone compares the game's MIDI output to another renderer. It must have certainly made it into other games whose MIDI code is based on Shuusou Gyoku's, or that pbg was involved with. And sure enough, not only did this bug survive Kioh Gyoku's OOP refactoring, but it even traveled into Windows Touhou, where it remained in every single game that supported MIDI playback. Now we know for a fact that pbg's Program Support role in the TH06 credits involved sharing ready-made, finished code with ZUN:

Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH06Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH07Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH08Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH09Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH10
The broken tempo deserialization in the respective latest full versions of TH06 through TH10. And yes, that's TH10 – even though TH09's trial version was the last game to ship MIDI versions of its soundtrack, TH10 still contained all of pbg's MIDI code that originated back in Shuusou Gyoku, before TH11 finally removed it.
Amusingly, ZUN's compiler even started optimizing the combination of left-shifting and addition to a multiplication with 257 for TH09, which even sort of highlights this bug if you're used to reading x86 ASM.

That leaves support for MIDI loop points as the only missing feature for syncing MIDI data with a looping waveform track. While it didn't require all too much code, pbg's original zero-copy approach of iterating over raw MIDI data definitely injected a lot of complexity into the required branches. Multi-track/SMF Type 1 files require quite a bit of extra thought to correctly calculate delta times across loop boundaries that reach past the end of the respective track, while still allowing the real-time delta values to be resynchronized at tempo changes within the loop – and yes, 3 of ZUN's 19 arranged MIDI files actually do use more than one track, so this wasn't just about maximizing MIDI compatibility for mods. I stuck to the original approach mostly as a challenge and to prove that it's possible without first parsing the entire MIDI sequence into a friendlier internal representation, but I absolutely do not recommend this to anyone else. :tannedcirno:

After hardcoding the loop points detected by mly into the binary, we only need to call Mid_Proc() once per frame in the Music Room and pass the frame delta time instead of the 10 ms constant. And then, we get this:

The MIDI TIMER now shows off the arguably more interesting current MIDI pulse value rather than just formatting the PASSED TIME in milliseconds. Ironically, displaying this value in a constantly counting way takes more effort now – the new nanosecond-based timing code doesn't use any measure of total MIDI pulses anymore, and they don't naturally fall out of the algorithm either. Instead, the code remembers the total pulse value of the last event it processed and adds the real-time duration that has passed since, similar to the original timing algorithm.
This naturally causes the timer to jump from the loop end pulse to the loop start pulse, proving that Mid_Proc() is in fact looping the sequence.

Alright, now we know what to package:

Unfortunately, we still haven't reached the end of the complications and weird issues that haunt Shuusou Gyoku's music:

  1. The original game reads the in-game track title directly out of the first Sequence Name event of the playing MIDI file. The waveform equivalent would be the Vorbis comment TITLE tag, which therefore should exactly match the original track's title, down to the exact placement of whitespace. As usual, if I emphasize minor things like this, it's not without reason: 幻想科学 ~ Doll's Phantom inconsistently uses halfwidth spaces at both sides of the , and wouldn't fit into the Music Room's limited space otherwise.

  2. However, the AST MIDI files jam a bunch of other metadata into their Sequence Names, roughly following the format
    【 $title 】 from 秋霜玉  for sc88Pro comp.ZUN
    The track titles should definitely not appear in this format in-game, but how do we get rid of this format without hardcoding either the names or the magic to parse the names out of this format? :thonk:
  3. The absolute state of GS SysEx tooling rears its ugly head one final time in three of the AST MIDIs, which for some reason are missing the Roland vendor prefix byte in all of their SysEx messages and are therefore undeniably bugged. There even seemed to be another SysEx-related bug which Romantique Tp explained away, but not this one:

    ssg_04.mid

    0:000	SysEx(   10 42 12 40 00 7F 00 41 F7)
    0:240	SysEx(   10 42 12 40 01 30 14 7B F7)
    0:360	SysEx(   10 42 12 40 01 33 14 78 F7)
    0:420	SysEx(   10 42 12 40 01 34 50 3B F7)

    ssg_05.mid

    0:000	SysEx(   10 42 12 40 00 7F 00 41 F7)
    0:240	SysEx(   10 42 12 40 01 30 14 7B F7)
    0:360	SysEx(   10 42 12 40 01 33 00 0C F7)
    0:420	SysEx(   10 42 12 40 01 34 14 77 F7)

    ssg_10.mid

    0:000	SysEx(   10 42 12 40 00 7F 00 41 F7)
    0:240	SysEx(   10 42 12 40 01 30 14 7B F7)
    0:360	SysEx(   10 42 12 40 01 33 00 0C F7)
    0:420	SysEx(   10 42 12 40 01 34 60 2B F7)

    ssg_04.mid

    0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)	GS Reset
    0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)	Reverb Macro #20
    0:360	SysEx(41 10 42 12 40 01 33 14 78 F7)	Reverb Level 20
    0:420	SysEx(41 10 42 12 40 01 34 50 3B F7)	Reverb Time 80

    ssg_05.mid

    0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)	GS Reset
    0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)	Reverb Macro #20
    0:360	SysEx(41 10 42 12 40 01 33 00 0C F7)	Reverb Level 0
    0:420	SysEx(41 10 42 12 40 01 34 14 77 F7)	Reverb Time 20

    ssg_10.mid

    0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)	GS Reset
    0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)	Reverb Macro #20
    0:360	SysEx(41 10 42 12 40 01 33 00 0C F7)	Reverb Level 0
    0:420	SysEx(41 10 42 12 40 01 34 60 2B F7)	Reverb Time 96
    The irony of using invalid Reverb Macros within already invalid SysEx messages is not lost on me.

    This is something we should fix even before running these files through Sound Canvas VA in order to render these with the reverb settings that ZUN clearly (and, for once, unironically) intended.

  4. For perfect preservation of the original BGM/gameplay synchronicity, it makes sense for the waveform versions to retain the leading 1 or 2 beats of silence that the original MIDI files use for their SysEx setup. While some of the AST tracks use a slightly different tempo compared to their OST counterparts, they would still be largely in sync as ZUN didn't rearrange the layout of their setup area… except for, once again, the three tracks used in the Extra Stage. :zunpet: Marisa's and Reimu's boss themes aren't too bad with their 4 beats of setup, but シルクロードアリス takes the cake with a whopping 12 beats of leading silence. That's 5 seconds from the start of the Extra Stage to the first note you'd hear. 🐌

2) and 4) could theoretically be worked around in Shuusou Gyoku's MIDI code, but there's no way around editing the MIDI files themselves as far as 3) is concerned. Thus, it makes sense to apply all of the workarounds to the AST MIDIs as part of the BGM build process – parsing the titles out of the 【brackets】, inserting the Roland vendor prefix byte where necessary, and compressing the setup bars in the Extra Stage themes to match their OST counterparts. Adding any hidden magic to the MIDI code would only have needlessly increased complexity and/or annoyed some modder in the future who would then have to work around it.
Ideally, these edits would involve taking the mly dump output, performing the necessary replacements at a plaintext level, and rebuilding the result back into a MIDI file, bu~t we're unfortunately missing the latter feature. Luckily, someone else had the same idea 13 years ago and wrote a tool in C that does exactly what we need. Getting it to compile in 2024 only required fixing a typical C thing… why are students and boomers defending this antique of a language again? 🙄

The single most glaring issue, however, is the drastic difference in volume between the individual tracks in both soundtracks. While Romantique Tp had to normalize each track to the maximum possible volume individually as a consequence of the recording process, the Sound Canvas VA renderings reveal just how inconsistent the volume levels of these MIDI files really are:

The peak amplitudes of every track in both soundtracks, as rendered by Sound Canvas VA at maximum volume. Looking at these, you might think that kaorin's 2007 recordings were purposely trying to preserve the clipping that would come out of an SC-88Pro if you don't manually adjust the volume knob for each song, but those recordings are still much louder than even these numbers.

So how do we interpret this? Is this a bug, because no one in their right mind would want their music to clip on purpose, and that in turn means that everything about these volume levels is arbitrary and unintentional? Or is this a quirk, and ZUN deliberately chose these volume levels for compositional reasons? It certainly would make sense for the name registration theme.
Once again, the AST version of シルクロードアリス is the worst offender in this regard as well, but it might also provide some evidence for the quirk interpretation. The fact that almost all of its MIDI channels blast away at full volume might have been an accident that could have gone unnoticed if the volume knob of ZUN's SC-88Pro was turned rather low during the time he arranged this piece, but the excessive left-panning must have been deliberate. Even Romantique Tp agrees:

Stereo waveform of the Sound Canvas VA rendering of Shuusou Gyoku's Extra Stage theme (シルクロードアリス), highlighting the excessive left-panningStereo waveform of Romantique Tp's recording of Shuusou Gyoku's Extra Stage theme (シルクロードアリス), highlighting the excessive left-panning
It might have even made compositional sense if Silk Road Alice was supposed to be a "Western-style piece", but it's not. :zunpet:

And that's with the volume already normalized. Because this one channel of this one track is almost twice as loud as anything else in the AST, we would consequently have to bring down the volume of every other arranged track and the right channel of the same track by almost 50% if we wanted to maintain the volume differences between the individual tracks of the AST. In the process, we lose almost one entire bit of dynamic range. At this rate, you might even consider remixing and remastering the entire thing, but that would involve so many creative decisions to definitely fall into fanfiction territory…

However, normalizing each track to a peak level of 0 dBFS makes much more sense for in-game playback if you consider how loud Shuusou Gyoku's sound effects are. Once again, the best solution would involve offering both versions, but should we really add two more SCVA BGM packs just to cover volume differences? :thonk:
ReplayGain solves this exact problem for regular music listening in a non-destructive way by writing the per-track and per-album gain levels into an audio file's metadata. Since we need metadata support for titles anyway, we can do something similar, albeit not exactly the same for two reasons:

And so, we hard-apply the album-level gain during the conversion from 32-bit float to FLAC to preserve the volume differences between the tracks, calculate the track-level GAIN FACTOR based on the resulting peak levels, add a volume normalization toggle to the Sound / Config menu, enable it by default, and thus make everyone happy. ✅

The final interesting tidbit in building these packages can be found in the way the Sound Canvas VA recordings are looped. When manually cutting loops, you always have to consider that the intro might end with unique notes that aren't present at the end of the loop, which will still be fading out at the calculated loop start point. This necessitates shifting the loop start point by a few bars until these notes are no longer audible – or you could simply ignore the issue because ZUN's compositions are so frantic that no one would ever notice. :onricdennat:
With the separate intro and loop files generated by mly, on the other hand, the reverb/release trails are immediately visible and, after trimming trailing silence, exactly define the number of samples that the calculated loop start point needs to be shifted by. The .loop file then remains always exactly as long, in samples, as the duration of the loop reported by mly. If a piece happens to have a constant tempo whose beat duration corresponds to an integer number of samples, we get some very satisfying, round loop durations out of this process. ☺️


So let's play it all back in-game… and immediately run into two unexpected miniaudio limitations, what the…?!

  1. miniaudio uses a fixed linear function for its fade-out envelope, and doesn't offer anything else? We might not even want a logarithmic one this time because symmetry with MIDI's simple quadratic curve would be neat, but we sure don't want a linear function – those stay near the original volume for too long, and then turn quiet way too quickly.
  2. There is no way to access FLAC metadata from miniaudio's public API, even though the library bundles the author's own FLAC library which has this feature?

📝 Back when I evaluated miniaudio, I alluded that I consider single-file C libraries to be massively overrated, and this is exactly why: Once they grow as massive as miniaudio (how ironic), they can quickly lead to their authors treating their dependencies as implementation details and melting down the interfaces that would naturally arise. In a regular library, dr_flac would be a separate, proper dependency, and the API would have a way to initialize a stream from an externally loaded drflac object. But since the C community collectively pretends that multi-file libraries are a burden on other developers, miniaudio ended up with dr_flac copy-pasted into its giant single file, with a silly ma_ namespacing prefix added to all its functions. And why? Did we have to move so far in the other direction just because CMake doesn't support globbing? That's a symptom of CMake not actually solving any problem, not a valid architectural decision that libraries should bend around. 🙄
So unless we fork and hack around in miniaudio, there's now no way around depending on a second, regular copy of dr_flac. Which has now led to the same project organization bloat that single-file libraries originally set out to prevent…

Sigh. At this rate, it makes more sense to just copy-paste and adapt the old BGM streaming code I wrote for thcrap in late 2018, which used dr_flac directly, and extend it with metadata support. With the streaming code moved out of the platform layer and into game logic, it also makes much more sense to implement the squared fade-out curve at that same level instead of copy-pasting and adjusting an unhealthy amount of miniaudio's verbose C code.
While I'm doing the same for the old Vorbis streaming code, it would also make sense to rewrite that one to use stb_vorbis instead of the old libogg+libvorbis reference libraries. There's no need to add two more dependencies if miniaudio already comes with stb_vorbis.c, and that library is widely acclaimed. So, integration should be a breeze, right?
Well, surprise, rarely have I seen a C library so actively hostile toward being integrated. Both of its API variants are completely unreasonable:

What happened to the tried-and-true idea of providing a structure with read, tell, and seek callbacks, and then providing an optional variant for C FILE* handles if you absolutely must? Sure, the whole point of Vorbis is to be small and nobody these days would care about spending a few MB on keeping an entire Vorbis file in memory, but come on. If pulldata made the deliberate and opinionated choice to only support buffers of complete Vorbis streams and argued in the name of simplicity that hand-coded disk streaming isn't worth it in this day and age, I might have even been convinced. And this is from the guy who popularized the concept of single-file C libraries in the first place? :thonk:

Oh well, tupblocks go brrr. libvorbis definitely shows its age with all the old command-line tools in the lib/ directory that they never moved away and that we now have to remove from our glob. But even that just adds a single line to the Tupfile, and then we get to enjoy its much friendlier API. That sure beats the almost 800 lines of code that miniaudio had to write to integrate stb_vorbis… which I can't even link because the file is too big for GitHub. 🤷
At this point, it would have even made sense to upgrade from a 24-year-old lossy codec to an 11-year-old lossy codec and use Opus instead, since the enforced 48,000 Hz sampling rate is a non-issue when you control the entire audio pipeline. But let's keep compatibility with existing thcrap mods for now.

The last time I added dependencies, 📝 I wondered whether just downloading and extracting official Windows binary builds might be superior to pasting batch script duct tape over the usability issues of Git submodules. However, I still wanted to try out Git's sparse checkout feature before, in an attempt to remove all the unneeded bloat… and as it turned out, this might just be the idealistic and perfect nirvana of vendoring libraries in C++ projects. I particularly like how the limitations of its default mode (always checking out all files within each directory level that shows up in a filter) can be turned into a guideline about how to structure a repository: All non-essential stuff that consumers of your code might not need – tests, high-level documentation, or optional features – should go into a subdirectory where it can be easily filtered.
And that's how the size of our libs/ directory went down from 82.7 MiB in the P0256 build to 30.4 MiB in the P0275 build, despite adding 4 more libraries in the latter. Now if only this didn't require even more duct tape to actually set up shallow clones correctly

In the end, the Windows build ended up using only a single one of the miniaudio features that DirectSound doesn't have, and that's the ability to use the more modern WASAPI instead of DirectSound. We're still going to use miniaudio for the Linux port, but as far as Windows is concerned, it would be quite nice to backport BGM streaming to the game's original DirectSound backend. The P0275 build is pushing 1 MiB of binary size for a game that originally came in a 220 KiB binary, so it would remove a noticeable amount of bloat from GIAN07.EXE, but it would also allow waveform BGM to work in the Windows 98-compatible i586 build. If that sounds cool to you, this is the issue you want to fund.


That only left some logic and UI busywork to put it all together, which means that we've almost reached the end of things to talk about! Here's what it all looks like:



After half a year of being bought out way past the cap, I've finally got some small room left for new orders again. If it weren't for this blog post and the required research and web development work, this delivery would have probably come out in early January, taking half the time it ended up taking. So I really have to start factoring the blog posts into the push prices in a better and fairer way.
Meanwhile, the hate toward my day job only keeps growing, but there's little point in looking for a new one as long as ReC98 remains this motivating and complex. It leaves pretty much no cognitive room for any similarly demanding job. Thus, I want 2024 to be the year where ReC98 either becomes profitable enough to be my only full-time job, or where we conclusively find out that it can't, I go look for a better day job, and ReC98 shifts to a slower pace. Here's the plan:

With the new price of per push, this means that there's now a small window in which you can get a full push worth of functionality for , until the current cap is filled up again.

Next up: Probably TH02's endings to relax a bit. Maybe we're also getting some new Touhou-related contributions?

📝 Posted:
💰 Funded by:
Blue Bolt, [Anonymous], iruleatgames
🏷️ Tags:

Oh, it's 2024 already and I didn't even have a delivery for December or January? Yeah… I can only repeat what I said at the end of November, although the finish line is actually in sight now. With 10 pushes across 4 repositories and a blog post that has already reached a word count of 9,240, the Shuusou Gyoku SC-88Pro BGM release is going to break 📝 both the push record set by TH01 Sariel two years ago, and 📝 the blog post length record set by the last Shuusou Gyoku delivery. Until that's done though, let's clear some more PC-98 Touhou pushes out of the backlog, and continue the preparation work for the non-ASCII translation project starting later this year.

But first, we got another free bugfix according to my policy! 📝 Back in April 2022 when I researched the Divide Error crash that can occur in TH04's Stage 4 Marisa fight, I proposed and implemented four possible workarounds and let the community pick one of them for the generally recommended small bugfix mod. I still pushed the others onto individual branches in case the gameplay community ever wants to look more closely into them and maybe pick a different one… except that I accidentally pushed the wrong code for the warp workaround, probably because I got confused with the second warp variant I developed later on.
Fortunately, I still had the intended code for both variants lying around, and used the occasion to merge the current master branch into all of these mod branches. Thanks to wyatt8740 for spotting and reporting this oversight!

  1. The Music Room background masking effect
  2. The GRCG's plane disabling flags
  3. Text color restrictions
  4. The entire messy rest of the Music Room code
  5. TH04's partially consistent congratulation picture on Easy Mode
  6. TH02's boss position and damage variables

As the final piece of code shared in largely identical form between 4 of the 5 games, the Music Rooms were the biggest remaining piece of low-hanging fruit that guaranteed big finalization% gains for comparatively little effort. They seemed to be especially easy because I already decompiled TH02's Music Room together with the rest of that game's OP.EXE back in early 2015, when this project focused on just raw decompilation with little to no research. 9 years of increased standards later though, it turns out that I missed a lot of details, and ended up renaming most variables and functions. Combined with larger-than-expected changes in later games and the usual quality level of ZUN's menu code, this ended up taking noticeably longer than the single push I expected.

The undoubtedly most interesting part about this screen is the animation in the background, with the spinning and falling polygons cutting into a single-color background to reveal a spacey image below. However, the only background image loaded in the Music Room is OP3.PI (TH02/TH03) or MUSIC3.PI (TH04/TH05), which looks like this in a .PI viewer or when converted into another image format with the usual tools:

TH02's Music Room background in its on-disk state TH03's Music Room background in its on-disk state TH04's Music Room background in its on-disk state TH05's Music Room background in its on-disk state
Let's call this "the blank image".

That is definitely the color that appears on top of the polygons, but where is the spacey background? If there is no other .PI file where it could come from, it has to be somewhere in that same file, right? :thonk:
And indeed: This effect is another bitplane/color palette trick, exactly like the 📝 three falling stars in the background of TH04's Stage 5. If we set every bit on the first bitplane and thus change any of the resulting even hardware palette color indices to odd ones, we reveal a full second 8-color sub-image hiding in the same .PI file:

TH02's Music Room background, with all bits in the first bitplane set to reveal the spacey background image, and the full color palette at the bottom TH03's Music Room background, with all bits in the first bitplane set to reveal the spacey background image, and the full color palette at the bottom TH04's Music Room background, with all bits in the first bitplane set to reveal the spacey background image, and the full color palette at the bottom TH05's Music Room background, with all bits in the first bitplane set to reveal the spacey background image, and the full color palette at the bottom
The spacey sub-image. Never before seen!1!! …OK, touhou-memories beat me by a month. Let's add each image's full 16-color palette to deliver some additional value.

On a high level, the first bitplane therefore acts as a stencil buffer that selects between the blank and spacey sub-image for every pixel. The important part here, however, is that the first bitplane of the blank sub-images does not consist entirely of 0 bits, but does have 1 bits at the pixels that represent the caption that's supposed to be overlaid on top of the animation. Since there now are some pixels that should always be taken from the spacey sub-image regardless of whether they're covered by a polygon, the game can no longer just clear the first bitplane at the start of every frame. Instead, it has to keep a separate copy of the first bitplane's original state (called nopoly_B in the code), captured right after it blitted the .PI image to VRAM. Turns out that this copy also comes in quite handy with the text, but more on that later.


Then, the game simply draws polygons onto only the reblitted first bitplane to conditionally set the respective bits. ZUN used master.lib's grcg_polygon_c() function for this, which means that we can entirely thank the uncredited master.lib developers for this iconic animation – if they hadn't included such a function, the Music Rooms would most certainly look completely different.
This is where we get to complete the series on the PC-98 GRCG chip with the last remaining four bits of its mode register. So far, we only needed the highest bit (0x80) to either activate or deactivate it, and the bit below (0x40) to choose between the 📝 RMW and 📝 TCR/📝 TDW modes. But you can also use the lowest four bits to restrict the GRCG's operations to any subset of the four bitplanes, leaving the other ones untouched:

// Enable the GRCG (0x80) in regular RMW mode (0x40). All bitplanes are
// enabled and written according to the contents of the tile register.
outportb(0x7C, 0xC0);

// The same, but limiting writes to the first bitplane by disabling the
// second (0x02), third (0x04), and fourth (0x08) one, as done in the
// PC-98 Touhou Music Rooms.
outportb(0x7C, 0xCE);

// Regular GRCG blitting code to any VRAM segment…
pokeb(0xA8000, offset, …);

// We're done, turn off the GRCG.
outportb(0x7C, 0x00);

This could be used for some unusual effects when writing to two or three of the four planes, but it seems rather pointless for this specific case at first. If we only want to write to a single plane, why not just do so directly, without the GRCG? Using that chip only involves more hardware and is therefore slower by definition, and the blitting code would be the same, right?
This is another one of these questions that would be interesting to benchmark one day, but in this case, the reason is purely practical: All of master.lib's polygon drawing functions expect the GRCG to be running in RMW mode. They write their pixels as bitmasks where 1 and 0 represent pixels that should or should not change, and leave it to the GRCG to combine these masks with its tile register and OR the result into the bitplanes instead of doing so themselves. Since GRCG writes are done via MOV instructions, not using the GRCG would turn these bitmasks into actual dot patterns, overwriting any previous contents of each VRAM byte that gets modified.
Technically, you'd only have to replace a few MOV instructions with OR to build a non-GRCG version of such a function, but why would you do that if you haven't measured polygon drawing to be an actual bottleneck.

Three overlapping Music Room polygons rendered using master.lib's grcg_polygon_c() function with a disabled GRCGThree overlapping Music Room polygons rendered as in the original game, with the GRCG enabled
An example with three polygons drawn from top to bottom. Without the GRCG, edges of later polygons overwrite any previously drawn pixels within the same VRAM byte. Note how treating bitmasks as dot patterns corrupts even those areas where the background image had nonzero bits in its first bitplane.

As far as complexity is concerned though, the worst part is the implicit logic that allows all this text to show up on top of the polygons in the first place. If every single piece of text is only rendered a single time, how can it appear on top of the polygons if those are drawn every frame?
Depending on the game (because of course it's game-specific), the answer involves either the individual bits of the text color index or the actual contents of the palette:

The contents of nopoly_B with each game's first track selected.

Finally, here's a list of all the smaller details that turn the Music Rooms into such a mess:

And that's all the Music Rooms! The OP.EXE binaries of TH04 and especially TH05 are now very close to being 100% RE'd, with only the respective High Score menus and TH04's title animation still missing. As for actual completion though, the finalization% metric is more relevant as it also includes the ZUN Soft logo, which I RE'd on paper but haven't decompiled. I'm 📝 still hoping that this will be the final piece of code I decompile for these two games, and that no one pays to get it done earlier… :onricdennat:


For the rest of the second push, there was a specific goal I wanted to reach for the remaining anything budget, which was blocked by a few functions at the beginning of TH04's and TH05's MAINE.EXE. In another anticlimactic development, this involved yet another way too early decompilation of a main() function…
Generally, this main() function just calls the top-level functions of all other ending-related screens in sequence, but it also handles the TH04-exclusive congratulating All Clear images within itself. After a 1CC, these are an additional reward on top of the Good Ending, showing the player character wearing a different outfit depending on the selected difficulty. On Easy Mode, however, the Good Ending is unattainable because the game always ends after Stage 5 with a Bad Ending, but ZUN still chose to show the EASY ALL CLEAR!! image in this case, regardless of how many continues you used.
While this might seem inconsistent with the other difficulties, it is consistent within Easy Mode itself, as the enforced Bad Ending after Stage 5 also doesn't distinguish between the number of continues. Also, Try to Normal Rank!! could very well be ZUN's roundabout way of implying "because this is how you avoid the Bad Ending".

With that out of the way, I was finally able to separate the VRAM text renderer of TH04 and TH05 into its own assembly unit, 📝 finishing the technical debt repayment project that I couldn't complete in 2021 due to assembly-time code segment label arithmetic in the data segment. This now allows me to translate this undecompilable self-modifying mess of ASM into C++ for the non-ASCII translation project, and thus unify the text renderers of all games and enhance them with support for Unicode characters loaded from a bitmap font. As the final finalized function in the SHARED segment, it also allowed me to remove 143 lines of particularly ugly segmentation workarounds 🙌


The remaining 1/6th of the second push provided the perfect occasion for some light TH02 PI work. The global boss position and damage variables represented some equally low-hanging fruit, being easily identified global variables that aren't part of a larger structure in this game. In an interesting twist, TH02 is the only game that uses an increasing damage value to track boss health rather than decreasing HP, and also doesn't internally distinguish between bosses and midbosses as far as these variables are concerned. Obviously, there's quite a bit of state left to be RE'd, not least because Marisa is doing her own thing with a bunch of redundant copies of her position, but that was too complex to figure out right now.

Also doing their own thing are the Five Magic Stones, which need five positions rather than a single one. Since they don't move, the game doesn't have to keep 📝 separate position variables for both VRAM pages, and can handle their positions in a much simpler way that made for a nice final commit.
And for the first time in a long while, I quite like what ZUN did there! Not only are their positions stored in an array that is indexed with a consistent ID for every stone, but these IDs also follow the order you fight the stones in: The two inner ones use 0 and 1, the two outer ones use 2 and 3, and the one in the center uses 4. This might look like an odd choice at first because it doesn't match their horizontal order on the playfield. But then you notice that ZUN uses this property in the respective phase control functions to iterate over only the subrange of active stones, and you realize how brilliant it actually is.

Screenshot of TH02's Five Magic Stones, with the first two (both internally and in the order you fight them in) alive and activated Screenshot of TH02's Five Magic Stones, with the second two (both internally and in the order you fight them in) alive and activated Screenshot of TH02's Five Magic Stones, with the last one (both internally and in the order you fight them in) alive and activated

This seems like a really basic thing to get excited about, especially since the rest of their data layout sure isn't perfect. Splitting each piece of state and even the individual X and Y coordinates into separate 5-element arrays is still counter-productive because the game ends up paying more memory and CPU cycles to recalculate the element offsets over and over again than this would have ever saved in cache misses on a 486. But that's a minor issue that could be fixed with a few regex replacements, not a misdesigned architecture that would require a full rewrite to clean it up. Compared to the hardcoded and bloated mess that was 📝 YuugenMagan's five eyes, this is definitely an improvement worthy of the good-code tag. The first actual one in two years, and a welcome change after the Music Room!

These three pieces of data alone yielded a whopping 5% of overall TH02 PI in just 1/6th of a push, bringing that game comfortably over the 60% PI mark. MAINE.EXE is guaranteed to reach 100% PI before I start working on the non-ASCII translations, but at this rate, it might even be realistic to go for 100% PI on MAIN.EXE as well? Or at least technical position independence, without the false positives.

Next up: Shuusou Gyoku SC-88Pro BGM. It's going to be wild.

📝 Posted:
💰 Funded by:
Blue Bolt, [Anonymous]
🏷️ Tags:

And once again, the Shuusou Gyoku task was too complex to be satisfyingly solved within a single month. Even just finding provably correct loop sections in both the original and arranged MIDI files required some rather involved detection algorithms. I could have just defined what sounded like correct loops, but the results of these algorithms were quite surprising indeed. Turns out that not even Seihou is safe from ZUN quirks, and some tracks technically loop much later than you'd think they do, or don't loop at all. And since I then wanted to put these MIDI loops back into the game to ensure perfect synchronization between the recordings and MIDI versions, I ended up rewriting basically all the MIDI code in a cross-platform way. This rewrite also uncovered a pbg bug that has traveled from Shuusou Gyoku into Windows Touhou, where it survived until ZUN ultimately removed all MIDI code in TH11 (!)

Fortunately, the backlog still had enough general PC-98 Touhou funds that I could spend on picking some soon-important low-hanging fruit, giving me something to deliver for the end of the month after all. TH04 and TH05 use almost identical code for their main/option menus, so decompiling it would make number go up quite significantly and the associated blog post won't be that long…

Wait, what's this, a bug report from touhou-memories concerning the website?

  1. Tab switchers tended to break on certain Firefox versions, and
  2. video playback didn't work on Microsoft Edge at all?

Those are definitely some high-priority bugs that demand immediate attention.

  1. Microsoft Edge's anti-support of AV1
  2. TH04/TH05's main/option menu
  3. TH04/TH05's first-launch sound setup menu
  4. TH05's title animation ☯️

The tab switcher issue was easily fixed by replacing the previous z-index trickery with a more robust solution involving the hidden attribute. The second one, however, is much more aggravating, because video playback on Edge has been broken ever since I 📝 switched the preferred video codec to AV1.
This goes so far beyond not supporting a specific codec. Usually, unsupported codecs aren't supposed to be an issue: As soon as you start using the HTML <video> tag, you'll learn that not every browser supports all codecs. And so you set up an encoding pipeline to serve each video in a mix of new and ancient formats, put the <source> tag of the most preferred codec first, and rest assured that browsers will fall back on the best-supported option as necessary. Except that Edge doesn't even try, and insists on staying on a non-playing AV1 video. 🙄

The codecs parameter for the <source> type attribute was the first potential solution I came across. Specifying the video codec down to the finest encoding details right in the HTML markup sounds like a good idea, similar to specifying sizes of images and videos to prevent layout reflows on long pages during the initial page load. So why was this the first time I heard of this feature? The fact that there isn't a simple ffprobe -show_html_codecs_string command to retrieve this string might already give a clue about how useful it is in practice. Instead, you have to manually piece the string together by grepping your way through all of a video's metadata
…and then it still doesn't change anything about Edge's behavior, even when also specifying the string for the VP9 and VP8 sources. Calling the infamously ridiculous HTMLMediaElement.canPlayType() method with a representative parameter of "video/webm; codecs=av01.1.04M.08.0.000.01.13.00.0" explains why: Both the AV1-supporting Chrome and Edge return "probably", but only the former can actually play this format. 🤦

But wait, there is an AV1 video extension in the Microsoft Store that would add support to any unspecified favorite video app. Except that it stopped working inside Edge as of version 116. And even if it did: If you can't query the presence of this extension via JavaScript, it might as well not exist at all.
Not to mention that the favorite video app part is obviously a lie as a lot of widely preferred Windows video apps are bundled with their own codecs, and have probably long supported AV1.

In the end, there's no way around the utter desperation move of removing the AV1 <source> for Edge users. Serving each video in two other formats means that we can at least do something here – try visiting the GitHub release page of the P0234-1 TH01 Anniversary Edition build in Edge and you also don't get to see anything, because that video uses AV1 and GitHub understandably doesn't re-encode every uploaded video into a variety of old formats.
Just for comparison, I tried both that page and the ReC98 blog on an old Android 6 phone from 2014, and even that phone picked and played the AV1 videos with the latest available Chrome and Firefox versions. This was the phone whose available Firefox version didn't support VP9 in 2019, which was my initial reason for adding the VP8 versions. Looks like it's finally time to drop those… 🤔 Maybe in the far future once I start running out of space on this server.

Removing the <source> tags can be done in one of two places:

  1. server-side, detecting Edge via the User-Agent header, or
  2. client-side, using navigator.userAgentData.brands.

I went with 2) because more dynamic server-side code would only move us further away from static site generation, which would make a lot of sense as the next evolutionary step in the architecture of this website. The client-side solution is much simpler too, and we can defer the deletion until a user actually hovers over a specific video.
And while we're at it, let's also add a popup complaining about this whole state of affairs. Edge is heavily marketed inside Windows as "the modern browser recommended by Microsoft", and you sure wouldn't expect low-quality chroma-subsampled VP9 from such a tagline. With such a level of anti-support for AV1, Edge users deserve to know exactly what's going on, especially since this post also explains what they will encounter on other websites.

A popup on top of a ReC98 blog video, showing the caption "⚠️ Edge does not support AV1, falling back on low-quality video…"
That's the polite way of putting it.

Alright, where was I? For TH01, the main menu was the last thing I decompiled before the 100% finalization mark, so it's rather anticlimactic to already cover the TH04/TH05 one now, with both of the games still being very far away from 100%, just because people will soon want to translate the description text in the bottom-right corner of the screen. But then again, the ZUN Soft logo animation would make for an even nicer final piece of decompiled code, especially since the bouncing-ball logo from TH01, TH02, and TH03 was the very first decompilation I did, all the way back in 2015.

The code quality of ZUN's VRAM-based menus has barely increased between TH01 and TH05. Both the top-level and option menu still need to know the bounding rectangle of the other one to unblit the right pixels when switching between the two. And since ZUN sure loved hardcoded and copy-pasted numbers in the PC-98 days, the coordinates both tend to be excessively large, and excessively wrong. :zunpet: Luckily, each menu item comes with its own correct unblitting rectangle, which avoids any graphical glitches that would otherwise occur.
As for actual observable quirks and bugs, these menus only contain one of each, and both are exclusive to TH04:

And yes, these videos do have a frame rate of 2 FPS.

Now that 100% finalization of their OP.EXE binaries is within reach, all this bloat made me think about the viability of a 📝 single-executable build for TH04's and TH05's debloated and anniversary versions. It would be really nice to have such a build ready before I start working on the non-ASCII translations – not just because they will be based on the anniversary branch by default, but also because it would significantly help their development if there are 4 fewer executables to worry about.
However, it's not as simple for these games as it was for TH01. The unique code in their OP.EXE and MAINE.EXE binaries is much larger than Borland's easily removed C++ exception handler, so I'd have to remove a lot more bloat to keep the resulting single binary at or below the size of the original MAIN.EXE. But I'm sure going to try.


Speaking of code that can be debloated for great effect: The second push of this delivery focused on the first-launch sound setup menu, whose BGM and sound effect submenus are almost complete code duplicates of each other. The debloated branch could easily remove more than half of the code in there, yielding another ≈800 bytes in case we need them.
If hex-editing MIKO.CFG is more convenient for you than deleting that file, you can set its first byte to FF to re-trigger this menu. Decompiling this screen was not only relevant now because it contains text rendered with font ROM glyphs and it would help dig our way towards more important strings in the data segment, but also because of its visual style. I can imagine many potential mods that might want to use the same backgrounds and box graphics for their menus.

TH04's first-launch sound setup menu, showing the BGM mode selectionTH05's first-launch sound setup menu, showing the sound effect mode selection
How about an initial language selection menu in the same style?

With the two submenus being shown in a fixed sequence, there's not a lot of room for the code to do anything wrong, and it's even more identical between the two games than the main menu already was. Thankfully, ZUN just reblits the respective options in the new color when moving the cursor, with no 📝 palette tricks. TH04's background image only uses 7 colors, so he could have easily reserved 3 colors for that. In exchange, the TH05 image gets to use the full 16 colors with no change to the code.


Rounding out this delivery, we also got TH05's rolling Yin-Yang Orb animation before the title screen… and it's just more bloat and landmines on a smaller scale that might be noticeable on slower PC-98 models. In total, there are three unnecessary inter-page copies of the entire VRAM that can easily insert lag frames, and two minor page-switching landmines that can potentially lead to tearing on the first frame of the roll or fade animation. Clearly, ZUN did not have smoothness or code quality in mind there, as evidenced by the fact that this animation simply displays 8 .PI files in sequence. But hey, a short animation like this is 📝 another perfectly appropriate place for a quick-and-dirty solution if you develop with a deadline.
And that's 1.30% of all PC-98 Touhou code finalized in two pushes! We're slowly running out of these big shared pieces of ASM code…

I've been neglecting TH03's OP.EXE quite a bit since it simply doesn't contain any translatable plaintext outside the Music Room. All menu labels are gaiji, and even the character selection menu displays its monochrome character names using the 4-plane sprites from CHNAME.BFT. Splitting off half of its data into a separate .ASM file was more akin to getting out a jackhammer to free up the room in front of the third remaining Music Room, but now we're there, and I can decompile all three of them in a natural way, with all referenced data.
Next up, therefore: Doing just that, securing another important piece of text for the upcoming non-ASCII translations and delivering another big piece of easily finalized code. I'm going to work full-time on ReC98 for almost all of December, and delivering that and the Shuusou Gyoku SC-88Pro recording BGM back-to-back should free up about half of the slightly higher cap for this month.

📝 Posted:
💰 Funded by:
Blue Bolt, [Anonymous]
🏷️ Tags:

TH03 finally passed 20% RE, and the newly decompiled code contains no serious ZUN bugs! What a nice way to end the year.

There's only a single unlockable feature in TH03: Chiyuri and Yumemi as playable characters, unlocked after a 1CC on any difficulty. Just like the Extra Stages in TH04 and TH05, YUME.NEM contains a single designated variable for this unlocked feature, making it trivial to craft a fully unlocked score file without recording any high scores that others would have to compete against. So, we can now put together a complete set for all PC-98 Touhou games: 2021-12-27-Fully-unlocked-clean-score-files.zip It would have been cool to set the randomly generated encryption keys in these files to a fixed value so that they cancel out and end up not actually encrypting the file. Too bad that TH03 also started feeding each encrypted byte back into its stream cipher, which makes this impossible.

The main loading and saving code turned out to be the second-cleanest implementation of a score file format in PC-98 Touhou, just behind TH02. Only two of the YUME.NEM functions come with nonsensical differences between OP.EXE and MAINL.EXE, rather than 📝 all of them, as in TH01 or 📝 too many of them, as in TH04 and TH05. As for the rest of the per-difficulty structure though… well, it quickly becomes clear why this was the final score file format to be RE'd. The name, score, and stage fields are directly stored in terms of the internal REGI*.BFT sprite IDs used on the high score screen. TH03 also stores 10 score digits for each place rather than the 9 possible ones, keeps any leading 0 digits, and stores the letters of entered names in reverse order… yeah, let's decompile the high score screen as well, for a full understanding of why ZUN might have done all that. (Answer: For no reason at all. :zunpet:)


And wow, what a breath of fresh air. It's surely not good-code: The overlapping shadows resulting from using a 24-pixel letterspacing with 32-pixel glyphs in the name column led ZUN to do quite a lot of unnecessary and slightly confusing rendering work when moving the cursor back and forth, and he even forgot about the EGC there. But it's nowhere close to the level of jank we saw in 📝 TH01's high score menu last year. Good to see that ZUN had learned a thing or two by his third game – especially when it comes to storing the character map cursor in terms of a character ID, and improving the layout of the character map:

The alphabet available for TH03 high score names.

That's almost a nicely regular grid there. With the question mark and the double-wide SP, BS, and END options, the cursor movement code only comes with a reasonable two exceptions, which are easily handled. And while I didn't get this screen completely decompiled, one additional push was enough to cover all important code there.

The only potential glitch on this screen is a result of ZUN's continued use of binary-coded decimal digits without any bounds check or cap. Like the in-game HUD score display in TH04 and TH05, TH03's high score screen simply uses the next glyph in the character set for the most significant digit of any score above 1,000,000,000 points – in this case, the period. Still, it only really gets bad at 8,000,000,000 points: Once the glyphs are exhausted, the blitting function ends up accessing garbage data and filling the entire screen with garbage pixels. For comparison though, the current world record is 133,650,710 points, so good luck getting 8 billion in the first place.

Next up: Starting 2022 with the long-awaited decompilation of TH01's Sariel fight! Due to the 📝 recent price increase, we now got a window in the cap that is going to remain open until tomorrow, providing an early opportunity to set a new priority after Sariel is done.

📝 Posted:
💰 Funded by:
Ember2528
🏷️ Tags:

OK, TH01 missile bullets. Can we maybe have a well-behaved entity type, without any weirdness? Just once?

Ehh, kinda. Apart from another 150 bytes wasted on unused structure members, this code is indeed more on the low end in terms of overall jank. It does become very obvious why dodging these missiles in the YuugenMagan, Mima, and Elis fights feels so awful though: An unfair 46×46 pixel hitbox around Reimu's center pixel, combined with the comeback of 📝 interlaced rendering, this time in every stage. ZUN probably did this because missiles are the only 16×16 sprite in TH01 that is blitted to unaligned X positions, which effectively ends up touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would have been totally possible to render every missile in every frame at roughly the same amount of CPU time that the original game uses for interlaced rendering:

That's an optimization that would have significantly benefitted the game, in contrast to all of the fake ones introduced in later games. Then again, this optimization is actually something that the later games do, and it might have in fact been necessary to achieve their higher bullet counts without significant slowdown.

Unfortunately, it was only worth decompiling half of the missile code right now, thanks to gratuitous FPU usage in the other half, where 📝 double variables are compared to float literals. That one will have to wait 📝 until after SinGyoku.


After some effectively unused Mima sprite effect code that is so broken that it's impossible to make sense out of it, we get to the final feature I wanted to cover for all bosses in parallel before returning to Sariel: The separate sprite background storage for moving or animated boss sprites in the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from main memory on every frame. After all, TH01 and TH02 had a minimum required clock speed of 33 MHz, half of the speed required for the later three games. So, he simply blitted these boss sprites to both VRAM pages, leading the usual unblitting calls to only remove the other sprites on top of the boss. However, these bosses themselves want to move across the screen… and this makes it necessary to save the stage background behind them in some other way.

Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM into a sprite slot. No problem with that approach in theory, as the size of all these bigger sprites is a multiple of 32×32; splitting a larger sprite into these smaller 32×32 chunks makes the code look just a little bit clumsy (and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is blitted on top of her regular sprite, using just regular sprite transparency, she ends up with her infamous third arm:

TH01 Mima's third arm

Ironically, there's an unused code path in Mima's unblit function where ZUN assumes a height of 48 pixels for Mima's animation sprites rather than the actual 64. This leads to even clumsier .PTN function calls for the bottom 128×16 pixels… Failing to unblit the bottom 16 pixels would have also yielded that third arm, although it wouldn't have looked as natural. Still wouldn't say that it was intentional; maybe this casting sprite was just added pretty late in the game's development?


So, mission accomplished, Sariel unblocked… at 2¼ pushes. :thonk: That's quite some time left for some smaller stage initialization code, which bundles a bunch of random function calls in places where they logically really don't belong. The stage opening animation then adds a bunch of VRAM inter-page copies that are not only redundant but can't even be understood without knowing the hidden internal state of the last VRAM page accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any complexity limit on inlining arithmetic expressions, as long as they only operate on compile-time constants. That's how we get macro-free, compile-time Shift-JIS to JIS X 0208 conversion of the individual code points in the 東方★靈異伝 string, in a compiler from 1994. As long as you don't store any intermediate results in variables, that is… :tannedcirno:

But wait, there's more! With still ¼ of a push left, I also went for the boss defeat animation, which includes the route selection after the SinGyoku fight.
As in all other instances, the 2× scaled font is accomplished by first rendering the text at regular 1× resolution to the other, invisible VRAM page, and then scaled from there to the visible one. However, the route selection is unique in that its scaled text is both drawn transparently on top of the stage background (not onto a black one), and can also change colors depending on the selection. It would have been no problem to unblit and reblit the text by rendering the 1× version to a position on the invisible VRAM page that isn't covered by the 2× version on the visible one, but ZUN (needlessly) clears the invisible page before rendering any text. :zunpet: Instead, he assigned a separate VRAM color for both the 魔界 and 地獄 options, and only changed the palette value for these colors to white or gray, depending on the correct selection. This is another one of the 📝 rare cases where TH01 demonstrates good use of PC-98 hardware, as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.

Then, why does this still not count as good-code? When changing palette colors, you kinda need to be aware of everything else that can possibly be on screen, which colors are used there, and which aren't and can therefore be used for such an effect without affecting other sprites. In this case, well… hover over the image below, and notice how Reimu's hair and the bomb sprites in the HUD light up when Makai is selected:

Demonstration of palette changes in TH01's route selection

This push did end on a high note though, with the generic, non-SinGyoku version of the defeat animation being an easily parametrizable copy. And that's how you decompile another 2.58% of TH01 in just slightly over three pushes.


Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and SinGyoku without needing any more detours into non-boss code. Thanks to the current TH01 funding subscriptions, I can plan to cover most, if not all, of Sariel in a single push series, but the currently 3 pending pushes probably won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got quite a lot of not specifically TH01-related funds in the backlog to pass the time though.

Due to recent developments, it actually makes quite a lot of sense to take a break from TH01: spaztron64 has managed what every Touhou download site so far has failed to do: Bundling all 5 game onto a single .HDI together with pre-configured PC-98 emulators and a nice boot menu, and hosting the resulting package on a proper website. While this first release is already quite good (and much better than my attempt from 2014), there is still a bit of room for improvement to be gained from specific ReC98 research. Next up, therefore:

📝 Posted:
💰 Funded by:
[Anonymous]
🏷️ Tags:

Alright, no more big code maintenance tasks that absolutely need to be done right now. Time to really focus on parts 6 and 7 of repaying technical debt, right? Except that we don't get to speed up just yet, as TH05's barely decompilable PMD file loading function is rather… complicated.
Fun fact: Whenever I see an unusual sequence of x86 instructions in PC-98 Touhou, I first consult the disassembly of Wolfenstein 3D. That game was originally compiled with the quite similar Borland C++ 3.0, so it's quite helpful to compare its ASM to the officially released source code. If I find the instructions in question, they mostly come from that game's ASM code, leading to the amusing realization that "even John Carmack was unable to get these instructions out of this compiler" :onricdennat: This time though, Wolfenstein 3D did point me to Borland's intrinsics for common C functions like memcpy() and strchr(), available via #pragma intrinsic. Bu~t those unfortunately still generate worse code than what ZUN micro-optimized here. Commenting how these sequences of instructions should look in C is unfortunately all I could do here.
The conditional branches in this function did compile quite nicely though, clarifying the control flow, and clearly exposing a ZUN bug: TH05's snd_load() will hang in an infinite loop when trying to load a non-existing -86 BGM file (with a .M2 extension) if the corresponding -26 BGM file (with a .M extension) doesn't exist either.

Unsurprisingly, the PMD channel monitoring code in TH05's Music Room remains undecompilable outside the two most "high-level" initialization and rendering functions. And it's not because there's data in the middle of the code segment – that would have actually been possible with some #pragmas to ensure that the data and code segments have the same name. As soon as the SI and DI registers are referenced anywhere, Turbo C++ insists on emitting prolog code to save these on the stack at the beginning of the function, and epilog code to restore them from there before returning. Found that out in September 2019, and confirmed that there's no way around it. All the small helper functions here are quite simply too optimized, throwing away any concern for such safety measures. 🤷
Oh well, the two functions that were decompilable at least indicate that I do try.


Within that same 6th push though, we've finally reached the one function in TH05 that was blocking further progress in TH04, allowing that game to finally catch up with the others in terms of separated translation units. Feels good to finally delete more of those .ASM files we've decompiled a while ago… finally!

But since that was just getting started, the most satisfying development in both of these pushes actually came from some more experiments with macros and inline functions for near-ASM code. By adding "unused" dummy parameters for all relevant registers, the exact input registers are made more explicit, which might help future port authors who then maybe wouldn't have to look them up in an x86 instruction reference quite as often. At its best, this even allows us to declare certain functions with the __fastcall convention and express their parameter lists as regular C, with no additional pseudo-registers or macros required.
As for output registers, Turbo C++'s code generation turns out to be even more amazing than previously thought when it comes to returning pseudo-registers from inline functions. A nice example for how this can improve readability can be found in this piece of TH02 code for polling the PC-98 keyboard state using a BIOS interrupt:

inline uint8_t keygroup_sense(uint8_t group) {
	_AL = group;
	_AH = 0x04;
	geninterrupt(0x18);
	// This turns the output register of this BIOS call into the return value
	// of this function. Surprisingly enough, this does *not* naively generate
	// the `MOV AL, AH` instruction you might expect here!
	return _AH;
}

void input_sense(void)
{
	// As a result, this assignment becomes `_AH = _AH`, which Turbo C++
	// never emits as such, giving us only the three instructions we need.
	_AH = keygroup_sense(8);

	// Whereas this one gives us the one additional `MOV BH, AH` instruction
	// we'd expect, and nothing more.
	_BH = keygroup_sense(7);

	// And now it's obvious what both of these registers contain, from just
	// the assignments above.
	if(_BH & K7_ARROW_UP || _AH & K8_NUM_8) {
		key_det |= INPUT_UP;
	}
	// […]
}

I love it. No inline assembly, as close to idiomatic C code as something like this is going to get, yet still compiling into the minimum possible number of x86 instructions on even a 1994 compiler. This is how I keep this project interesting for myself during chores like these. :tannedcirno: We might have even reached peak inline already?

And that's 65% of technical debt in the SHARED segment repaid so far. Next up: Two more of these, which might already complete that segment? Finally!

📝 Posted:
💰 Funded by:
Blue Bolt, [Anonymous]
🏷️ Tags:

Turns out that TH04's player selection menu is exactly three times as complicated as TH05's. Two screens for character and shot type rather than one, and a way more intricate implementation for saving and restoring the background behind the raised top and left edges of a character picture when moving the cursor between Reimu and Marisa. TH04 decides to backup precisely only the two 256×8 (top) and 8×244 (left) strips behind the edges, indicated in red in the picture below.

Backed-up VRAM area in TH04's player character selection

These take up just 4 KB of heap memory… but require custom blitting functions, and expanding this explicitly hardcoded approach to TH05's 4 characters would have been pretty annoying. So, rather than, uh, not explicitly hardcoding it all, ZUN decided to just be lazy with the backup area in TH05, saving the entire 640×400 screen, and thus spending 128 KB of heap memory on this rather simple selection shadow effect. :zunpet:


So, this really wasn't something to quickly get done during the first half of a push, even after already having done TH05's equivalent of this menu. But since life is very busy right now, I also used the occasion to start addressing another code organization annoyance: master.lib's single master.h header file.

So, time to start a new master.hpp header that would contain just the declarations from master.h that PC-98 Touhou actually needs, plus some semantic (yes, semantic) sugar. Comparing just the old master.h to just the new master.hpp after roughly 60% of the transition has been completed, we get median build times of 319 ms for master.h, and 144 ms for master.hpp on my (admittedly rather slow) DOSBox setup. Nice!
As of this push, ReC98 consists of 107 translation units that have to be compiled with Turbo C++ 4.0J. Fully rebuilding all of these currently takes roughly 37.5 seconds in DOSBox. After the transition to master.hpp is done, we could therefore shave some 10 to 15 seconds off this time, simply by switching header files. And that's just the beginning, as this will also pave the way for further #include optimizations. Life in this codebase will be great!


Unfortunately, there wasn't enough time to repay some of the actual technical debt I was looking forward to, after all of this. Oh well, at least we now also have nice identifiers for the three different boldface options that are used when rendering text to VRAM, after procrastinating that issue for almost 11 months. Next up, assuming the existing subscriptions: More ridiculous decompilations of things that definitely weren't originally written in C, and a big blocker in TH03's MAIN.EXE.

📝 Posted:
💰 Funded by:
[Anonymous], -Tom-
🏷️ Tags:

So, TH05 OP.EXE. The first half of this push started out nicely, with an easy decompilation of the entire player character selection menu. Typical ZUN quality, with not much to say about it. While the overall function structure is identical to its TH04 counterpart, the two games only really share small snippets inside these functions, and do need to be RE'd separately.

The high score viewing (not registration) menu would have been next. Unfortunately, it calls one of the GENSOU.SCR loading functions… which are all a complete mess that still needed to be sorted out first. 5 distinct functions in 6 binaries, and of course TH05 also micro-optimized its MAIN.EXE version to directly use the DOS INT 21h file loading API instead of master.lib's wrappers. Could have all been avoided with a single method on the score data structure, taking a player character ID and a difficulty level as parameters…

So, no score menu in this push then. Looking at the other end of the ASM code though, we find the starting functions for the main game, the Extra Stage, and the demo replays, which did fit perfectly to round out this push.

Which is where we find an easter egg! 🥚 If you've ever looked into 怪綺談2.DAT, you might have noticed 6 .REC files with replays for the Demo Play mode. However, the game only ever seems to cycle between 4 replays. So what's in the other two, and why are they 40 KB instead of just 10 KB like the others? Turns out that they combine into a full Extra Stage Clear replay with Mima, with 3 bombs and 1 death, obviously recorded by ZUN himself. The split into two files for the stage (DEMO4.REC) and boss (DEMO5.REC) portion is merely an attempt to limit the amount of simultaneously allocated heap memory.
To watch this replay without modding the game, unlock the Extra Stage with all 4 characters, then hold both the ⬅️ left and ➡️ right arrow keys in the main menu while waiting for the usual demo replay. I can't possibly be the first one to discover this, but I couldn't find any other mention of it.
Edit (2021-03-15): ZUN did in fact document this replay in Section 6 of TH05's OMAKE.TXT, along with the exact method to view it. Thanks to Popfan for the discovery!

Here's a recording of the whole replay:

Note how the boss dialogue is skipped. MAIN.EXE actually contains no less than 6 if() branches just to distinguish this overly long replay from the regular ones.


I'd really like to do the TH04 and TH05 main menus in parallel, since we can expect a bit more shared code after all the initial differences. Therefore, I'm going to put the next "anything" push towards covering the TH04 version of those functions. Next up though, it's back to TH01, with more redundant image format code…

📝 Posted:
💰 Funded by:
Lmocinemod, Blue Bolt, [Anonymous]
🏷️ Tags:

Finally, after a long while, we've got two pushes with barely anything to talk about! Continuing the road towards 100% PI for TH05, these were exactly the two pushes that TH05 MAINE.EXE PI was estimated to additionally cost, relative to TH04's. Consequently, they mostly went to TH05's unique data structures in the ending cutscenes, the score name registration menu, and the staff roll.

A unique feature in there is TH05's support for automatic text color changes in its ending scripts, based on the first full-width Shift-JIS codepoint in a line. The \c=codepoint,color commands at the top of the _ED??.TXT set up exactly this codepoint→color mapping. As far as I can tell, TH05 is the only Touhou game with a feature like this – even the Windows Touhou games went back to manually spelling out each color change.

The orb particles in TH05's staff roll also try to be a bit unique by using 32-bit X and Y subpixel variables for their current position. With still just 4 fractional bits, I can't really tell yet whether the extended range was actually necessary. Maybe due to how the "camera scrolling" through "space" was implemented? All other entities were pretty much the usual fare, though.
12.4, 4.4, and now a 28.4 fixed-point format… yup, 📝 C++ templates were definitely the right choice.

At the end of its staff roll, TH05 not only displays the usual performance verdict, but then scrolls in the scores at the end of each stage before switching to the high score menu. The simplest way to smoothly scroll between two full screens on a PC-98 involves a separate bitmap… which is exactly what TH05 does here, reserving 28,160 bytes of its global data segment for just one overly large monochrome 320×704 bitmap where both the screens are rendered to. That's… one benefit of splitting your game into multiple executables, I guess? :tannedcirno:
Not sure if it's common knowledge that you can actually scroll back and forth between the two screens with the Up and Down keys before moving to the score menu. I surely didn't know that before. But it makes sense – might as well get the most out of that memory.


The necessary groundwork for all of this may have actually made TH04's (yes, TH04's) MAINE.EXE technically position-independent. Didn't quite reach the same goal for TH05's – but what we did reach is ⅔ of all PC-98 Touhou code now being position-independent! Next up: Celebrating even more milestones, as -Tom- is about to finish development on his TH05 MAIN.EXE PI demo…

📝 Posted:
💰 Funded by:
Yanga, Ember2528
🏷️ Tags:

Three pushes to decompile the TH01 high score menu… because it's completely terrible, and needlessly complicated in pretty much every aspect:

In the end, I just gave up with my usual redundancy reduction efforts for this one. Anyone wanting to change TH01's high score name entering code would be better off just rewriting the entire thing properly.

And that's all of the shared code in TH01! Both OP.EXE and FUUIN.EXE are now only missing the actual main menu and ending code, respectively. Next up, though: The long awaited TH01 PI push. Which will not only deliver 100% PI for OP.EXE and FUUIN.EXE, but also probably quite some gains in REIIDEN.EXE. With now over 30% of the game decompiled, it's about time we get to look at some gameplay code!

📝 Posted:
💰 Funded by:
Yanga, Ember2528
🏷️ Tags:

Back to TH01, and its high score menu… oh, wait, that one will eventually involve keyboard input. And thanks to the generous TH01 funding situation, there's really no reason not to cover that right now. After all, TH01 is the last game where input still hadn't been RE'd.
But first, let's also cover that one unused blitting function, together with REIIDEN.CFG loading and saving, which are in front of the input function in OP.EXE… (By now, we all know about the hidden start bomb configuration, right?)

Unsurprisingly, the earliest game also implements input in the messiest way, with a different function for each of the three executables. "Because they all react differently to keyboard inputs :zunpet:", apparently? OP.EXE even has two functions for it, one for the START / CONTINUE / OPTION / QUIT main menu, and one for both Option and Music Test menus, both of which directly perform the ring arithmetic on the menu cursor variable. A consistent separation of keyboard polling from input processing apparently wasn't all too obvious of a thought, since it's only truly done from TH02 on.

This lack of proper architecture becomes actually hilarious once you notice that it did in fact facilitate a recursion bug! :godzun: In case you've been living under a rock for the past 8 years, TH01 shipped with debugging features, which you can enter by running the game via game d from the DOS prompt. These features include a memory info screen, shown when pressing PgUp, implemented as one blocking function (test_mem()) called directly in response to the pressed key inside the polling function. test_mem() only returns once that screen is left by pressing PgDown. And in order to poll input… it directly calls back into the same polling function that called it in the first place, after a 3-frame delay.

Which means that this screen is actually re-entered for every 3 frames that the PgUp key is being held. And yes, you can, of course, also crash the system via a stack overflow this way by holding down PgUp for a few seconds, if that's your thing.
Edit (2020-09-17): Here's a video from spaztron64, showing off this exact stack overflow crash while running under the VEM486 memory manager, which displays additional information about these sorts of crashes:

What makes this even funnier is that the code actually tracks the last state of every polled key, to prevent exactly that sort of bug. But the copy-pasted assignment of the last input state is only done after test_mem() already returned, making it effectively pointless for PgUp. It does work as intended for PgDown… and that's why you have to actually press and release this key once for every call to test_mem() in order to actually get back into the game. Even though a single call to PgDown will already show the game screen again.

In maybe more relevant news though, this function also came with what can be considered the first piece of actual gameplay logic! Bombing via double-tapping the Z and X keys is also handled here, and now we know that both keys simply have to be tapped twice within a window of 20 frames. They are tracked independently from each other, so you don't necessarily have to press them simultaneously.
In debug mode, the bomb count tracks precisely this window of time. That's why it only resets back to 0 when pressing Z or X if it's ≥20.

Sure, TH01's code is expectedly terrible and messy. But compared to the micro-optimizations of TH04 and TH05, it's an absolute joy to work on, and opening all these ZUN bug loot boxes is just the icing on the cake. Looking forward to more of the high score menu in the next pushes!

📝 Posted:
💰 Funded by:
Touhou Patch Center
🏷️ Tags:

🎉 TH04's and TH05's OP.EXE are now fully position-independent! 🎉

What does this mean?

You can now add any data or code to the main menus of the two games, by simply editing the ReC98 source, writing your mod in ASM or C/C++, and recompiling the code. Since all absolute memory addresses have now been converted to labels, this will work without causing any instability. See the position independence section in the FAQ for a more thorough explanation about why this was a problem.

What does this not mean?

The original ZUN code hasn't been completely reverse-engineered yet, let alone decompiled. Pretty much all of that is still ASM, which might make modding a bit inconvenient right now.

Since this push was otherwise pretty unremarkable, I made a video demonstrating a few basic things you can do with this:

Now, what to do for the last outstanding Touhou Patch Center push? Bullets, or resident structures? :thonk: