ReC98

⮜ Blog

⮜ List of tags

Showing all posts tagged

📝 Posted:: 2024-03-09 15:23 UTC
🚚 Summary of:: P0266, P0267, P0268, P0269, P0270, P0271, P0272, P0273, P0274, P0275, P0276, P0277
⌨ Commits:: (mly) eb2b0c8...b356884, (mly) b356884...1c70db0, (mly) 1c70db0...db0c195, (BGM packs) 2f9bce5...45087c2, (Seihou) P0256...a9ca081, (Seihou) a9ca081...8db918f, (Seihou) 8db918f...3de48ab, (Seihou) 3de48ab...9467705, (Seihou) 9467705...241a6c9, (Seihou) 241a6c9...P0275, (Seihou) dbc369f...883ac40, (Seihou) 883ac40...6ac72f3
💰 Funded by:: Ember2528, [Anonymous]
🏷 Tags:

📝 Over two years since the previous largest delivery, we've now got a new record in every regard: 12 pushes across 5 repos, 215 commits, and a blog post with over 14,000 words and 48 pieces of media. 😱 Who would have thought that the superficially simple task of putting SC-88Pro recordings into Shuusou Gyoku would actually mainly focus on deep research into the underlying MIDI files? I don't typically cover much music-related content because it's a non-issue as far as PC-98 Touhou code is concerned, so it's quite fitting how extensive this one turned out. So here we go, the result of virtually unlimited funding and patience:

The SC-88Pro recording controversy
Undefined SysEx behavior
Resolving the controversy, and making a choice (contains personal opinion)
A Unix-style command-line MIDI filter (in Rust BTW)
Visualizing MIDI files (for science, and not for playing them on a keyboard)
Shuusou Gyoku's individual loop quirks 🎺
Rewriting pbg's MIDI code
Putting together the BGM packs
Outgrowing miniaudio (and raging about single-file C libraries for a while)
Remaining implementation details
Pricing changes (and no, not everything's getting more expensive)

So where's the controversy? Romantique Tp obviously made the best and most careful real-hardware SC-88Pro recordings of all of ZUN's old MIDIs, including the original (OST) and arranged (AST) soundtrack of Shuusou Gyoku, right? Surely all I have to do now is to cut them into seamless loops to save a bit of disk space, and then put them into the game? Let's start at the end of the track list with the name registration theme, since it's light on instruments and has an obvious loop point that will be easy to spot in the waveform. But, um… wait a moment, that very first drum note comes a bit late, doesn't it?

This can also be heard in Romantique Tp's YouTube upload.

At a notated tempo of 96 BPM, these first four beats should take exactly 2.5 seconds, which they do in this seamlessly looping softsynth rendering.

That's… not quite the accuracy and perfection I was expecting. But I think I know what we're seeing and hearing there. Let's look at the first few MIDI events on the drum channel:

Delta	Pulse	 Beat	Channel	Event
 +540	   960	  2:000	      1	Controller { CC   0, value   0 }
   +0	   960	  2:000	      1	Controller { CC  32, value   0 }
   +0	   960	  2:000	      1	ProgramChange {  37 }
[…]
   +0	   960	  2:000	      2	Controller { CC   0, value   0 }
   +0	   960	  2:000	      2	Controller { CC  32, value   0 }
   +0	   960	  2:000	      2	ProgramChange {  19 }
[…]
   +0	   960	  2:000	      3	Controller { CC   0, value   0 }
   +0	   960	  2:000	      3	Controller { CC  32, value   0 }
   +0	   960	  2:000	      3	ProgramChange {   6 }
[…]
   +0	   960	  2:000	      4	Controller { CC   0, value   0 }
   +0	   960	  2:000	      4	Controller { CC  32, value   0 }
   +0	   960	  2:000	      4	ProgramChange {   2 }
[…]

Delta	Pulse	 Beat	Channel	Event
   +0	  960	2:000	     10	Controller { CC   0, value   0 }
   +0	  960	2:000	     10	Controller { CC  32, value   0 }
   +0	  960	2:000	     10	ProgramChange {  25 }
   +0	  960	2:000	     10	Controller { CC   7, value 127 }
   +0	  960	2:000	     10	Controller { CC  11, value 127 }
   +0	  960	2:000	     10	Controller { CC  10, value  64 }
   +0	  960	2:000	     10	Controller { CC  91, value  80 }
   +0	  960	2:000	     10	Controller { CC  93, value  40 }
   +0	  960	2:000	     10	NoteOn { Key  42, Vel.  94 }
   +0	  960	2:000	     10	NoteOn { Key  36, Vel. 110 }
   +1	  961	2:001	     10	NoteOn { Key  42, Vel.   0 }
   +0	  961	2:001	     10	NoteOn { Key  36, Vel.   0 }
 +119	 1080	2:120	     10	NoteOn { Key  42, Vel.  34 }
   +1	 1081	2:121	     10	NoteOn { Key  42, Vel.   0 }
 +119	 1200	2:240	     10	NoteOn { Key  42, Vel.  64 }
   +0	 1200	2:240	     10	NoteOn { Key  36, Vel.  64 }

Also, the fact that GS doesn't put its drums on a non-general voice bank and instead relies on external channel configuration to differentiate drums from pitched instruments is making this Yamaha kid uncontrollably furious. 🤬

Yup. That's the sound of a vintage hardware synth being slow and taking a two-digit number of milliseconds to process a barrage of simultaneous Program Change messages, playing a MIDI file that doesn't take this reality into account and expects program changes to happen instantly.
I can only speak from my own experience of writing MIDIs for hardware synths here, but having the first note displaced by 50 ms is very much not the way a composer would have intended the music to be heard if the note is clearly notated to occur on the beat. If you had told me about such an issue when playing one of my MIDIs on a certain synth, I would have thanked you for the bug report! And I would have promptly released a fixed version of the MIDI with the Program Change events moved back by a beat or two. In the case of Shuusou Gyoku's MIDIs, this wouldn't even have added any additional delay in-game, as all of these files already start with at least one beat of leading silence to make room for setting Roland-specific synth parameters.

OK, but that's just a single isolated bass drum hit. If we wanted to, we could even fix this issue ourselves by splicing the same note from around the loop end point. Maybe this is just an isolated case and the rest of Romantique Tp's recordings are fine? Well…

Again, check Romantique Tp's YouTube upload for proof.

By the way, this seamless audio player is what consumed most of the two website pushes this time. The rest went to the slightly redesigned main page, whose progress bars now use the cap bar style and the GitHub badge colors.

This one is even worse. Here, the delay is so long relative to the tempo of the piece that the intended five drum hits pretty much turn into four.

This type of issue doesn't even have to be isolated to the very beginning of a piece. A few of the tracks in both the OST and AST start with an anacrusis on just one or two channels and leave the Program Change event barrage at the beginning of the first full measure. In 幻想科学～ Doll's Phantom for example, this creates a flam-like glitch where the bass on channel 2 is pretty much on time, but the crash hit on channel 10 only follows 50 ms later, after the SC-88Pro took its sweet time to process all the Program Change events on the channels between:

This is from the arranged soundtrack for a change. In that one, ZUN at least fixed the issue in the final three MIDIs (シルクロードアリス, 魔女達の舞踏会, and 二色蓮花蝶　～ Ancients) that closed out this rearranging project in May 2001, which spread out their per-channel setup events over at least a single measure before playing any note.

Let's listen to that at half speed:

Romantique Tp's YouTube upload.

Still on point.

Sure, all of this is barely noticeable in casual listening, but very noticeable if you're the one who now has to cut these recordings into seamless loops. And these are just the most obvious timing issues that can be easily pinpointed and documented – the actual worst aspects are all the minor tempo and timing fluctuations throughout most of the pieces. With recordings that deviate ever so slightly from the tempo defined in the MIDI files, you can no longer rely on mathematically exact sample positions when cutting loops. Even if those positions do work out from time to time, there'd pretty much always be a discontinuity in the waveform at both ends of the loop, manifesting as a clearly audible click. In the end, the only way of finding good loop points in existing recordings involves straining your ears and listening very, very closely to avoid any audible glitches. 😩

But if you've taken a look at the second tabs in the clips above, you will have noticed that we don't necessarily have to be stuck with recordings from real hardware. In late 2015, Roland released Sound Canvas VA, a VST plugin that emulates the classic core of Roland's old Sound Canvas lineup, including the SC-88Pro. As long as we run such a software synthesizer through a quality VST host, a purely software-based solution should be way superior for recording looped BGM:

By moving from real-time recording to an offline rendering paradigm, we get perfectly accurate note timing, as it no longer matters how long the synth takes to produce each output sample.
We stay entirely in the digital realm instead of going from digital (SC-88Pro) to analog (RCA cable) to digital (line-in recording) again, removing any chance for noise or distortion to ruin audio quality.
We get to directly render at 44,100 Hz instead of being limited to the 32,000 Hz signal coming out of the SC-88Pro's DAC. This can be easily noticed in the half-speed video above, whose SCVA version retains significantly more sibilant high-frequency content compared to the more muffled sound of Romantique Tp's recording.
Good recordings from real hardware involve careful observation of the output volume. You ideally want to set the synth's master volume to a level that's simultaneously quiet enough to avoid clipping and loud enough to ensure the highest possible signal-to-noise ratio, which requires lots of trial-and-error attempts for each individual track. With a softsynth, this becomes a non-issue: 📝 As we've seen during the last look at Shuusou Gyoku's sound, rendering to 32-bit floating-point .WAV files would preserve all waveform information above 0 dBFS. So we could simply render everything at Sound Canvas VA's default maximum volume and later algorithmically scale the volume of the rendered files to fit within 0 dBFS, without having to touch SCVA's sluggish interface.
- Doing that also makes it feasible to preserve loudness differences between the pieces of a soundtrack instead of eradicating them by normalizing the volume of each individual track to the digital maximum.
Finally, it's much more time-efficient. We simply hit foobar2000's Convert button and get all MIDIs rendered within a few seconds each, instead of having to wait the entire length of a piece.

Any drawbacks? For our use case, all of them are found in the abysmal software quality of everything around the synth engine. As it's typical for the VST industry, Sound Canvas VA is excessively DRM'd – it takes multiple seconds to start up, and even then only allows a single process to run at any given time, immediately quitting every process beyond the first one with a misleading Parameter File1 Read Error message box. I totally believe anyone who claims that this makes SCVA more annoying than real hardware when composing new music. Retro gamers also dislike how Roland themselves no longer sells the 32-bit builds they used to offer for the first few versions. These old versions are now exclusively available through resellers, or on the seven seas.
But as far as the SC-88Pro emulation is concerned, there don't seem to be any technical reasons against it. There is a long thread over at VOGONS discussing all sorts of issues, but you have to dig quite deep to find any clear descriptions of bugs in SCVA's synth engine. Everything I found either only applies to the SC-55 emulation and not the SC-88Pro, was fixed by Roland in the meantime, or turned out to be a fixable bug in a MIDI file.

~~Nevertheless, Romantique Tp has a very negative opinion about SCVA, getting quite angry and defensive in this instance where someone favorably compared SCVA to their recordings.~~ Edit (2024-03-10): These days, Romantique Tp has a much more favorable opinion on SCVA as well.
8 years after their release, however, the community unanimously accepts the Romantique Tp recordings as the intended way to listen to ZUN's old MIDIs, so choosing Sound Canvas VA for our Shuusou Gyoku builds might be a bad idea purely for PR reasons. At best, people would slightly wonder why I intentionally went with the opposite of the accepted reference recordings, but at worst, this entire project could face a violent backlash…

But wait, we've already heard one obvious difference between the real SC-88Pro and Sound Canvas VA. Let's listen to the very first clip again:

Ha! You can clearly hear a panning echo in the real-hardware recording that is missing from the Sound Canvas VA rendering. That's an obvious case of a core system effect not being reproduced correctly. If even that's undeniably broken, who knows which other subtle bugs SCVA suffers from, right? Case closed, Romantique Tp was right all along, SCVA is trash, real hardware reigns supreme

Actually, let's look closer into this one. Panning delay effects like this are typically reverb-related, but General MIDI only specifies a single controller to specify the per-channel reverb level from 0 to 127. Any specific characteristics of the reverb therefore have to be configured using vendor-specific system-exclusive messages, or SysEx for short.
So it's down to one of the four SysEx messages at the beginning of the MIDI file:

Delta	Pulse	 Beat	Event
   +0	    0	0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)
 +240	  240	0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)
 +120	  360	0:360	SysEx(41 10 42 12 40 01 33 0F 7D F7)
  +60	  420	0:420	SysEx(41 10 42 12 40 01 34 30 5B F7)

Since these byte strings represent Roland-specific instructions, we can't learn anything from a raw MIDI event dump alone here. No problem though, let's just load these files into some old MIDI sequencer that targeted Roland synths, open its MIDI event list, and then they will be automatically decoded into a human-readable representation…
…or at least that's what I expected. In Yamaha land, XGworks has done that for Yamaha's own XG SysEx messages ever since 1997:

Screenshot of the MIDI Event Viewer in Yamaha's XGworks, showing off its automatic XG SysEx decoding feature. — No configuration required. You can even edit the textual `Value1` representation and XGworks parses it back into the closest supported value!

But for Roland synths, there's… nothing similar? Seriously? 😶 Roland fanboys, how do you even live?! I mean, they are quick to recommend the typical bloated and sluggish big-name DAWs that take up multiple gigabytes of disk space, but none of the ones I tried seemed to have this feature. They can't have possibly been flinging around raw byte strings for the past 33 years?!
But once you look more into today's MIDI community, it becomes clear that this is exactly what they've been doing. Why else would so many people use the word complicated to describe Roland SysEx, or call it an old school/cryptic communication protocol in hexadecimal format? The latter is particularly hilarious because if you removed the word cryptic, this might as well describe all of MIDI, not just SysEx. Everything about this is a tooling issue, and Yamaha showed how easily it could have been solved. Instead, we get Sound Canvas experts, who should know more about the ecosystem than I do, making the incredible mental leap from "my DAW doesn't decode or easily generate SysEx" to "SysEx is antiquated" to "please just lift up these settings to the VST level and into my proprietary DAW's proprietary project format, that would be so much better"…

Thankfully that's not entirely true. After some more digging and configuration, I found a somewhat workable solution involving a comparatively modern sequencer called Domino:

Download either Domino's original Japanese version or the partial English translation. The .zip file on the release page contains a full standalone build.
Open the File → Preferences menu and associate your MIDI output device with a module map. This makes sense for SysEx encoding/generation since it can limit the options in the UI to what's actually available on your target hardware, but is also required for selecting the respective SysEx map into Domino's SysEx decoder. There is no technical reason for this because SC-88Pro SysEx messages can be uniquely identified by the three vendor, device, and model ID bytes that every message starts with, but would be too easy and user-friendly. The perception of SysEx being a black art must be upheld at all costs.

I've kept the garbled text of the partial translation to emphasize the sheer amount of jank involved in this entire process.
Load a MIDI file and let Domino "analyze" it:
Strangely enough, this will take quite a while – on my system, this analysis step runs at a speed of roughly 4.25 KB/s of MIDI data. Yes, kilobytes.
Unfortunately, "control change macro restoration" also seems to mean that you don't get to see any raw bytes when selecting the respective MIDI track in the UI, but at least we get what we were looking for:
…for the most part?
```
Pulse	Event
    0	SysEx(41 10 42 12 40 00 7F 00 41 F7)
  240	SysEx(41 10 42 12 40 01 30 14 7B F7)
  360	SysEx(41 10 42 12 40 01 33 0F 7D F7)
  420	SysEx(41 10 42 12 40 01 34 30 5B F7)
```

Alright, that's something we can work with. The GS Reset message is something that every Roland GS MIDI should start with, but it's immediately followed by a message that Domino failed to decode? The two subsequent reverb parameters make sense, but panning delays typically have more parameters than just a reverb level and time.
That unknown SysEx message shares much of the same bytes with the decoded ones though. So let's do what we maybe should have done all along, return to caveman, and check the SC-88Pro manual:

The relevant section from page 194. We can see how the address and value correspond to bytes 5-7 and 8 in the SysEx messages. Byte 9 is a checksum and byte 10 signals the end of the message.

And that's where we find what this particular issue boils down to. The missing SysEx message is clearly intended to be a Reverb Macro command, whose value can range from 0 to 7 inclusive on the SC-88Pro, but ZUN tries to specify Reverb Macro #14h, or 20 in decimal. The SC-88Pro manual does not specify what happens if a SysEx message wants to write an invalid value to a valid address, which means that we've firmly entered the territory of undefined behavior.
Edit (2024-03-10): Romantique Tp confirmed that the real SC-88Pro clamps these Reverb Macro IDs to the supported range of 0-7. Therefore, the appropriate course of action for guaranteeing the same sound on other Roland synths would be to fix the MIDI file and specify Reverb Macro #7 instead. But since this behavior remains technically undefined, we can still argue about ZUN's intention behind specifying the Reverb Macro like this:

Clearly, ZUN did want to specify a valid Reverb Macro, but made a typo when manually entering the SysEx byte string, as he was forced to do thanks to terrible tooling. He clearly liked the resulting sound though, so the track should still be preserved with the panning reverb intact.
Clearly, the typical behavior for MIDI synths is to ignore invalid and unsupported SysEx messages, because validating user input is an important characteristic of quality software. This is what SCVA does, and what we hear in its rendering is the default hall reverb with ZUN's level and time adjustments. Therefore, SCVA is right, and the fact that we get a panning delay on the real SC-88Pro is a bug in real hardware.
Clearly, ZUN did not care enough about the reverb to specify a valid Reverb Macro. Whether we get the default reverb or a panning delay is an irrelevant performance detail, and does intentionally not matter when it comes to the intended sound of this track – especially since these four SysEx messages are the full extent of Roland GS-specific sound design in this piece, and the rest of it only uses standard MIDI features.

In fact, 32 out of the 39 MIDIs across both of Shuusou Gyoku's soundtrack use this invalid Reverb Macro. The only ones that don't are

both versions of Gates' theme (天空アーミー), which use the equally invalid Reverb Macro #11,
both versions of Milia's theme (プリムローズシヴァ), which use Reverb Macro #0 (Room 1),
and, again, the three arranged MIDIs that ZUN released last (シルクロードアリス, 魔女達の舞踏会, and 二色蓮花蝶　～ Ancients), which feature a more detailed effect setup with custom chorus and EQ settings. In the case of Reimu's theme, these settings are even commented within the MIDI file.

And that's where this quest seemed to end, until Romantique Tp themselves came in and suggested that I take a closer look at the GS Advanced Editor, or GSAE for short.

The splash screen of GSAE version 4.01e. — Make sure to connect a MIDI input device before starting GSAE, or it will silently crash immediately after this splash screen. At least it accepts *any* controller, so this might just be a bug instead of the typical user-hostile kind of hardware dongle DRM that is pervasive in today's synth industry. 1999 would seem a bit too early for that, thankfully.

I was aware of this tool, but hadn't initially considered it because it's always described as just a SysEx generator/encoder. In fact, the very existence of such a tool made no sense to me at first, and seemed to prove my point that the usability of GS SysEx was wholly inferior to what I was used to in Yamaha land. Like, why not build at least a tiny and stripped-down MIDI sequencer around this functionality that would allow you to insert SC-88Pro-specific messages at any point within a sequence, and not just the beginning? I can see the need for such a tool in today's world of closed-source DAWs where hardware MIDI modules are niche and retro and are only kept alive by a small community of enthusiasts. But why would its developers guarantee that MIDI composers would have to hop between programs even back in 1997? I can only imagine that they saw how every just slightly advanced MIDI sequencer or DAW back then already used its own project format instead of raw Standard MIDI Files, and assumed that composers would therefore be program-hopping anyway?
However, GSAE does support the import of settings from a MIDI file and features a SysEx history window that decodes every newly processed Roland SysEx byte string, which is all I was looking for. So let's throw in that same MIDI and…

Screenshot of GSAE's SysEx history window,showing the results of sending a GS Reverb Macro #20 message — That's the result of sending just the single `F0 41 10 42 12 40 01 30 14 7B F7` message at the top.

Now that's some wild numbers. An equally invalid Reverb Character, and Reverb Level and Time values that even exceed their defined range of 0-127? Could it be that GSAE emulates the real-hardware response to invalid Reverb Macros here, and gives us the exact reverb setting we can hear in Romantique Tp's recording? This could even be the reason why GSAE is still used and recommended within today's Roland MIDI sequencing scene, and hasn't been supplanted by some more modern open-source tool written by the community.

In any case, these values have to come from somewhere, so let's reverse-engineer GSAE and figure out the logic behind them. Shoutout to IDR for being a great help with its automatic generation of IDC debug symbols for the Delphi standard library, and even including a few names of application-level widget class methods by reading Delphi-specific type information from the binary. This little sub-project made me also come around to appreciating Ghidra, whose decompiler and data type manager helped a lot and allowed me to find the relevant code section within just a few hours.
A~nd it turns out that the values all come from out-of-bounds accesses into arrays on the stack. If we combine 25, 235, and 132 back into a 32-bit value, we get 0x19EB84, which is the virtual address of the relevant function's stack frame base pointer.
But it gets even more hilarious: If you enable debug text output via Option → Other Options → SMF → Insert text events to setup measures and export these imported settings back into a MIDI file, GSAE not only retains these invalid Reverb Macro IDs, but stringifies them via a simple lookup into a hardcoded string pointer array, again without any bounds checks. The effects of this are roughly what you would expect:

Reverb Macro IDs between 8 and 27 simply insert wrong strings from adjacent string pointer arrays
Reverb Macro 28 crashes GSAE
Reverb Macro 64 causes GSAE to vomit 65,512 bytes of garbage into the MIDI file

In the end, we have Domino not decoding the Reverb Macro message, and GSAE, the premier SysEx tool for Roland synths, responding to it in even more undefined and clearly bugged ways than real hardware apparently does. That's two programs confirming that whatever ZUN intended was never supposed to work reliably. And while we still don't know exactly what these reverb parameters are supposed to be, these observations solve the mystery as far as I'm concerned, and solidify my personal opinion on the matter.

So what do we do now, and which version do we go with? Optimally, I'd offer both versions and turn this controversy into a personal choice so that everybody wins… and Ember2528 agreed and generously provided all the funding to make it happen. 💸
If you haven't picked your favorite yet, here are some final arguments:

The Romantique Tp recordings certainly have something going for them with their provenance of coming from real hardware, and the care that Romantique Tp put into manually recording every single track, warts and all. I wholeheartedly agree that preserving the raw sound of playing the MIDI files into the hardware without thinking about bugs or quirks is an important angle to take when it comes to preservation. It's good that these recordings exist – after all, you wouldn't know which musical elements you'd possibly be missing in an emulation if you have nothing to compare it to. Even the muffled sound in the half-speed clip above can be an argument in their favor, as the SC-88Pro's DAC operates at 32 kHz and you wouldn't expect any meaningful frequency content between 16,000 and 22,050 Hz to begin with. Any frequency content in that range that does remain in Romantique Tp's recording is simply 📝 rolled-off imaging noise added during the ADC's resampling process.
All this is why they are a definite improvement over kaorin's 2007 recordings of only the AST, which used to be the previous reference recordings within the community. Those had all of the same timing issues and more, in addition to being so excessively volume-boosted that 0.15% of the samples across the entire soundtrack ended up clipped. That's 6.25 seconds out of 68:39m being lost to pure digital noise.

Most importantly though: ZUN himself said that only the real SC-88Pro will play back these files as he intended them to sound. This quote is likely where the tagline of Romantique Tp's entire recording project came from in the first place:

> 全てのデエタはSC-88ProもしくはSC-8850（ロオランド社）にて最適に聴けるように調整してあります > それ以外の音源でも、作者の意図した音ではない場合があります。 — ZUN on 東方幻想的音楽, his old MIDI page

However. ZUN is not exactly known for accurately and carefully preserving the legacy of his series, or really doing anything beyond parading his old games as unobtainable showpieces at conventions. With all the issues we've seen, preferring real hardware is ultimately just that: an angle, and a preference. This is why I disagree with the heavy and uncritical advertising that is mainly responsible for elevating the Romantique Tp recordings to their current reference status within the community, especially if at least half of the alleged superiority of real hardware is founded on undefined behavior that can easily be fixed in the MIDI files themselves if people only bothered to look.

Here's where I stand: MIDI files are digital sheet music first and foremost, not an inferior version of tracker modules where the samples are sold separately. As such, the specific synth a MIDI file was written for is merely a secondary property of the composition – and even more so if the MIDI file contains little to nothing in terms of sound design and mostly restricts itself to the basic feature set of General MIDI. In turn, synth quirks and bugs are not a defined part of the composition either, unless they are clearly annotated and documented in the file itself. And most importantly: If the MIDI file specifies a certain timing and a recording fails to reproduce that timing, then that recording is not an accurate representation of the MIDI file.
In that regard, Sound Canvas VA is not only the closest alternative to the real thing, as a few people in the MIDI and retrogaming scene do have to admit, but superior to the real thing. I'll gladly take clarity and perfect timing accuracy in exchange for minor differences in effects, especially if the MIDI file does not explicitly and correctly define said effects to begin with. If I want a panning delay as part of the reverb, I add the respective and correct SysEx message to define one – and if I don't, I do not care about the reverb. You might still get a panning delay on a certain synth, and you might even prefer how it sounds, but it's ultimately a rendering artifact and not a consciously intended part of the composition. In that way, it's similar to the individual flavor a musician adds to a performance of a piece of classical music.
And as far as the differences in frequency response and resonant filters are concerned: In Yamaha land, these are exactly the main distinguishing factors between vintage WF-192XG sound cards (resembling the real SC-88Pro in these characteristics) and the S-YXG50 softsynth (resembling SCVA). Once I found out about that softsynth and how much clearer it sounded in comparison, I sold that old PCI sound card soon after.

In the interest of preservation though, there's still one more unexplored solution that could be the ideal middle ground between the two approaches:

Play the MIDIs through a real-hardware SC-88Pro again
Capture the actually observed system-exclusive settings that fall within the synth's supported and documented ranges
Insert them back into the MIDI file, creating a new bugfixed version
Re-record that bugfixed version through Sound Canvas VA

Edit (2024-03-10): And since Romantique Tp has confirmed what exactly happens on real hardware, I'm going to do exactly that. These bugfixed Sound Canvas VA renderings will be a free bonus of the single next Shuusou Gyoku push, and will add another angle to the preservation of these soundtracks. In the meantime though, the Sound Canvas VA packs will sound like they do in the preview videos above.

Or, you know… Maybe none of this actually matters. Here's beatMARIO streaming some Shuusou Gyoku gameplay using what looks like a real-hardware SC-8850, which plays these MIDIs with occasionally noticeably different instrument patches and no panning delay in the name registration theme, and he still enjoyed every second of it. Imagine undefined SysEx behavior not even being consistent within the same family of Roland synths… nah, I'm done arguing, let's get back to the actual work and cut some loops.

Just to be clear: I'm not suggesting that Romantique Tp should have been the one to cut their recordings into loops, or even just the one who defined where the loop points are supposed to be. On the surface, this seems to be a non-issue, and you'd just pick a point wherever each track appears to loop, right? But with 39 MIDIs to cut and all the financial support from Ember2528, it made sense to also solve this problem more thoroughly, and algorithmically detect provably correct loop points for all of these files. Who knows, maybe we even find some surprises that make it all worth it?
This is the algorithm I came up with:

At a basic level, we loop over the list of MIDI events and return the earliest and longest subrange that is immediately followed by an identical copy.
MIDI players, however, need loop point definitions that use MIDI pulse units rather than event list indices. This is especially necessary for multi-track/SMF Type 1 sequences, which would otherwise require one loop start/end index pair per track, and then it still wouldn't work because some of the tracks might not even have an event at the loop start/end point. This requires the detection algorithm and the player to agree on how to map event indices to time points and back, and simply going for the first event of each pulse (i.e., any event with a nonzero delta time) makes the most sense here. In turn, we can skip any potential start or end events that have a delta time of 0, speeding up the algorithm significantly for typical compositions with a high degree of polyphony.
Naively considering just the raw MIDI events works for MIDI playback. But as soon as we want to cut a recording based on the detected loop points, we need to account for the fact that MIDI playback is inherently stateful. Each of the 16 channels at the protocol level features at least the 128 continuous controllers (CCs) with a 7-bit state, the 14-bit pitch bend controller, and the 7-bit instrument program value, in addition to the global tempo of the piece. As a result, two ranges of events might look identical, but can still sound differently if the events before the first range changed one piece of state which is then only touched again near the end of that range. This requires us to track the full MIDI state at both the start and end of a loop, and reject any potential loop that differs in these states:
```
Delta	Pulse	  Beat	Event
   +0	 1200	 2:240	Controller { CC 7, value  50 }

+240	 1440	 3:000	NoteOn { Key 60 }
+240	 1680	 3:240	NoteOn { Key 65 }
+240	 1920	 4:000	NoteOn { Key 67 }
+240	 2160	 4:240	NoteOn { Key 70 }
+240	 2400	 5:000	Controller { CC 7, value 100 }
  +0	 2400	 5:000	NoteOn { Key 67 }
+240	 2640	 5:240	NoteOn { Key 65 }

+240	 2880	 6:000	NoteOn { Key 60 }
+240	 3120	 6:240	NoteOn { Key 65 }
+240	 3360	 7:000	NoteOn { Key 67 }
+240	 3600	 7:240	NoteOn { Key 70 }
+240	 3840	 8:000	Controller { CC 7, value 100 }
  +0	 3840	 8:000	NoteOn { Key 67 }
+240	 4080	 8:240	NoteOn { Key 65 }

+240	 4320	 9:000	NoteOn { Key 60 }
+240	 4560	 9:240	NoteOn { Key 65 }
+240	 4800	10:000	NoteOn { Key 67 }
+240	 5040	10:240	NoteOn { Key 70 }
+240	 5280	11:000	Controller { CC 7, value 100 }
  +0	 5280	11:000	NoteOn { Key 67 }
+240	 5520	11:240	NoteOn { Key 65 }
```
In this example, a naive event-level scan would detect a loop between beats 3 and 6 as the same events are immediately repeated between beats 6 and 9. However, the piece starts with the first four notes at a channel volume of 50, which is only set to its later value of 100 on beat 5. Therefore, the actual loop ranges from beat 5 to 8. In turn, the piece needed to be at least 11 beats long to include the full second copy of the looped events and prove the loop as such.
This check can be a bit too strict in some cases, though. A channel might start with one of its CCs at a specific value but then change the same CC to a different value at a later point before playing the first note. In such a case, the detected loop would be delayed to the second CC change even though the initial CC value has no impact on the sound. By filtering these redundant CC changes, we get to move the loop start point of a few tracks (original 夢機械　～ Innocent Power and arranged 魔法少女十字軍) back by a few seconds, to the position you'd expect.
Finally, we reject any overlong loops that themselves fully consist of multiple successive copies of the first N events.
Shuusou Gyoku's original MIDI files hide the original game's lack of MIDI looping by simply duplicating the looping sections enough times so that a typical player won't notice. The algorithm we have so far, however, would return a much longer loop if a MIDI file contains more than three successive copies of a looping section. The original version of ハーセルヴズ in particular repeats its 8 looping bars a total of 15 times before the MIDI ends, and this condition is necessary to detect the actual 8-bar loop instead of a 56-bar one.

Of course, this algorithm isn't perfect and won't work for every MIDI file out there. It doesn't consider things like differently ordered events within the same MIDI pulse, (non-)registered parameter numbers, or the effect that SysEx messages can have on the state of individual channels. The latter would require the general SysEx decoding logic that I would have liked to have for the research above… actually, let's add an issue and add the project to the order form. I'd really like to see a comprehensive open-source cross-vendor SysEx decoder library in my lifetime.

As for the implementation, I was happy to write some Rust again for a change, as it's a great fit for these standalone greenfield command-line tools that don't have to directly interact with the legacy C++ code bases that this project usually deals with. It's even better if the foundational functionality is not just available in a crate, but in four, with the community already having gone through multiple iterations to arrive at a tried and tested winner. Who knows, maybe I even get to rewrite this website in it one day? Just for the sheer meme value of doing so, of course.
I also enjoyed this a lot from a technical point of view:

You might think that Rust's typical safety guarantees don't matter for the problem at hand. But then you accidentally write -= instead of += for a u32 that starts out at 0, and Rust immediately panics instead of silently underflowing to u32::MAX. This must have saved me at least 5 minutes of debugging the resulting logic error.
As it turns out, my loop detection algorithm is embarrassingly parallel. You might initially think about it in a sequential way because we always want the earliest occurrence of the longest repeating section of MIDI events, which means that each new loop candidate further into the track has to be longer than the previous one. But since we always iterate over the entire MIDI, it makes perfect sense to divide and conquer the problem. Let's split the list of possible loop end points into equal chunks, scan them all in parallel for the earliest and longest loop within that chunk, and then pick the earliest and longest loop among those intermediate results as the final one. In Rust, you don't even have to think much about the chunks, as all of that can be easily done by replacing the iteration with Rayon's parallel fold and adding a reduce() with the same condition for the final step. This sped up the algorithm by exactly the number of cores in my system.

This algorithm works well for the long MIDI files of Shuusou Gyoku's OST that all contain multiple duplicates of their loop section, but it quickly reaches its limit with the AST. Following the classic two-loop + fade-out format, that soundtrack was meant to be played back in generic MIDI players, and not to actually be put back into the game in looped form. Since the loop algorithm did, in fact, find inconsistencies even in the OST, two copies of the apparent loop are sometimes not enough to prove cases where the actual loop ends much later than you think it does. In a few cases, it would be enough to simply remove all volume change events from the fade-out to prove the actual loop, but in others, the algorithm would need MIDI event data far past the end of the fade-out.

However, just giving up and not looping any of these tracks would be equally unfortunate. So how about shifting the question, from what's the best loop in this MIDI file to what's the best loop if the MIDI didn't fade out and instead repeated its apparent second loop a third time? As long as the detected loop in such a pre-processed file ends before the repeated range, it's still a valid loop in terms of the unmodified original.
Ideally, we want to do this pre-processing programmatically with the same Rust library instead of manually editing the MIDI. Many sequencers (and especially XGworks) apply significant changes to a MIDI file's internal structure when saving its internal representation back to a MIDI file, which might even mess with our loop algorithm. So it would be very nice to have a more trustworthy tool that applies only the edit we actually want, and perfectly retains the rest of the MIDI.

And that's how this sub-project turned into a small suite of command-line MIDI operations in the classic Unix filter/pipeline style: Each command reads a MIDI file from stdin, transforms it, and outputs text or the resulting MIDI file on stdout. This way, we gain maximum transparency and reproducibility as I can document the unique pre-processing steps for each AST track by simply providing the command lines. And sure, we're re-encoding and re-decoding the full MIDI sequence at every step along such a pipeline, but computers are fast, Rust and the midly library in particular are ⚡ blazingly fast ⚡, and the usability benefits of this pipeline model far outweigh any theoretical performance drops.
Here's the full list of commands that made it into the resulting mly tool:

cut: Extremely basic removal of MIDI events within a certain range.
dump: Dumps all MIDI events into a textual table. All event lists in this blog post are based on this output.
duration: Shows the duration of a MIDI file in pulses, beats, seconds, and PCM samples.
filter-note: Removes all Note On events within a certain range, retaining all other events. This allows us to generate separate intro and loop MIDIs, whose renderings we can then splice back into a single loopable waveform with no discontinuities, which is not guaranteed when rendering a single MIDI file. This provides the last missing piece needed for rendering perfect, sample-accurate loops through Sound Canvas VA.
loop-find: The loop detection algorithm described above.
loop-unfold: Duplicates MIDI events from a given point to the end of the track. A budget solution for the problem of creating synthetic loops – arbitrary copying of arbitrary subranges to arbitrary destinations would have been undeniably nicer, but also much more complex, and I didn't need that full flexibility for the task at hand.
smf0: Flattening multi-track/SMF Type 1 MIDI sequences into single-track/SMF Type 0 ones. Having this conversion as a distinct operation in our toolset allows other operations to exclusively support SMF Type 0 if a Type 1 implementation would either take significant additional effort or just duplicate the Type 0 flattening algorithm. This group of operations includes loop-find, cut, and even the real-time output for duration because tempo events can theoretically occur on any track.

This feature set should strike a good balance between not spending too much of the Shuusou Gyoku budget on tangential problems, but still offering a decent solution for the problem at hand. As a counterexample, the obvious killer feature – deserializing a dump back into a Standard MIDI File – would have gone way past the budget. While there are crates that free you from the need to write manual parsing code for basic data structures, they would instead require a lot of attribute boilerplate – and if the library that provided the structures doesn't already come with these attributes, you now have to duplicate all the structures, and convert back and forth between the original structures and your copies. Not to mention that we'd still have to write code for the high-level structure of the dump output…

If we put it all together, this is what we can do:

$ <ssg_02.mid mly loop-find
Best loop in note space: 4 events (between event #[117, 121[ and [121, 125[)
First note: event    71 / pulse    960 / beat   2:000 / 0:00:800m
Loop start: event   117 / pulse   1680 / beat   3:240 / 0:01:400m
  Loop end: event   121 / pulse   1920 / beat   4:000 / 0:01:600m

$ <ssg_02.mid mly cut 466: | mly loop-unfold 240: | mly -r 44100 loop-find
Track #0: Removing events #[16439, 19881[
Track #0: Repeating events #[8344, 16439[ at the end of the sequence
Best loop in note space: 8095 events (between event #[5625, 13720[ and [13720, 21815[)
First note: event    71 / pulse    960 / beat   2:000 / 0:00:800m
Loop start: event  5625 / pulse  75361 / beat 157:001 / 1:03:531m
  Loop end: event 13720 / pulse 183841 / beat 383:001 / 2:34:726m

Best loop in recording space:  8095 events (between event #[5709, 13804[ and [13804, 21899[)
First note: event    71 / pulse    960 / beat   2:000 / 0:00:800m / sample    35280.00
Loop start: event  5709 / pulse  77280 / beat 161:000 / 1:05:163m / sample  2873667.66
  Loop end: event 13804 / pulse 185760 / beat 387:000 / 2:36:358m / sample  6895375.27

Translation:

The best loop found in the raw MIDI file spans 4 events and 200 milliseconds. Clearly, this is not the loop we're looking for.
Let's cut off all events from the start of the fade-out to the end, do a loop-unfold copy of all events from the position during the apparent second loop that corresponds to where the fade-out started, and try looking for a loop in that modified MIDI.
The resulting loop is 1:31m long, which is exactly what we were hoping to find.
- The note space loop represents the earliest possible event range with equivalent per-channel controller and pitch bend state at both ends. This loop is only appropriate for MIDI players, as its bounds can fall into the middle of notes that are played with a different channel state at the start and end of the loop. This is why it doesn't show any sample positions.
- The recording space loop ensures that this doesn't happen. It's also always placed on a Note On event with non-zero velocity, which eases the splicing of separate filter-note recordings. This way, it's enough to remove leading silence from the loop part and mix it exactly at the indicated sample position.
- The detected loop is also nowhere close to the cut point at beat 466, matching our condition for validity. All events within the loop came from ZUN's original composition, and the cut/loop-unfold combo merely provided the remaining 63% of events necessary to prove this loop as such.

So, where are these loop quirks that justify why some of these audio files are longer than you'd think they should be? Just listing them as text wouldn't really communicate just how minor these are. It would be much nicer to visualize them in a way that highlights the exact inconsistencies within a fixed range of MIDI measures. Screenshots of MIDI sequencer or DAW windows won't capture these aspects all too well because these programs are geared toward fine-grained editing of single tracks, not visualization of details across all channels.

Screenshot of the first 8 measures of Shuusou Gyoku's Stage 1 theme (フォルスストロベリー) in its OST version, as visualized by REAPER's piano roll — REAPER's piano roll nicely snaps to a certain range, but good luck picking out the individual lines from the single volume lane at the bottom of the screen, or spotting a 7-point difference. Not to mention that CC #11 (Expression) makes up an equal part of a channel's final perceived volume, which is the metric we'd *actually* want to visualize.

Typical MIDI visualizers, however, are on the complete opposite end of the spectrum. In recent years, MIDI visualization has become synonymous with the typical Synthesia style of YouTube videos with a big keyboard at the bottom, note bars flying in from the top, and optional fancy effects once those notes hit the top of the keyboard. The Black MIDI community has been churning out tons of identically looking MIDI visualizers in recent years that mainly seem to differ in the programming language they're written in, and in how well they can cope with the blackest of black MIDIs.
Thankfully, most of these visualizers are open-source and have small and manageable codebases. The project with the most GitHub stars and the most generic name seemed to be the best starting point for hacking in the missing features, despite using GLSL shaders which I had no prior experience with. It was long overdue that I did something with GLSL though – it added a nice educational aspect to these hacks, and it still was easier than deciphering whatever the fastest and hyper-optimized Rust visualizer is doing.
Still, this visualizer needed a total of 18 small features and bugfixes to be actually usable for demonstrating Shuusou Gyoku's loop quirks. As such, these hacks turned into yet another tangential sub-project that could have easily consumed another two pushes if I cleaned up the code and published the result. But that would have really gone way past the budget for something that people might not even care about. So here's what we're going to do:

I've added this MIDI visualizer as a new goal to the order form. This goal is eligible for microtransactions, so you don't have to fund a full push to see the first changes committed and released.
The upstream project seems to have been abandoned recently, which is the perfect excuse for not even trying to merge in my sweeping changes with a series of pull requests. The code sure needs a lot of cleanup and deduplication, and especially a more build system-friendly way of embedding its shader source code.
Every backer who supports this goal with at least 0.1 pushes or microtransactions will get a Windows binary with my current hacked-in changes as a preview, immediately after the purchase. Shoutout to the MIT license for letting me do this 😛
As usual, once the code is done, the final cleaned-up version will be available for free for everyone, in both source code and binary release form.

Alright then! Here's how to read the visualizations:

The transparency of each note represents its velocity multiplied by the channel volume and expression. To spot volume inconsistencies, you'd compare the opacity of equivalent notes in the two ranges.
The X-axis of these visualizations uses linear/real time, so the width of each measure represents the exact time it takes to be played relative to the other measures in the visualized range. To spot tempo inconsistencies, you'd compare the distance between the bar lines.
Notes that are duplicated on two or more channels may be colored differently in the loop start and end views. These are rendering order inconsistencies and don't communicate anything about the MIDI.

Stage 1 theme (フォルスストロベリー), original and arranged version: The string and harmonica channels are slightly louder on the apparent first loop than on the others.

Apparent loop:
0:01m – 1:31m

Actual loop:
1:04m – 2:34m
Mei and Mai's theme (ディザストラスジェミニ), arranged version: The one and only quirk that's caused by different notes – the first loop has an E♭ on the slap bass channel in measure 32, but the second loop has a G♭ in the corresponding measure 72.

Apparent loop:
0:01m – 1:02m

Actual loop:
0:50m – 1:51m
Stage 3 theme (華の幻想紅夢の宙), original and arranged version: The trumpet channel starts out panned to the center of the stereo field (64), before being left-panned by 25% (48) at 1:04m, where it stays for the rest of the track.

Apparent loop:
0:01m – 1:29m

Actual loop:
1:04m – 2:32m

I didn't come up with a good way of visualizing panning in a 2D plane, so you have to trust your ears with this one.
Marie's theme (機械サーカス　～ Reverie), arranged version: Every apparent loop modulates up by a semitone 16 measures before it ends, and remains in that new key at the start of the next loop, so the piece technically doesn't loop at all. The original stays in G♯m throughout.
Stage 5 theme (カナベラルの夢幻少女), original version: The ritardando near the supposed end of the first loop drops from 145 BPM to 118 BPM, but only to 129 BPM in all further loops.

Apparent loop:
0:01m – 1:39m

Actual loop:
1:33m – 3:11m

Yup, that means that the intro part technically makes almost up the entire apparent loop. ZUN replaced the ritardando with instant tempo changes in the arranged version, which moves the loop to its expected place at the start of the track.

The loop start and end points are in the respective next measure past this range.
Stage 6 theme (アンティークテラー), arranged version: The string channel starts out with the maximum expression of 127, but then only goes up to 120 after some fading notes later in the piece, where it stays for the beginning of the second loop.

Apparent loop:
0:01m – 1:53m

Actual loop:
0:13m – 2:05m

Same here.
VIVIT-captured-'s first theme (夢機械　～ Innocent Power), arranged version: Has a unique ending section that starts in Gm and then modulates through Em and Fm before it fades out on F♯m.
VIVIT-captured-'s second theme (幻想科学～ Doll's Phantom), original and arranged version: Another fade-related 127 vs. 120 expression inconsistency, this time on the orange square channel.

Apparent loop:
0:01m – 1:32m

Actual loop:
1:03m – 2:34m
VIVIT-captured-'s third theme (少女神性　～ Pandora's Box), original and arranged version: Another tempo inconsistency: A slightly differently shaped ritardando before the bell tree hit in the supposed first loop.

Apparent loop:

0:40m – 1:57m (original), 0:41m – 1:59m (arranged)

Actual loop:

1:17m – 2:34m (original), 1:18m – 2:37m (arranged)
Marisa's theme (魔女達の舞踏会), arranged version: Has a unique 8-bar ending section that is first played in Cm and then loops in C♯m while fading out.
Ending theme (ハーセルヴズ), arranged version: Probably the best-known one out of these, and I'm talking of course about the beautiful ending section. I'm making the executive decision to not loop this track in-game, and letting it fade to silence instead.

Before we package up these looped soundtracks, let's take a quick look at how they would be shown off in the Music Room. The Seihou Music Rooms carry over the per-channel keyboards from TH05, add the current per-channel volume, expression, and pan pot values, and top it off with a fake spectrum analyzer. All of these visualizations rely on MIDI data, and the Music Room would feel very dull and boring without them. Just look at Kioh Gyoku, whose Music Room basically turns into a still image in WAVE mode.
Retaining these visualizations even when playing waveform BGM was very important for me, and not just because it would make for a unique high-quality feature that would break new ground. It can also double as proof that the waveform versions are, in fact, in perfect sync with both the MIDIs they are based on, and, by extension, the respective stage scripts.
However, this would require the game to process the MIDIs and update the internal visualization state without simultaneously playing them back through the WinMM / MME / midiOut*() API. And just like graphics and text rendering, Shuusou Gyoku's original code came with zero architectural separation between platform-independent processing logic and platform-specific playback…

So I accidentally rewrote almost the entire MIDI code to achieve said separation. This also provided a great occasion to modernize this code and add some much-needed robustness for potential MIDI mods, while retaining the original code's approach of iterating over raw SMF byte streams. It might all have been very excessive for a delivery that was supposed to be just about waveform BGM support, but on the plus side, MIDI output is now portable to any other system's MIDI API as well.

Surprisingly though, it was Shuusou Gyoku's original MIDI timing that quickly turned out to be rather inaccurate, and not the waveforms. The exact numbers vary depending on the piece, but the game played back every MIDI about 1% slower than notated, adding about 2 or 3 seconds to their total playback time after 5 minutes. Tempo changes in particular were the biggest causes of desynchronizations with the waveforms…
To understand how this can happen to begin with, we have to look closer at how you're supposed to use the midiOut*() API. This API is as low-level as it gets, only covering the transmission of a single MIDI message to the selected output device right now. There is no concept of note timing at this low level, so it's completely up to the program to parse delta times and tempo change events out of the MIDI file and correctly time the calls to this API for each MIDI message. With all the code that runs between the API and the actual renderer of the synth for every single message, the resulting timing can only ever be an approximation of the MIDI file. This doesn't really matter for the timescales and polyphony levels of typical music because, again, computers are fast, but such an API is fundamentally unsuitable for accurately playing back even just a moderately complex million-note Black MIDI.

Shuusou Gyoku handles this required manual timing in the simplest possible way: It runs a MIDI processing function (Mid_Proc() in the code) at an interval of 10 ms, which processes and instantly sends out all MIDI events that have occurred at any point within the last 10 ms, maintaining merely their order. This explains not only why the original game incremented its MIDI TIMER by multiples of 10, but also the infamous missing drums when playing the soundtrack through the Microsoft GS Wavetable Synth:

ZUN reduced all drum notes to the minimum possible length allowed by the 480 PPQN pulse resolution of these MIDI files.
In regular music notation, this corresponds to ¹/₁₉₂₀th notes.
While the exact real-time length in purely mathematical terms depends on the tempo of a piece, it only has to be ≥13 BPM for a ¹/₁₉₂₀th note to be shorter than 10 ms.
Therefore, the higher the BPM, the higher the chance that both a drum note's Note On and Note Off messages are sent within the same call to Mid_Proc(), with the respective two midiOut*() API calls only being at best a two-digit number of microseconds apart.
So it only makes sense why cheap MIDI synths that don't even respond to reverb or release time messages completely drop any note with such a short length. After all, at a sampling rate of 44,100 Hz, a note would have to be at least 22.7 µs long to be represented by even a single PCM sample.
This also extends to the visualizations above, and was the reason why I chose to render all drum notes as fixed-size diamonds. Otherwise, they would barely be visible.

But while sending MIDI events in such quantized chunks might not be perfect, it can't be the cause behind multi-second playback slowdowns. Instead, this issue has to boil down to the way Shuusou Gyoku times each individual message, and specifically how it converts between MIDI pulse units and real-time (milli)seconds. pbg's original MIDI code chose to do this in an equally confusing and inaccurate way: it kept two counters that tracked the current MIDI pulse before and after the latest tempo change, used the value of the latter counter to decide which events to process, and only added the pulse equivalent of 10 ms to this counter at the end of Mid_Proc() in the then current tempo. The commit message for my rewritten algorithm details the problems with this approach using nice ASCII art in case you're interested, but in short, the main problem lies in how the single final addition can only consider a single tempo change within each call to Mid_Proc(). If a MIDI file contains tempo ramps with less than 10 ms between each different tempo, the original game would only use the last of these tempo values as the basis for converting the entire 10 ms back into MIDI pulses. Not to mention that maybe MIDI pulses aren't the best unit in a game that still 📝 treats the FPU as lava and doesn't use any fixed-point means of increasing the resolution of the 10 ms→pulse division either…

On the contrary, it's much more accurate to immediately convert every encountered MIDI delta time to a real-time quantity and use that unit for event timing, especially if we want to restrict ourselves to integer math. Signed 64-bit integers are enough to fit the product of the slowest possible MIDI tempo ((2²⁴ - 1) µs per quarter note) and the highest possible MIDI delta time (2²⁸ - 1) at nanosecond precision (10³), with one bit to spare. Then, we arrive at a much simpler timing algorithm:

Each simultaneously playing track gets a next event timer, starting out at 0
When looking at the next event, add the converted nanosecond value of its delta time to this timer
Subtract the equivalent of 10 ms from each track's timer at the beginning of the processing function
As long as the timer is ≤0, process and send the next message

The additive nature of this timer not only naturally allows more than one event to happen within a single Mid_Proc() call, but also averages out any minor timing inconsistencies across the length of a track.

This new algorithm did improve the overall timing accuracy, but only barely, shaving off just ≈100 ms of the total duration. Turns out that the main source behind the slowness was hiding somewhere else entirely, in the single line that deserializes tempo values from MIDI's big-endian representation into the native integer format:

assert(length_of_tempo_message == 3);
uint32_t tempo = 0;
for(int i = 0; i < length_of_tempo_message; i++) {
-	tempo += ((tempo << 8) + (*track_data++));
+	tempo  = ((tempo << 8) + (*track_data++));
}

Yup – the original code performed two additions per byte, which incorrectly added the interim value at every byte to the final result, and yielded a tempo that is ≈0.8% / ≈1 BPM slower than notated in the MIDI file, matching the number we were looking for. That's why the |/OR operator is the safer one to use in such a bit-twiddling context…
But now I'm curious. This is such a tiny bug that is bound to remain unnoticed until someone compares the game's MIDI output to another renderer. It must have certainly made it into other games whose MIDI code is based on Shuusou Gyoku's, or that pbg was involved with. And sure enough, not only did this bug survive Kioh Gyoku's OOP refactoring, but it even traveled into Windows Touhou, where it remained in every single game that supported MIDI playback. Now we know for a fact that pbg's Program Support role in the TH06 credits involved sharing ready-made, finished code with ZUN:

Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH06 — The broken tempo deserialization in the respective latest full versions of TH06 through TH10. And yes, that's TH10 – even though TH09's trial version was the last game to ship MIDI versions of its soundtrack, TH10 still contained all of pbg's MIDI code that originated back in Shuusou Gyoku, before TH11 finally removed it.
Amusingly, ZUN's compiler even started optimizing the combination of left-shifting and addition to a multiplication with 257 for TH09, which even sort of highlights this bug if you're used to reading x86 ASM.

Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH07 — The broken tempo deserialization in the respective latest full versions of TH06 through TH10. And yes, that's TH10 – even though TH09's trial version was the last game to ship MIDI versions of its soundtrack, TH10 still contained all of pbg's MIDI code that originated back in Shuusou Gyoku, before TH11 finally removed it.
Amusingly, ZUN's compiler even started optimizing the combination of left-shifting and addition to a multiplication with 257 for TH09, which even sort of highlights this bug if you're used to reading x86 ASM.

That leaves support for MIDI loop points as the only missing feature for syncing MIDI data with a looping waveform track. While it didn't require all too much code, pbg's original zero-copy approach of iterating over raw MIDI data definitely injected a lot of complexity into the required branches. Multi-track/SMF Type 1 files require quite a bit of extra thought to correctly calculate delta times across loop boundaries that reach past the end of the respective track, while still allowing the real-time delta values to be resynchronized at tempo changes within the loop – and yes, 3 of ZUN's 19 arranged MIDI files actually do use more than one track, so this wasn't just about maximizing MIDI compatibility for mods. I stuck to the original approach mostly as a challenge and to prove that it's possible without first parsing the entire MIDI sequence into a friendlier internal representation, but I absolutely do not recommend this to anyone else.

After hardcoding the loop points detected by mly into the binary, we only need to call Mid_Proc() once per frame in the Music Room and pass the frame delta time instead of the 10 ms constant. And then, we get this:

The MIDI TIMER now shows off the arguably more interesting current MIDI pulse value rather than just formatting the PASSED TIME in milliseconds. Ironically, displaying this value in a constantly counting way takes more effort now – the new nanosecond-based timing code doesn't use any measure of total MIDI pulses anymore, and they don't naturally fall out of the algorithm either. Instead, the code remembers the total pulse value of the last event it processed and adds the real-time duration that has passed since, similar to the original timing algorithm.
This naturally causes the timer to jump from the loop end pulse to the loop start pulse, proving that Mid_Proc() is in fact looping the sequence.

Alright, now we know what to package:

We're going to have 8 BGM packs for each permutation of soundtrack (OST / AST), sound source (Romantique Tp / Sound Canvas VA), and codec (FLAC / Vorbis), making up 1.15 GiB of music data in total.
When looking at the package names, you will notice that I don't particularly highlight the FLAC versions as lossless. And for good reason – the Romantique Tp recordings had dithering and noise shaping applied to them, and the Sound Canvas VA versions will necessarily have to be volume-normalized and quantized to 16-bit during the conversion to FLAC. If we wanted a BGM pack with the actual raw Sound Canvas VA output, we'd have to implement WavPack support, which is the only lossless codec that supports 32-bit float – and even that codec could only compress these files down to 14 MiB per minute of music, or 508 MB for the entire original soundtrack. That's 1.4× the size of an equivalent thbgm.dat!
The whole packaging process will be complex enough to warrant a build system. I'd also like to generate an extensive README file for each package, not least to describe the Sound Canvas VA rendering and loop-cutting process in complete detail.
The AST packs need to bundle the MIDI files from ZUN's site for Music Room visualization. We might as well add a 9^th MIDI-only AST pack then, as it will naturally fall out of the packaging pipeline anyway. Some people sure love their MIDI synths, after all.
The OST packs can fall back on the original game's MIDI files from MUSIC.DAT for their Music Room visualization, so there's no need to bundle those and infringe copyright. Ironically, the game will still require a MUSIC.DAT even if you use a BGM pack, if only for the one number in that file that says that Shuusou Gyoku's soundtrack consists of 20 tracks in total.
ZUN didn't arrange タイトルドメイド, so we need to copy the OST version recorded with the respective sound source into the AST pack.

Unfortunately, we still haven't reached the end of the complications and weird issues that haunt Shuusou Gyoku's music:

The original game reads the in-game track title directly out of the first Sequence Name event of the playing MIDI file. The waveform equivalent would be the Vorbis comment TITLE tag, which therefore should exactly match the original track's title, down to the exact placement of whitespace. As usual, if I emphasize minor things like this, it's not without reason: 幻想科学～ Doll's Phantom inconsistently uses halfwidth spaces at both sides of the ～, and wouldn't fit into the Music Room's limited space otherwise.
However, the AST MIDI files jam a bunch of other metadata into their Sequence Names, roughly following the format
```
【 $title 】 from 秋霜玉  for sc88Pro comp.ZUN
```
The track titles should definitely not appear in this format in-game, but how do we get rid of this format without hardcoding either the names or the magic to parse the names out of this format?

The absolute state of GS SysEx tooling rears its ugly head one final time in three of the AST MIDIs, which for some reason are missing the Roland vendor prefix byte in all of their SysEx messages and are therefore undeniably bugged. There even seemed to be another SysEx-related bug which Romantique Tp explained away, but not this one:

`ssg_04.mid`

0:000	SysEx(   10 42 12 40 00 7F 00 41 F7)
0:240	SysEx(   10 42 12 40 01 30 14 7B F7)
0:360	SysEx(   10 42 12 40 01 33 14 78 F7)
0:420	SysEx(   10 42 12 40 01 34 50 3B F7)

`ssg_05.mid`

0:000	SysEx(   10 42 12 40 00 7F 00 41 F7)
0:240	SysEx(   10 42 12 40 01 30 14 7B F7)
0:360	SysEx(   10 42 12 40 01 33 00 0C F7)
0:420	SysEx(   10 42 12 40 01 34 14 77 F7)

`ssg_10.mid`

0:000	SysEx(   10 42 12 40 00 7F 00 41 F7)
0:240	SysEx(   10 42 12 40 01 30 14 7B F7)
0:360	SysEx(   10 42 12 40 01 33 00 0C F7)
0:420	SysEx(   10 42 12 40 01 34 60 2B F7)

`ssg_04.mid`

0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)	GS Reset
0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)	Reverb Macro #20
0:360	SysEx(41 10 42 12 40 01 33 14 78 F7)	Reverb Level 20
0:420	SysEx(41 10 42 12 40 01 34 50 3B F7)	Reverb Time 80

`ssg_05.mid`

0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)	GS Reset
0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)	Reverb Macro #20
0:360	SysEx(41 10 42 12 40 01 33 00 0C F7)	Reverb Level 0
0:420	SysEx(41 10 42 12 40 01 34 14 77 F7)	Reverb Time 20

`ssg_10.mid`

0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)	GS Reset
0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)	Reverb Macro #20
0:360	SysEx(41 10 42 12 40 01 33 00 0C F7)	Reverb Level 0
0:420	SysEx(41 10 42 12 40 01 34 60 2B F7)	Reverb Time 96

The irony of using invalid Reverb Macros within already invalid SysEx messages is not lost on me.

This is something we should fix even before running these files through Sound Canvas VA in order to render these with the reverb settings that ZUN clearly (and, for once, unironically) intended.

For perfect preservation of the original BGM/gameplay synchronicity, it makes sense for the waveform versions to retain the leading 1 or 2 beats of silence that the original MIDI files use for their SysEx setup. While some of the AST tracks use a slightly different tempo compared to their OST counterparts, they would still be largely in sync as ZUN didn't rearrange the layout of their setup area… except for, once again, the three tracks used in the Extra Stage. Marisa's and Reimu's boss themes aren't too bad with their 4 beats of setup, but シルクロードアリス takes the cake with a whopping 12 beats of leading silence. That's 5 seconds from the start of the Extra Stage to the first note you'd hear. 🐌

2) and 4) could theoretically be worked around in Shuusou Gyoku's MIDI code, but there's no way around editing the MIDI files themselves as far as 3) is concerned. Thus, it makes sense to apply all of the workarounds to the AST MIDIs as part of the BGM build process – parsing the titles out of the 【brackets】, inserting the Roland vendor prefix byte where necessary, and compressing the setup bars in the Extra Stage themes to match their OST counterparts. Adding any hidden magic to the MIDI code would only have needlessly increased complexity and/or annoyed some modder in the future who would then have to work around it.
Ideally, these edits would involve taking the mly dump output, performing the necessary replacements at a plaintext level, and rebuilding the result back into a MIDI file, bu~t we're unfortunately missing the latter feature. Luckily, someone else had the same idea 13 years ago and wrote a tool in C that does exactly what we need. Getting it to compile in 2024 only required fixing a typical C thing… why are students and boomers defending this antique of a language again? 🙄

The single most glaring issue, however, is the drastic difference in volume between the individual tracks in both soundtracks. While Romantique Tp had to normalize each track to the maximum possible volume individually as a consequence of the recording process, the Sound Canvas VA renderings reveal just how inconsistent the volume levels of these MIDI files really are:

The peak amplitudes of every track in both soundtracks, as rendered by Sound Canvas VA at maximum volume. Looking at these, you might think that kaorin's 2007 recordings were purposely trying to preserve the clipping that would come out of an SC-88Pro if you don't manually adjust the volume knob for each song, but those recordings are still much louder than even these numbers.

So how do we interpret this? Is this a bug, because no one in their right mind would want their music to clip on purpose, and that in turn means that everything about these volume levels is arbitrary and unintentional? Or is this a quirk, and ZUN deliberately chose these volume levels for compositional reasons? It certainly would make sense for the name registration theme.
Once again, the AST version of シルクロードアリス is the worst offender in this regard as well, but it might also provide some evidence for the quirk interpretation. The fact that almost all of its MIDI channels blast away at full volume might have been an accident that could have gone unnoticed if the volume knob of ZUN's SC-88Pro was turned rather low during the time he arranged this piece, but the excessive left-panning must have been deliberate. Even Romantique Tp agrees:

Stereo waveform of the Sound Canvas VA rendering of Shuusou Gyoku's Extra Stage theme (シルクロードアリス), highlighting the excessive left-panning — It might have even made compositional sense if Silk Road Alice was supposed to be a "Western-style piece", but it's not.

Stereo waveform of Romantique Tp's recording of Shuusou Gyoku's Extra Stage theme (シルクロードアリス), highlighting the excessive left-panning — It might have even made compositional sense if Silk Road Alice was supposed to be a "Western-style piece", but it's not.

And that's with the volume already normalized. Because this one channel of this one track is almost twice as loud as anything else in the AST, we would consequently have to bring down the volume of every other arranged track and the right channel of the same track by almost 50% if we wanted to maintain the volume differences between the individual tracks of the AST. In the process, we lose almost one entire bit of dynamic range. At this rate, you might even consider remixing and remastering the entire thing, but that would involve so many creative decisions to definitely fall into fanfiction territory…

However, normalizing each track to a peak level of 0 dBFS makes much more sense for in-game playback if you consider how loud Shuusou Gyoku's sound effects are. Once again, the best solution would involve offering both versions, but should we really add two more SCVA BGM packs just to cover volume differences?
ReplayGain solves this exact problem for regular music listening in a non-destructive way by writing the per-track and per-album gain levels into an audio file's metadata. Since we need metadata support for titles anyway, we can do something similar, albeit not exactly the same for two reasons:

ReplayGain is specified to target an average volume of −17 dBFS, whereas we'd like to target a peak volume of 0 dBFS in order to always use the entire available digital scale. We've got some loud sound effects to compete with, after all.
ReplayGain expresses its gain values in dB, which is cumbersome to work with. In the realm of PCM, volume changes don't need to involve more than a simple multiplication, so let's go with a simple scalar GAIN FACTOR.

And so, we hard-apply the volume-level gain during the conversion from 32-bit float to FLAC to preserve the volume differences between the tracks, calculate the track-level GAIN FACTOR based on the resulting peak levels, add a volume normalization toggle to the Sound / Config menu, enable it by default, and thus make everyone happy. ✅

The final interesting tidbit in building these packages can be found in the way the Sound Canvas VA recordings are looped. When manually cutting loops, you always have to consider that the intro might end with unique notes that aren't present at the end of the loop, which will still be fading out at the calculated loop start point. This necessitates shifting the loop start point by a few bars until these notes are no longer audible – or you could simply ignore the issue because ZUN's compositions are so frantic that no one would ever notice.
With the separate intro and loop files generated by mly, on the other hand, the reverb/release trails are immediately visible and, after trimming trailing silence, exactly define the number of samples that the calculated loop start point needs to be shifted by. The .loop file then remains always exactly as long, in samples, as the duration of the loop reported by mly. If a piece happens to have a constant tempo whose beat duration corresponds to an integer number of samples, we get some very satisfying, round loop durations out of this process. ☺️

So let's play it all back in-game… and immediately run into two unexpected miniaudio limitations, what the…?!

miniaudio uses a fixed linear function for its fade-out envelope, and doesn't offer anything else? We might not even want a logarithmic one this time because symmetry with MIDI's simple quadratic curve would be neat, but we sure don't want a linear function – those stay near the original volume for too long, and then turn quiet way too quickly.
There is no way to access FLAC metadata from miniaudio's public API, even though the library bundles the author's own FLAC library which has this feature?

📝 Back when I evaluated miniaudio, I alluded that I consider single-file C libraries to be massively overrated, and this is exactly why: Once they grow as massive as miniaudio (how ironic), they can quickly lead to their authors treating their dependencies as implementation details and melting down the interfaces that would naturally arise. In a regular library, dr_flac would be a separate, proper dependency, and the API would have a way to initialize a stream from an externally loaded drflac object. But since the C community collectively pretends that multi-file libraries are a burden on other developers, miniaudio ended up with dr_flac copy-pasted into its giant single file, with a silly ma_ namespacing prefix added to all its functions. And why? Did we have to move so far in the other direction just because CMake doesn't support globbing? That's a symptom of CMake not actually solving any problem, not a valid architectural decision that libraries should bend around. 🙄
So unless we fork and hack around in miniaudio, there's now no way around depending on a second, regular copy of dr_flac. Which has now led to the same project organization bloat that single-file libraries originally set out to prevent…

Sigh. At this rate, it makes more sense to just copy-paste and adapt the old BGM streaming code I wrote for thcrap in late 2018, which used dr_flac directly, and extend it with metadata support. With the streaming code moved out of the platform layer and into game logic, it also makes much more sense to implement the squared fade-out curve at that same level instead of copy-pasting and adjusting an unhealthy amount of miniaudio's verbose C code.
While I'm doing the same for the old Vorbis streaming code, it would also make sense to rewrite that one to use stb_vorbis instead of the old libogg+libvorbis reference libraries. There's no need to add two more dependencies if miniaudio already comes with stb_vorbis.c, and that library is widely acclaimed. So, integration should be a breeze, right?
Well, surprise, rarely have I seen a C library so actively hostile toward being integrated. Both of its API variants are completely unreasonable:

The pulldata API pulls Vorbis data as needed from either a memory buffer containing the entire Vorbis file, or a C FILE* handle.
Effectively, this forces either you to give up disk streaming completely, or your program into C's terrible I/O API with all its buffering slowness and Unicode issues on Windows. The documentation even goes on to suggest just modifying the code if you need anything else, which might be acceptable in the strange world of game development this library originates from, but it sure isn't in the kind of open-source development I do.
The pushdata API expects the caller to gradually feed chunks of Vorbis data. How large do these chunks have to be? Nobody knows – and, even worse, the API doesn't retain any of the data already pushed in. If the buffer you passed is too small, which you don't get to know in advance, you have to pass the same data plus more in the next call. I get that you might want an API like this to avoid dynamic memory allocations, but not only does this API perform plenty of allocations itself, it actively forces its caller to realloc() over and over again. 🙄 The lack of seeking support reveals that this API is geared towards live-streamed audio, and it might very well be acceptable in such a case, but it's nothing we could use for BGM.

What happened to the tried-and-true idea of providing a structure with read, tell, and seek callbacks, and then providing an optional variant for C FILE* handles if you absolutely must? Sure, the whole point of Vorbis is to be small and nobody these days would care about spending a few MB on keeping an entire Vorbis file in memory, but come on. If pulldata made the deliberate and opinionated choice to only support buffers of complete Vorbis streams and argued in the name of simplicity that hand-coded disk streaming isn't worth it in this day and age, I might have even been convinced. And this is from the guy who popularized the concept of single-file C libraries in the first place?

Oh well, tupblocks go brrr. libvorbis definitely shows its age with all the old command-line tools in the lib/ directory that they never moved away and that we now have to remove from our glob. But even that just adds a single line to the Tupfile, and then we get to enjoy its much friendlier API. That sure beats the almost 800 lines of code that miniaudio had to write to integrate stb_vorbis… which I can't even link because the file is too big for GitHub. 🤷
At this point, it would have even made sense to upgrade from a 24-year-old lossy codec to an 11-year-old lossy codec and use Opus instead, since the enforced 48,000 Hz sampling rate is a non-issue when you control the entire audio pipeline. But let's keep compatibility with existing thcrap mods for now.

The last time I added dependencies, 📝 I wondered whether just downloading and extracting official Windows binary builds might be superior to pasting batch script duct tape over the usability issues of Git submodules. However, I still wanted to try out Git's sparse checkout feature before, in an attempt to remove all the unneeded bloat… and as it turned out, this might just be the idealistic and perfect nirvana of vendoring libraries in C++ projects. I particularly like how the limitations of its default mode (always checking out all files within each directory level that shows up in a filter) can be turned into a guideline about how to structure a repository: All non-essential stuff that consumers of your code might not need – tests, high-level documentation, or optional features – should go into a subdirectory where it can be easily filtered.
And that's how the size of our libs/ directory went down from 82.7 MiB in the P0256 build to 30.4 MiB in the P0275 build, despite adding 4 more libraries in the latter. Now if only this didn't require even more duct tape to actually set up shallow clones correctly…

In the end, the Windows build ended up using only a single one of the miniaudio features that DirectSound doesn't have, and that's the ability to use the more modern WASAPI instead of DirectSound. We're still going to use miniaudio for the Linux port, but as far as Windows is concerned, it would be quite nice to backport BGM streaming to the game's original DirectSound backend. The P0275 build is pushing 1 MiB of binary size for a game that originally came in a 220 KiB binary, so it would remove a noticeable amount of bloat from GIAN07.EXE, but it would also allow waveform BGM to work in the Windows 98-compatible i586 build. If that sounds cool to you, this is the issue you want to fund.

That only left some logic and UI busywork to put it all together, which means that we've almost reached the end of things to talk about! Here's what it all looks like:

BGM pack selection is done in-game through a new submenu. The <Download> option will open the BGM pack release page in the system's preferred browser:

This window presented a great occasion for already implementing the generic boilerplate for vertically scrolling windows with an unlimited number of items. That will come in quite handy once we introduce better replay support… 👀
Even with per-track BGM volume normalization, Shuusou Gyoku's sound effects are still a bit too loud in comparison, especially when mixed on top of that excessively and unfixably left-panned AST version of the Extra Stage theme. Adding separate volume controls for BGM and sound effects really was the only sustainable solution here, and conveniently checks an important quality-of-life box the original game lacked. So important that it was the very first issue I added to the GitHub tracker of my fork:
I really wanted to have Japanese help text in these menus, as it makes them look just so much more consistent and polished. Many thanks to Elfin, who responded to my bounty offer, and will most likely also provide localizations for future features.
In-game music titles are now consistently right-aligned. Leading whitespace in 4 of the original MIDI Sequence Names suggests that pbg might have intended these titles to be centered within the 216 maximum pixels that the original code designated for music titles, but none of those 4 had the correct amount of spaces that would have been required for exact centering:
```
#04 |・・・幻想帝都・・・・
#07 |・・天空アーミー・・・
#11 |・魔法少女十字軍・・・
#16 |・・シルクロードアリス
```
Right-aligned text matches the one certain intention I can read out of the code, and allows us to consistently trim whitespace from both the original MIDI Sequence Names and the TITLE tags in the BGM packs… at the cost of significantly changing the animation. 🤔

Maybe, all this whitespace had the explicit purpose of making the animation look the way it did originally? But hard-padding the title tags in the BGM packs would be so dumb… 😩 Let's keep it like this for now and fix the animation later.
At startup, the game now shows a new screen if any of the game's .DAT files are missing, displaying their expected absolute path. This is bound to be very important on Linux because each distribution might have its own idea of where these files are supposed to be stored. But even on Windows, this allows GIAN07.EXE to at least run and show something if one or more of these files are not present, instead of crashing at the first attempt of loading anything from them.

The ¥ instead of \ is, 📝 once again, a font issue. Good luck finding a font not named MS Gothic that looks good when rendered in this game…
On a more unfortunate note, I dropped the i586 build from this release. Visual Studio 2022's CRT implements the new filesystem and threading code using Win32 API functions that are only available on Vista or later and are not covered by the one ready-made KernelEx package I was able to find, so I couldn't easily test such a build on Windows 98 anymore. Resurrecting the i586 build would therefore involve additional platform abstraction layers that we wouldn't need otherwise. Writing them wouldn't be too expensive, but it only makes sense if there's actual demand. Backporting waveform BGM to DirectSound to restore feature parity would also be a good idea here, as it would avoid the need to litter the current code with #ifdefs at any place that references anything related to BGM packs.

After half a year of being bought out way past the cap, I've finally got some small room left for new orders again. If it weren't for this blog post and the required research and web development work, this delivery would have probably come out in early January, taking half the time it ended up taking. So I really have to start factoring the blog posts into the push prices in a better and fairer way.
Meanwhile, the hate toward my day job only keeps growing, but there's little point in looking for a new one as long as ReC98 remains this motivating and complex. It leaves pretty much no cognitive room for any similarly demanding job. Thus, I want 2024 to be the year where ReC98 either becomes profitable enough to be my only full-time job, or where we conclusively find out that it can't, I go look for a better day job, and ReC98 shifts to a slower pace. Here's the plan:

From now on, I will immediately increase the push price whenever we reach 100% of the cap, either directly through new orders or indirectly through existing subscriptions. The price increase will be relative to how long it took to reach that point since the last re-opening.
If the store continues selling out, I will aim for per push by the end of the year.
In exchange, microtransactions (i.e., deliveries containing just code and no blog posts) will now be half the price of regular pushes for the same amount of delivered code. Or in other words: If you want to fund a goal that's eligible for microtransactions, you can now decide whether your fixed amount of money goes to 2× coding work and 0× blogging, or 1× coding work and 1× blogging.
I'll permanently increase the default level of the cap from 8 to 10 pushes. The past 12 months were full of mod releases that raised the bar, and 2024 shows no signs of stopping that trend.
If we ever reach per push, I plan to hire people for some of the contribution-ideas or anything else that might improve this project. (Well-produced YouTube videos about the findings of this project might be a nice idea!) At that point, I will have reached my goal of living decently off this project alone, and it's time for others to make money in this space as well.

With the new price of per push, this means that there's now a small window in which you can get a full push worth of functionality for , until the current cap is filled up again.

Next up: Probably TH02's endings to relax a bit. Maybe we're also getting some new Touhou-related contributions?

📝 Posted:: 2024-02-03 08:03 UTC
🚚 Summary of:: P0264, P0265
⌨ Commits:: 46cd6e7...78728f6, 78728f6...ff19bed
💰 Funded by:: Blue Bolt, [Anonymous], iruleatgames
🏷 Tags:

Oh, it's 2024 already and I didn't even have a delivery for December or January? Yeah… I can only repeat what I said at the end of November, although the finish line is actually in sight now. With 10 pushes across 4 repositories and a blog post that has already reached a word count of 9,240, the Shuusou Gyoku SC-88Pro BGM release is going to break 📝 both the push record set by TH01 Sariel two years ago, and 📝 the blog post length record set by the last Shuusou Gyoku delivery. Until that's done though, let's clear some more PC-98 Touhou pushes out of the backlog, and continue the preparation work for the non-ASCII translation project starting later this year.

But first, we got another free bugfix according to my policy! 📝 Back in April 2022 when I researched the Divide Error crash that can occur in TH04's Stage 4 Marisa fight, I proposed and implemented four possible workarounds and let the community pick one of them for the generally recommended small bugfix mod. I still pushed the others onto individual branches in case the gameplay community ever wants to look more closely into them and maybe pick a different one… except that I accidentally pushed the wrong code for the warp workaround, probably because I got confused with the second warp variant I developed later on.
Fortunately, I still had the intended code for both variants lying around, and used the occasion to merge the current master branch into all of these mod branches. Thanks to wyatt8740 for spotting and reporting this oversight!

The Music Room background masking effect
The GRCG's plane disabling flags
Text color restrictions
The entire messy rest of the Music Room code
TH04's partially consistent congratulation picture on Easy Mode
TH02's boss position and damage variables

As the final piece of code shared in largely identical form between 4 of the 5 games, the Music Rooms were the biggest remaining piece of low-hanging fruit that guaranteed big finalization% gains for comparatively little effort. They seemed to be especially easy because I already decompiled TH02's Music Room together with the rest of that game's OP.EXE back in early 2015, when this project focused on just raw decompilation with little to no research. 9 years of increased standards later though, it turns out that I missed a lot of details, and ended up renaming most variables and functions. Combined with larger-than-expected changes in later games and the usual quality level of ZUN's menu code, this ended up taking noticeably longer than the single push I expected.

The undoubtedly most interesting part about this screen is the animation in the background, with the spinning and falling polygons cutting into a single-color background to reveal a spacey image below. However, the only background image loaded in the Music Room is OP3.PI (TH02/TH03) or MUSIC3.PI (TH04/TH05), which looks like this in a .PI viewer or when converted into another image format with the usual tools:

TH02's Music Room background in its on-disk state — Let's call this "the blank image".

TH03's Music Room background in its on-disk state — Let's call this "the blank image".

That is definitely the color that appears on top of the polygons, but where is the spacey background? If there is no other .PI file where it could come from, it has to be somewhere in that same file, right?
And indeed: This effect is another bitplane/color palette trick, exactly like the 📝 three falling stars in the background of TH04's Stage 5. If we set every bit on the first bitplane and thus change any of the resulting even hardware palette color indices to odd ones, we reveal a full second 8-color sub-image hiding in the same .PI file:

TH02's Music Room background, with all bits in the first bitplane set to reveal the spacey background image, and the full color palette at the bottom — The spacey sub-image. Never before seen!1!! …OK, touhou-memories beat me by a month. Let's add each image's full 16-color palette to deliver some additional value.

TH03's Music Room background, with all bits in the first bitplane set to reveal the spacey background image, and the full color palette at the bottom — The spacey sub-image. Never before seen!1!! …OK, touhou-memories beat me by a month. Let's add each image's full 16-color palette to deliver some additional value.

On a high level, the first bitplane therefore acts as a stencil buffer that selects between the blank and spacey sub-image for every pixel. The important part here, however, is that the first bitplane of the blank sub-images does not consist entirely of 0 bits, but does have 1 bits at the pixels that represent the caption that's supposed to be overlaid on top of the animation. Since there now are some pixels that should always be taken from the spacey sub-image regardless of whether they're covered by a polygon, the game can no longer just clear the first bitplane at the start of every frame. Instead, it has to keep a separate copy of the first bitplane's original state (called nopoly_B in the code), captured right after it blitted the .PI image to VRAM. Turns out that this copy also comes in quite handy with the text, but more on that later.

Then, the game simply draws polygons onto only the reblitted first bitplane to conditionally set the respective bits. ZUN used master.lib's grcg_polygon_c() function for this, which means that we can entirely thank the uncredited master.lib developers for this iconic animation – if they hadn't included such a function, the Music Rooms would most certainly look completely different.
This is where we get to complete the series on the PC-98 GRCG chip with the last remaining four bits of its mode register. So far, we only needed the highest bit (0x80) to either activate or deactivate it, and the bit below (0x40) to choose between the 📝 RMW and 📝 TCR/📝 TDW modes. But you can also use the lowest four bits to restrict the GRCG's operations to any subset of the four bitplanes, leaving the other ones untouched:

// Enable the GRCG (0x80) in regular RMW mode (0x40). All bitplanes are
// enabled and written according to the contents of the tile register.
outportb(0x7C, 0xC0);

// The same, but limiting writes to the first bitplane by disabling the
// second (0x02), third (0x04), and fourth (0x08) one, as done in the
// PC-98 Touhou Music Rooms.
outportb(0x7C, 0xCE);

// Regular GRCG blitting code to any VRAM segment…
pokeb(0xA8000, offset, …);

// We're done, turn off the GRCG.
outportb(0x7C, 0x00);

This could be used for some unusual effects when writing to two or three of the four planes, but it seems rather pointless for this specific case at first. If we only want to write to a single plane, why not just do so directly, without the GRCG? Using that chip only involves more hardware and is therefore slower by definition, and the blitting code would be the same, right?
This is another one of these questions that would be interesting to benchmark one day, but in this case, the reason is purely practical: All of master.lib's polygon drawing functions expect the GRCG to be running in RMW mode. They write their pixels as bitmasks where 1 and 0 represent pixels that should or should not change, and leave it to the GRCG to combine these masks with its tile register and OR the result into the bitplanes instead of doing so themselves. Since GRCG writes are done via MOV instructions, not using the GRCG would turn these bitmasks into actual dot patterns, overwriting any previous contents of each VRAM byte that gets modified.
Technically, you'd only have to replace a few MOV instructions with OR to build a non-GRCG version of such a function, but why would you do that if you haven't measured polygon drawing to be an actual bottleneck.

Three overlapping Music Room polygons rendered using master.lib's grcg_polygon_c() function with a disabled GRCG — An example with three polygons drawn from top to bottom. Without the GRCG, edges of later polygons overwrite any previously drawn pixels within the same VRAM byte. Note how treating bitmasks as dot patterns corrupts even those areas where the background image had nonzero bits in its first bitplane.

Three overlapping Music Room polygons rendered as in the original game, with the GRCG enabled — An example with three polygons drawn from top to bottom. Without the GRCG, edges of later polygons overwrite any previously drawn pixels within the same VRAM byte. Note how treating bitmasks as dot patterns corrupts even those areas where the background image had nonzero bits in its first bitplane.

As far as complexity is concerned though, the worst part is the implicit logic that allows all this text to show up on top of the polygons in the first place. If every single piece of text is only rendered a single time, how can it appear on top of the polygons if those are drawn every frame?
Depending on the game (because of course it's game-specific), the answer involves either the individual bits of the text color index or the actual contents of the palette:

Colors 0 or 1 can't be used, because those don't include any of the bits that can stay constant between frames.
If the lowest bit of a palette color index has no effect on the displayed color, text drawn in either of the two colors won't be visually affected by the polygon animation and will always appear on top. TH04 and TH05 rely on this property with their colors 2/3, 4/5, and 6/7 being identical, but this would work in TH02 and TH03 as well.
But this doesn't apply to TH02 and TH03's palettes, so how do they do it? The secret: They simply include all text pixels in nopoly_B. This allows text to use any color with an odd palette index – the lowest bit then won't be affected by the polygons ORed into the first bitplane, and the other bitplanes remain unchanged.
TH04 is a curious case. Ostensibly, it seems to remove support for odd text colors, probably because the new 10-frame fade-in animation on the comment text would require at least the comment area in VRAM to be captured into nopoly_B on every one of the 10 frames. However, the initial pixels of the tracklist are still included in nopoly_B, which would allow those to still use any odd color in this game. ZUN only removed those from nopoly_B in TH05, where it had to be changed because that game lets you scroll and browse through multiple tracklists.

The contents of `nopoly_B` with each game's first track selected.

Finally, here's a list of all the smaller details that turn the Music Rooms into such a mess:

Due to the polygon animation, the Music Room is one of the few double-buffered menus in PC-98 Touhou, rendering to both VRAM pages on alternate frames instead of using the other page to store a background image. Unfortunately though, this doesn't actually translate to tearing-free rendering because ZUN's initial implementation for TH02 mixed up the order of the required operations. You're supposed to first wait for the GDC's VSync interrupt and then, within the display's vertical blanking interval, write to the relevant I/O ports to flip the accessed and shown pages. Doing it the other way around and flipping as soon as you're finished with the last draw call of a frame means that you'll very likely hit a point where the (real or emulated) electron beam is still traveling across the screen. This ensures that there will be a tearing line somewhere on the screen on all but the fastest PC-98 models that can render an entire frame of the Music Room completely within the vertical blanking interval, causing the very issue that double-buffering was supposed to prevent.
ZUN only fixed this landmine in TH05.
The polygons have a fixed vertex count and radius depending on their index, everything else is randomized. They are also never reinitialized while OP.EXE is running – if you leave the Music Room and reenter it, they will continue animating from the same position.
Except for TH05's repeatable ⬆️ up and ⬇️ down inputs, the games force you to release any pressed key before they will handle any new input. But since the game is running on a PC-98, we 📝 once again have to mention the infamous quirk of the keyboard controller with regard to held keys. Funnily enough, the games go back and forth in how they address it, in a way that matches the 📝 additional delay of the cutscene interpreter loop:
- TH02 and TH04 don't handle it at all, causing held keys to be processed again after about a second.
- TH03 and TH05 correctly work around the quirk, at the usual cost of a 614.4 µs delay per frame. Except that the delay is actually twice as long in frames in which a previously held key is released, because this code is a mess.
But even in 2024, DOSBox-X is the only emulator that actually replicates this detail of real hardware. On anything else, keyboard input will behave as ZUN intended it to. At least I've now mentioned this once for every game, and can just link back to this blog post for the other menus we still have to go through, in case their game-specific behavior matches this one.
TH02 is the only game that
1. separately lists the stage and boss themes of the main game, rather than following the in-game order of appearance,
2. continues playing the selected track when leaving the Music Room,
3. always loads both MIDI and PMD versions, regardless of the currently selected mode, and
4. does not stop the currently playing track before loading the new one into the PMD and MMD drivers.
The combination of 2) and 3) allows you to leave the Music Room and change the music mode in the Option menu to listen to the same track in the other version, without the game changing back to the title screen theme. 4), however, might cause the PMD and MMD drivers to play garbage for a short while if the music data is loaded from a slow storage device that takes longer than a single period of the OPN timer to fill the driver's song buffer. Probably not worth mentioning anymore though, now that people no longer try fitting PC-98 Touhou games on floppy disks.
The comment text files use another fixed-size plaintext format, just like 📝 TH02's in-game dialog system or 📝 TH03's win messages:
- Exactly 40 (TH02/TH03) / 38 (TH04/TH05) visible bytes per line,
- padded with 2 bytes that can hold a CR/LF newline sequence for easier editing.
- Every track starts with a title line that mostly just duplicates the names from the hardcoded tracklist,
- followed by a fixed 19 (TH02/TH03/TH04) / 9 (TH05) comment lines.
In TH04 and TH05, lines can start with a semicolon (;) to prevent them from being rendered. This is purely a performance hint, and is visually equivalent to filling the line with spaces.
All in all, the quality of the code is even slightly below the already poor standard for PC-98 Touhou: More VRAM page copies than necessary, conditional logic that is nested way too deeply, a distinct avoidance of state in favor of loops within loops, and – of course – a couple of gotos to jump around as needed.
In TH05, this gets so bad with the scrolling and game-changing tracklist that it all gives birth to a wonderfully obscure inconsistency: When pressing both ⬆️/⬇️ and ⬅️/➡️ at the same time, the game first processes the vertical input and then the horizontal one in the next frame, making it appear as if the latter takes precedence. Except when the cursor is highlighting the first (⬆️ ) or 12^th (⬇️ ) element of the list, and said list element is not the first track (⬆️ ) or the quit option (⬇️ ), in which case the horizontal input is ignored.

And that's all the Music Rooms! The OP.EXE binaries of TH04 and especially TH05 are now very close to being 100% RE'd, with only the respective High Score menus and TH04's title animation still missing. As for actual completion though, the finalization% metric is more relevant as it also includes the ZUN Soft logo, which I RE'd on paper but haven't decompiled. I'm 📝 still hoping that this will be the final piece of code I decompile for these two games, and that no one pays to get it done earlier…

For the rest of the second push, there was a specific goal I wanted to reach for the remaining anything budget, which was blocked by a few functions at the beginning of TH04's and TH05's MAINE.EXE. In another anticlimactic development, this involved yet another way too early decompilation of a main() function…
Generally, this main() function just calls the top-level functions of all other ending-related screens in sequence, but it also handles the TH04-exclusive congratulating All Clear images within itself. After a 1CC, these are an additional reward on top of the Good Ending, showing the player character wearing a different outfit depending on the selected difficulty. On Easy Mode, however, the Good Ending is unattainable because the game always ends after Stage 5 with a Bad Ending, but ZUN still chose to show the EASY ALL CLEAR!! image in this case, regardless of how many continues you used.
While this might seem inconsistent with the other difficulties, it is consistent within Easy Mode itself, as the enforced Bad Ending after Stage 5 also doesn't distinguish between the number of continues. Also, Try to Normal Rank!! could very well be ZUN's roundabout way of implying "because this is how you avoid the Bad Ending".

With that out of the way, I was finally able to separate the VRAM text renderer of TH04 and TH05 into its own assembly unit, 📝 finishing the technical debt repayment project that I couldn't complete in 2021 due to assembly-time code segment label arithmetic in the data segment. This now allows me to translate this undecompilable self-modifying mess of ASM into C++ for the non-ASCII translation project, and thus unify the text renderers of all games and enhance them with support for Unicode characters loaded from a bitmap font. As the final finalized function in the SHARED segment, it also allowed me to remove 143 lines of particularly ugly segmentation workarounds 🙌

The remaining ¹/₆th of the second push provided the perfect occasion for some light TH02 PI work. The global boss position and damage variables represented some equally low-hanging fruit, being easily identified global variables that aren't part of a larger structure in this game. In an interesting twist, TH02 is the only game that uses an increasing damage value to track boss health rather than decreasing HP, and also doesn't internally distinguish between bosses and midbosses as far as these variables are concerned. Obviously, there's quite a bit of state left to be RE'd, not least because Marisa is doing her own thing with a bunch of redundant copies of her position, but that was too complex to figure out right now.

Also doing their own thing are the Five Magic Stones, which need five positions rather than a single one. Since they don't move, the game doesn't have to keep 📝 separate position variables for both VRAM pages, and can handle their positions in a much simpler way that made for a nice final commit.
And for the first time in a long while, I quite like what ZUN did there! Not only are their positions stored in an array that is indexed with a consistent ID for every stone, but these IDs also follow the order you fight the stones in: The two inner ones use 0 and 1, the two outer ones use 2 and 3, and the one in the center uses 4. This might look like an odd choice at first because it doesn't match their horizontal order on the playfield. But then you notice that ZUN uses this property in the respective phase control functions to iterate over only the subrange of active stones, and you realize how brilliant it actually is.

Screenshot of TH02's Five Magic Stones, with the first two (both internally and in the order you fight them in) alive and activated

Screenshot of TH02's Five Magic Stones, with the second two (both internally and in the order you fight them in) alive and activated

This seems like a really basic thing to get excited about, especially since the rest of their data layout sure isn't perfect. Splitting each piece of state and even the individual X and Y coordinates into separate 5-element arrays is still counter-productive because the game ends up paying more memory and CPU cycles to recalculate the element offsets over and over again than this would have ever saved in cache misses on a 486. But that's a minor issue that could be fixed with a few regex replacements, not a misdesigned architecture that would require a full rewrite to clean it up. Compared to the hardcoded and bloated mess that was 📝 YuugenMagan's five eyes, this is definitely an improvement worthy of the good-code tag. The first actual one in two years, and a welcome change after the Music Room!

These three pieces of data alone yielded a whopping 5% of overall TH02 PI in just ¹/₆th of a push, bringing that game comfortably over the 60% PI mark. MAINE.EXE is guaranteed to reach 100% PI before I start working on the non-ASCII translations, but at this rate, it might even be realistic to go for 100% PI on MAIN.EXE as well? Or at least technical position independence, without the false positives.

Next up: Shuusou Gyoku SC-88Pro BGM. It's going to be wild.

📝 Posted:: 2023-11-30 23:56 UTC
🚚 Summary of:: P0262, P0263
⌨ Commits:: ae2fc28...741d889, 741d889...46cd6e7
💰 Funded by:: Blue Bolt, [Anonymous]
🏷 Tags:

And once again, the Shuusou Gyoku task was too complex to be satisfyingly solved within a single month. Even just finding provably correct loop sections in both the original and arranged MIDI files required some rather involved detection algorithms. I could have just defined what sounded like correct loops, but the results of these algorithms were quite surprising indeed. Turns out that not even Seihou is safe from ZUN quirks, and some tracks technically loop much later than you'd think they do, or don't loop at all. And since I then wanted to put these MIDI loops back into the game to ensure perfect synchronization between the recordings and MIDI versions, I ended up rewriting basically all the MIDI code in a cross-platform way. This rewrite also uncovered a pbg bug that has traveled from Shuusou Gyoku into Windows Touhou, where it survived until ZUN ultimately removed all MIDI code in TH11 (!)…

Fortunately, the backlog still had enough general PC-98 Touhou funds that I could spend on picking some soon-important low-hanging fruit, giving me something to deliver for the end of the month after all. TH04 and TH05 use almost identical code for their main/option menus, so decompiling it would make number go up quite significantly and the associated blog post won't be that long…

Wait, what's this, a bug report from touhou-memories concerning the website?

Tab switchers tended to break on certain Firefox versions, and
video playback didn't work on Microsoft Edge at all?

Those are definitely some high-priority bugs that demand immediate attention.

Microsoft Edge's anti-support of AV1
TH04/TH05's main/option menu
TH04/TH05's first-launch sound setup menu
TH05's title animation ☯️

The tab switcher issue was easily fixed by replacing the previous z-index trickery with a more robust solution involving the hidden attribute. The second one, however, is much more aggravating, because video playback on Edge has been broken ever since I 📝 switched the preferred video codec to AV1.
This goes so far beyond not supporting a specific codec. Usually, unsupported codecs aren't supposed to be an issue: As soon as you start using the HTML <video> tag, you'll learn that not every browser supports all codecs. And so you set up an encoding pipeline to serve each video in a mix of new and ancient formats, put the <source> tag of the most preferred codec first, and rest assured that browsers will fall back on the best-supported option as necessary. Except that Edge doesn't even try, and insists on staying on a non-playing AV1 video. 🙄

The codecs parameter for the <source> type attribute was the first potential solution I came across. Specifying the video codec down to the finest encoding details right in the HTML markup sounds like a good idea, similar to specifying sizes of images and videos to prevent layout reflows on long pages during the initial page load. So why was this the first time I heard of this feature? The fact that there isn't a simple ffprobe -show_html_codecs_string command to retrieve this string might already give a clue about how useful it is in practice. Instead, you have to manually piece the string together by grepping your way through all of a video's metadata…
…and then it still doesn't change anything about Edge's behavior, even when also specifying the string for the VP9 and VP8 sources. Calling the infamously ridiculous HTMLMediaElement.canPlayType() method with a representative parameter of "video/webm; codecs=av01.1.04M.08.0.000.01.13.00.0" explains why: Both the AV1-supporting Chrome and Edge return "probably", but only the former can actually play this format. 🤦

But wait, there is an AV1 video extension in the Microsoft Store that would add support to any unspecified favorite video app. Except that it stopped working inside Edge as of version 116. And even if it did: If you can't query the presence of this extension via JavaScript, it might as well not exist at all.
Not to mention that the favorite video app part is obviously a lie as a lot of widely preferred Windows video apps are bundled with their own codecs, and have probably long supported AV1.

In the end, there's no way around the utter desperation move of removing the AV1 <source> for Edge users. Serving each video in two other formats means that we can at least do something here – try visiting the GitHub release page of the P0234-1 TH01 Anniversary Edition build in Edge and you also don't get to see anything, because that video uses AV1 and GitHub understandably doesn't re-encode every uploaded video into a variety of old formats.
Just for comparison, I tried both that page and the ReC98 blog on an old Android 6 phone from 2014, and even that phone picked and played the AV1 videos with the latest available Chrome and Firefox versions. This was the phone whose available Firefox version didn't support VP9 in 2019, which was my initial reason for adding the VP8 versions. Looks like it's finally time to drop those… 🤔 Maybe in the far future once I start running out of space on this server.

Removing the <source> tags can be done in one of two places:

server-side, detecting Edge via the User-Agent header, or
client-side, using navigator.userAgentData.brands.

I went with 2) because more dynamic server-side code would only move us further away from static site generation, which would make a lot of sense as the next evolutionary step in the architecture of this website. The client-side solution is much simpler too, and we can defer the deletion until a user actually hovers over a specific video.
And while we're at it, let's also add a popup complaining about this whole state of affairs. Edge is heavily marketed inside Windows as "the modern browser recommended by Microsoft", and you sure wouldn't expect low-quality chroma-subsampled VP9 from such a tagline. With such a level of anti-support for AV1, Edge users deserve to know exactly what's going on, especially since this post also explains what they will encounter on other websites.

Alright, where was I? For TH01, the main menu was the last thing I decompiled before the 100% finalization mark, so it's rather anticlimactic to already cover the TH04/TH05 one now, with both of the games still being very far away from 100%, just because people will soon want to translate the description text in the bottom-right corner of the screen. But then again, the ZUN Soft logo animation would make for an even nicer final piece of decompiled code, especially since the bouncing-ball logo from TH01, TH02, and TH03 was the very first decompilation I did, all the way back in 2015.

The code quality of ZUN's VRAM-based menus has barely increased between TH01 and TH05. Both the top-level and option menu still need to know the bounding rectangle of the other one to unblit the right pixels when switching between the two. And since ZUN sure loved hardcoded and copy-pasted numbers in the PC-98 days, the coordinates both tend to be excessively large, and excessively wrong. Luckily, each menu item comes with its own correct unblitting rectangle, which avoids any graphical glitches that would otherwise occur.
As for actual observable quirks and bugs, these menus only contain one of each, and both are exclusive to TH04:

Quitting out of the Music Room moves the cursor to the Start option. In TH05, it stays on Music Room.
Changing the S.E. mode seems to do nothing within TH04's menus, and would only take effect if you also change the Music mode afterward, or launch into the game.

And yes, these videos do have a frame rate of 2 FPS.

Now that 100% finalization of their OP.EXE binaries is within reach, all this bloat made me think about the viability of a 📝 single-executable build for TH04's and TH05's debloated and anniversary versions. It would be really nice to have such a build ready before I start working on the non-ASCII translations – not just because they will be based on the anniversary branch by default, but also because it would significantly help their development if there are 4 fewer executables to worry about.
However, it's not as simple for these games as it was for TH01. The unique code in their OP.EXE and MAINE.EXE binaries is much larger than Borland's easily removed C++ exception handler, so I'd have to remove a lot more bloat to keep the resulting single binary at or below the size of the original MAIN.EXE. But I'm sure going to try.

Speaking of code that can be debloated for great effect: The second push of this delivery focused on the first-launch sound setup menu, whose BGM and sound effect submenus are almost complete code duplicates of each other. The debloated branch could easily remove more than half of the code in there, yielding another ≈800 bytes in case we need them.
If hex-editing MIKO.CFG is more convenient for you than deleting that file, you can set its first byte to FF to re-trigger this menu. Decompiling this screen was not only relevant now because it contains text rendered with font ROM glyphs and it would help dig our way towards more important strings in the data segment, but also because of its visual style. I can imagine many potential mods that might want to use the same backgrounds and box graphics for their menus.

TH04's first-launch sound setup menu, showing the BGM mode selection — How about an initial language selection menu in the same style?

TH05's first-launch sound setup menu, showing the sound effect mode selection — How about an initial language selection menu in the same style?

With the two submenus being shown in a fixed sequence, there's not a lot of room for the code to do anything wrong, and it's even more identical between the two games than the main menu already was. Thankfully, ZUN just reblits the respective options in the new color when moving the cursor, with no 📝 palette tricks. TH04's background image only uses 7 colors, so he could have easily reserved 3 colors for that. In exchange, the TH05 image gets to use the full 16 colors with no change to the code.

Rounding out this delivery, we also got TH05's rolling Yin-Yang Orb animation before the title screen… and it's just more bloat and landmines on a smaller scale that might be noticeable on slower PC-98 models. In total, there are three unnecessary inter-page copies of the entire VRAM that can easily insert lag frames, and two minor page-switching landmines that can potentially lead to tearing on the first frame of the roll or fade animation. Clearly, ZUN did not have smoothness or code quality in mind there, as evidenced by the fact that this animation simply displays 8 .PI files in sequence. But hey, a short animation like this is 📝 another perfectly appropriate place for a quick-and-dirty solution if you develop with a deadline.
And that's 1.30% of all PC-98 Touhou code finalized in two pushes! We're slowly running out of these big shared pieces of ASM code…

I've been neglecting TH03's OP.EXE quite a bit since it simply doesn't contain any translatable plaintext outside the Music Room. All menu labels are gaiji, and even the character selection menu displays its monochrome character names using the 4-plane sprites from CHNAME.BFT. Splitting off half of its data into a separate .ASM file was more akin to getting out a jackhammer to free up the room in front of the third remaining Music Room, but now we're there, and I can decompile all three of them in a natural way, with all referenced data.
Next up, therefore: Doing just that, securing another important piece of text for the upcoming non-ASCII translations and delivering another big piece of easily finalized code. I'm going to work full-time on ReC98 for almost all of December, and delivering that and the Shuusou Gyoku SC-88Pro recording BGM back-to-back should free up about half of the slightly higher cap for this month.

📝 Posted:: 2021-12-27 18:19 UTC
🚚 Summary of:: P0172, P0173
⌨ Commits:: 49e6789...2d5491e, 2d5491e...27f901c
💰 Funded by:: Blue Bolt, [Anonymous]
🏷 Tags:

TH03 finally passed 20% RE, and the newly decompiled code contains no serious ZUN bugs! What a nice way to end the year.

There's only a single unlockable feature in TH03: Chiyuri and Yumemi as playable characters, unlocked after a 1CC on any difficulty. Just like the Extra Stages in TH04 and TH05, YUME.NEM contains a single designated variable for this unlocked feature, making it trivial to craft a fully unlocked score file without recording any high scores that others would have to compete against. So, we can now put together a complete set for all PC-98 Touhou games: 2021-12-27-Fully-unlocked-clean-score-files.zip It would have been cool to set the randomly generated encryption keys in these files to a fixed value so that they cancel out and end up not actually encrypting the file. Too bad that TH03 also started feeding each encrypted byte back into its stream cipher, which makes this impossible.

The main loading and saving code turned out to be the second-cleanest implementation of a score file format in PC-98 Touhou, just behind TH02. Only two of the YUME.NEM functions come with nonsensical differences between OP.EXE and MAINL.EXE, rather than 📝 all of them, as in TH01 or 📝 too many of them, as in TH04 and TH05. As for the rest of the per-difficulty structure though… well, it quickly becomes clear why this was the final score file format to be RE'd. The name, score, and stage fields are directly stored in terms of the internal REGI*.BFT sprite IDs used on the high score screen. TH03 also stores 10 score digits for each place rather than the 9 possible ones, keeps any leading 0 digits, and stores the letters of entered names in reverse order… yeah, let's decompile the high score screen as well, for a full understanding of why ZUN might have done all that. (Answer: For no reason at all. )

And wow, what a breath of fresh air. It's surely not good-code: The overlapping shadows resulting from using a 24-pixel letterspacing with 32-pixel glyphs in the name column led ZUN to do quite a lot of unnecessary and slightly confusing rendering work when moving the cursor back and forth, and he even forgot about the EGC there. But it's nowhere close to the level of jank we saw in 📝 TH01's high score menu last year. Good to see that ZUN had learned a thing or two by his third game – especially when it comes to storing the character map cursor in terms of a character ID, and improving the layout of the character map:

The alphabet available for TH03 high score names.

That's almost a nicely regular grid there. With the question mark and the double-wide SP, BS, and END options, the cursor movement code only comes with a reasonable two exceptions, which are easily handled. And while I didn't get this screen completely decompiled, one additional push was enough to cover all important code there.

The only potential glitch on this screen is a result of ZUN's continued use of binary-coded decimal digits without any bounds check or cap. Like the in-game HUD score display in TH04 and TH05, TH03's high score screen simply uses the next glyph in the character set for the most significant digit of any score above 1,000,000,000 points – in this case, the period. Still, it only really gets bad at 8,000,000,000 points: Once the glyphs are exhausted, the blitting function ends up accessing garbage data and filling the entire screen with garbage pixels. For comparison though, the current world record is 133,650,710 points, so good luck getting 8 billion in the first place.

Next up: Starting 2022 with the long-awaited decompilation of TH01's Sariel fight! Due to the 📝 recent price increase, we now got a window in the cap that is going to remain open until tomorrow, providing an early opportunity to set a new priority after Sariel is done.

📝 Posted:: 2021-11-08 10:59 UTC
🚚 Summary of:: P0165, P0166, P0167
⌨ Commits:: 7a0e5d8...f2bca01, f2bca01...e697907, e697907...c2de6ab
💰 Funded by:: Ember2528
🏷 Tags:

OK, TH01 missile bullets. Can we maybe have a well-behaved entity type, without any weirdness? Just once?

Ehh, kinda. Apart from another 150 bytes wasted on unused structure members, this code is indeed more on the low end in terms of overall jank. It does become very obvious why dodging these missiles in the YuugenMagan, Mima, and Elis fights feels so awful though: An unfair 46×46 pixel hitbox around Reimu's center pixel, combined with the comeback of 📝 interlaced rendering, this time in every stage. ZUN probably did this because missiles are the only 16×16 sprite in TH01 that is blitted to unaligned X positions, which effectively ends up touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would have been totally possible to render every missile in every frame at roughly the same amount of CPU time that the original game uses for interlaced rendering:

Note that all missile sprites only use two colors, white and green.
Instead of naively going with the usual four bitplanes, extract the pixels drawn in each of the two used colors into their own bitplanes. master.lib calls this the "tiny format".
Use the GRCG to draw these two bitplanes in the intended white and green colors, halving the amount of VRAM writes compared to the original function.
(Not using the .PTN format would have also avoided the inconsistency of storing the missile sprites in boss-specific sprite slots.)

That's an optimization that would have significantly benefitted the game, in contrast to all of the fake ones introduced in later games. Then again, this optimization is actually something that the later games do, and it might have in fact been necessary to achieve their higher bullet counts without significant slowdown.

Unfortunately, it was only worth decompiling half of the missile code right now, thanks to gratuitous FPU usage in the other half, where 📝 double variables are compared to float literals. That one will have to wait 📝 until after SinGyoku.

After some effectively unused Mima sprite effect code that is so broken that it's impossible to make sense out of it, we get to the final feature I wanted to cover for all bosses in parallel before returning to Sariel: The separate sprite background storage for moving or animated boss sprites in the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from main memory on every frame. After all, TH01 and TH02 had a minimum required clock speed of 33 MHz, half of the speed required for the later three games. So, he simply blitted these boss sprites to both VRAM pages, leading the usual unblitting calls to only remove the other sprites on top of the boss. However, these bosses themselves want to move across the screen… and this makes it necessary to save the stage background behind them in some other way.

Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM into a sprite slot. No problem with that approach in theory, as the size of all these bigger sprites is a multiple of 32×32; splitting a larger sprite into these smaller 32×32 chunks makes the code look just a little bit clumsy (and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is blitted on top of her regular sprite, using just regular sprite transparency, she ends up with her infamous third arm:

Ironically, there's an unused code path in Mima's unblit function where ZUN assumes a height of 48 pixels for Mima's animation sprites rather than the actual 64. This leads to even clumsier .PTN function calls for the bottom 128×16 pixels… Failing to unblit the bottom 16 pixels would have also yielded that third arm, although it wouldn't have looked as natural. Still wouldn't say that it was intentional; maybe this casting sprite was just added pretty late in the game's development?

So, mission accomplished, Sariel unblocked… at 2¼ pushes. That's quite some time left for some smaller stage initialization code, which bundles a bunch of random function calls in places where they logically really don't belong. The stage opening animation then adds a bunch of VRAM inter-page copies that are not only redundant but can't even be understood without knowing the hidden internal state of the last VRAM page accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any complexity limit on inlining arithmetic expressions, as long as they only operate on compile-time constants. That's how we get macro-free, compile-time Shift-JIS to JIS X 0208 conversion of the individual code points in the 東方★靈異伝 string, in a compiler from 1994. As long as you don't store any intermediate results in variables, that is…

But wait, there's more! With still ¼ of a push left, I also went for the boss defeat animation, which includes the route selection after the SinGyoku fight.
As in all other instances, the 2× scaled font is accomplished by first rendering the text at regular 1× resolution to the other, invisible VRAM page, and then scaled from there to the visible one. However, the route selection is unique in that its scaled text is both drawn transparently on top of the stage background (not onto a black one), and can also change colors depending on the selection. It would have been no problem to unblit and reblit the text by rendering the 1× version to a position on the invisible VRAM page that isn't covered by the 2× version on the visible one, but ZUN (needlessly) clears the invisible page before rendering any text. Instead, he assigned a separate VRAM color for both the 魔界 and 地獄 options, and only changed the palette value for these colors to white or gray, depending on the correct selection. This is another one of the 📝 rare cases where TH01 demonstrates good use of PC-98 hardware, as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.

Then, why does this still not count as good-code? When changing palette colors, you kinda need to be aware of everything else that can possibly be on screen, which colors are used there, and which aren't and can therefore be used for such an effect without affecting other sprites. In this case, well… hover over the image below, and notice how Reimu's hair and the bomb sprites in the HUD light up when Makai is selected:

Demonstration of palette changes in TH01's route selection

This push did end on a high note though, with the generic, non-SinGyoku version of the defeat animation being an easily parametrizable copy. And that's how you decompile another 2.58% of TH01 in just slightly over three pushes.

Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and SinGyoku without needing any more detours into non-boss code. Thanks to the current TH01 funding subscriptions, I can plan to cover most, if not all, of Sariel in a single push series, but the currently 3 pending pushes probably won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got quite a lot of not specifically TH01-related funds in the backlog to pass the time though.

Due to recent developments, it actually makes quite a lot of sense to take a break from TH01: spaztron64 has managed what every Touhou download site so far has failed to do: Bundling all 5 game onto a single .HDI together with pre-configured PC-98 emulators and a nice boot menu, and hosting the resulting package on a proper website. While this first release is already quite good (and much better than my attempt from 2014), there is still a bit of room for improvement to be gained from specific ReC98 research. Next up, therefore:

Researching how TH04 and TH05 use EMS memory, together with the cause behind TH04's crash in Stage 5 when playing as Reimu without an EMS driver loaded, and
reverse-engineering TH03's score data file format (YUME.NEM), which hopefully also comes with a way of building a file that unlocks all characters without any high scores.

📝 Posted:: 2021-03-20 20:19 UTC
🚚 Summary of:: P0135, P0136
⌨ Commits:: a6eed55...252c13d, 252c13d...07bfcf2
💰 Funded by:: [Anonymous]
🏷 Tags:

Alright, no more big code maintenance tasks that absolutely need to be done right now. Time to really focus on parts 6 and 7 of repaying technical debt, right? Except that we don't get to speed up just yet, as TH05's barely decompilable PMD file loading function is rather… complicated.
Fun fact: Whenever I see an unusual sequence of x86 instructions in PC-98 Touhou, I first consult the disassembly of Wolfenstein 3D. That game was originally compiled with the quite similar Borland C++ 3.0, so it's quite helpful to compare its ASM to the officially released source code. If I find the instructions in question, they mostly come from that game's ASM code, leading to the amusing realization that "even John Carmack was unable to get these instructions out of this compiler" This time though, Wolfenstein 3D did point me to Borland's intrinsics for common C functions like memcpy() and strchr(), available via #pragma intrinsic. Bu~t those unfortunately still generate worse code than what ZUN micro-optimized here. Commenting how these sequences of instructions should look in C is unfortunately all I could do here.
The conditional branches in this function did compile quite nicely though, clarifying the control flow, and clearly exposing a ZUN bug: TH05's snd_load() will hang in an infinite loop when trying to load a non-existing -86 BGM file (with a .M2 extension) if the corresponding -26 BGM file (with a .M extension) doesn't exist either.

Unsurprisingly, the PMD channel monitoring code in TH05's Music Room remains undecompilable outside the two most "high-level" initialization and rendering functions. And it's not because there's data in the middle of the code segment – that would have actually been possible with some #pragmas to ensure that the data and code segments have the same name. As soon as the SI and DI registers are referenced anywhere, Turbo C++ insists on emitting prolog code to save these on the stack at the beginning of the function, and epilog code to restore them from there before returning. Found that out in September 2019, and confirmed that there's no way around it. All the small helper functions here are quite simply too optimized, throwing away any concern for such safety measures. 🤷
Oh well, the two functions that were decompilable at least indicate that I do try.

Within that same 6th push though, we've finally reached the one function in TH05 that was blocking further progress in TH04, allowing that game to finally catch up with the others in terms of separated translation units. Feels good to finally delete more of those .ASM files we've decompiled a while ago… finally!

But since that was just getting started, the most satisfying development in both of these pushes actually came from some more experiments with macros and inline functions for near-ASM code. By adding "unused" dummy parameters for all relevant registers, the exact input registers are made more explicit, which might help future port authors who then maybe wouldn't have to look them up in an x86 instruction reference quite as often. At its best, this even allows us to declare certain functions with the __fastcall convention and express their parameter lists as regular C, with no additional pseudo-registers or macros required.
As for output registers, Turbo C++'s code generation turns out to be even more amazing than previously thought when it comes to returning pseudo-registers from inline functions. A nice example for how this can improve readability can be found in this piece of TH02 code for polling the PC-98 keyboard state using a BIOS interrupt:

inline uint8_t keygroup_sense(uint8_t group) {
	_AL = group;
	_AH = 0x04;
	geninterrupt(0x18);
	// This turns the output register of this BIOS call into the return value
	// of this function. Surprisingly enough, this does *not* naively generate
	// the `MOV AL, AH` instruction you might expect here!
	return _AH;
}

void input_sense(void)
{
	// As a result, this assignment becomes `_AH = _AH`, which Turbo C++
	// never emits as such, giving us only the three instructions we need.
	_AH = keygroup_sense(8);

	// Whereas this one gives us the one additional `MOV BH, AH` instruction
	// we'd expect, and nothing more.
	_BH = keygroup_sense(7);

	// And now it's obvious what both of these registers contain, from just
	// the assignments above.
	if(_BH & K7_ARROW_UP || _AH & K8_NUM_8) {
		key_det |= INPUT_UP;
	}
	// […]
}

I love it. No inline assembly, as close to idiomatic C code as something like this is going to get, yet still compiling into the minimum possible number of x86 instructions on even a 1994 compiler. This is how I keep this project interesting for myself during chores like these. We might have even reached peak inline already?

And that's 65% of technical debt in the SHARED segment repaid so far. Next up: Two more of these, which might already complete that segment? Finally!

📝 Posted:: 2020-11-03 01:44 UTC
🚚 Summary of:: P0124, P0125
⌨ Commits:: 72dfa09...056b1c7, 056b1c7...f6a3246
💰 Funded by:: Blue Bolt, [Anonymous]
🏷 Tags:

Turns out that TH04's player selection menu is exactly three times as complicated as TH05's. Two screens for character and shot type rather than one, and a way more intricate implementation for saving and restoring the background behind the raised top and left edges of a character picture when moving the cursor between Reimu and Marisa. TH04 decides to backup precisely only the two 256×8 (top) and 8×244 (left) strips behind the edges, indicated in red in the picture below.

Backed-up VRAM area in TH04's player character selection

These take up just 4 KB of heap memory… but require custom blitting functions, and expanding this explicitly hardcoded approach to TH05's 4 characters would have been pretty annoying. So, rather than, uh, not explicitly hardcoding it all, ZUN decided to just be lazy with the backup area in TH05, saving the entire 640×400 screen, and thus spending 128 KB of heap memory on this rather simple selection shadow effect.

So, this really wasn't something to quickly get done during the first half of a push, even after already having done TH05's equivalent of this menu. But since life is very busy right now, I also used the occasion to start addressing another code organization annoyance: master.lib's single master.h header file.

Now that ReC98 is trying to develop (or at least mimic) a more type-safe C++ foundation to model the PC-98 hardware, a pure C header (with counter-productive C++ extensions) is becoming increasingly unidiomatic. By moving some of the original assumptions about function parameters into the type system, we can also reduce the reliance on its Japanese-only documentation without having to translate it
It's far from complete with regards to providing compile-time PC-98 hardware constants and helpful types. In fact, we started to add these in our own header files a long time ago.
It's quite bloated, with at least 2800 lines of code that currently are #included into the vast majority of files, not counting master.h's recursively included C standard library headers. PC-98 Touhou only makes direct use of a rather small fraction of its contents.
And finally, all the DOS/V compatibility definitions are especially useless in the context of ReC98. As I've noted 📝 time and 📝 time again, porting PC-98 Touhou to IBM-compatible DOS won't be easy, and MASTER_DOSV won't be helping much. Therefore, my upstream version of ReC98 will never include all of master.lib. There's no point in lengthening compile times for everyone by default, and those will be getting quite noticeable after moving to a full 16-bit build process.
(Actually, what retro system ports should rather be doing: Get rid of master.lib's original ASM code, replace it with readable, modern C++, and then simply convert the optimized assembly output of modern compilers to your ISA of choice. Improving the landscape of such assembly or object file converters would benefit everyone!)

So, time to start a new master.hpp header that would contain just the declarations from master.h that PC-98 Touhou actually needs, plus some semantic (yes, semantic) sugar. Comparing just the old master.h to just the new master.hpp after roughly 60% of the transition has been completed, we get median build times of 319 ms for master.h, and 144 ms for master.hpp on my (admittedly rather slow) DOSBox setup. Nice!
As of this push, ReC98 consists of 107 translation units that have to be compiled with Turbo C++ 4.0J. Fully rebuilding all of these currently takes roughly 37.5 seconds in DOSBox. After the transition to master.hpp is done, we could therefore shave some 10 to 15 seconds off this time, simply by switching header files. And that's just the beginning, as this will also pave the way for further #include optimizations. Life in this codebase will be great!

Unfortunately, there wasn't enough time to repay some of the actual technical debt I was looking forward to, after all of this. Oh well, at least we now also have nice identifiers for the three different boldface options that are used when rendering text to VRAM, after procrastinating that issue for almost 11 months. Next up, assuming the existing subscriptions: More ridiculous decompilations of things that definitely weren't originally written in C, and a big blocker in TH03's MAIN.EXE.

📝 Posted:: 2020-09-21 13:01 UTC
🚚 Summary of:: P0119
⌨ Commits:: cbf14eb...453dd3c
💰 Funded by:: [Anonymous], -Tom-
🏷 Tags:

So, TH05 OP.EXE. The first half of this push started out nicely, with an easy decompilation of the entire player character selection menu. Typical ZUN quality, with not much to say about it. While the overall function structure is identical to its TH04 counterpart, the two games only really share small snippets inside these functions, and do need to be RE'd separately.

The high score viewing (not registration) menu would have been next. Unfortunately, it calls one of the GENSOU.SCR loading functions… which are all a complete mess that still needed to be sorted out first. 5 distinct functions in 6 binaries, and of course TH05 also micro-optimized its MAIN.EXE version to directly use the DOS INT 21h file loading API instead of master.lib's wrappers. Could have all been avoided with a single method on the score data structure, taking a player character ID and a difficulty level as parameters…

So, no score menu in this push then. Looking at the other end of the ASM code though, we find the starting functions for the main game, the Extra Stage, and the demo replays, which did fit perfectly to round out this push.

Which is where we find an easter egg! 🥚 If you've ever looked into 怪綺談2.DAT, you might have noticed 6 .REC files with replays for the Demo Play mode. However, the game only ever seems to cycle between 4 replays. So what's in the other two, and why are they 40 KB instead of just 10 KB like the others? Turns out that they combine into a full Extra Stage Clear replay with Mima, with 3 bombs and 1 death, obviously recorded by ZUN himself. The split into two files for the stage (DEMO4.REC) and boss (DEMO5.REC) portion is merely an attempt to limit the amount of simultaneously allocated heap memory.
To watch this replay without modding the game, unlock the Extra Stage with all 4 characters, then hold both the ⬅️ left and ➡️ right arrow keys in the main menu while waiting for the usual demo replay. I can't possibly be the first one to discover this, but I couldn't find any other mention of it.
Edit (2021-03-15): ZUN did in fact document this replay in Section 6 of TH05's OMAKE.TXT, along with the exact method to view it. Thanks to Popfan for the discovery!

Here's a recording of the whole replay:

Note how the boss dialogue is skipped. MAIN.EXE actually contains no less than 6 if() branches just to distinguish this overly long replay from the regular ones.

I'd really like to do the TH04 and TH05 main menus in parallel, since we can expect a bit more shared code after all the initial differences. Therefore, I'm going to put the next "anything" push towards covering the TH04 version of those functions. Next up though, it's back to TH01, with more redundant image format code…

📝 Posted:: 2020-09-12 10:09 UTC
🚚 Summary of:: P0115, P0116
⌨ Commits:: 967bb8b...e5328a3, e5328a3...03048c3
💰 Funded by:: Lmocinemod, Blue Bolt, [Anonymous]
🏷 Tags:

Finally, after a long while, we've got two pushes with barely anything to talk about! Continuing the road towards 100% PI for TH05, these were exactly the two pushes that TH05 MAINE.EXE PI was estimated to additionally cost, relative to TH04's. Consequently, they mostly went to TH05's unique data structures in the ending cutscenes, the score name registration menu, and the staff roll.

A unique feature in there is TH05's support for automatic text color changes in its ending scripts, based on the first full-width Shift-JIS codepoint in a line. The \c=codepoint,color commands at the top of the _ED??.TXT set up exactly this codepoint→color mapping. As far as I can tell, TH05 is the only Touhou game with a feature like this – even the Windows Touhou games went back to manually spelling out each color change.

The orb particles in TH05's staff roll also try to be a bit unique by using 32-bit X and Y subpixel variables for their current position. With still just 4 fractional bits, I can't really tell yet whether the extended range was actually necessary. Maybe due to how the "camera scrolling" through "space" was implemented? All other entities were pretty much the usual fare, though.
12.4, 4.4, and now a 28.4 fixed-point format… yup, 📝 C++ templates were definitely the right choice.

At the end of its staff roll, TH05 not only displays the usual performance verdict, but then scrolls in the scores at the end of each stage before switching to the high score menu. The simplest way to smoothly scroll between two full screens on a PC-98 involves a separate bitmap… which is exactly what TH05 does here, reserving 28,160 bytes of its global data segment for just one overly large monochrome 320×704 bitmap where both the screens are rendered to. That's… one benefit of splitting your game into multiple executables, I guess?
Not sure if it's common knowledge that you can actually scroll back and forth between the two screens with the Up and Down keys before moving to the score menu. I surely didn't know that before. But it makes sense – might as well get the most out of that memory.

The necessary groundwork for all of this may have actually made TH04's (yes, TH04's) MAINE.EXE technically position-independent. Didn't quite reach the same goal for TH05's – but what we did reach is ⅔ of all PC-98 Touhou code now being position-independent! Next up: Celebrating even more milestones, as -Tom- is about to finish development on his TH05 MAIN.EXE PI demo…

📝 Posted:: 2020-05-25 14:05 UTC
🚚 Summary of:: P0092, P0093, P0094
⌨ Commits:: 29c5a73...4403308, 4403308...0e73029, 0e73029...57a8487
💰 Funded by:: Yanga, Ember2528
🏷 Tags:

Three pushes to decompile the TH01 high score menu… because it's completely terrible, and needlessly complicated in pretty much every aspect:

Another, final set of differences between the REIIDEN.EXE and FUUIN.EXE versions of the code. Which are so insignificant that it must mean that ZUN kept this code in two separate, manually and imperfectly synced files. The REIIDEN.EXE version, only shown when game-overing, automatically jumps to the enter/終 button after the 8th character was entered, and also has a completely invisible timeout that force-enters a high score name after 1000… key presses? Not frames? Why. Like, how do you even realistically such a number. (Best guess: It's a hidden easter egg to amuse players who place drinking glasses on cursor keys. Or beer bottles.)
That's all the differences that are maybe visible if you squint hard enough. On top of that though, we got a bunch of further, minor code organization differences that serve no purpose other than to waste decompilation time, and certainly did their part in stretching this out to 3 pushes instead of 2.

Entered names are restricted to a set of 16-bit, full-width Shift-JIS codepoints, yet are still accessed as 8-bit byte arrays everywhere. This bloats both the C++ and generated ASM code with needless byte splits, swaps, and bit shifts. Same for the route kanji. You have this 16-, heck, even 32-bit CPU, why not use it?! (Fun fact: FUUIN.EXE is explicitly compiled for a 80186, for the most part – unlike REIIDEN.EXE, which does use Turbo C++'s 80386 mode.)

The sensible way of storing the current position of the alphabet cursor would simply be two variables, indicating the logical row and column inside the character map. When rendering, you'd then transform these into screen space. This can keep the on-screen position constants in a single place of code.
TH01 does the opposite: The selected character is stored directly in terms of its on-screen position, which is then mapped back to a character index for every processed input and the subsequent screen update. There's no notion of a logical row or column anywhere, and consequently, the position constants are vomited all over the code.

Which might not be as bad if the character map had a uniform grid structure, with no gaps. But the one in TH01 looks like this:
And with no sense of abstraction anywhere, both input handling and rendering end up with a separate if branch for at least 4 of the 6 rows.

In the end, I just gave up with my usual redundancy reduction efforts for this one. Anyone wanting to change TH01's high score name entering code would be better off just rewriting the entire thing properly.

And that's all of the shared code in TH01! Both OP.EXE and FUUIN.EXE are now only missing the actual main menu and ending code, respectively. Next up, though: The long awaited TH01 PI push. Which will not only deliver 100% PI for OP.EXE and FUUIN.EXE, but also probably quite some gains in REIIDEN.EXE. With now over 30% of the game decompiled, it's about time we get to look at some gameplay code!

📝 Posted:: 2020-05-12 14:07 UTC
🚚 Summary of:: P0090, P0091
⌨ Commits:: 90252cc...07dab29, 07dab29...29c5a73
💰 Funded by:: Yanga, Ember2528
🏷 Tags:

Back to TH01, and its high score menu… oh, wait, that one will eventually involve keyboard input. And thanks to the generous TH01 funding situation, there's really no reason not to cover that right now. After all, TH01 is the last game where input still hadn't been RE'd.
But first, let's also cover that one unused blitting function, together with REIIDEN.CFG loading and saving, which are in front of the input function in OP.EXE… (By now, we all know about the hidden start bomb configuration, right?)

Unsurprisingly, the earliest game also implements input in the messiest way, with a different function for each of the three executables. "Because they all react differently to keyboard inputs ", apparently? OP.EXE even has two functions for it, one for the START / CONTINUE / OPTION / QUIT main menu, and one for both Option and Music Test menus, both of which directly perform the ring arithmetic on the menu cursor variable. A consistent separation of keyboard polling from input processing apparently wasn't all too obvious of a thought, since it's only truly done from TH02 on.

This lack of proper architecture becomes actually hilarious once you notice that it did in fact facilitate a recursion bug! In case you've been living under a rock for the past 8 years, TH01 shipped with debugging features, which you can enter by running the game via game d from the DOS prompt. These features include a memory info screen, shown when pressing PgUp, implemented as one blocking function (test_mem()) called directly in response to the pressed key inside the polling function. test_mem() only returns once that screen is left by pressing PgDown. And in order to poll input… it directly calls back into the same polling function that called it in the first place, after a 3-frame delay.

Which means that this screen is actually re-entered for every 3 frames that the PgUp key is being held. And yes, you can, of course, also crash the system via a stack overflow this way by holding down PgUp for a few seconds, if that's your thing.
Edit (2020-09-17): Here's a video from spaztron64, showing off this exact stack overflow crash while running under the VEM486 memory manager, which displays additional information about these sorts of crashes:

What makes this even funnier is that the code actually tracks the last state of every polled key, to prevent exactly that sort of bug. But the copy-pasted assignment of the last input state is only done after test_mem() already returned, making it effectively pointless for PgUp. It does work as intended for PgDown… and that's why you have to actually press and release this key once for every call to test_mem() in order to actually get back into the game. Even though a single call to PgDown will already show the game screen again.

In maybe more relevant news though, this function also came with what can be considered the first piece of actual gameplay logic! Bombing via double-tapping the Z and X keys is also handled here, and now we know that both keys simply have to be tapped twice within a window of 20 frames. They are tracked independently from each other, so you don't necessarily have to press them simultaneously.
In debug mode, the bomb count tracks precisely this window of time. That's why it only resets back to 0 when pressing Z or X if it's ≥20.

Sure, TH01's code is expectedly terrible and messy. But compared to the micro-optimizations of TH04 and TH05, it's an absolute joy to work on, and opening all these ZUN bug loot boxes is just the icing on the cake. Looking forward to more of the high score menu in the next pushes!

📝 Posted:: 2019-12-29 20:23 UTC
🚚 Summary of:: P0064
⌨ Commits:: 80cec5b...9faa29a
💰 Funded by:: Touhou Patch Center
🏷 Tags:

🎉 TH04's and TH05's OP.EXE are now fully position-independent! 🎉

What does this mean?

You can now add any data or code to the main menus of the two games, by simply editing the ReC98 source, writing your mod in ASM or C/C++, and recompiling the code. Since all absolute memory addresses have now been converted to labels, this will work without causing any instability. See the position independence section in the FAQ for a more thorough explanation about why this was a problem.

What does this not mean?

The original ZUN code hasn't been completely reverse-engineered yet, let alone decompiled. Pretty much all of that is still ASM, which might make modding a bit inconvenient right now.

Since this push was otherwise pretty unremarkable, I made a video demonstrating a few basic things you can do with this:

Now, what to do for the last outstanding Touhou Patch Center push? Bullets, or resident structures?

⮜ Blog

⮜ List of tags

Showing all posts tagged menu-

ssg_04.mid

ssg_05.mid

ssg_10.mid

ssg_04.mid

ssg_05.mid

ssg_10.mid

What does this mean?

What does this not mean?

Showing all posts tagged

`ssg_04.mid`

`ssg_05.mid`

`ssg_10.mid`

`ssg_04.mid`

`ssg_05.mid`

`ssg_10.mid`