⮜ Blog

⮜ List of tags

Showing all posts tagged
,
and

📝 Posted:
🚚 Summary of:
P0266, P0267, P0268, P0269, P0270, P0271, P0272, P0273, P0274, P0275, P0276, P0277
Commits:
(mly) eb2b0c8...b356884, (mly) b356884...1c70db0, (mly) 1c70db0...db0c195, (BGM packs) 2f9bce5...45087c2, (Seihou) P0256...a9ca081, (Seihou) a9ca081...8db918f, (Seihou) 8db918f...3de48ab, (Seihou) 3de48ab...9467705, (Seihou) 9467705...241a6c9, (Seihou) 241a6c9...P0275, (Seihou) dbc369f...883ac40, (Seihou) 883ac40...6ac72f3
💰 Funded by:
Ember2528, [Anonymous]
🏷 Tags:

📝 Over two years since the previous largest delivery, we've now got a new record in every regard: 12 pushes across 5 repos, 215 commits, and a blog post with over 14,000 words and 48 pieces of media. 😱 Who would have thought that the superficially simple task of putting SC-88Pro recordings into Shuusou Gyoku would actually mainly focus on deep research into the underlying MIDI files? I don't typically cover much music-related content because it's a non-issue as far as PC-98 Touhou code is concerned, so it's quite fitting how extensive this one turned out. So here we go, the result of virtually unlimited funding and patience:

  1. The SC-88Pro recording controversy
  2. Undefined SysEx behavior
  3. Resolving the controversy, and making a choice (contains personal opinion)
  4. A Unix-style command-line MIDI filter (in Rust BTW)
  5. Visualizing MIDI files (for science, and not for playing them on a keyboard)
  6. Shuusou Gyoku's individual loop quirks 🎺
  7. Rewriting pbg's MIDI code
  8. Putting together the BGM packs
  9. Outgrowing miniaudio (and raging about single-file C libraries for a while)
  10. Remaining implementation details
  11. Pricing changes (and no, not everything's getting more expensive)

So where's the controversy? Romantique Tp obviously made the best and most careful real-hardware SC-88Pro recordings of all of ZUN's old MIDIs, including the original (OST) and arranged (AST) soundtrack of Shuusou Gyoku, right? Surely all I have to do now is to cut them into seamless loops to save a bit of disk space, and then put them into the game? Let's start at the end of the track list with the name registration theme, since it's light on instruments and has an obvious loop point that will be easy to spot in the waveform. But, um… wait a moment, that very first drum note comes a bit late, doesn't it?

This can also be heard in Romantique Tp's YouTube upload.
At a notated tempo of 96 BPM, these first four beats should take exactly 2.5 seconds, which they do in this seamlessly looping softsynth rendering.

That's… not quite the accuracy and perfection I was expecting. :thonk: But I think I know what we're seeing and hearing there. Let's look at the first few MIDI events on the drum channel:

Delta	Pulse	 Beat	Channel	Event
 +540	   960	  2:000	      1	Controller { CC   0, value   0 }
   +0	   960	  2:000	      1	Controller { CC  32, value   0 }
   +0	   960	  2:000	      1	ProgramChange {  37 }
[…]
   +0	   960	  2:000	      2	Controller { CC   0, value   0 }
   +0	   960	  2:000	      2	Controller { CC  32, value   0 }
   +0	   960	  2:000	      2	ProgramChange {  19 }
[…]
   +0	   960	  2:000	      3	Controller { CC   0, value   0 }
   +0	   960	  2:000	      3	Controller { CC  32, value   0 }
   +0	   960	  2:000	      3	ProgramChange {   6 }
[…]
   +0	   960	  2:000	      4	Controller { CC   0, value   0 }
   +0	   960	  2:000	      4	Controller { CC  32, value   0 }
   +0	   960	  2:000	      4	ProgramChange {   2 }
[…]
Delta	Pulse	 Beat	Channel	Event
   +0	  960	2:000	     10	Controller { CC   0, value   0 }
   +0	  960	2:000	     10	Controller { CC  32, value   0 }
   +0	  960	2:000	     10	ProgramChange {  25 }
   +0	  960	2:000	     10	Controller { CC   7, value 127 }
   +0	  960	2:000	     10	Controller { CC  11, value 127 }
   +0	  960	2:000	     10	Controller { CC  10, value  64 }
   +0	  960	2:000	     10	Controller { CC  91, value  80 }
   +0	  960	2:000	     10	Controller { CC  93, value  40 }
   +0	  960	2:000	     10	NoteOn { Key  42, Vel.  94 }
   +0	  960	2:000	     10	NoteOn { Key  36, Vel. 110 }
   +1	  961	2:001	     10	NoteOn { Key  42, Vel.   0 }
   +0	  961	2:001	     10	NoteOn { Key  36, Vel.   0 }
 +119	 1080	2:120	     10	NoteOn { Key  42, Vel.  34 }
   +1	 1081	2:121	     10	NoteOn { Key  42, Vel.   0 }
 +119	 1200	2:240	     10	NoteOn { Key  42, Vel.  64 }
   +0	 1200	2:240	     10	NoteOn { Key  36, Vel.  64 }
Also, the fact that GS doesn't put its drums on a non-general voice bank and instead relies on external channel configuration to differentiate drums from pitched instruments is making this Yamaha kid uncontrollably furious. 🤬

Yup. That's the sound of a vintage hardware synth being slow and taking a two-digit number of milliseconds to process a barrage of simultaneous Program Change messages, playing a MIDI file that doesn't take this reality into account and expects program changes to happen instantly.
I can only speak from my own experience of writing MIDIs for hardware synths here, but having the first note displaced by 50 ms is very much not the way a composer would have intended the music to be heard if the note is clearly notated to occur on the beat. If you had told me about such an issue when playing one of my MIDIs on a certain synth, I would have thanked you for the bug report! And I would have promptly released a fixed version of the MIDI with the Program Change events moved back by a beat or two. In the case of Shuusou Gyoku's MIDIs, this wouldn't even have added any additional delay in-game, as all of these files already start with at least one beat of leading silence to make room for setting Roland-specific synth parameters.

OK, but that's just a single isolated bass drum hit. If we wanted to, we could even fix this issue ourselves by splicing the same note from around the loop end point. Maybe this is just an isolated case and the rest of Romantique Tp's recordings are fine? Well…

Again, check Romantique Tp's YouTube upload for proof.
By the way, this seamless audio player is what consumed most of the two website pushes this time. The rest went to the slightly redesigned main page, whose progress bars now use the cap bar style and the GitHub badge colors.

This one is even worse. Here, the delay is so long relative to the tempo of the piece that the intended five drum hits pretty much turn into four.

This type of issue doesn't even have to be isolated to the very beginning of a piece. A few of the tracks in both the OST and AST start with an anacrusis on just one or two channels and leave the Program Change event barrage at the beginning of the first full measure. In 幻想科学 ~ Doll's Phantom for example, this creates a flam-like glitch where the bass on channel 2 is pretty much on time, but the crash hit on channel 10 only follows 50 ms later, after the SC-88Pro took its sweet time to process all the Program Change events on the channels between:

This is from the arranged soundtrack for a change. In that one, ZUN at least fixed the issue in the final three MIDIs (シルクロードアリス, 魔女達の舞踏会, and 二色蓮花蝶 ~ Ancients) that closed out this rearranging project in May 2001, which spread out their per-channel setup events over at least a single measure before playing any note.

Let's listen to that at half speed:

Romantique Tp's YouTube upload.
Still on point.

Sure, all of this is barely noticeable in casual listening, but very noticeable if you're the one who now has to cut these recordings into seamless loops. And these are just the most obvious timing issues that can be easily pinpointed and documented – the actual worst aspects are all the minor tempo and timing fluctuations throughout most of the pieces. With recordings that deviate ever so slightly from the tempo defined in the MIDI files, you can no longer rely on mathematically exact sample positions when cutting loops. Even if those positions do work out from time to time, there'd pretty much always be a discontinuity in the waveform at both ends of the loop, manifesting as a clearly audible click. In the end, the only way of finding good loop points in existing recordings involves straining your ears and listening very, very closely to avoid any audible glitches. 😩

But if you've taken a look at the second tabs in the clips above, you will have noticed that we don't necessarily have to be stuck with recordings from real hardware. In late 2015, Roland released Sound Canvas VA, a VST plugin that emulates the classic core of Roland's old Sound Canvas lineup, including the SC-88Pro. As long as we run such a software synthesizer through a quality VST host, a purely software-based solution should be way superior for recording looped BGM:

Any drawbacks? For our use case, all of them are found in the abysmal software quality of everything around the synth engine. As it's typical for the VST industry, Sound Canvas VA is excessively DRM'd – it takes multiple seconds to start up, and even then only allows a single process to run at any given time, immediately quitting every process beyond the first one with a misleading Parameter File1 Read Error message box. I totally believe anyone who claims that this makes SCVA more annoying than real hardware when composing new music. Retro gamers also dislike how Roland themselves no longer sells the 32-bit builds they used to offer for the first few versions. These old versions are now exclusively available through resellers, or on the seven seas.
But as far as the SC-88Pro emulation is concerned, there don't seem to be any technical reasons against it. There is a long thread over at VOGONS discussing all sorts of issues, but you have to dig quite deep to find any clear descriptions of bugs in SCVA's synth engine. Everything I found either only applies to the SC-55 emulation and not the SC-88Pro, was fixed by Roland in the meantime, or turned out to be a fixable bug in a MIDI file.

Nevertheless, Romantique Tp has a very negative opinion about SCVA, getting quite angry and defensive in this instance where someone favorably compared SCVA to their recordings. Edit (2024-03-10): These days, Romantique Tp has a much more favorable opinion on SCVA as well.
8 years after their release, however, the community unanimously accepts the Romantique Tp recordings as the intended way to listen to ZUN's old MIDIs, so choosing Sound Canvas VA for our Shuusou Gyoku builds might be a bad idea purely for PR reasons. At best, people would slightly wonder why I intentionally went with the opposite of the accepted reference recordings, but at worst, this entire project could face a violent backlash…


But wait, we've already heard one obvious difference between the real SC-88Pro and Sound Canvas VA. Let's listen to the very first clip again:

Ha! You can clearly hear a panning echo in the real-hardware recording that is missing from the Sound Canvas VA rendering. That's an obvious case of a core system effect not being reproduced correctly. If even that's undeniably broken, who knows which other subtle bugs SCVA suffers from, right? Case closed, Romantique Tp was right all along, SCVA is trash, real hardware reigns supreme :godzun:

Actually, let's look closer into this one. Panning delay effects like this are typically reverb-related, but General MIDI only specifies a single controller to specify the per-channel reverb level from 0 to 127. Any specific characteristics of the reverb therefore have to be configured using vendor-specific system-exclusive messages, or SysEx for short.
So it's down to one of the four SysEx messages at the beginning of the MIDI file:

Delta	Pulse	 Beat	Event
   +0	    0	0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)
 +240	  240	0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)
 +120	  360	0:360	SysEx(41 10 42 12 40 01 33 0F 7D F7)
  +60	  420	0:420	SysEx(41 10 42 12 40 01 34 30 5B F7)

Since these byte strings represent Roland-specific instructions, we can't learn anything from a raw MIDI event dump alone here. No problem though, let's just load these files into some old MIDI sequencer that targeted Roland synths, open its MIDI event list, and then they will be automatically decoded into a human-readable representation…
…or at least that's what I expected. In Yamaha land, XGworks has done that for Yamaha's own XG SysEx messages ever since 1997:

Screenshot of the MIDI Event Viewer in Yamaha's XGworks, showing off its automatic XG SysEx decoding feature.
No configuration required. You can even edit the textual Value1 representation and XGworks parses it back into the closest supported value!

But for Roland synths, there's… nothing similar? Seriously? 😶 Roland fanboys, how do you even live?! I mean, they are quick to recommend the typical bloated and sluggish big-name DAWs that take up multiple gigabytes of disk space, but none of the ones I tried seemed to have this feature. They can't have possibly been flinging around raw byte strings for the past 33 years?!
But once you look more into today's MIDI community, it becomes clear that this is exactly what they've been doing. Why else would so many people use the word complicated to describe Roland SysEx, or call it an old school/cryptic communication protocol in hexadecimal format? The latter is particularly hilarious because if you removed the word cryptic, this might as well describe all of MIDI, not just SysEx. :tannedcirno: Everything about this is a tooling issue, and Yamaha showed how easily it could have been solved. Instead, we get Sound Canvas experts, who should know more about the ecosystem than I do, making the incredible mental leap from "my DAW doesn't decode or easily generate SysEx" to "SysEx is antiquated" to "please just lift up these settings to the VST level and into my proprietary DAW's proprietary project format, that would be so much better"

Thankfully that's not entirely true. After some more digging and configuration, I found a somewhat workable solution involving a comparatively modern sequencer called Domino:

  1. Download either Domino's original Japanese version or the partial English translation. The .zip file on the release page contains a full standalone build.
  2. Open the File → Preferences menu and associate your MIDI output device with a module map. This makes sense for SysEx encoding/generation since it can limit the options in the UI to what's actually available on your target hardware, but is also required for selecting the respective SysEx map into Domino's SysEx decoder. There is no technical reason for this because SC-88Pro SysEx messages can be uniquely identified by the three vendor, device, and model ID bytes that every message starts with, but would be too easy and user-friendly. The perception of SysEx being a black art must be upheld at all costs.
    Screenshot of Domino's MIDI-OUT window, complete with garbled text
    I've kept the garbled text of the partial translation to emphasize the sheer amount of jank involved in this entire process.
  3. Load a MIDI file and let Domino "analyze" it:
    Screenshot of Domino's analysis message box
  4. Strangely enough, this will take quite a while – on my system, this analysis step runs at a speed of roughly 4.25 KB/s of MIDI data. Yes, kilobytes.
  5. Unfortunately, "control change macro restoration" also seems to mean that you don't get to see any raw bytes when selecting the respective MIDI track in the UI, but at least we get what we were looking for:
    Screenshot of the four SysEx messages of タイトルドメイド, Shuusou Gyoku's name registration theme, as decoded by Domino
    …for the most part?
    Pulse	Event
        0	SysEx(41 10 42 12 40 00 7F 00 41 F7)
      240	SysEx(41 10 42 12 40 01 30 14 7B F7)
      360	SysEx(41 10 42 12 40 01 33 0F 7D F7)
      420	SysEx(41 10 42 12 40 01 34 30 5B F7)

Alright, that's something we can work with. The GS Reset message is something that every Roland GS MIDI should start with, but it's immediately followed by a message that Domino failed to decode? The two subsequent reverb parameters make sense, but panning delays typically have more parameters than just a reverb level and time.
That unknown SysEx message shares much of the same bytes with the decoded ones though. So let's do what we maybe should have done all along, return to caveman, and check the SC-88Pro manual:

The relevant section from page 194. We can see how the address and value correspond to bytes 5-7 and 8 in the SysEx messages. Byte 9 is a checksum and byte 10 signals the end of the message.

And that's where we find what this particular issue boils down to. The missing SysEx message is clearly intended to be a Reverb Macro command, whose value can range from 0 to 7 inclusive on the SC-88Pro, but ZUN tries to specify Reverb Macro #14h, or 20 in decimal. The SC-88Pro manual does not specify what happens if a SysEx message wants to write an invalid value to a valid address, which means that we've firmly entered the territory of undefined behavior.
Edit (2024-03-10): Romantique Tp confirmed that the real SC-88Pro clamps these Reverb Macro IDs to the supported range of 0-7. Therefore, the appropriate course of action for guaranteeing the same sound on other Roland synths would be to fix the MIDI file and specify Reverb Macro #7 instead. But since this behavior remains technically undefined, we can still argue about ZUN's intention behind specifying the Reverb Macro like this:

In fact, 32 out of the 39 MIDIs across both of Shuusou Gyoku's soundtrack use this invalid Reverb Macro. The only ones that don't are


And that's where this quest seemed to end, until Romantique Tp themselves came in and suggested that I take a closer look at the GS Advanced Editor, or GSAE for short.

The splash screen of GSAE version 4.01e.
Make sure to connect a MIDI input device before starting GSAE, or it will silently crash immediately after this splash screen. At least it accepts any controller, so this might just be a bug instead of the typical user-hostile kind of hardware dongle DRM that is pervasive in today's synth industry. 1999 would seem a bit too early for that, thankfully.

I was aware of this tool, but hadn't initially considered it because it's always described as just a SysEx generator/encoder. In fact, the very existence of such a tool made no sense to me at first, and seemed to prove my point that the usability of GS SysEx was wholly inferior to what I was used to in Yamaha land. Like, why not build at least a tiny and stripped-down MIDI sequencer around this functionality that would allow you to insert SC-88Pro-specific messages at any point within a sequence, and not just the beginning? I can see the need for such a tool in today's world of closed-source DAWs where hardware MIDI modules are niche and retro and are only kept alive by a small community of enthusiasts. But why would its developers guarantee that MIDI composers would have to hop between programs even back in 1997? I can only imagine that they saw how every just slightly advanced MIDI sequencer or DAW back then already used its own project format instead of raw Standard MIDI Files, and assumed that composers would therefore be program-hopping anyway?
However, GSAE does support the import of settings from a MIDI file and features a SysEx history window that decodes every newly processed Roland SysEx byte string, which is all I was looking for. So let's throw in that same MIDI and…

Screenshot of GSAE's SysEx history window,showing the results of sending a GS Reverb Macro #20 message
That's the result of sending just the single F0 41 10 42 12 40 01 30 14 7B F7 message at the top.

Now that's some wild numbers. An equally invalid Reverb Character, and Reverb Level and Time values that even exceed their defined range of 0-127? Could it be that GSAE emulates the real-hardware response to invalid Reverb Macros here, and gives us the exact reverb setting we can hear in Romantique Tp's recording? This could even be the reason why GSAE is still used and recommended within today's Roland MIDI sequencing scene, and hasn't been supplanted by some more modern open-source tool written by the community.

In any case, these values have to come from somewhere, so let's reverse-engineer GSAE and figure out the logic behind them. Shoutout to IDR for being a great help with its automatic generation of IDC debug symbols for the Delphi standard library, and even including a few names of application-level widget class methods by reading Delphi-specific type information from the binary. This little sub-project made me also come around to appreciating Ghidra, whose decompiler and data type manager helped a lot and allowed me to find the relevant code section within just a few hours.
A~nd it turns out that the values all come from out-of-bounds accesses into arrays on the stack. :onricdennat: If we combine 25, 235, and 132 back into a 32-bit value, we get 0x19EB84, which is the virtual address of the relevant function's stack frame base pointer.
But it gets even more hilarious: If you enable debug text output via Option → Other Options → SMF → Insert text events to setup measures and export these imported settings back into a MIDI file, GSAE not only retains these invalid Reverb Macro IDs, but stringifies them via a simple lookup into a hardcoded string pointer array, again without any bounds checks. The effects of this are roughly what you would expect:

In the end, we have Domino not decoding the Reverb Macro message, and GSAE, the premier SysEx tool for Roland synths, responding to it in even more undefined and clearly bugged ways than real hardware apparently does. That's two programs confirming that whatever ZUN intended was never supposed to work reliably. And while we still don't know exactly what these reverb parameters are supposed to be, these observations solve the mystery as far as I'm concerned, and solidify my personal opinion on the matter.


So what do we do now, and which version do we go with? Optimally, I'd offer both versions and turn this controversy into a personal choice so that everybody wins… and Ember2528 agreed and generously provided all the funding to make it happen. 💸
If you haven't picked your favorite yet, here are some final arguments:

The Romantique Tp recordings certainly have something going for them with their provenance of coming from real hardware, and the care that Romantique Tp put into manually recording every single track, warts and all. I wholeheartedly agree that preserving the raw sound of playing the MIDI files into the hardware without thinking about bugs or quirks is an important angle to take when it comes to preservation. It's good that these recordings exist – after all, you wouldn't know which musical elements you'd possibly be missing in an emulation if you have nothing to compare it to. Even the muffled sound in the half-speed clip above can be an argument in their favor, as the SC-88Pro's DAC operates at 32 kHz and you wouldn't expect any meaningful frequency content between 16,000 and 22,050 Hz to begin with. Any frequency content in that range that does remain in Romantique Tp's recording is simply 📝 rolled-off imaging noise added during the ADC's resampling process.
All this is why they are a definite improvement over kaorin's 2007 recordings of only the AST, which used to be the previous reference recordings within the community. Those had all of the same timing issues and more, in addition to being so excessively volume-boosted that 0.15% of the samples across the entire soundtrack ended up clipped. That's 6.25 seconds out of 68:39m being lost to pure digital noise.

Most importantly though: ZUN himself said that only the real SC-88Pro will play back these files as he intended them to sound. This quote is likely where the tagline of Romantique Tp's entire recording project came from in the first place:

> 全てのデエタはSC-88ProもしくはSC-8850(ロオランド社)にて最適に聴けるように調整してあります > それ以外の音源でも、作者の意図した音ではない場合があります。 — ZUN on 東方幻想的音楽, his old MIDI page

However. ZUN is not exactly known for accurately and carefully preserving the legacy of his series, or really doing anything beyond parading his old games as unobtainable showpieces at conventions. With all the issues we've seen, preferring real hardware is ultimately just that: an angle, and a preference. This is why I disagree with the heavy and uncritical advertising that is mainly responsible for elevating the Romantique Tp recordings to their current reference status within the community, especially if at least half of the alleged superiority of real hardware is founded on undefined behavior that can easily be fixed in the MIDI files themselves if people only bothered to look.

Here's where I stand: MIDI files are digital sheet music first and foremost, not an inferior version of tracker modules where the samples are sold separately. As such, the specific synth a MIDI file was written for is merely a secondary property of the composition – and even more so if the MIDI file contains little to nothing in terms of sound design and mostly restricts itself to the basic feature set of General MIDI. In turn, synth quirks and bugs are not a defined part of the composition either, unless they are clearly annotated and documented in the file itself. And most importantly: If the MIDI file specifies a certain timing and a recording fails to reproduce that timing, then that recording is not an accurate representation of the MIDI file.
In that regard, Sound Canvas VA is not only the closest alternative to the real thing, as a few people in the MIDI and retrogaming scene do have to admit, but superior to the real thing. I'll gladly take clarity and perfect timing accuracy in exchange for minor differences in effects, especially if the MIDI file does not explicitly and correctly define said effects to begin with. If I want a panning delay as part of the reverb, I add the respective and correct SysEx message to define one – and if I don't, I do not care about the reverb. You might still get a panning delay on a certain synth, and you might even prefer how it sounds, but it's ultimately a rendering artifact and not a consciously intended part of the composition. In that way, it's similar to the individual flavor a musician adds to a performance of a piece of classical music.
And as far as the differences in frequency response and resonant filters are concerned: In Yamaha land, these are exactly the main distinguishing factors between vintage WF-192XG sound cards (resembling the real SC-88Pro in these characteristics) and the S-YXG50 softsynth (resembling SCVA). Once I found out about that softsynth and how much clearer it sounded in comparison, I sold that old PCI sound card soon after.

In the interest of preservation though, there's still one more unexplored solution that could be the ideal middle ground between the two approaches:

  1. Play the MIDIs through a real-hardware SC-88Pro again
  2. Capture the actually observed system-exclusive settings that fall within the synth's supported and documented ranges
  3. Insert them back into the MIDI file, creating a new bugfixed version
  4. Re-record that bugfixed version through Sound Canvas VA

Edit (2024-03-10): And since Romantique Tp has confirmed what exactly happens on real hardware, I'm going to do exactly that. These bugfixed Sound Canvas VA renderings will be a free bonus of the single next Shuusou Gyoku push, and will add another angle to the preservation of these soundtracks. In the meantime though, the Sound Canvas VA packs will sound like they do in the preview videos above.

Or, you know… Maybe none of this actually matters. Here's beatMARIO streaming some Shuusou Gyoku gameplay using what looks like a real-hardware SC-8850, which plays these MIDIs with occasionally noticeably different instrument patches and no panning delay in the name registration theme, and he still enjoyed every second of it. Imagine undefined SysEx behavior not even being consistent within the same family of Roland synths… nah, I'm done arguing, let's get back to the actual work and cut some loops.


Just to be clear: I'm not suggesting that Romantique Tp should have been the one to cut their recordings into loops, or even just the one who defined where the loop points are supposed to be. On the surface, this seems to be a non-issue, and you'd just pick a point wherever each track appears to loop, right? But with 39 MIDIs to cut and all the financial support from Ember2528, it made sense to also solve this problem more thoroughly, and algorithmically detect provably correct loop points for all of these files. Who knows, maybe we even find some surprises that make it all worth it?
This is the algorithm I came up with:

Of course, this algorithm isn't perfect and won't work for every MIDI file out there. It doesn't consider things like differently ordered events within the same MIDI pulse, (non-)registered parameter numbers, or the effect that SysEx messages can have on the state of individual channels. The latter would require the general SysEx decoding logic that I would have liked to have for the research above… actually, let's add an issue and add the project to the order form. I'd really like to see a comprehensive open-source cross-vendor SysEx decoder library in my lifetime.

As for the implementation, I was happy to write some Rust again for a change, as it's a great fit for these standalone greenfield command-line tools that don't have to directly interact with the legacy C++ code bases that this project usually deals with. It's even better if the foundational functionality is not just available in a crate, but in four, with the community already having gone through multiple iterations to arrive at a tried and tested winner. Who knows, maybe I even get to rewrite this website in it one day? Just for the sheer meme value of doing so, of course.
I also enjoyed this a lot from a technical point of view:

This algorithm works well for the long MIDI files of Shuusou Gyoku's OST that all contain multiple duplicates of their loop section, but it quickly reaches its limit with the AST. Following the classic two-loop + fade-out format, that soundtrack was meant to be played back in generic MIDI players, and not to actually be put back into the game in looped form. Since the loop algorithm did, in fact, find inconsistencies even in the OST, two copies of the apparent loop are sometimes not enough to prove cases where the actual loop ends much later than you think it does. In a few cases, it would be enough to simply remove all volume change events from the fade-out to prove the actual loop, but in others, the algorithm would need MIDI event data far past the end of the fade-out.

However, just giving up and not looping any of these tracks would be equally unfortunate. So how about shifting the question, from what's the best loop in this MIDI file to what's the best loop if the MIDI didn't fade out and instead repeated its apparent second loop a third time? As long as the detected loop in such a pre-processed file ends before the repeated range, it's still a valid loop in terms of the unmodified original.
Ideally, we want to do this pre-processing programmatically with the same Rust library instead of manually editing the MIDI. Many sequencers (and especially XGworks) apply significant changes to a MIDI file's internal structure when saving its internal representation back to a MIDI file, which might even mess with our loop algorithm. So it would be very nice to have a more trustworthy tool that applies only the edit we actually want, and perfectly retains the rest of the MIDI.

And that's how this sub-project turned into a small suite of command-line MIDI operations in the classic Unix filter/pipeline style: Each command reads a MIDI file from stdin, transforms it, and outputs text or the resulting MIDI file on stdout. This way, we gain maximum transparency and reproducibility as I can document the unique pre-processing steps for each AST track by simply providing the command lines. And sure, we're re-encoding and re-decoding the full MIDI sequence at every step along such a pipeline, but computers are fast, Rust and the midly library in particular are ⚡ blazingly fast ⚡, and the usability benefits of this pipeline model far outweigh any theoretical performance drops.
Here's the full list of commands that made it into the resulting mly tool:

This feature set should strike a good balance between not spending too much of the Shuusou Gyoku budget on tangential problems, but still offering a decent solution for the problem at hand. As a counterexample, the obvious killer feature – deserializing a dump back into a Standard MIDI File – would have gone way past the budget. While there are crates that free you from the need to write manual parsing code for basic data structures, they would instead require a lot of attribute boilerplate – and if the library that provided the structures doesn't already come with these attributes, you now have to duplicate all the structures, and convert back and forth between the original structures and your copies. Not to mention that we'd still have to write code for the high-level structure of the dump output…

If we put it all together, this is what we can do:

$ <ssg_02.mid mly loop-find
Best loop in note space: 4 events (between event #[117, 121[ and [121, 125[)
First note: event    71 / pulse    960 / beat   2:000 / 0:00:800m
Loop start: event   117 / pulse   1680 / beat   3:240 / 0:01:400m
  Loop end: event   121 / pulse   1920 / beat   4:000 / 0:01:600m

$ <ssg_02.mid mly cut 466: | mly loop-unfold 240: | mly -r 44100 loop-find
Track #0: Removing events #[16439, 19881[
Track #0: Repeating events #[8344, 16439[ at the end of the sequence
Best loop in note space: 8095 events (between event #[5625, 13720[ and [13720, 21815[)
First note: event    71 / pulse    960 / beat   2:000 / 0:00:800m
Loop start: event  5625 / pulse  75361 / beat 157:001 / 1:03:531m
  Loop end: event 13720 / pulse 183841 / beat 383:001 / 2:34:726m

Best loop in recording space:  8095 events (between event #[5709, 13804[ and [13804, 21899[)
First note: event    71 / pulse    960 / beat   2:000 / 0:00:800m / sample    35280.00
Loop start: event  5709 / pulse  77280 / beat 161:000 / 1:05:163m / sample  2873667.66
  Loop end: event 13804 / pulse 185760 / beat 387:000 / 2:36:358m / sample  6895375.27

Translation:


So, where are these loop quirks that justify why some of these audio files are longer than you'd think they should be? Just listing them as text wouldn't really communicate just how minor these are. It would be much nicer to visualize them in a way that highlights the exact inconsistencies within a fixed range of MIDI measures. Screenshots of MIDI sequencer or DAW windows won't capture these aspects all too well because these programs are geared toward fine-grained editing of single tracks, not visualization of details across all channels.

Screenshot of the first 8 measures of Shuusou Gyoku's Stage 1 theme (フォルスストロベリー) in its OST version, as visualized by REAPER's piano roll
REAPER's piano roll nicely snaps to a certain range, but good luck picking out the individual lines from the single volume lane at the bottom of the screen, or spotting a 7-point difference. Not to mention that CC #11 (Expression) makes up an equal part of a channel's final perceived volume, which is the metric we'd actually want to visualize.

Typical MIDI visualizers, however, are on the complete opposite end of the spectrum. In recent years, MIDI visualization has become synonymous with the typical Synthesia style of YouTube videos with a big keyboard at the bottom, note bars flying in from the top, and optional fancy effects once those notes hit the top of the keyboard. The Black MIDI community has been churning out tons of identically looking MIDI visualizers in recent years that mainly seem to differ in the programming language they're written in, and in how well they can cope with the blackest of black MIDIs.
Thankfully, most of these visualizers are open-source and have small and manageable codebases. The project with the most GitHub stars and the most generic name seemed to be the best starting point for hacking in the missing features, despite using GLSL shaders which I had no prior experience with. It was long overdue that I did something with GLSL though – it added a nice educational aspect to these hacks, and it still was easier than deciphering whatever the fastest and hyper-optimized Rust visualizer is doing.
Still, this visualizer needed a total of 18 small features and bugfixes to be actually usable for demonstrating Shuusou Gyoku's loop quirks. As such, these hacks turned into yet another tangential sub-project that could have easily consumed another two pushes if I cleaned up the code and published the result. But that would have really gone way past the budget for something that people might not even care about. So here's what we're going to do:


Alright then! Here's how to read the visualizations:



Before we package up these looped soundtracks, let's take a quick look at how they would be shown off in the Music Room. The Seihou Music Rooms carry over the per-channel keyboards from TH05, add the current per-channel volume, expression, and pan pot values, and top it off with a fake spectrum analyzer. All of these visualizations rely on MIDI data, and the Music Room would feel very dull and boring without them. Just look at Kioh Gyoku, whose Music Room basically turns into a still image in WAVE mode.
Retaining these visualizations even when playing waveform BGM was very important for me, and not just because it would make for a unique high-quality feature that would break new ground. It can also double as proof that the waveform versions are, in fact, in perfect sync with both the MIDIs they are based on, and, by extension, the respective stage scripts.
However, this would require the game to process the MIDIs and update the internal visualization state without simultaneously playing them back through the WinMM / MME / midiOut*() API. And just like graphics and text rendering, Shuusou Gyoku's original code came with zero architectural separation between platform-independent processing logic and platform-specific playback…

So I accidentally rewrote almost the entire MIDI code to achieve said separation. :tannedcirno: This also provided a great occasion to modernize this code and add some much-needed robustness for potential MIDI mods, while retaining the original code's approach of iterating over raw SMF byte streams. It might all have been very excessive for a delivery that was supposed to be just about waveform BGM support, but on the plus side, MIDI output is now portable to any other system's MIDI API as well.

Surprisingly though, it was Shuusou Gyoku's original MIDI timing that quickly turned out to be rather inaccurate, and not the waveforms. The exact numbers vary depending on the piece, but the game played back every MIDI about 1% slower than notated, adding about 2 or 3 seconds to their total playback time after 5 minutes. Tempo changes in particular were the biggest causes of desynchronizations with the waveforms… :thonk:
To understand how this can happen to begin with, we have to look closer at how you're supposed to use the midiOut*() API. This API is as low-level as it gets, only covering the transmission of a single MIDI message to the selected output device right now. There is no concept of note timing at this low level, so it's completely up to the program to parse delta times and tempo change events out of the MIDI file and correctly time the calls to this API for each MIDI message. With all the code that runs between the API and the actual renderer of the synth for every single message, the resulting timing can only ever be an approximation of the MIDI file. This doesn't really matter for the timescales and polyphony levels of typical music because, again, computers are fast, but such an API is fundamentally unsuitable for accurately playing back even just a moderately complex million-note Black MIDI. :onricdennat:

Shuusou Gyoku handles this required manual timing in the simplest possible way: It runs a MIDI processing function (Mid_Proc() in the code) at an interval of 10 ms, which processes and instantly sends out all MIDI events that have occurred at any point within the last 10 ms, maintaining merely their order. This explains not only why the original game incremented its MIDI TIMER by multiples of 10, but also the infamous missing drums when playing the soundtrack through the Microsoft GS Wavetable Synth:

But while sending MIDI events in such quantized chunks might not be perfect, it can't be the cause behind multi-second playback slowdowns. Instead, this issue has to boil down to the way Shuusou Gyoku times each individual message, and specifically how it converts between MIDI pulse units and real-time (milli)seconds. pbg's original MIDI code chose to do this in an equally confusing and inaccurate way: it kept two counters that tracked the current MIDI pulse before and after the latest tempo change, used the value of the latter counter to decide which events to process, and only added the pulse equivalent of 10 ms to this counter at the end of Mid_Proc() in the then current tempo. The commit message for my rewritten algorithm details the problems with this approach using nice ASCII art in case you're interested, but in short, the main problem lies in how the single final addition can only consider a single tempo change within each call to Mid_Proc(). If a MIDI file contains tempo ramps with less than 10 ms between each different tempo, the original game would only use the last of these tempo values as the basis for converting the entire 10 ms back into MIDI pulses. Not to mention that maybe MIDI pulses aren't the best unit in a game that still 📝 treats the FPU as lava and doesn't use any fixed-point means of increasing the resolution of the 10 ms→pulse division either…

On the contrary, it's much more accurate to immediately convert every encountered MIDI delta time to a real-time quantity and use that unit for event timing, especially if we want to restrict ourselves to integer math. Signed 64-bit integers are enough to fit the product of the slowest possible MIDI tempo ((224 - 1) µs per quarter note) and the highest possible MIDI delta time (228 - 1) at nanosecond precision (103), with one bit to spare. Then, we arrive at a much simpler timing algorithm:

The additive nature of this timer not only naturally allows more than one event to happen within a single Mid_Proc() call, but also averages out any minor timing inconsistencies across the length of a track.

This new algorithm did improve the overall timing accuracy, but only barely, shaving off just ≈100 ms of the total duration. Turns out that the main source behind the slowness was hiding somewhere else entirely, in the single line that deserializes tempo values from MIDI's big-endian representation into the native integer format:

assert(length_of_tempo_message == 3);
uint32_t tempo = 0;
for(int i = 0; i < length_of_tempo_message; i++) {
-	tempo += ((tempo << 8) + (*track_data++));
+	tempo  = ((tempo << 8) + (*track_data++));
}

Yup – the original code performed two additions per byte, which incorrectly added the interim value at every byte to the final result, and yielded a tempo that is ≈0.8% / ≈1 BPM slower than notated in the MIDI file, matching the number we were looking for. That's why the |/OR operator is the safer one to use in such a bit-twiddling context…
But now I'm curious. This is such a tiny bug that is bound to remain unnoticed until someone compares the game's MIDI output to another renderer. It must have certainly made it into other games whose MIDI code is based on Shuusou Gyoku's, or that pbg was involved with. And sure enough, not only did this bug survive Kioh Gyoku's OOP refactoring, but it even traveled into Windows Touhou, where it remained in every single game that supported MIDI playback. Now we know for a fact that pbg's Program Support role in the TH06 credits involved sharing ready-made, finished code with ZUN:

Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH06Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH07Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH08Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH09Disassembly of the Shuusou Gyoku MIDI tempo deserialization bug in TH10
The broken tempo deserialization in the respective latest full versions of TH06 through TH10. And yes, that's TH10 – even though TH09's trial version was the last game to ship MIDI versions of its soundtrack, TH10 still contained all of pbg's MIDI code that originated back in Shuusou Gyoku, before TH11 finally removed it.
Amusingly, ZUN's compiler even started optimizing the combination of left-shifting and addition to a multiplication with 257 for TH09, which even sort of highlights this bug if you're used to reading x86 ASM.

That leaves support for MIDI loop points as the only missing feature for syncing MIDI data with a looping waveform track. While it didn't require all too much code, pbg's original zero-copy approach of iterating over raw MIDI data definitely injected a lot of complexity into the required branches. Multi-track/SMF Type 1 files require quite a bit of extra thought to correctly calculate delta times across loop boundaries that reach past the end of the respective track, while still allowing the real-time delta values to be resynchronized at tempo changes within the loop – and yes, 3 of ZUN's 19 arranged MIDI files actually do use more than one track, so this wasn't just about maximizing MIDI compatibility for mods. I stuck to the original approach mostly as a challenge and to prove that it's possible without first parsing the entire MIDI sequence into a friendlier internal representation, but I absolutely do not recommend this to anyone else. :tannedcirno:

After hardcoding the loop points detected by mly into the binary, we only need to call Mid_Proc() once per frame in the Music Room and pass the frame delta time instead of the 10 ms constant. And then, we get this:

The MIDI TIMER now shows off the arguably more interesting current MIDI pulse value rather than just formatting the PASSED TIME in milliseconds. Ironically, displaying this value in a constantly counting way takes more effort now – the new nanosecond-based timing code doesn't use any measure of total MIDI pulses anymore, and they don't naturally fall out of the algorithm either. Instead, the code remembers the total pulse value of the last event it processed and adds the real-time duration that has passed since, similar to the original timing algorithm.
This naturally causes the timer to jump from the loop end pulse to the loop start pulse, proving that Mid_Proc() is in fact looping the sequence.

Alright, now we know what to package:

Unfortunately, we still haven't reached the end of the complications and weird issues that haunt Shuusou Gyoku's music:

  1. The original game reads the in-game track title directly out of the first Sequence Name event of the playing MIDI file. The waveform equivalent would be the Vorbis comment TITLE tag, which therefore should exactly match the original track's title, down to the exact placement of whitespace. As usual, if I emphasize minor things like this, it's not without reason: 幻想科学 ~ Doll's Phantom inconsistently uses halfwidth spaces at both sides of the , and wouldn't fit into the Music Room's limited space otherwise.

  2. However, the AST MIDI files jam a bunch of other metadata into their Sequence Names, roughly following the format
    【 $title 】 from 秋霜玉  for sc88Pro comp.ZUN
    The track titles should definitely not appear in this format in-game, but how do we get rid of this format without hardcoding either the names or the magic to parse the names out of this format? :thonk:
  3. The absolute state of GS SysEx tooling rears its ugly head one final time in three of the AST MIDIs, which for some reason are missing the Roland vendor prefix byte in all of their SysEx messages and are therefore undeniably bugged. There even seemed to be another SysEx-related bug which Romantique Tp explained away, but not this one:

    ssg_04.mid

    0:000	SysEx(   10 42 12 40 00 7F 00 41 F7)
    0:240	SysEx(   10 42 12 40 01 30 14 7B F7)
    0:360	SysEx(   10 42 12 40 01 33 14 78 F7)
    0:420	SysEx(   10 42 12 40 01 34 50 3B F7)

    ssg_05.mid

    0:000	SysEx(   10 42 12 40 00 7F 00 41 F7)
    0:240	SysEx(   10 42 12 40 01 30 14 7B F7)
    0:360	SysEx(   10 42 12 40 01 33 00 0C F7)
    0:420	SysEx(   10 42 12 40 01 34 14 77 F7)

    ssg_10.mid

    0:000	SysEx(   10 42 12 40 00 7F 00 41 F7)
    0:240	SysEx(   10 42 12 40 01 30 14 7B F7)
    0:360	SysEx(   10 42 12 40 01 33 00 0C F7)
    0:420	SysEx(   10 42 12 40 01 34 60 2B F7)

    ssg_04.mid

    0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)	GS Reset
    0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)	Reverb Macro #20
    0:360	SysEx(41 10 42 12 40 01 33 14 78 F7)	Reverb Level 20
    0:420	SysEx(41 10 42 12 40 01 34 50 3B F7)	Reverb Time 80

    ssg_05.mid

    0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)	GS Reset
    0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)	Reverb Macro #20
    0:360	SysEx(41 10 42 12 40 01 33 00 0C F7)	Reverb Level 0
    0:420	SysEx(41 10 42 12 40 01 34 14 77 F7)	Reverb Time 20

    ssg_10.mid

    0:000	SysEx(41 10 42 12 40 00 7F 00 41 F7)	GS Reset
    0:240	SysEx(41 10 42 12 40 01 30 14 7B F7)	Reverb Macro #20
    0:360	SysEx(41 10 42 12 40 01 33 00 0C F7)	Reverb Level 0
    0:420	SysEx(41 10 42 12 40 01 34 60 2B F7)	Reverb Time 96
    The irony of using invalid Reverb Macros within already invalid SysEx messages is not lost on me.

    This is something we should fix even before running these files through Sound Canvas VA in order to render these with the reverb settings that ZUN clearly (and, for once, unironically) intended.

  4. For perfect preservation of the original BGM/gameplay synchronicity, it makes sense for the waveform versions to retain the leading 1 or 2 beats of silence that the original MIDI files use for their SysEx setup. While some of the AST tracks use a slightly different tempo compared to their OST counterparts, they would still be largely in sync as ZUN didn't rearrange the layout of their setup area… except for, once again, the three tracks used in the Extra Stage. :zunpet: Marisa's and Reimu's boss themes aren't too bad with their 4 beats of setup, but シルクロードアリス takes the cake with a whopping 12 beats of leading silence. That's 5 seconds from the start of the Extra Stage to the first note you'd hear. 🐌

2) and 4) could theoretically be worked around in Shuusou Gyoku's MIDI code, but there's no way around editing the MIDI files themselves as far as 3) is concerned. Thus, it makes sense to apply all of the workarounds to the AST MIDIs as part of the BGM build process – parsing the titles out of the 【brackets】, inserting the Roland vendor prefix byte where necessary, and compressing the setup bars in the Extra Stage themes to match their OST counterparts. Adding any hidden magic to the MIDI code would only have needlessly increased complexity and/or annoyed some modder in the future who would then have to work around it.
Ideally, these edits would involve taking the mly dump output, performing the necessary replacements at a plaintext level, and rebuilding the result back into a MIDI file, bu~t we're unfortunately missing the latter feature. Luckily, someone else had the same idea 13 years ago and wrote a tool in C that does exactly what we need. Getting it to compile in 2024 only required fixing a typical C thing… why are students and boomers defending this antique of a language again? 🙄

The single most glaring issue, however, is the drastic difference in volume between the individual tracks in both soundtracks. While Romantique Tp had to normalize each track to the maximum possible volume individually as a consequence of the recording process, the Sound Canvas VA renderings reveal just how inconsistent the volume levels of these MIDI files really are:

The peak amplitudes of every track in both soundtracks, as rendered by Sound Canvas VA at maximum volume. Looking at these, you might think that kaorin's 2007 recordings were purposely trying to preserve the clipping that would come out of an SC-88Pro if you don't manually adjust the volume knob for each song, but those recordings are still much louder than even these numbers.

So how do we interpret this? Is this a bug, because no one in their right mind would want their music to clip on purpose, and that in turn means that everything about these volume levels is arbitrary and unintentional? Or is this a quirk, and ZUN deliberately chose these volume levels for compositional reasons? It certainly would make sense for the name registration theme.
Once again, the AST version of シルクロードアリス is the worst offender in this regard as well, but it might also provide some evidence for the quirk interpretation. The fact that almost all of its MIDI channels blast away at full volume might have been an accident that could have gone unnoticed if the volume knob of ZUN's SC-88Pro was turned rather low during the time he arranged this piece, but the excessive left-panning must have been deliberate. Even Romantique Tp agrees:

Stereo waveform of the Sound Canvas VA rendering of Shuusou Gyoku's Extra Stage theme (シルクロードアリス), highlighting the excessive left-panningStereo waveform of Romantique Tp's recording of Shuusou Gyoku's Extra Stage theme (シルクロードアリス), highlighting the excessive left-panning
It might have even made compositional sense if Silk Road Alice was supposed to be a "Western-style piece", but it's not. :zunpet:

And that's with the volume already normalized. Because this one channel of this one track is almost twice as loud as anything else in the AST, we would consequently have to bring down the volume of every other arranged track and the right channel of the same track by almost 50% if we wanted to maintain the volume differences between the individual tracks of the AST. In the process, we lose almost one entire bit of dynamic range. At this rate, you might even consider remixing and remastering the entire thing, but that would involve so many creative decisions to definitely fall into fanfiction territory…

However, normalizing each track to a peak level of 0 dBFS makes much more sense for in-game playback if you consider how loud Shuusou Gyoku's sound effects are. Once again, the best solution would involve offering both versions, but should we really add two more SCVA BGM packs just to cover volume differences? :thonk:
ReplayGain solves this exact problem for regular music listening in a non-destructive way by writing the per-track and per-album gain levels into an audio file's metadata. Since we need metadata support for titles anyway, we can do something similar, albeit not exactly the same for two reasons:

And so, we hard-apply the volume-level gain during the conversion from 32-bit float to FLAC to preserve the volume differences between the tracks, calculate the track-level GAIN FACTOR based on the resulting peak levels, add a volume normalization toggle to the Sound / Config menu, enable it by default, and thus make everyone happy. ✅

The final interesting tidbit in building these packages can be found in the way the Sound Canvas VA recordings are looped. When manually cutting loops, you always have to consider that the intro might end with unique notes that aren't present at the end of the loop, which will still be fading out at the calculated loop start point. This necessitates shifting the loop start point by a few bars until these notes are no longer audible – or you could simply ignore the issue because ZUN's compositions are so frantic that no one would ever notice. :onricdennat:
With the separate intro and loop files generated by mly, on the other hand, the reverb/release trails are immediately visible and, after trimming trailing silence, exactly define the number of samples that the calculated loop start point needs to be shifted by. The .loop file then remains always exactly as long, in samples, as the duration of the loop reported by mly. If a piece happens to have a constant tempo whose beat duration corresponds to an integer number of samples, we get some very satisfying, round loop durations out of this process. ☺️


So let's play it all back in-game… and immediately run into two unexpected miniaudio limitations, what the…?!

  1. miniaudio uses a fixed linear function for its fade-out envelope, and doesn't offer anything else? We might not even want a logarithmic one this time because symmetry with MIDI's simple quadratic curve would be neat, but we sure don't want a linear function – those stay near the original volume for too long, and then turn quiet way too quickly.
  2. There is no way to access FLAC metadata from miniaudio's public API, even though the library bundles the author's own FLAC library which has this feature?

📝 Back when I evaluated miniaudio, I alluded that I consider single-file C libraries to be massively overrated, and this is exactly why: Once they grow as massive as miniaudio (how ironic), they can quickly lead to their authors treating their dependencies as implementation details and melting down the interfaces that would naturally arise. In a regular library, dr_flac would be a separate, proper dependency, and the API would have a way to initialize a stream from an externally loaded drflac object. But since the C community collectively pretends that multi-file libraries are a burden on other developers, miniaudio ended up with dr_flac copy-pasted into its giant single file, with a silly ma_ namespacing prefix added to all its functions. And why? Did we have to move so far in the other direction just because CMake doesn't support globbing? That's a symptom of CMake not actually solving any problem, not a valid architectural decision that libraries should bend around. 🙄
So unless we fork and hack around in miniaudio, there's now no way around depending on a second, regular copy of dr_flac. Which has now led to the same project organization bloat that single-file libraries originally set out to prevent…

Sigh. At this rate, it makes more sense to just copy-paste and adapt the old BGM streaming code I wrote for thcrap in late 2018, which used dr_flac directly, and extend it with metadata support. With the streaming code moved out of the platform layer and into game logic, it also makes much more sense to implement the squared fade-out curve at that same level instead of copy-pasting and adjusting an unhealthy amount of miniaudio's verbose C code.
While I'm doing the same for the old Vorbis streaming code, it would also make sense to rewrite that one to use stb_vorbis instead of the old libogg+libvorbis reference libraries. There's no need to add two more dependencies if miniaudio already comes with stb_vorbis.c, and that library is widely acclaimed. So, integration should be a breeze, right?
Well, surprise, rarely have I seen a C library so actively hostile toward being integrated. Both of its API variants are completely unreasonable:

What happened to the tried-and-true idea of providing a structure with read, tell, and seek callbacks, and then providing an optional variant for C FILE* handles if you absolutely must? Sure, the whole point of Vorbis is to be small and nobody these days would care about spending a few MB on keeping an entire Vorbis file in memory, but come on. If pulldata made the deliberate and opinionated choice to only support buffers of complete Vorbis streams and argued in the name of simplicity that hand-coded disk streaming isn't worth it in this day and age, I might have even been convinced. And this is from the guy who popularized the concept of single-file C libraries in the first place? :thonk:

Oh well, tupblocks go brrr. libvorbis definitely shows its age with all the old command-line tools in the lib/ directory that they never moved away and that we now have to remove from our glob. But even that just adds a single line to the Tupfile, and then we get to enjoy its much friendlier API. That sure beats the almost 800 lines of code that miniaudio had to write to integrate stb_vorbis… which I can't even link because the file is too big for GitHub. 🤷
At this point, it would have even made sense to upgrade from a 24-year-old lossy codec to an 11-year-old lossy codec and use Opus instead, since the enforced 48,000 Hz sampling rate is a non-issue when you control the entire audio pipeline. But let's keep compatibility with existing thcrap mods for now.

The last time I added dependencies, 📝 I wondered whether just downloading and extracting official Windows binary builds might be superior to pasting batch script duct tape over the usability issues of Git submodules. However, I still wanted to try out Git's sparse checkout feature before, in an attempt to remove all the unneeded bloat… and as it turned out, this might just be the idealistic and perfect nirvana of vendoring libraries in C++ projects. I particularly like how the limitations of its default mode (always checking out all files within each directory level that shows up in a filter) can be turned into a guideline about how to structure a repository: All non-essential stuff that consumers of your code might not need – tests, high-level documentation, or optional features – should go into a subdirectory where it can be easily filtered.
And that's how the size of our libs/ directory went down from 82.7 MiB in the P0256 build to 30.4 MiB in the P0275 build, despite adding 4 more libraries in the latter. Now if only this didn't require even more duct tape to actually set up shallow clones correctly

In the end, the Windows build ended up using only a single one of the miniaudio features that DirectSound doesn't have, and that's the ability to use the more modern WASAPI instead of DirectSound. We're still going to use miniaudio for the Linux port, but as far as Windows is concerned, it would be quite nice to backport BGM streaming to the game's original DirectSound backend. The P0275 build is pushing 1 MiB of binary size for a game that originally came in a 220 KiB binary, so it would remove a noticeable amount of bloat from GIAN07.EXE, but it would also allow waveform BGM to work in the Windows 98-compatible i586 build. If that sounds cool to you, this is the issue you want to fund.


That only left some logic and UI busywork to put it all together, which means that we've almost reached the end of things to talk about! Here's what it all looks like:



After half a year of being bought out way past the cap, I've finally got some small room left for new orders again. If it weren't for this blog post and the required research and web development work, this delivery would have probably come out in early January, taking half the time it ended up taking. So I really have to start factoring the blog posts into the push prices in a better and fairer way.
Meanwhile, the hate toward my day job only keeps growing, but there's little point in looking for a new one as long as ReC98 remains this motivating and complex. It leaves pretty much no cognitive room for any similarly demanding job. Thus, I want 2024 to be the year where ReC98 either becomes profitable enough to be my only full-time job, or where we conclusively find out that it can't, I go look for a better day job, and ReC98 shifts to a slower pace. Here's the plan:

With the new price of per push, this means that there's now a small window in which you can get a full push worth of functionality for , until the current cap is filled up again.

Next up: Probably TH02's endings to relax a bit. Maybe we're also getting some new Touhou-related contributions?

📝 Posted:
🚚 Summary of:
P0240, P0241
Commits:
be69ab6...40c900f, 40c900f...08352a5
💰 Funded by:
JonathKane, Blue Bolt, [Anonymous]
🏷 Tags:

Well, well. My original plan was to ship the first step of Shuusou Gyoku OpenGL support on the next day after this delivery. But unfortunately, the complications just kept piling up, to a point where the required solutions definitely blow the current budget for that goal. I'm currently sitting on over 70 commits that would take at least 5 pushes to deliver as a meaningful release, and all of that is just rearchitecting work, preparing the game for a not too Windows-specific OpenGL backend in the first place. I haven't even written a single line of OpenGL yet… 🥲
This shifts the intended Big Release Month™ to June after all. Now I know that the next round of Shuusou Gyoku features should better start with the SC-88Pro recordings, which are much more likely to get done within their current budget. At least I've already completed the configuration versioning system required for that goal, which leaves only the actual audio part.

So, TH04 position independence. Thanks to a bit of funding for stage dialogue RE, non-ASCII translations will soon become viable, which finally presents a reason to push TH04 to 100% position independence after 📝 TH05 had been there for almost 3 years. I haven't heard back from Touhou Patch Center about how much they want to be involved in funding this goal, if at all, but maybe other backers are interested as well.
And sure, it would be entirely possible to implement non-ASCII translations in a way that retains the layout of the original binaries and can be easily compared at a binary level, in case we consider translations to be a critical piece of infrastructure. This wouldn't even just be an exercise in needless perfectionism, and we only have to look to Shuusou Gyoku to realize why: Players expected that my builds were compatible with existing SpoilerAL SSG files, which was something I hadn't even considered the need for. I mean, the game is open-source 📝 and I made it easy to build. You can just fork the code, implement all the practice features you want in a much more efficient way, and I'd probably even merge your code into my builds then?
But I get it – recompiling the game yields just yet another build that can't be easily compared to the original release. A cheat table is much more trustworthy in giving players the confidence that they're still practicing the same original game. And given the current priorities of my backers, it'll still take a while for me to implement proof by replay validation, which will ultimately free every part of the community from depending on the original builds of both Seihou and PC-98 Touhou.

However, such an implementation within the original binary layout would significantly drive up the budget of non-ASCII translations, and I sure don't want to constantly maintain this layout during development. So, let's chase TH04 position independence like it's 2020, and quickly cover a larger amount of PI-relevant structures and functions at a shallow level. The only parts I decompiled for now contain calculations whose intent can't be clearly communicated in ASM. Hitbox visualizations or other more in-depth research would have to wait until I get to the proper decompilation of these features.
But even this shallow work left us with a large amount of TH04-exclusive code that had its worst parts RE'd and could be decompiled fairly quickly. If you want to see big TH04 finalization% gains, general TH04 progress would be a very good investment.


The first push went to the often-mentioned stage-specific custom entities that share a single statically allocated buffer. Back in 2020, I 📝 wrongly claimed that these were a TH05 innovation, but the system actually originated in TH04. Both games use a 26-byte structure, but TH04 only allocates a 32-element array rather than TH05's 64-element one. The conclusions from back then still apply, but I also kept wondering why these games used a static array for these entities to begin with. You know what they call an area of memory that you can cleanly repurpose for things? That's right, a heap! :tannedcirno: And absolutely no one would mind one additional heap allocation at the start of a stage, next to the ones for all the sprites and portraits.
However, we are still running in Real Mode with segmented memory. Accessing anything outside a common data segment involves modifying segment registers, which has a nonzero CPU cycle cost, and Turbo C++ 4.0J is terrible at optimizing away the respective instructions. Does this matter? Probably not, but you don't take "risks" like these if you're in a permanent micro-optimization mindset… :godzun:

In TH04, this system is used for:

  1. Kurumi's symmetric bullet spawn rays, fired from her hands towards the left and right edges of the playfield. These are rather infamous for being the last thing you see before 📝 the Divide Error crash that can happen in ZUN's original build. Capped to 6 entities.

  2. The 4 📝 bits used in Marisa's Stage 4 boss fight. Coincidentally also related to the rare Divide Error crash in that fight.

  3. Stage 4 Reimu's spinning orbs. Note how the game uses two different sets of sprites just to have two different outline colors. This was probably better than messing with the palette, which can easily cause unintended effects if you only have 16 colors to work with. Heck, I have an entire blog post tag just to highlight these cases. Capped to the full 32 entities.

  4. The chasing cross bullets, seen in Phase 14 of the same Stage 6 Yuuka fight. Featuring some smart sprite work, making use of point symmetry to achieve a fluid animation in just 4 frames. This is good-code in sprite form. Capped to 31 entities, because the 32nd custom entity during this fight is defined to be…

  5. The single purple pulsating and shrinking safety circle, seen in Phase 4 of the same fight. The most interesting aspect here is actually still related to the cross bullets, whose spawn function is wrongly limited to 32 entities and could theoretically overwrite this circle. :zunpet: This is strictly landmine territory though:

    • Yuuka never uses these bullets and the safety circle simultaneously
    • She never spawns more than 24 cross bullets
    • All cross bullets are fast enough to have left the screen by the time Yuuka restarts the corresponding subpattern
    • The cross bullets spawn at Yuuka's center position, and assign its Q12.4 coordinates to structure fields that the safety circle interprets as raw pixels. The game does try to render the circle afterward, but since Yuuka's static position during this phase is nowhere near a valid pixel coordinate, it is immediately clipped.

  6. The flashing lines seen in Phase 5 of the Gengetsu fight, telegraphing the slightly random bullet columns.

    The spawn column lines in the TH05 Gengetsu fight, in the first of their two flashing colors.The spawn column lines in the TH05 Gengetsu fight, in the second of their two flashing colors.

These structures only took 1 push to reverse-engineer rather than the 2 I needed for their TH05 counterparts because they are much simpler in this game. The "structure" for Gengetsu's lines literally uses just a single X position, with the remaining 24 bytes being basically padding. The only minor bug I found on this shallow level concerns Marisa's bits, which are clipped at the right and bottom edges of the playfield 16 pixels earlier than you would expect:


The remaining push went to a bunch of smaller structures and functions:


To top off the second push, we've got the vertically scrolling checkerboard background during the Stage 6 Yuuka fight, made up of 32×32 squares. This one deserves a special highlight just because of its needless complexity. You'd think that even a performant implementation would be pretty simple:

  1. Set the GRCG to TDW mode
  2. Set the GRCG tile to one of the two square colors
  3. Start with Y as the current scroll offset, and X as some indicator of which color is currently shown at the start of each row of squares
  4. Iterate over all lines of the playfield, filling in all pixels that should be displayed in the current color, skipping over the other ones
  5. Count down Y for each line drawn
  6. If Y reaches 0, reset it to 32 and flip X
  7. At the bottom of the playfield, change the GRCG tile to the other color, and repeat with the initial value of X flipped

The most important aspect of this algorithm is how it reduces GRCG state changes to a minimum, avoiding the costly port I/O that we've identified time and time again as one of the main bottlenecks in TH01. With just 2 state variables and 3 loops, the resulting code isn't that complex either. A naive implementation that just drew the squares from top to bottom in a single pass would barely be simpler, but much slower: By changing the GRCG tile on every color, such an implementation would burn a low 5-digit number of CPU cycles per frame for the 12×11.5-square checkerboard used in the game.
And indeed, ZUN retained all important aspects of this algorithm… but still implemented it all in ASM, with a ridiculous layer of x86 segment arithmetic on top? :zunpet: Which blows up the complexity to 4 state variables, 5 nested loops, and a bunch of constants in unusual units. I'm not sure what this code is supposed to optimize for, especially with that rather questionable register allocation that nevertheless leaves one of the general-purpose registers unused. :onricdennat: Fortunately, the function was still decompilable without too many code generation hacks, and retains the 5 nested loops in all their goto-connected glory. If you want to add a checkerboard to your next PC-98 demo, just stick to the algorithm I gave above.
(Using a single XOR for flipping the starting X offset between 32 and 64 pixels is pretty nice though, I have to give him that.)


This makes for a good occasion to talk about the third and final GRCG mode, completing the series I started with my previous coverage of the 📝 RMW and 📝 TCR modes. The TDW (Tile Data Write) mode is the simplest of the three and just writes the 8×1 GRCG tile into VRAM as-is, without applying any alpha bitmask. This makes it perfect for clearing rectangular areas of pixels – or even all of VRAM by doing a single memset():

// Set up the GRCG in TDW mode.
outportb(0x7C, 0x80);

// Fill the tile register with color #7 (0111 in binary).
outportb(0x7E, 0xFF); // Plane 0: (B): (********)
outportb(0x7E, 0xFF); // Plane 1: (R): (********)
outportb(0x7E, 0xFF); // Plane 2: (G): (********)
outportb(0x7E, 0x00); // Plane 3: (E): (        )

// Set the 32 pixels at the top-left corner of VRAM to the exact contents of
// the tile register, effectively repeating the tile 4 times. In TDW mode, the
// GRCG ignores the CPU-supplied operand, so we might as well just pass the
// contents of a register with the intended width. This eliminates useless load
// instructions in the compiled assembly, and even sort of signals to readers
// of this code that we do not care about the source value.
*reinterpret_cast<uint32_t far *>(MK_FP(0xA800, 0)) = _EAX;

// Fill the entirety of VRAM with the GRCG tile. A simple C one-liner that will
// probably compile into a single `REP STOS` instruction. Unfortunately, Turbo
// C++ 4.0J only ever generates the 16-bit `REP STOSW` here, even when using
// the `__memset__` intrinsic and when compiling in 386 mode. When targeting
// that CPU and above, you'd ideally want `REP STOSD` for twice the speed.
memset(MK_FP(0xA800, 0), _AL, ((640 / 8) * 400));

However, this might make you wonder why TDW mode is even necessary. If it's functionally equivalent to RMW mode with a CPU-supplied bitmask made up entirely of 1 bits (i.e., 0xFF, 0xFFFF, or 0xFFFFFFFF), what's the point? The difference lies in the hardware implementation: If all you need to do is write tile data to VRAM, you don't need the read and modify parts of RMW mode which require additional processing time. The PC-9801 Programmers' Bible claims a speedup of almost 2× when using TDW mode over equivalent operations in RMW mode.
And that's the only performance claim I found, because none of these old PC-98 hardware and programming books did any benchmarks. Then again, it's not too interesting of a question to benchmark either, as the byte-aligned nature of TDW blitting severely limits its use in a game engine anyway. Sure, maybe it makes sense to temporarily switch from RMW to TDW mode if you've identified a large rectangular and byte-aligned section within a sprite that could be blitted without a bitmask? But the necessary identification work likely nullifies the performance gained from TDW mode, I'd say. In any case, that's pretty deep micro-optimization territory. Just use TDW mode for the few cases it's good at, and stick to RMW mode for the rest.

So is this all that can be said about the GRCG? Not quite, because there are 4 bits I haven't talked about yet…


And now we're just 5.37% away from 100% position independence for TH04! From this point, another 2 pushes should be enough to reach this goal. It might not look like we're that close based on the current estimate, but a big chunk of the remaining numbers are false positives from the player shot control functions. Since we've got a very special deadline to hit, I'm going to cobble these two pushes together from the two current general subscriptions and the rest of the backlog. But you can, of course, still invest in this goal to allow the existing contributions to go to something else.
… Well, if the store was actually open. :thonk: So I'd better continue with a quick task to free up some capacity sooner rather than later. Next up, therefore: Back to TH02, and its item and player systems. Shouldn't take that long, I'm not expecting any surprises there. (Yeah, I know, famous last words…)

📝 Posted:
🚚 Summary of:
P0226
Commits:
(Seihou) M0002...P0226
💰 Funded by:
Arandui, alp-bib
🏷 Tags:
> "OK, TH03/TH04/TH05 cutscenes done, let's quickly finish the Touhou Patch Center MediaWiki upgrade. Just some scripting and verification left, it will be done so quickly that I don't even have to mention it on this blog" > Still not done after 3 weeks > Blocked by one final critical bug that really should be fixed upstream > Code reviewers are probably on vacation

And so, the year unfortunately ended with yet another slow month. During the MediaWiki upgrade, I was slowly decompiling the TH05 Sara fight on the side, but stumbled over one interesting but high-maintenance detail there that would really enhance her blog post. TH02 would need a lot of attention for the basic rendering calls as well…

…so let's end the year with Shuusou Gyoku instead, looking at its most critical issue in particular. As if that were the easy option here… :tannedcirno:
The game does not run properly on modern Windows systems due to its usage of the ancient DirectDraw APIs, with issues ranging from unbearable slowdown to glitched colors to the game not even starting at all. Thankfully, Shuusou Gyoku is not the only ancient Windows game affected by these issues, and people have developed a variety of generic DirectDraw wrappers and patches for playing such games on modern systems. Out of all these, DDrawCompat is one of the simpler solutions for Shuusou Gyoku in particular: Just drop its ddraw proxy DLL into the game directory, and the game will run as it's supposed to.
So let's just bundle that DLL with all my future Shuusou Gyoku releases then? That would have been the quick and dirty option, coming with several drawbacks:

Fortunately, I had the budget to dig a bit deeper and figure out what exactly DDrawCompat does to make Shuusou Gyoku work properly. Turns out that among all the hooks and patches, the game only needs the most central one: Enforcing a 32-bit display mode regardless of whatever lower bit depth the game requests natively, combined with converting the game's pixel buffer to 32-bit on the fly.
So does this mean that adding 32-bit to the game's list of supported bit depths is everything we have to do?

The new 32-bit rendering option in the Shuusou Gyoku P0226 build.
Interestingly, Shuusou Gyoku already saved the DirectDraw enumeration flag that indicates support for 32-bit display modes. The official version just did nothing with it.

Well, almost everything. Initially, this surprised me as well: With all the if statements checking for precise bit depths, you would think that supporting one more bit depth would be way harder in this code base. As it turned out though, these conditional branches are not really about 8-bit or 16-bit color for the most part, but instead differentiate between two very distinct rendering approaches:

Consequently, most of these branches deal with differences between these two approaches that couldn't be nicely abstracted away in pbg's renderer interface: Specific palette changes that are exclusive to "8-bit" mode, or certain entities and effects whose Direct3D draw calls in "16-bit" mode require tailor-made approximations for the "8-bit" mode. Since our new 32-bit mode is equivalent to the 16-bit mode in all of these branches, I only needed to replace the raw number comparisons with more meaningful method calls.

That only left a very small number of 2D raster effects that directly write to or read from DirectDraw surface memory, and therefore do need to know the bit size of each pixel. Thanks to std::variant and std::visit(), adding 32-bit support becomes trivial here: By rewriting the code in a generic manner that derives all offsets from the template type, you only have to say hey, I'd like to have 32-bit as well, and C++ will automatically instantiate correct 32-bit variants of all bit depth-dependent code snippets.
There are only three features in the entire game that access pixel buffers this way: a color key retrieval function, the lens ball animation on the logo screen, and… the ending staff roll? Sure, the text sprites fade in and out, but so does the picture next to it, using Direct3D alpha blending or palette color ramping depending on the current rendering mode. Instead, the only reason why these sprites directly access their pixel buffer is… an unused and pretty wild spiral effect. 😮 It's still part of the code, and only doesn't show up because the parameters that control its timing were commented out before release:

They probably considered it too wild for the mood of this ending.
The main ending text was the only remaining issue of mojibake present in my previous Shuusou Gyoku builds, and is now fixed as well. Windows can render Shift-JIS text via GDI even outside Japanese locale, but only when explicitly selecting a font that supports the SHIFTJIS_CHARSET, and the game simply didn't select any font for rendering this text. Thus, GDI fell back onto its default font, which obviously is only guaranteed to support the SHIFTJIS_CHARSET if your system locale is set to Japanese. This is why the font in the original game might look different between systems. For my build, I chose the font that would appear on a clean Windows installation – a basic 400-weighted MS Gothic at font size 16, which is already used all throughout the game.

Alright, 32-bit mode complete, let's set it as the default if possible… and break compatibility to the original 秋霜CFG.DAT format in the process? When validating this file, the original game only allows the originally supported 8-bit or 16-bit modes. Setting the BitDepth field to any other value causes the entire file to be reset to its defaults, re-locking the Extra Stage in the process. :onricdennat:
Introducing a backward-compatible version system for 秋霜CFG.DAT was beyond the scope of this push. Changing the validation to a per-field approach was a good small first step to take though. The new build no longer validates the BitDepth field against a fixed list, but against the actually supported bit depths on your system, picking a different supported one if necessary. With the original approach, this would have caused your entire configuration to fail the validation check. Instead, you can now safely update to the new build without losing your option settings, or your previously unlocked access to the Extra Stage.
Side note: The validation limit for starting bombs is off by one, and the one for starting lives check is off by two. By modifying 秋霜CFG.DAT, you could theoretically get new games to start with 7 lives and 3 bombs… if you then calculate a correct checksum for your hacked config file, that is. 🧑‍💻

Interestingly, DirectDraw doesn't even indicate support for 8-bit or 16-bit color on systems that are affected by the initially mentioned issues. Therefore, these issues are not the fault of DirectDraw, but of Shuusou Gyoku, as the original release requested a bit depth that it has even verified to be unsupported. Unfortunately, Windows sides with Sim City Shuusou Gyoku here: If you previously experimented with the Windows app compatibility settings, you might have ended up with the DWM8And16BitMitigation flag assigned to the full file path of your Shuusou Gyoku executable in either

As the term mitigation suggests, these modes are (poorly) emulated, which is exactly what causes the issues with this game in the first place. Sure, this might be the lesser evil from the point of view of an operating system: If you don't have the budget for a full-blown DDrawCompat-style DirectDraw wrapper, you might consider it better for users to have the game run poorly than have it fail at startup due to incorrect API usage. Controlling this with a flag that sticks around for future runs of a binary is definitely suboptimal though, especially given how hard it is to programmatically remove this flag within the binary itself. It only adds additional complexity to the ideal clean upgrade path.
So, make sure to check your registry and manually remove these flags for the time being. Without them, the new Config → Graphic menu will correctly prevent you from selecting anything else but 32-bit on modern Windows.


After all that, there was just enough time left in this push to implement basic locale independence, as requested by the Seihou development Discord group, without looking into automatic fixes for previous mojibake filenames yet. Combining std::filesystem::path with the native Win32 API should be straightforward and bloat-free, especially with all the abstractions I've been building, right?
Well, turns out that std::filesystem::path does not actually meet my expectations. At least as long as it's not constexpr-enabled, because you still get the unfortunate conversion from narrow to wide encoding at runtime, even for globals with static storage duration. That brings us back to writing our path abstraction in terms of the regular std::string and std::wstring containers, which at least allow us to enforce the respective encoding at compile time. Even std::string_view only adds to the complexity here, as its strings are never inherently null-terminated, which is required by both the POSIX and Win32 APIs. Not to mention dynamic filenames: C++20's std::format() would be the obvious idiomatic choice here, but using it almost doubles the size of the compiled binary… 🤮
In the end, the most bloat-free way of implementing C++ file I/O in 2023 is still the same as it was 30 years ago: Call system APIs, roll a custom abstraction that conditionally uses the L prefix, and pass around raw pointers. And if you need a dynamic filename, just write the dynamic characters into arrays at fixed positions. Just as PC-98 Touhou used to do… :zunpet:
Oh, and the game's window also uses a Unicode title bar now.

And that's it for this push! Make sure to rename your configuration (秋霜CFG.DAT), score (秋霜SC.DAT), and replay (秋霜りぷ*.DAT) filenames if you were previously running the game on a non-Japanese locale, and then grab the new build:

:sh01: Shuusou Gyoku P0226

With that, we've got the most critical bugs out of the way, but the number of potential fixes and features in Shuusou Gyoku has only increased. Looking forward to what's next in this apparent Seihou revolution, later in 2023!

Next up: Starting the new year with all my plans hopefully working out for once. TH05 Sara very soon, ZMBV code review afterward, low-hanging fruit of the TH01 Anniversary Edition after that, and then kicking off TH02 with a bunch of low-level blitting code.