⮜ Blog

⮜ List of tags

Showing all posts tagged kaja-

📝 Posted:
🚚 Summary of:
P0139
Commits:
864e864...d985811
💰 Funded by:
[Anonymous]
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ kaja- jank+ bug+ contribution-ideas+

Technical debt, part 10… in which two of the PMD-related functions came with such complex ramifications that they required one full push after all, leaving no room for the additional decompilations I wanted to do. At least, this did end up being the final one, completing all SHARED segments for the time being.


The first one of these functions determines the BGM and sound effect modes, combining the resident type of the PMD driver with the Option menu setting. The TH04 and TH05 version is apparently coded quite smartly, as PC-98 Touhou only needs to distinguish "OPN- / PC-9801-26K-compatible sound sources handled by PMD.COM" from "everything else", since all other PMD varieties are OPNA- / PC-9801-86-compatible.
Therefore, I only documented those two results returned from PMD's AH=09h function. I'll leave a comprehensive, fully documented enum to interested contributors, since that would involve research into basically the entire history of the PC-9800 series, and even the clearly out-of-scope PC-88VA. After all, distinguishing between more versions of the PMD driver in the Option menu (and adding new sprites for them!) is strictly mod territory.


The honor of being the final decompiled function in any SHARED segment went to TH04's snd_load(). TH04 contains by far the sanest version of this function: Readable C code, no new ZUN bugs (and still missing file I/O error handling, of course)… but wait, what about that actual file read syscall, using the INT 21h, AH=3Fh DOS file read API? Reading up to a hardcoded number of bytes into PMD's or MMD's song or sound effect buffer, 20 KiB in TH02-TH04, 64 KiB in TH05… that's kind of weird. About time we looked closer into this. :thonk:

Turns out that no, KAJA's driver doesn't give you the full 64 KiB of one memory segment for these, as especially TH05's code might suggest to anyone unfamiliar with these drivers. :zunpet: Instead, you can customize the size of these buffers on its command line. In GAME.BAT, ZUN allocates 8 KiB for FM songs, 2 KiB for sound effects, and 12 KiB for MMD files in TH02… which means that the hardcoded sizes in snd_load() are completely wrong, no matter how you look at them. :onricdennat: Consequently, this read syscall will overflow PMD's or MMD's song or sound effect buffer if the given file is larger than the respective buffer size.
Now, ZUN could have simply hardcoded the sizes from GAME.BAT instead, and it would have been fine. As it also turns out though, PMD has an API function (AH=22h) to retrieve the actual buffer sizes, provided for exactly that purpose. There is little excuse not to use it, as it also gives you PMD's default sizes if you don't specify any yourself.
(Unless your build process enumerates all PMD files that are part of the game, and bakes the largest size into both snd_load() and GAME.BAT. That would even work with MMD, which doesn't have an equivalent for AH=22h.)

What'd be the consequence of loading a larger file then? Well, since we don't get a full segment, let's look at the theoretical limit first.
PMD prefers to keep both its driver code and the data buffers in a single memory segment. As a result, the limit for the combined size of the song, instrument, and sound effect buffer is determined by the amount of code in the driver itself. In PMD86 version 4.8o (bundled with TH04 and TH05) for example, the remaining size for these buffers is exactly 45,555 bytes. Being an actually good programmer who doesn't blindly trust user input, KAJA thankfully validates the sizes given via the /M, /V, and /E command-line options before letting the driver reside in memory, and shuts down with an error message if they exceed 40 KiB. Would have been even better if he calculated the exact size – even in the current PMD version 4.8s from January 2020, it's still a hardcoded value (see line 8581).
Either way: If the file is larger than this maximum, the concrete effect is down to the INT 21h, AH=3Fh implementation in the underlying DOS version. DOS 3.3 treats the destination address as linear and reads past the end of the segment, DOS 5.0 and DOSBox-X truncate the number of bytes to not exceed the remaining space in the segment, and maybe there's even a DOS that wraps around and ends up overwriting the PMD driver code. In any case: You will overwrite what's after the driver in memory – typically, the game .EXE and its master.lib functions.

It almost feels like a happy accident that this doesn't cause issues in the original games. The largest PMD file in any of the 4 games, the -86 version of 幽夢 ~ Inanimate Dream, takes up 8,099 bytes, just under the 8,192 byte limit for BGM. For modders, I'd really recommend implementing this properly, with PMD's AH=22h function and error handling, once position independence has been reached.

Whew, didn't think I'd be doing more research into KAJA's drivers during regular ReC98 development! That's probably been the final time though, as all involved functions are now decompiled, and I'm unlikely to iterate over them again.


And that's it! Repaid the biggest chunk of technical debt, time for some actual progress again. Next up: Reopening the store tomorrow, and waiting for new priorities. If we got nothing by Sunday, I'm going to put the pending [Anonymous] pushes towards some work on the website.

📝 Posted:
🚚 Summary of:
P0135, P0136
Commits:
a6eed55...252c13d, 252c13d...07bfcf2
💰 Funded by:
[Anonymous]
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ kaja- menu+ micro-optimization+ bug+ tcc+

Alright, no more big code maintenance tasks that absolutely need to be done right now. Time to really focus on parts 6 and 7 of repaying technical debt, right? Except that we don't get to speed up just yet, as TH05's barely decompilable PMD file loading function is rather… complicated.
Fun fact: Whenever I see an unusual sequence of x86 instructions in PC-98 Touhou, I first consult the disassembly of Wolfenstein 3D. That game was originally compiled with the quite similar Borland C++ 3.0, so it's quite helpful to compare its ASM to the officially released source code. If I find the instructions in question, they mostly come from that game's ASM code, leading to the amusing realization that "even John Carmack was unable to get these instructions out of this compiler" :onricdennat: This time though, Wolfenstein 3D did point me to Borland's intrinsics for common C functions like memcpy() and strchr(), available via #pragma intrinsic. Bu~t those unfortunately still generate worse code than what ZUN micro-optimized here. Commenting how these sequences of instructions should look in C is unfortunately all I could do here.
The conditional branches in this function did compile quite nicely though, clarifying the control flow, and clearly exposing a ZUN bug: TH05's snd_load() will hang in an infinite loop when trying to load a non-existing -86 BGM file (with a .M2 extension) if the corresponding -26 BGM file (with a .M extension) doesn't exist either.

Unsurprisingly, the PMD channel monitoring code in TH05's Music Room remains undecompilable outside the two most "high-level" initialization and rendering functions. And it's not because there's data in the middle of the code segment – that would have actually been possible with some #pragmas to ensure that the data and code segments have the same name. As soon as the SI and DI registers are referenced anywhere, Turbo C++ insists on emitting prolog code to save these on the stack at the beginning of the function, and epilog code to restore them from there before returning. Found that out in September 2019, and confirmed that there's no way around it. All the small helper functions here are quite simply too optimized, throwing away any concern for such safety measures. 🤷
Oh well, the two functions that were decompilable at least indicate that I do try.


Within that same 6th push though, we've finally reached the one function in TH05 that was blocking further progress in TH04, allowing that game to finally catch up with the others in terms of separated translation units. Feels good to finally delete more of those .ASM files we've decompiled a while ago… finally!

But since that was just getting started, the most satisfying development in both of these pushes actually came from some more experiments with macros and inline functions for near-ASM code. By adding "unused" dummy parameters for all relevant registers, the exact input registers are made more explicit, which might help future port authors who then maybe wouldn't have to look them up in an x86 instruction reference quite as often. At its best, this even allows us to declare certain functions with the __fastcall convention and express their parameter lists as regular C, with no additional pseudo-registers or macros required.
As for output registers, Turbo C++'s code generation turns out to be even more amazing than previously thought when it comes to returning pseudo-registers from inline functions. A nice example for how this can improve readability can be found in this piece of TH02 code for polling the PC-98 keyboard state using a BIOS interrupt:

inline uint8_t keygroup_sense(uint8_t group) {
	_AL = group;
	_AH = 0x04;
	geninterrupt(0x18);
	// This turns the output register of this BIOS call into the return value
	// of this function. Surprisingly enough, this does *not* naively generate
	// the `MOV AL, AH` instruction you might expect here!
	return _AH;
}

void input_sense(void)
{
	// As a result, this assignment becomes `_AH = _AH`, which Turbo C++
	// never emits as such, giving us only the three instructions we need.
	_AH = keygroup_sense(8);

	// Whereas this one gives us the one additional `MOV BH, AH` instruction
	// we'd expect, and nothing more.
	_BH = keygroup_sense(7);

	// And now it's obvious what both of these registers contain, from just
	// the assignments above.
	if(_BH & K7_ARROW_UP || _AH & K8_NUM_8) {
		key_det |= INPUT_UP;
	}
	// […]
}

I love it. No inline assembly, as close to idiomatic C code as something like this is going to get, yet still compiling into the minimum possible number of x86 instructions on even a 1994 compiler. This is how I keep this project interesting for myself during chores like these. :tannedcirno: We might have even reached peak inline already?

And that's 65% of technical debt in the SHARED segment repaid so far. Next up: Two more of these, which might already complete that segment? Finally!