⮜ Blog

⮜ List of tags

Showing all posts tagged th01-

📝 Posted:
🚚 Summary of:
P0140, P0141, P0142
d985811...d856f7d, d856f7d...5afee78, 5afee78...08bc188
💰 Funded by:
[Anonymous], rosenrose, Yanga
🏷 Tags:
rec98+ th01- pc98+ dosbox-x+ gameplay+ boss+ konngara+ danmaku-pattern+

Alright, onto Konngara! Let's quickly move the escape sequences used later in the battle to C land, and then we can immediately decompile the loading and entrance animation function together with its filenames. Might as well reverse-engineer those escape sequences while I'm at it, though – even if they aren't implemented in DOSBox-X, they're well documented in all those Japanese PDFs, so this should be no big deal…

…wait, ESC )3 switches to "graph mode"? As opposed to the default "kanji mode", which can be re-entered via ESC )0? Let's look up graph mode in the PC-9801 Programmers' Bible then…

> Kanji cannot be handled in this mode.
…and that's apparently all it has to say. Why have it then, on a platform whose main selling point is a kanji ROM, and where Shift-JIS (and, well, 7-bit ASCII) are the only native encodings? No support for graph mode in DOSBox-X either… yeah, let's take a deep dive into NEC's IO.SYS, and get to the bottom of this.

And yes, graph mode pretty much just disables Shift-JIS decoding for characters written via INT 29h, the lowest-level way of "just printing a char" on DOS, which every printf() will ultimately end up calling. Turns out there is a use for it though, which we can spot by looking at the 8×16 half-width section of font ROM:

The half-width glyphs marked in red correspond to the byte ranges from 0x80-0x9F and 0xE0-0xFF… which Shift-JIS defines as lead bytes for two-byte, full-width characters. But if we turn off Shift-JIS decoding…

Jackpot, we get those half-width characters when printing their corresponding bytes.
I've re-implemented all my findings into DOSBox-X, which will include graph mode in the upcoming 0.83.14 release. If P0140 looks a bit empty as a result, that's why – most of the immediate feature work went into DOSBox-X, not into ReC98. That's the beauty of "anything" pushes. :tannedcirno:

So, after switching to graph mode, TH01 does… one of the slowest possible memset()s over all of text RAM – one printf(" ") call for every single one of its 80×25 half-width cells – before switching back to kanji mode. What a waste of RE time…? Oh well, at least we've now got plenty of proof that these weird escape sequences actually do nothing of interest.

As for the Konngara code itself… well, it's script-like code, what can you say. Maybe minimally sloppy in some places, but ultimately harmless.
One small thing that might not be widely known though: The large, blue-green Siddhaṃ seed syllables are supposed to show up immediately, with no delay between them? Good to know. Clocking your emulator too low tends to roll them down from the top of the screen, and will certainly add a noticeable delay between the four individual images.

… Wait, but this means that ZUN could have intended this "effect". Why else would he not only put those syllables into four individual images (and therefore add at least the latency of disk I/O between them), but also show them on the foreground VRAM page, rather than on the "back buffer"?

Meanwhile, in 📝 another instance of "maybe having gone too far in a few places": Expressing distances on the playfield as fractions of its width and height, just to avoid absolute numbers? Raw numbers are bad because they're in screen space in this game. But we've already been throwing PLAYFIELD_ constants into the mix as a way of explicitly communicating screen space, and keeping raw number literals for the actual playfield coordinates is looking increasingly sloppy… I don't know, fractions really seemed like the most sensible thing to do with what we're given here. 😐

So, 2 pushes in, and we've got the loading code, the entrance animation, facial expression rendering, and the first one out of Konngara's 12 danmaku patterns. Might not sound like much, but since that first pattern involves those ◆ blue-green diamond sprites and therefore is one of the more complicated ones, it all amounts to roughly 21.6% of Konngara's code. That's 7 more pushes to get Konngara done, then? Next up though: Two pushes of website improvements.

📝 Posted:
🚚 Summary of:
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:
rec98+ th01- th02+ th03+ th04+ micro-optimization+ file-format+ waste+

Technical debt, part 9… and as it turns out, it's highly impractical to repay 100% of it at this point in development. 😕

The reason: graph_putsa_fx(), ZUN's function for rendering optionally boldfaced text to VRAM using the font ROM glyphs, in its ridiculously micro-optimized TH04 and TH05 version. This one sets the "callback function" for applying the boldface effect by self-modifying the target of two CALL rel16 instructions… because there really wasn't any free register left for an indirect CALL, eh? The necessary distance, from the call site to the function itself, has to be calculated at assembly time, by subtracting the target function label from the call site label.
This usually wouldn't be a problem… if ZUN didn't store the resulting lookup tables in the .DATA segment. With code segments, we can easily split them at pretty much any point between functions because there are multiple of them. But there's only a single .DATA segment, with all ZUN and master.lib data sandwiched between Borland C++'s crt0 at the top, and Borland C++'s library functions at the bottom of the segment. Adding another split point would require all data after that point to be moved to its own translation unit, which in turn requires EXTERN references in the big .ASM file to all that moved data… in short, it would turn the codebase into an even greater mess.
Declaring the labels as EXTERN wouldn't work either, since the linker can't do fancy arithmetic and is limited to simply replacing address placeholders with one single address. So, we're now stuck with this function at the bottom of the SHARED segment, for the foreseeable future.

We can still continue to separate functions off the top of that segment, though. Pretty much the only thing noteworthy there, so far: TH04's code for loading stage tile images from .MPN files, which we hadn't reverse-engineered so far, and which nicely fit into one of Blue Bolt's pending ⅓ RE contributions. Yup, we finally moved the RE% bars again! If only for a tiny bit. :tannedcirno:
Both TH02 and TH05 simply store one pointer to one dynamically allocated memory block for all tile images, as well as the number of images, in the data segment. TH04, on the other hand, reserves memory for 8 .MPN slots, complete with their color palettes, even though it only ever uses the first one of these. There goes another 458 bytes of conventional RAM… I should start summing up all the waste we've seen so far. Let's put the next website contribution towards a tagging system for these blog posts.

At 86% of technical debt in the SHARED segment repaid, we aren't quite done yet, but the rest is mostly just TH04 needing to catch up with functions we've already separated. Next up: Getting to that practical 98.5% point. Since this is very likely to not require a full push, I'll also decompile some more actual TH04 and TH05 game code I previously reverse-engineered – and after that, reopen the store!

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th01- th02+ th03+ th04+ th05+ micro-optimization+ master.lib+ tcc+

Wow, 31 commits in a single push? Well, what the last push had in progress, this one had in maintenance. The 📝 master.lib header transition absolutely had to be completed in this one, for my own sanity. And indeed, it reduced the build time for the entirety of ReC98 to about 27 seconds on my system, just as expected in the original announcement. Looking forward to even faster build times with the upcoming #include improvements I've got up my sleeve! The port authors of the future are going to appreciate those quite a bit.

As for the new translation units, the funniest one is probably TH05's function for blitting the 1-color .CDG images used for the main menu options. Which is so optimized that it becomes decompilable again, by ditching the self-modifying code of its TH04 counterpart in favor of simply making better use of CPU registers. The resulting C code is still a mess, but what can you do. :tannedcirno:
This was followed by even more TH05 functions that clearly weren't compiled from C, as evidenced by their padding bytes. It's about time I've documented my lack of ideas of how to get those out of Turbo C++. :onricdennat:

And just like in the previous push, I also had to 📝 throw away a decompiled TH02 function purely due to alignment issues. Couldn't have been a better one though, no one's going to miss a residency check for the MMD driver that is largely identical to the corresponding (and indeed decompilable) function for the PMD driver. Both of those should have been merged into a single function anyway, given how they also mutate the game's sound configuration flags…

In the end, I've slightly slowed down with this one, with only 37% of technical debt done after this 4th dedicated push. Next up: One more of these, centered around TH05's stupidly optimized .PI functions. Maybe also with some more reverse-engineering, after not having done any for 1½ months?

📝 Posted:
🚚 Summary of:
P0130, P0131
6d69ea8...576def5, 576def5...dc9e3ee
💰 Funded by:
🏷 Tags:
rec98+ th01- pc98+ blitting+ good-code+ jank+ gameplay+ boss+ rng+

50% hype! 🎉 But as usual for TH01, even that final set of functions shared between all bosses had to consume two pushes rather than one…

First up, in the ongoing series "Things that TH01 draws to the PC-98 graphics layer that really should have been drawn to the text layer instead": The boss HP bar. Oh well, using the graphics layer at least made it possible to have this half-red, half-white pattern for the middle section.
This one pattern is drawn by making surprisingly good use of the GRCG. So far, we've only seen it used for fast monochrome drawing:

// Setting up fast drawing using color #9 (1001 in binary)
outportb(0x7E, 0xFF); // Plane 0: (B): (********)
outportb(0x7E, 0x00); // Plane 1: (R): (        )
outportb(0x7E, 0x00); // Plane 2: (G): (        )
outportb(0x7E, 0xFF); // Plane 3: (E): (********)

// Write a checkerboard pattern (* * * * ) in color #9 to the top-left corner,
// with transparent blanks. Requires only 1 VRAM write to a single bitplane:
// The GRCG automatically writes to the correct bitplanes, as specified above
*(uint8_t *)(MK_FP(0xA800, 0)) = 0xAA;
But since this is actually an 8-pixel tile register, we can set any 8-pixel pattern for any bitplane. This way, we can get different colors for every one of the 8 pixels, with still just a single VRAM write of the alpha mask to a single bitplane:
grcg_setmode(GC_RMW); //  Final color: (A7A7A7A7)
outportb(0x7E, 0x55); // Plane 0: (B): ( * * * *)
outportb(0x7E, 0xFF); // Plane 1: (R): (********)
outportb(0x7E, 0x55); // Plane 2: (G): ( * * * *)
outportb(0x7E, 0xAA); // Plane 3: (E): (* * * * )
And I thought TH01 only suffered the drawbacks of PC-98 hardware, making so little use of its actual features that it's perhaps not fair to even call it "a PC-98 game"… Still, I'd say that "bad PC-98 port of an idea" describes it best.

However, after that tiny flash of brilliance, the surrounding HP rendering code goes right back to being the typical sort of confusing TH01 jank. There's only a single function for the three distinct jobs of

  • incrementing HP during the boss entrance animation,
  • decrementing HP if hit by the Orb, and
  • redrawing the entire bar, because it's still all in VRAM, and Sariel wants different backgrounds,
with magic numbers to select between all of these.

VRAM of course also means that the backgrounds behind the individual hit points have to be stored, so that they can be unblitted later as the boss is losing HP. That's no big deal though, right? Just allocate some memory, copy what's initially in VRAM, then blit it back later using your foundational set of blitting funct– oh, wait, TH01 doesn't have this sort of thing, right :tannedcirno: The closest thing, 📝 once again, are the .PTN functions. And so, the game ends up handling these 8×16 background sprites with 16×16 wrappers around functions for 32×32 sprites. :zunpet: That's quite the recipe for confusion, especially since ZUN preferred copy-pasting the necessary ridiculous arithmetic expressions for calculating positions, .PTN sprite IDs, and the ID of the 16×16 quarter inside the 32×32 sprite, instead of just writing simple helper functions. He did manage to make the result mostly bug-free this time around, though! There's one minor hit point discoloration bug if the red-white or white sections start at an odd number of hit points, but that's never the case for any of the original 7 bosses.
The remaining sloppiness is ultimately inconsequential as well: The game always backs up twice the number of hit point backgrounds, and thus uses twice the amount of memory actually required. Also, this self-restriction of only unblitting 16×16 pixels at a time requires any remaining odd hit point at the last position to, of course, be rendered again :onricdennat:

After stumbling over the weakest imaginable random number generator, we finally arrive at the shared boss↔orb collision handling function, the final blocker among the final blockers. This function takes a whopping 12 parameters, 3 of them being references to int values, some of which are duplicated for every one of the 7 bosses, with no generic boss struct anywhere. 📝 Previously, I speculated that YuugenMagan might have been the first boss to be programmed for TH01. With all these variables though, there is some new evidence that SinGyoku might have been the first one after all: It's the only boss to use its own HP and phase frame variables, with the other bosses sharing the same two globals.

While this function only handles the response to a boss↔orb collision, it still does way too much to describe it briefly. Took me quite a while to frame it in terms of invincibility (which is the main impact of all of this that can be observed in gameplay code). That made at least some sort of sense, considering the other usages of the variables passed as references to that function. Turns out that YuugenMagan, Kikuri, and Elis abuse what's meant to be the "invincibility frame" variable as a frame counter for some of their animations 🙄
Oh well, the game at least doesn't call the collision handling function during those, so "invincibility frame" is technically still a correct variable name there.

And that's it! We're finally ready to start with Konngara, in 2021. I've been waiting quite a while for this, as all this high-level boss code is very likely to speed up TH01 progress quite a bit. Next up though: Closing out 2020 with more of the technical debt in the other games.

📝 Posted:
🚚 Summary of:
P0128, P0129
dc65b59...dde36f7, dde36f7...f4c2e45
💰 Funded by:
🏷 Tags:
rec98+ th01- file-format+ gameplay+ card-flipping+ waste+ hidden-content+ bug+

So, only one card-flipping function missing, and then we can start decompiling TH01's two final bosses? Unfortunately, that had to be the one big function that initializes and renders all gameplay objects. #17 on the list of longest functions in all of PC-98 Touhou, requiring two pushes to fully understand what's going on there… and then it immediately returns for all "boss" stages whose number is divisible by 5, yet is still called during Sariel's and Konngara's initialization 🤦

Oh well. This also involved the final file format we hadn't looked at yet – the STAGE?.DAT files that describe the layout for all stages within a single 5-stage scene. Which, for a change is a very well-designed form– no, of course it's completely weird, what did you expect? Development must have looked somewhat like this:

  • Weirdness #1: :zunpet: "Hm, the stage format should include the file names for the background graphics and music… or should it?" And so, the 22-byte header still references some music and background files that aren't part of the final game. The game doesn't use anything from there, and instead derives those file names from the scene ID.
    That's probably nothing new to anyone who has ever looked at TH01's data files. In a slightly more interesting discovery though, seeing the+ 📝 .GRF extension, in some of the file names that are short enough to not cut it off, confirms that .GRF was initially used for background images. Probably before ZUN learned about .PI, and how it achieves better compression than his own per-bitplane RLE approach?
  • Weirdness #2: :zunpet: "Hm, I might want to put obstacles on top of cards?" You'd probably expect this format to contain one single array for every stage, describing which object to place on every 32×32 tile, if any. Well, the real format uses two arrays: One for the cards, and a combined one for all "obstacles" – bumpers, bumper bars, turrets, and portals. However, none of the card-flipping stages in the final game come with any such overlaps. That's quite unfortunate, as it would have made for some quite interesting level designs:

    As you can see, the final version of the blitting code was not written with such overlaps in mind either, blitting the cards on top of all the obstacles, and not the other way round.

  • Weirdness #3: :zunpet: "In contrast to obstacles, of which there are multiple types, cards only really need 1 bit. Time for some bit twiddling!" Not the worst idea, given that the 640×336 playfield can fit 20×10 cards, which would fit exactly into 25 bytes if you use a single bit to indicate card or no card. But for whatever reason, ZUN only stored 4 card bits per byte, leaving the other 4 bits unused, and needlessly blowing up that array to 50 bytes. 🤷

    Oh, and did I mention that the contents of the STAGE?.DAT files are loaded into the main data segment, even though the game immediately parses them into something more conveniently accessible? That's another 1250 bytes of memory wasted for no reason…

  • Weirdness #4: :zunpet: "Hm, how about requiring the player to flip some of the cards multiple times? But I've already written all this bit twiddling code to store 4 cards in 1 byte. And if cards should need anywhere from 1 to 4 flips, that would need at least 2 more bits, which won't fit into the unused 4 bits either…" This feature must have come later, because the final game uses 3 "obstacle" type IDs to act as a flip count modifier for a card at the same relative array position. Complete with lookup code to find the actual card index these modifiers belong to, and ridiculous switch statements to not include those non-obstacles in the game's internal obstacle array. :tannedcirno:

With all that, it's almost not worth mentioning how there are 12 turret types, which only differ in which hardcoded pellet group they fire at a hardcoded interval of either 100 or 200 frames, and that they're all explicitly spelled out in every single switch statement. Or how the layout of the internal card and obstacle SoA classes is quite disjointed. So here's the new ZUN bugs you've probably already been expecting!

Cards and obstacles are blitted to both VRAM pages. This way, any other entities moving on top of them can simply be unblitted by restoring pixels from VRAM page 1, without requiring the stationary objects to be redrawn from main memory. Obviously, the backgrounds behind the cards have to be stored somewhere, since the player can remove them. For faster transitions between stages of a scene, ZUN chose to store the backgrounds behind obstacles as well. This way, the background image really only needs to be blitted for the first stage in a scene.

All that memory for the object backgrounds adds up quite a bit though. ZUN actually made the correct choice here and picked a memory allocation function that can return more than the 64 KiB of a single x86 Real Mode segment. He then accesses the individual backgrounds via regular array subscripts… and that's where the bug lies, because he stores the returned address in a regular far pointer rather than a huge one. This way, the game still can only display a total of 102 objects (i. e., cards and obstacles combined) per stage, without any unblitting glitches.
What a shame, that limit could have been 127 if ZUN didn't needlessly allocate memory for alpha planes when backing up VRAM content. :onricdennat:

And since array subscripts on far pointers wrap around after 64 KiB, trying to save the background of the 103rd object is guaranteed to corrupt the memory block header at the beginning of the returned segment. :zunpet:. When TH01 runs in test mode, it correctly reports a corrupted heap in this case.

Finally, some unused content! Upon discovering TH01's debug mode, probably everyone tried to access Stage 21, just to see what happens, and indeed landed in an actual stage, with a black background and a weird color palette. Turns out that ZUN did ship an unused scene in SCENE7.DAT, which is exactly was loaded there.
Unfortunately, it's easy to believe that this is just garbage data (as I initially did): At the beginning of "Stage 22", the game seems to enter an infinite loop somewhere during the flip-in animation.

Well, we've had a heap overflow above, and the cause here is nothing but a stack buffer overflow – a perhaps more modern kind of classic C bug, given its prevalence in the Windows Touhou games. Explained in a few lines of code:

void stageobjs_init_and_render()
	int card_animation_frames[50]; // even though there can be up to 200?!
	int total_frames = 0;

	(code that would end up resetting total_frames if it ever tried to reset
The number of cards in "Stage 22"? 76. There you have it.

But of course, it's trivial to disable this animation and fix these stage transitions. So here they are, Stages 21 to 24, as shipped with the game in STAGE7.DAT:

Wow, what a mess. All that was just a bit too much to be covered in two pushes… Next up, assuming the current subscriptions: Taking a vacation with one smaller TH01 push, covering some smaller functions here and there to ensure some uninterrupted Konngara progress later on.

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th01- file-format+ blitting+ waste+ jank+

Done with the .BOS format, at last! While there's still quite a bunch of undecompiled non-format blitting code left, this was in fact the final piece of graphics format loading code in TH01.

📝 Continuing the trend from three pushes ago, we've got yet another class, this time for the 48×48 and 48×32 sprites used in Reimu's gohei, slide, and kick animations. The only reason these had to use the .BOS format at all is simply because Reimu's regular sprites are 32×32, and are therefore loaded from 📝 .PTN files.
Yes, this makes no sense, because why would you split animations for the same character across two file formats and two APIs, just because of a sprite size difference? This necessity for switching blitting APIs might also explain why Reimu vanishes for a few frames at the beginning and the end of the gohei swing animation, but more on that once we get to the high-level rendering code.

Now that we've decompiled all the .BOS implementations in TH01, here's an overview of all of them, together with .PTN to show that there really was no reason for not using the .BOS API for all of Reimu's sprites:

CBossEntity CBossAnim CPlayerAnim ptn_* (32×32)
Format .BOS .BOS .BOS .PTN
Byte-aligned blitting
Byte-aligned unblitting
Unaligned blitting Single-line and wave only
Precise unblitting
Per-file sprite limit 8 8 32 64
Pixels blitted at once 16 16 8 32

And even that last property could simply be handled by branching based on the sprite width, and wouldn't be a reason for switching formats. But well, it just wouldn't be TH01 without all that redundant bloat though, would it?

The basic loading, freeing, and blitting code was yet another variation on the other .BOS code we've seen before. So this should have caused just as little trouble as the CBossAnim code… except that CPlayerAnim did add one slightly difficult function to the mix, which led to it requiring almost a full push after all. Similar to 📝 the unblitting code for moving lasers we've seen in the last push, ZUN tries to minimize the amount of VRAM writes when unblitting Reimu's slide animations. Technically, it's only necessary to restore the pixels that Reimu traveled by, plus the ones that wouldn't be redrawn by the new animation frame at the new X position.
The theoretically arbitrary distance between the two sprites is, of course, modeled by a fixed-size buffer on the stack :onricdennat:, coming with the further assumption that the sprite surely hasn't moved by more than 1 horizontal VRAM byte compared to the last frame. Which, of course, results in glitches if that's not the case, leaving little Reimu parts in VRAM if the slide speed ever exceeded 8 pixels per frame. :tannedcirno: (Which it never does, being hardcoded to 6 pixels, but still.). As it also turns out, all those bit masking operations easily lead to incredibly sloppy C code. Which compiles into incredibly terrible ASM, which in turn might end up wasting way more CPU time than the final VRAM write optimization would have gained? Then again, in-depth profiling is way beyond the scope of this project at this point.

Next up: The TH04 main menu, and some more technical debt.

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th01- blitting+ waste+ jank+ gameplay+ laser+

This time around, laser is 📝 actually not difficult, with TH01's shootout laser class being simple enough to nicely fit into a single push. All other stationary lasers (as used by YuugenMagan, for example) don't even use a class, and are simply treated as regular lines with collision detection.

But of course, the shootout lasers also come with the typical share of TH01 jank we've all come to expect by now. This time, it already starts with the hardcoded sprite data:

A shootout laser can have a width from 1 to 8 pixels, so ZUN stored a separate 16×1 sprite with a line for each possible width (left-to-right). Then, he shifted all of these sprites 1 pixel to the right for all of the 8 possible start positions within a planar VRAM byte (top-to-bottom). Because… doing that bit shift programmatically is way too expensive, so let's pre-shift at compile time, and use 16× the memory per sprite? :tannedcirno:

Since a bunch of other sprite sheets need to be pre-shifted as well (this is the 5th one we've found so far), our sprite converter has a feature to automatically generate those pre-shifted variations. This way, we can abstract away that implementation detail and leave modders with .BMP files that still only contain a single version of each sprite. But, uh…, wait, in this sprite sheet, the second row for 1-pixel lasers is accidentally shifted right by one more pixel that it should have been?! Which means that

  1. we can't use the auto-preshift feature here, and have to store this weird-looking (and quite frankly, completely unnecessary) sprite sheet in its entirety
  2. ZUN did, at least during TH01's development, not have a sprite converter, and directly hardcoded these dot patterns in the C++ code :zunpet:

The waste continues with the class itself. 69 bytes, with 22 bytes outright unused, and 11 not really necessary. As for actual innovations though, we've got 📝 another 32-bit fixed-point type, this time actually using 8 bits for the fractional part. Therefore, the ray position is tracked to the 1/256th of a pixel, using the full precision of master.lib's 8-bit sin() and cos() lookup tables.
Unblitting is also remarkably efficient: It's only done once the laser stopped extending and started moving, and only for the exact pixels at the start of the ray that the laser traveled by in a single frame. If only the ray part was also rendered as efficiently – it's fully blitted every frame, right next to the collision detection for each row of the ray.

With a public interface of two functions (spawn, and update / collide / unblit / render), that's superficially all there is to lasers in this game. There's another (apparently inlined) function though, to both reset and, uh, "fully unblit" all lasers at the end of every boss fight… except that it fails hilariously at doing the latter, and ends up effectively unblitting random 32-pixel line segments, due to ZUN confusing both the coordinates and the parameter types for the line unblitting function. :zunpet:
A while ago, I was asked about this crash that tends to happen when defeating Elis. And while you can clearly see the random unblitted line segments that are missing from the sprites, I don't quite think we've found the cause for the crash, since the 📝 line unblitting function used there does clip its coordinates to the VRAM range.

Next up: The final piece of image format code in TH01, covering Reimu's sprites!

📝 Posted:
🚚 Summary of:
P0120, P0121
453dd3c...3c008b6, 3c008b6...5c42fcd
💰 Funded by:
🏷 Tags:
rec98+ th01- pc98+ blitting+ waste+ jank+ boss+ mima-th01+

Back to TH01, and its boss sprite format… with a separate class for storing animations that only differs minutely from the 📝 regular boss entity class I covered last time? Decompiling this class was almost free, and the main reason why the first of these pushes ended up looking pretty huge.

Next up were the remaining shape drawing functions from the code segment that started with the .GRC functions. P0105 already started these with the (surprisingly sanely implemented) 8×8 diamond, star, and… uh, snowflake (?) sprites , prominently seen in the Konngara, Elis, and Sariel fights, respectively. Now, we've also got:

  • ellipse arcs with a customizable angle distance between the individual dots – mostly just used for drawing full circles, though
  • line loops – which are only used for the rotating white squares around Mima, meaning that the white star in the YuugenMagan fight got a completely redundant reimplementation
  • and the surprisingly weirdest one, drawing the red invincibility sprites.
The weirdness becomes obvious with just a single screenshot:

First, we've got the obvious issue of the sprites not being clipped at the right edge of VRAM, with the rightmost pixels in each row of the sprite extending to the beginning of the next row. Well, that's just what you get if you insist on writing unique low-level blitting code for the majority of the individual sprites in the game… 🤷
More importantly though, the sprite sheet looks like this: So how do we even get these fully filled red diamonds?

Well, turns out that the sprites are never consistently unblitted during their 8 frames of animation. There is a function that looks like it unblits the sprite… except that it starts with by enabling the GRCG and… reading from the first bitplane on the background page? If this was the EGC, such a read would fill some internal registers with the contents of all 4 bitplanes, which can then subsequently be blitted to all 4 bitplanes of any VRAM page with a single memory write. But with the GRCG in RMW mode, reads do nothing special, and simply copy the memory contents of one bitplane to the read destination. Maybe ZUN thought that setting the RMW color to red also sets some internal 4-plane mask register to match that color? :zunpet:
Instead, the rather random pixels read from the first bitplane are then used as a mask for a second blit of the same red sprite. Effectively, this only really "unblits" the invincibility pixels that are drawn on top of Reimu's sprite. Since Reimu is drawn first, the invincibility sprites are overwritten anyway. But due to the palette color layout of Reimu's sprite, its pixels end up fully masking away any invincibility sprite pixels in that second blit, leaving VRAM untouched as a result. Anywhere else though, this animation quickly turns into the union of all animation frames.

Then again, if that 16-dot-aligned rectangular unblitting function is all you know about the EGC, and you can't be bothered to write a perfect unblitter for 8×8 sprites, it becomes obvious why you wouldn't want to use it:

Because Reimu would barely be visible under all that flicker. In comparison, those fully filled diamonds actually look pretty good.

After all that, the remaining time wouldn't have been enough for the next few essential classes, so I closed out the push with three more VRAM effects instead:

  • Single-bitplane pixel inversion inside a 32×32 square – the main effect behind the discoloration seen in the bomb animation, as well as the exploding squares at the end of Kikuri's and Sariel's entrance animation
  • EGC-accelerated VRAM row copies – the second half of smooth and fully hardware-accelerated scrolling for backgrounds that are twice the size of VRAM
  • And finally, the VRAM page content transition function using meshed 8×8 squares, used for the blocky transition to Sariel's first and second phases. Which is quite ridiculous in just how needlessly bloated it is. I'm positive that this sort of thing could have also been accelerated using the PC-98's EGC… although simply writing better C would have already gone a long way. The function also comes with three unused mesh patterns.

And with that, ReC98, as a whole, is not only ⅓ done, but I've also fully caught up with the feature backlog for the first time in the history of this crowdfunding! Time to go into maintenance mode then, while we wait for the next pushes to be funded. Got a huge backlog of tiny maintenance issues to address at a leisurely pace, and of course there's also the 📝 16-bit build system waiting to be finished.

📝 Posted:
🚚 Summary of:
💰 Funded by:
-Tom-, Ember2528
🏷 Tags:
rec98+ th01- th02+ th04+ th05+ position-independence+ hud+ blitting+ unused+

🎉 TH05 is finally fully position-independent! 🎉 To celebrate this milestone, -Tom- coded a little demo, which we recorded on both an emulator and on real PC-98 hardware:

For all the new people who are unfamiliar with PC-98 Touhou internals: Boss behavior is hardcoded into MAIN.EXE, rather than being scriptable via separate .ECL files like in Windows Touhou. That's what makes this kind of a big deal.

What does this mean?

You can now freely add or remove both data and code anywhere in TH05, by editing the ReC98 codebase, writing your mod in ASM or C/C++, and recompiling the code. Since all absolute memory addresses have now been converted to labels, this will work without causing any instability. See the position independence section in the FAQ for a more thorough explanation about why this was a problem.

By extension, this also means that it's now theoretically possible to use a different compiler on the source code. But:

What does this not mean?

The original ZUN code hasn't been completely reverse-engineered yet, let alone decompiled. As the final PC-98 Touhou game, TH05 also happens to have the largest amount of actual ZUN-written ASM that can't ever be decompiled within ReC98's constraints of a legit source code reconstruction. But a lot of the originally-in-C code is also still in ASM, which might make modding a bit inconvenient right now. And while I have decompiled a bunch of functions, I selected them largely because they would help with PI (as requested by the backers), and not because they are particularly relevant to typical modding interests.

As a result, the code might also be a bit confusingly organized. There's quite a conflict between various goals there: On the one hand, I'd like to only have a single instance of every function shared with earlier games, as well as reduce ZUN's code duplication within a single game. On the other hand, this leads to quite a lot of code being scattered all over the place and then #include-pasted back together, except for the places where 📝 this doesn't work, and you'd have to use multiple translation units anyway… I'm only beginning to figure out the best structure here, and some more reverse-engineering attention surely won't hurt.

Also, keep in mind that the code still targets x86 Real Mode. To work effectively in this codebase, you'd need some familiarity with memory segmentation, and how to express it all in code. This tends to make even regular C++ development about an order of magnitude harder, especially once you want to interface with the remaining ASM code. That part made -Tom- struggle quite a bit with implementing his custom scripting language for the demo above. For now, he built that demo on quite a limited foundation – which is why he also chose to release neither the build nor the source publically for the time being.
So yeah, you're definitely going to need the TASM and Borland C++ manuals there.

tl;dr: We now know everything about this game's data, but not quite as much about this game's code.

So, how long until source ports become a realistic project?

You probably want to wait for 100% RE, which is when everything that can be decompiled has been decompiled.

Unless your target system is 16-bit Windows, in which case you could theoretically start right away. 📝 Again, this would be the ideal first system to port PC-98 Touhou to: It would require all the generic portability work to remove the dependency on PC-98 hardware, thus paving the way for a subsequent port to modern systems, yet you could still just drop in any undecompiled ASM.

Porting to IBM-compatible DOS would only be a harder and less universally useful version of that. You'd then simply exchange one architecture, with its idiosyncrasies and limits, for another, with its own set of idiosyncrasies and limits. (Unless, of course, you already happen to be intimately familiar with that architecture.) The fact that master.lib provides DOS/V support would have only mattered if ZUN consistently used it to abstract away PC-98 hardware at every single place in the code, which is definitely not the case.

The list of actually interesting findings in this push is, 📝 again, very short. Probably the most notable discovery: The low-level part of the code that renders Marisa's laser from her TH04 Illusion Laser shot type is still present in TH05. Insert wild mass guessing about potential beta version shot types… Oh, and did you know that the order of background images in the Extra Stage staff roll differs by character?

Next up: Finally driving up the RE% bar again, by decompiling some TH05 main menu code.

📝 Posted:
🚚 Summary of:
P0105, P0106, P0107, P0108
3622eb6...11b776b, 11b776b...1f1829d, 1f1829d...1650241, 1650241...dcf4e2c
💰 Funded by:
🏷 Tags:
rec98+ th01- meta+ file-format+ animation+ blitting+ boss+ singyoku+ yuugenmagan+ elis+ kikuri+ konngara+ waste+

And indeed, I got to end my vacation with a lot of image format and blitting code, covering the final two formats, .GRC and .BOS. .GRC was nothing noteworthy – one function for loading, one function for byte-aligned blitting, and one function for freeing memory. That's it – not even a unblitting function for this one. .BOS, on the other hand…

…has no generic (read: single/sane) implementation, and is only implemented as methods of some boss entity class. And then again for Sariel's dress and wand animations, and then again for Reimu's animations, both of which weren't even part of these 4 pushes. Looking forward to decompiling essentially the same algorithms all over again… And that's how TH01 became the largest and most bloated PC-98 Touhou game. So yeah, still not done with image formats, even at 44% RE.

This means I also had to reverse-engineer that "boss entity" class… yeah, what else to call something a boss can have multiple of, that may or may not be part of a larger boss sprite, may or may not be animated, and that may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this class. Since renaming all these variables in ASM land is tedious anyway, I went the extra mile and directly defined separate, meaningful names for the entities of all bosses. These also now document the natural order in which the bosses will ultimately be decompiled. So, unless a backer requests anything else, this order will be:

  1. Konngara
  2. Sariel
  3. Elis
  4. Kikuri
  5. SinGyoku
  6. (code for regular card-flipping stages)
  7. Mima
  8. YuugenMagan

As everyone kind of expects from TH01 by now, this class reveals yet another… um, unique and quirky piece of code architecture. In addition to the position and hitbox members you'd expect from a class like this, the game also stores the .BOS metadata – width, height, animation frame count, and 📝 bitplane pointer slot number – inside the same class. But if each of those still corresponds to one individual on-screen sprite, how can YuugenMagan have 5 eye sprites, or Kikuri have more than one soul and tear sprite? By duplicating that metadata, of course! And copying it from one entity to another :onricdennat:
At this point, I feel like I even have to congratulate the game for not actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760 bytes of waste would have definitely been noticeable in the DOS days. Makes much more sense to waste that amount of space on an unused C++ exception handler, and a bunch of redundant, unoptimized blitting functions :tannedcirno:

(Thinking about it, YuugenMagan fits this entire system perfectly. And together with its position in the game's code – last to be decompiled means first on the linker command line – we might speculate that YuugenMagan was the first boss to be programmed for TH01?)

So if a boss wants to use sprites with different sizes, there's no way around using another entity. And that's why Girl-Elis and Bat-Elis are two distinct entities internally, and have to manually sync their position. Except that there's also a third one for Attacking-Girl-Elis, because Girl-Elis has 9 frames of animation in total, and the global .BOS bitplane pointers are divided into 4 slots of only 8 images each. :zunpet:
Same for SinGyoku, who is split into a sphere entity, a person entity, and a… white flash entity for all three forms, all at the same resolution. Or Konngara's facial expressions, which also require two entities just for themselves.

And once you decompile all this code, you notice just how much of it the game didn't even use. 13 of the 50 bytes of the boss entity class are outright unused, and 10 bytes are used for a movement clamping and lock system that would have been nice if ZUN also used it outside of Kikuri's soul sprites. Instead, all other bosses ignore this system completely, and just party on the X/Y coordinates of the boss entities directly.

As for the rendering functions, 5 out of 10 are unused. And while those definitely make up less than half of the code, I still must have spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis' entrance animation, the class provides functions for wavy blitting and unblitting, which use a separate X coordinate for every line of the sprite. But there's also an unused and sort of broken one for unblitting two overlapping wavy sprites, located at the same Y coordinate. This might indicate that Elis could originally split herself into two sprites, similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind of animation effect, who knows.

After over 3 months of TH01 progress though, it's finally time to look at other games, to cover the rest of the crowdfunding backlog. Next up: Going back to TH05, and getting rid of those last PI false positives. And since I can potentially spend the next 7 weeks on almost full-time ReC98 work, I've also re-opened the store until October!

📝 Posted:
🚚 Summary of:
P0103, P0104
b60f38d...05c0028, 05c0028...3622eb6
💰 Funded by:
🏷 Tags:
rec98+ th01- hud+ file-format+ jank+ waste+

It's vacation time! Which, for ReC98, means "relaxing by looking at something boring and uninteresting that we'll ultimately have to cover anyway"… like the TH01 HUD.

📝 As noted earlier, all the score, card combo, stage, and time numbers are drawn into VRAM. Which turns TH01's HUD rendering from the trivial, gaiji-assisted text RAM writes we see in later games to something that, once again, requires blitting and unblitting steps. For some reason though, everything on there is blitted to both VRAM pages? And that's why the HUD chose to allocate a bunch of .PTN sprite slots to store the background behind all "animated" elements at the beginning of a 4-stage scene or boss battle… separately for every affected 16×16 area. (Looking forward to the completely unnecessary code in the Sariel fight that updates these slots after the backgrounds were animated!) And without any separation into helper functions, we end up with the same blitting calls separately copy-pasted for every single HUD element. That's why something as seemingly trivial as this isn't even done after 2 pushes, as we're still missing the stage timer.

Thankfully, the .PTN function signatures come with none of ZUN's little inconsistencies, so I was able to mostly reduce this copy-pasta to a bunch of small inline functions and macros. Those interfaces still remain a bit annoying, though. As a 32×32 format, .PTN merely supports 16×16 sprites with a separate bunch of functions that take an additional quarter parameter from 0 to 3, to select one of the 4 16×16 quarters in a such a sprite…

For life and bomb counts, there was no way around VRAM though, since ZUN wanted to use more than a single color for those. This is where we find at least somewhat of a mildly interesting quirk in all of this: Any life counts greater than the intended 6 will wrap into new rows, with the bombs in the second row overlapping those excess lives. With the way the rest of the HUD rendering works, that wrapping code code had to be explicitly written… which means that ZUN did in fact accomodate (his own?) cheating there.

Now, I promised image formats, and in the middle of this copy-pasta, we did get one… sort of. MASK.GRF, the red HUD background, is entirely handled with two small bespoke functions… and that's all the code we have for this format. Basically, it's a variation on the 📝 .GRZ format we've seen earlier. It uses the exact same RLE algorithm, but only has a single byte stream for both RLE commands and pixel data… as you would expect from an RLE format.

.GRF actually stores 4 separately encoded RLE streams, which suggests that it was intended for full 16-color images. Unfortunately, MASK.GRF only contains 4 copies of the same HUD background :zunpet:, so no unused beta data for us there. The only thing we could derive from 4 identical bitplanes would be that the background was originally meant to be drawn using color #15, rather than the red seen in the final game. Color #15 is a stage-specific background color that would have made the HUD blend in quite nicely – in the YuugenMagan fight, it's the changing color of the in the background, for example. But really, with no generic implementation of this format, that's all just speculation.

Oh, and in case you were looking for a rip of that image:

So yeah, more of the usual TH01 code, with the usual small quirks, but nothing all too horrible – as expected. Next up: The image formats that didn't make it into this push.

📝 Posted:
🚚 Summary of:
P0099, P0100, P0101, P0102
1799d67...1b25830, 1b25830...ceb81db, ceb81db...c11a956, c11a956...b60f38d
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98+ th01- gameplay+ bullet+ jank+ contribution-ideas+ bug+ boss+ elis+ kikuri+ sariel+ konngara+

Well, make that three days. Trying to figure out all the details behind the sprite flickering was absolutely dreadful…
It started out easy enough, though. Unsurprisingly, TH01 had a quite limited pellet system compared to TH04 and TH05:

  • The cap is 100, rather than 240 in TH04 or 180 in TH05.
  • Only 6 special motion functions (with one of them broken and unused) instead of 10. This is where you find the code that generates SinGyoku's chase pellets, Kikuri's small spinning multi-pellet circles, and Konngara's rain pellets that bounce down from the top of the playfield.
  • A tiny selection of preconfigured multi-pellet groups. Rather than TH04's and TH05's freely configurable n-way spreads, stacks, and rings, TH01 only provides abstractions for 2-, 3-, 4-, and 5- way spreads (yup, no 6-way or beyond), with a fixed narrow or wide angle between the individual pellets. The resulting pellets are also hardcoded to linear motion, and can't use the special motion functions. Maybe not the best code, but still kind of cute, since the generated groups do follow a clear logic.

As expected from TH01, the code comes with its fair share of smaller, insignificant ZUN bugs and oversights. As you would also expect though, the sprite flickering points to the biggest and most consequential flaw in all of this.

Apparently, it started with ZUN getting the impression that it's only possible to use the PC-98 EGC for fast blitting of all 4 bitplanes in one CPU instruction if you blit 16 horizontal pixels (= 2 bytes) at a time. Consequently, he only wrote one function for EGC-accelerated sprite unblitting, which can only operate on a "grid" of 16×1 tiles in VRAM. But wait, pellets are not only just 8×8, but can also be placed at any unaligned X position…

… yet the game still insists on using this 16-dot-aligned function to unblit pellets, forcing itself into using a super sloppy 16×8 rectangle for the job. 🤦 ZUN then tried to mitigate the resulting flickering in two hilarious ways that just make it worse:

  1. An… "interlaced rendering" mode? This one's activated for all Stage 15 and 20 fights, and separates pellets into two halves that are rendered on alternating frames. Collision detection with the Yin-Yang Orb and the player is only done for the visible half, but collision detection with player shots is still done for all pellets every frame, as are motion updates – so that pellets don't end up moving half as fast as they should.
    So yeah, your eyes weren't deceiving you. The game does effectively drop its perceived frame rate in the Elis, Kikuri, Sariel, and Konngara fights, and it does so deliberately.
  2. 📝 Just like player shots, pellets are also unblitted, moved, and rendered in a single function. Thanks to the 16×8 rectangle, there's now the (completely unnecessary) possibility of accidentally unblitting parts of a sprite that was previously drawn into the 8 pixels right of a pellet. And this is where ZUN went full :tannedcirno: and went "oh, I know, let's test the entire 16 pixels, and in case we got an entity there, we simply make the pellet invisible for this frame! Then we don't even have to unblit it later!" :zunpet:

    Except that this is only done for the first 3 elements of the player shot array…?! Which don't even necessarily have to contain the 3 shots fired last. It's not done for the player sprite, the Orb, or, heck, other pellets that come earlier in the pellet array. (At least we avoided going 𝑂(𝑛²) there?)

    Actually, and I'm only realizing this now as I type this blog post: This test is done even if the shots at those array elements aren't active. So, pellets tend to be made invisible based on comparisons with garbage data. :onricdennat:

    And then you notice that the player shot unblit​/​move​/​render function is actually only ever called from the pellet unblit​/​move​/​render function on the one global instance of the player shot manager class, after pellets were unblitted. So, we end up with a sequence of

    Pellet unblit → Pellet move → Shot unblit → Shot move → Shot render → Pellet render

    which means that we can't ever unblit a previously rendered shot with a pellet. Sure, as terrible as this one function call is from a software architecture perspective, it was enough to fix this issue. Yet we don't even get the intended positive effect, and walk away with pellets that are made temporarily invisible for no reason at all. So, uh, maybe it all just was an attempt at increasing the ramerate on lower spec PC-98 models?

Yup, that's it, we've found the most stupid piece of code in this game, period. It'll be hard to top this.

I'm confident that it's possible to turn TH01 into a well-written, fluid PC-98 game, with no flickering, and no perceived lag, once it's position-independent. With some more in-depth knowledge and documentation on the EGC (remember, there's still 📝 this one TH03 push waiting to be funded), you might even be able to continue using that piece of blitter hardware. And no, you certainly won't need ASM micro-optimizations – just a bit of knowledge about which optimizations Turbo C++ does on its own, and what you'd have to improve in your own code. It'd be very hard to write worse code than what you find in TH01 itself.

(Godbolt for Turbo C++ 4.0J when? Seriously though, that would 📝 also be a great project for outside contributors!)

Oh well. In contrast to TH04 and TH05, where 4 pushes only covered all the involved data types, they were enough to completely cover all of the pellet code in TH01. Everything's already decompiled, and we never have to look at it again. 😌 And with that, TH01 has also gone from by far the least RE'd to the most RE'd game within ReC98, in just half a year! 🎉
Still, that was enough TH01 game logic for a while. :tannedcirno: Next up: Making up for the delay with some more relaxing and easy pieces of TH01 code, that hopefully make just a bit more sense than all this garbage. More image formats, mainly.

📝 Posted:
🚚 Summary of:
P0096, P0097, P0098
8ddb778...8283c5e, 8283c5e...600f036, 600f036...ad06748
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98+ th01- file-format+ pc98+ blitting+ gameplay+ player+ shot+ jank+ mod+ tcc+

So, let's finally look at some TH01 gameplay structures! The obvious choices here are player shots and pellets, which are conveniently located in the last code segment. Covering these would therefore also help in transferring some first bits of data in REIIDEN.EXE from ASM land to C land. (Splitting the data segment would still be quite annoying.) Player shots are immediately at the beginning…

…but wait, these are drawn as transparent sprites loaded from .PTN files. Guess we first have to spend a push on 📝 Part 2 of this format.
Hm, 4 functions for alpha-masked blitting and unblitting of both 16×16 and 32×32 .PTN sprites that align the X coordinate to a multiple of 8 (remember, the PC-98 uses a planar VRAM memory layout, where 8 pixels correspond to a byte), but only one function that supports unaligned blitting to any X coordinate, and only for 16×16 sprites? Which is only called twice? And doesn't come with a corresponding unblitting function? :thonk:

Yeah, "unblitting". TH01 isn't double-buffered, and uses the PC-98's second VRAM page exclusively to store a stage's background and static sprites. Since the PC-98 has no hardware sprites, all you can do is write pixels into VRAM, and any animated sprite needs to be manually removed from VRAM at the beginning of each frame. Not using double-buffering theoretically allows TH01 to simply copy back all 128 KB of VRAM once per frame to do this. :tannedcirno: But that would be pretty wasteful, so TH01 just looks at all animated sprites, and selectively copies only their occupied pixels from the second to the first VRAM page.

Alright, player shot class methods… oh, wait, the collision functions directly act on the Yin-Yang Orb, so we first have to spend a push on that one. And that's where the impression we got from the .PTN functions is confirmed: The orb is, in fact, only ever displayed at byte-aligned X coordinates, divisible by 8. It's only thanks to the constant spinning that its movement appears at least somewhat smooth.
This is purely a rendering issue; internally, its position is tracked at pixel precision. Sadly, smooth orb rendering at any unaligned X coordinate wouldn't be that trivial of a mod, because well, the necessary functions for unaligned blitting and unblitting of 32×32 sprites don't exist in TH01's code. Then again, there's so much potential for optimization in this code, so it might be very possible to squeeze those additional two functions into the same C++ translation unit, even without position independence…

More importantly though, this was the right time to decompile the core functions controlling the orb physics – probably the highlight in these three pushes for most people.
Well, "physics". The X velocity is restricted to the 5 discrete states of -8, -4, 0, 4, and 8, and gravity is applied by simply adding 1 to the Y velocity every 5 frames :zunpet: No wonder that this can easily lead to situations in which the orb infinitely bounces from the ground.
At least fangame authors now have a reference of how ZUN did it originally, because really, this bad approximation of physics had to have been written that way on purpose. But hey, it uses 64-bit floating-point variables! :onricdennat:

…sometimes at least, and quite randomly. This was also where I had to learn about Turbo C++'s floating-point code generation, and how rigorously it defines the order of instructions when mixing double and float variables in arithmetic or conditional expressions. This meant that I could only get ZUN's original instruction order by using literal constants instead of variables, which is impossible right now without somehow splitting the data segment. In the end, I had to resort to spelling out ⅔ of one function, and one conditional branch of another, in inline ASM. 😕 If ZUN had just written 16.0 instead of 16.0f there, I would have saved quite some hours of my life trying to decompile this correctly…

To sort of make up for the slowdown in progress, here's the TH01 orb physics debug mod I made to properly understand them: 2020-06-13-TH01OrbPhysicsDebug.zip To use it, simply replace REIIDEN.EXE, and run the game in debug mode, via game d on the DOS prompt.
Its code might also serve as an example of how to achieve this sort of thing without position independence.

Alright, now it's time for player shots though. Yeah, sure, they don't move horizontally, so it's not too bad that those are also always rendered at byte-aligned positions. But, uh… why does this code only use the 16×16 alpha-masked unblitting function for decaying shots, and just sloppily unblits an entire 16×16 square everywhere else?

The worst part though: Unblitting, moving, and rendering player shots is done in a single function, in that order. And that's exactly where TH01's sprite flickering comes from. Since different types of sprites are free to overlap each other, you'd have to first unblit all types, then move all types, and then render all types, as done in later PC-98 Touhou games. If you do these three steps per-type instead, you will unblit sprites of other types that have been rendered before… and therefore end up with flicker.
Oh, and finally, ZUN also added an additional sloppy 16×16 square unblit call if a shot collides with a pellet or a boss, for some guaranteed flicker. Sigh.

And that's ⅓ of all ZUN code in TH01 decompiled! Next up: Pellets!

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th01- position-independence+ pc98+ thief+

🎉 TH01's OP.EXE and FUUIN.EXE are now fully position-independent! 🎉

What does this mean?

You can now add any data or code to TH01's main menu or ending cutscenes, by simply editing the ReC98 source, writing your mod in ASM or C++, and recompiling the code. Since all absolute memory addresses in OP and FUUIN have now been converted to labels, this will work without causing any instability. See the position independence section in the FAQ for a more thorough explanation about why this was a problem.
As an example, the most popular TH01 mod idea, replacing MDRV2 with PMD, could now at least be prototyped and tested in OP.EXE, without having to worry about x86 instruction lengths.
📝 Check the video I made for the TH04/TH05 OP.EXE PI announcement for a basic overview of how to do that.

What does this not mean?

The original ZUN code hasn't been completely decompiled yet. The final high-level parts of both the main menu and the cutscenes are still ASM, which might make modding a bit inconvenient right now.
It's not that much more code though, and could quickly be covered in a few pushes if requested. Due to the plentiful monthly subscriptions, the shop will stay closed for regular orders until the end of June, but backers with outstanding contributions could request that now if they want to – simply drop me a mail. Otherwise, the "generic TH01 RE" money will continue to go towards the main game. That way, we'll have more substance to show once we do decide to decompile the rest of OP.EXE and FUUIN.EXE, and likely get some press coverage as a result.

Then again, we've been building up to this point over the last few pushes, and it only really needed a quick look over the remaining false positives. The majority of the time therefore went towards more PI in REIIDEN.EXE, where the bitplane pointers for .BOS files yielded some quite big gains. Couldn't really find any obvious reason why ZUN used two slighly different variations on loading and blitting those files, though… :onricdennat:

As the final function in this rather random push, we got TH01's hardware-powered scrolling function, used for screen shaking effects and the scrolling backgrounds at the start of the Final Boss stages. And while I tried to document all these I/O writes… it turned out that ZUN actually copied the entire function straight from the PC-9801 Programmers' Bible, with no changes. :zunpet: It's the setgsta() example function on page 150. Which is terribly suboptimal and bloated – all those integer divisions are really not how you'd write such code for a 16-bit compiler from the 90's…

And that gives us 60% PI overall, and 50% PI over all of TH01! Next up: More structures… and classes, even?

📝 Posted:
🚚 Summary of:
P0092, P0093, P0094
29c5a73...4403308, 4403308...0e73029, 0e73029...57a8487
💰 Funded by:
Yanga, Ember2528
🏷 Tags:
rec98+ th01- file-format+ menu+ score+ jank+ waste+

Three pushes to decompile the TH01 high score menu… because it's completely terrible, and needlessly complicated in pretty much every aspect:

  • Another, final set of differences between the REIIDEN.EXE and FUUIN.EXE versions of the code. Which are so insignificant that it must mean that ZUN kept this code in two separate, manually and imperfectly synced files. The REIIDEN.EXE version, only shown when game-overing, automatically jumps to the enter/ button after the 8th character was entered, and also has a completely invisible timeout that force-enters a high score name after 1000… key presses? Not frames? Why. Like, how do you even realistically such a number. (Best guess: It's a hidden easter egg to amuse players who place drinking glasses on cursor keys. Or beer bottles.)
    That's all the differences that are maybe visible if you squint hard enough. On top of that though, we got a bunch of further, minor code organization differences that serve no purpose other than to waste decompilation time, and certainly did their part in stretching this out to 3 pushes instead of 2.
  • Entered names are restricted to a set of 16-bit, full-width Shift-JIS codepoints, yet are still accessed as 8-bit byte arrays everywhere. This bloats both the C++ and generated ASM code with needless byte splits, swaps, and bit shifts. Same for the route kanji. You have this 16-, heck, even 32-bit CPU, why not use it?! (Fun fact: FUUIN.EXE is explicitly compiled for a 80186, for the most part – unlike REIIDEN.EXE, which does use Turbo C++'s 80386 mode.)
  • The sensible way of storing the current position of the alphabet cursor would simply be two variables, indicating the logical row and column inside the character map. When rendering, you'd then transform these into screen space. This can keep the on-screen position constants in a single place of code.
    TH01 does the opposite: The selected character is stored directly in terms of its on-screen position, which is then mapped back to a character index for every processed input and the subsequent screen update. There's no notion of a logical row or column anywhere, and consequently, the position constants are vomited all over the code.
  • Which might not be as bad if the character map had a uniform grid structure, with no gaps. But the one in TH01 looks like this: And with no sense of abstraction anywhere, both input handling and rendering end up with a separate if branch for at least 4 of the 6 rows.

In the end, I just gave up with my usual redundancy reduction efforts for this one. Anyone wanting to change TH01's high score name entering code would be better off just rewriting the entire thing properly.

And that's all of the shared code in TH01! Both OP.EXE and FUUIN.EXE are now only missing the actual main menu and ending code, respectively. Next up, though: The long awaited TH01 PI push. Which will not only deliver 100% PI for OP.EXE and FUUIN.EXE, but also probably quite some gains in REIIDEN.EXE. With now over 30% of the game decompiled, it's about time we get to look at some gameplay code!

📝 Posted:
🚚 Summary of:
P0090, P0091
90252cc...07dab29, 07dab29...29c5a73
💰 Funded by:
Yanga, Ember2528
🏷 Tags:
rec98+ th01- file-format+ input+ menu+ bug+

Back to TH01, and its high score menu… oh, wait, that one will eventually involve keyboard input. And thanks to the generous TH01 funding situation, there's really no reason not to cover that right now. After all, TH01 is the last game where input still hadn't been RE'd.
But first, let's also cover that one unused blitting function, together with REIIDEN.CFG loading and saving, which are in front of the input function in OP.EXE… (By now, we all know about the hidden start bomb configuration, right?)

Unsurprisingly, the earliest game also implements input in the messiest way, with a different function for each of the three executables. "Because they all react differently to keyboard inputs :zunpet:", apparently? OP.EXE even has two functions for it, one for the START / CONTINUE / OPTION / QUIT main menu, and one for both Option and Music Test menus, both of which directly perform the ring arithmetic on the menu cursor variable. A consistent separation of keyboard polling from input processing apparently wasn't all too obvious of a thought, since it's only truly done from TH02 on.

This lack of proper architecture becomes actually hilarious once you notice that it did in fact facilitate a recursion bug! :godzun: In case you've been living under a rock for the past 8 years, TH01 shipped with debugging features, which you can enter by running the game via game d from the DOS prompt. These features include a memory info screen, shown when pressing PgUp, implemented as one blocking function (test_mem()) called directly in response to the pressed key inside the polling function. test_mem() only returns once that screen is left by pressing PgDown. And in order to poll input… it directly calls back into the same polling function that called it in the first place, after a 3-frame delay.

Which means that this screen is actually re-entered for every 3 frames that the PgUp key is being held. And yes, you can, of course, also crash the system via a stack overflow this way by holding down PgUp for a few seconds, if that's your thing.
Edit (2020-09-17): Here's a video from spaztron64, showing off this exact stack overflow crash while running under the VEM486 memory manager, which displays additional information about these sorts of crashes:

What makes this even funnier is that the code actually tracks the last state of every polled key, to prevent exactly that sort of bug. But the copy-pasted assignment of the last input state is only done after test_mem() already returned, making it effectively pointless for PgUp. It does work as intended for PgDown… and that's why you have to actually press and release this key once for every call to test_mem() in order to actually get back into the game. Even though a single call to PgDown will already show the game screen again.

In maybe more relevant news though, this function also came with what can be considered the first piece of actual gameplay logic! Bombing via double-tapping the Z and X keys is also handled here, and now we know that both keys simply have to be tapped twice within a window of 20 frames. They are tracked independently from each other, so you don't necessarily have to press them simultaneously.
In debug mode, the bomb count tracks precisely this window of time. That's why it only resets back to 0 when pressing Z or X if it's ≥20.

Sure, TH01's code is expectedly terrible and messy. But compared to the micro-optimizations of TH04 and TH05, it's an absolute joy to work on, and opening all these ZUN bug loot boxes is just the icing on the cake. Looking forward to more of the high score menu in the next pushes!

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th01- file-format+ score+ tasm+

Final TH01 RE push for the time being, and as expected, we've got the superficially final piece of shared code between the TH01 executables. However, just having a single implementation for loading and recreating the REYHI*.DAT score files would have been way above ZUN's standards of consistency. So ZUN had the unique idea to mix up the file I/O APIs, using master.lib functions in REIIDEN.EXE, and POSIX functions (along with error messages and disabled interrupts) in FUUIN.EXE:zunpet: Could have been worse though, as it was possible to abstract that away quite nicely.

That code wasn't quite in the natural way of decompilation either. As it turns out though, 📝 segment splitting isn't so painful after all if one of the new segments only has a few functions. Definitely going to do that more often from now on, since it allows a much larger number of functions to be immediately decompiled. Which is always superior to somehow transforming a function's ASM into a form that I can confidently call "reverse-engineered", only to revisit it again later for its decompilation.

And while I unfortunately missed 25% of total RE by a bit, this push reached two other and perhaps even more significant milestones:

  • 📝 In a little over 6 months, we've doubled the amount of reverse-engineered PC-98 Touhou game code! 📈
  • After (finally) compressing all unknown parts of the BSS segments using arrays, the number of remaining lines in the REIIDEN.EXE ASM dump has fallen below TASM's limit of 65,535. Which means that we no longer need that annoying th01_reiiden_2.inc file that everyone has forgotten about at least once.

Next up, PI milestones!

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th01- file-format+ waste+ good-code+

Nope, RL has given me plenty of things to do from home after all, so the current cap still remains an accurate representation of my free time. 😕

For now though, we've got one more TH01 file format push, covering the core functions for loading and displaying the 32×32 and 16×16 sprites from the .PTN files, as announced – and probably one of the last ones for quite a while to yield both RE and PI progress way above average. But what is this, error return values in a ZUN game?! And actually good code for deriving the alpha channel from the 16th color in the hardware palette?! Sure, the rest of the code could still be improved a lot, but that was quite a surprise, especially after the spaghetti code of 📝 the last push. That makes up for two of the .PTN structure fields (one of them always 0, and one of them always 1) remaining unused, and therefore unknown.

ZUN also uses the .PTN image slots to store the background of frequently updated VRAM sections, in order to be able to repeatedly draw on top of them – like for example the HUD area where the score and time numbers are drawn. Future games would simply use the text RAM and gaiji for those numbers. This would have worked just fine for TH01 too – especially since all the functions decompiled so far align the VRAM X coordinate to the 8-pixel byte grid, which is the simplest way of accessing VRAM given the PC-98's planar memory layout. Looks as if ZUN simply wasn't aware of gaiji during the development of TH01.

This won't be the last time I cover the .PTN format, since all the blitting functions that actually use alpha are exclusive to REIIDEN.EXE, and currently out of decompilation reach. But after some more long overdue cleaning work, TH01 has now passed both TH02 and even TH04 to become the second-most reverse-engineered game in all of ReC98, in terms of absolute numbers! 🎉

Also, PI for TH01's OP.EXE is imminent. Next up though, we've first got the probably final double-speed push for TH01, covering the last set of duplicated functions between the three binaries – quite fitting for the currently last fully funded, outstanding TH01 RE push. Then, we also might get FUUIN.EXE PI within the same push afterwards? After that, TH01 progress will be slowing down, since I'd then have to cover either the main menu or in-game code or the cutscenes, depending on what the backers request. (By default, it's going to be in-game code, of course.)

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th01- file-format+ thief+

Last of the 3 weeks of almost full-time ReC98 work, supposedly the least stressful one, and then things still get delayed thanks to illness 😕 In better news though, it looks like I'll be able to extend these 3 weeks to 8, as my RL is shutting down for coronavirus reasons. I'm going to wait a bit for the dust to settle before raising the crowdfunding cap though, since RL might give me more to do from home after all. I may or may not also get commissioned for a non-Touhou translation patch project to be worked on in that time…

The .GRP file functions turned out to, of course, also be present in FUUIN.EXE. In fact, that binary had the largest share of progress in this push, since it's the only one to include another reimplementation of master.lib-style hardware palette fading. As a typical little ZUN inconsistency, the FUUIN.EXE version of one .GRP palette function directly calls one of these functions.

As for the functions themselves, they basically wrap the single-function Pi load and display library by 電脳科学研究所/BERO in a bowl of global state spaghetti. 🍝 At least the function names now clearly encode important side effects like, y'know, a changed hardware palette. The reason ZUN used this separate library over master.lib's PI loading functions was probably its support for defining a color as transparent. This feature is used for the red box in the main menu, and the large cyan Siddhaṃ seed syllables in (again) the Konngara fight.

Next up, we've got the .PTN format!

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th01- file-format+ blitting+ waste+ contribution-ideas+

Sadly, we've already reached the end of fast triple-speed TH01 progress with 📝 the last push, which decompiled the last segment shared by all three of TH01's executables. There's still a bit of double-speed progress left though, with a small number of code segments that are shared between just two of the three executables.

At the end of the first one of these, we've got all the code for the .GRZ format – which is yet another run-length encoded image format, but this time storing up to 16 full 640×400 16-color images with an alpha bit. This one is exclusively used to wastefully store Konngara's sword slash and kuji-in kill animations. Due to… suboptimal code organization, the code for the format is also present in OP.EXE, despite not being used there. But hey, that brings TH01 to over 20% in RE!

Decoupling the RLE command stream from the pixel data sounds like a nice idea at first, allowing the format to efficiently encode a variety of animation frames displayed all over the screen… if ZUN actually made use of it. The RLE stream also has quite some ridiculous overhead, starting with 1 byte to store the 1-bit command (putting a single 8×1 pixel block, or entering a run of N such blocks). Run commands then store another 1-byte run length, which has to be followed by another command byte to identify the run as putting N blocks, or skipping N blocks. And the pixel data is just a sequence of these blocks for all 4 bitplanes, in uncompressed form…

Also, have some rips of all the images this format is used for:

To make these, I just wrote a small viewer, calling the same decompiled TH01 code: 2020-03-07-grzview.zip Obviously, this means that it not only must to be run on a PC-98, but also discards the alpha information. If any backers are really interested in having a proper converter to and from PNG, I can implement that in an upcoming push… although that would be the perfect thing for outside contributors to do.

Next up, we got some code for the PI format… oh, wait, the actual files are called "GRP" in TH01.

📝 Posted:
🚚 Summary of:
💰 Funded by:
Splashman, Ember2528
🏷 Tags:
rec98+ th01- pc98+ palette+

Last part of TH01's main graphics function segment, and we've got even more code that alternates between being boring and being slightly weird. But at least, "boring" also meant "consistent" for once. And so progress continued to be as fast as expected from the last TH01 pushes, yielding 3.3% in TH01 RE%, and 1% in overall RE%, within a single day. There even was enough time to decompile another full code segment, which bundles all the hardware initialization and cleanup calls into single functions to be run when starting and exiting the game. Which might be interesting for at least one person, I guess :tannedcirno:

But seriously, trying to access page 2 on a system with only page 0 and 1? Had to get out my real PC-98 to double-check that I wasn't missing anything here, since every emulator only looks at the bottom bit of the page number. But real hardware seems to do the same, and there really is nothing special to it semantically, being equivalent to page 0. 🤷

Next up in TH01, we'll have some file format code!

📝 Posted:
🚚 Summary of:
P0067, P0068, P0069
e55a48b...ebb30ce, ebb30ce...2ac00d4, e0d0dcd...0f18dbc
💰 Funded by:
Splashman, Yanga, [Anonymous]
🏷 Tags:
rec98+ th01- pc98+ blitting+ glitch+ boss+ mima-th01+ yuugenmagan+

Now that's more like the speed I was expecting! After a few more unused functions for palette fading and rectangle blitting, we've reached the big line drawing functions. And the biggest one among them, drawing a straight line at any angle between two points using Bresenham's algorithm, actually happens to be the single longest function present in more than one binary in all of PC-98 Touhou, and #23 on the list of individual longest functions.

And it technically has a ZUN bug! If you pass a point outside the (0, 0) - (639, 399) screen range, the function will calculate a new point at the edge of the screen, so that the resulting line will retain the angle intended by the points given. Except that it does so by calculating the line slope using an integer division rather than a floating-point one :zunpet: Doesn't seem like it actually causes any weirdly skewed lines to be drawn in-game, though; that case is only hit in the Mima boss fight, which draws a few lines with a bottom coordinate of 400 rather than the maximum of 399. It might also cause the wrong background pixels to be restored during parts of the YuugenMagan fight, leading to flickering sprites, but seriously, that's an issue everywhere you look in this game.

Together with the rendering-text-to-VRAM function we've mostly already known from TH02, this pushed the total RE percentage well over 20%, and almost doubled the TH01 RE percentage, all within three pushes. And comparatively, it went really smoothly, to the point (ha) where I even had enough time left to also include the single-point functions that come next in that code segment. Since about half of the remaining functions in OP.EXE are present in more than just itself, I'll be able to at least keep up this speed until OP.EXE hits the 70% RE mark. That is, as long as the backers' priorities continue to be generic RE or "giving some love to TH01"… we don't have a precedent for TH01's actual game code yet.

And that's all the TH01 progress funded for January! Next up, we actually do have a focus on TH03's game and scoring mechanics… or at least the foundation for that.

📝 Posted:
🚚 Summary of:
💰 Funded by:
Yanga, Splashman
🏷 Tags:
rec98+ th01- palette+ tcc+ waste+

So, the thing that made me so excited about TH01 were all those bulky C reimplementations of master.lib functions. Identical copies in all three executables, trivial to figure out and decompile, removing tons of instructions, and providing a foundation for large parts of the game later. The first set of functions near the end of that shared code segment deals with color palette handling, and master.lib's resident palette structure in particular. (No relation to the game's resident structure.) Which directly starts us out with pretty much all the decompilation difficulties imaginable:

  • iteration over internal DOS structures via segment pointers – Turbo C++ doesn't support a lot of arithmetic on those, requiring tons of casts to make it work
  • calls to a far function near the beginning of a segment from a function near the end of a segment – these are undecompilable until we've decompiled both functions (and thus, the majority of the segment), and need to be spelled out in ASM for the time being. And if the caller then stores some of the involved variables in registers, there's no way around the ugliest of workarounds, spelling out opcode bytes
  • surprising color format inconsistencies – apparently, GRB (rather than RGB) is some sort of wider standard in PC-98 inter-process communication, because it matches the order of the hardware's palette register ports? (0AAh = green, 0ACh = red, 0AEh = blue)? Yet the game's actual palette still uses RGB…

And as it turns out, the game doesn't even use the resident palette feature. Which adds yet another set of functions to the, uh, learning experience that ZUN must have chosen this game to be. I wouldn't be surprised if we manage to uncover actual scrapped beta game content later on, among all the unused code that's bound to still be in there.

At least decompilation should get easier for the next few TH01 pushes now… right?

📝 Posted:
🚚 Summary of:
P0034, P0035
6cdd229...6f1f367, 6f1f367...a533b5d
💰 Funded by:
🏷 Tags:
rec98+ th01- th02+ th04+ th05+ animation+ gameplay+ player+ bomb+

Deathbombs confirmed, in both TH04 and TH05! On the surface, it's the same 8-frame window as in most Windows games, but due to the slightly lower PC-98 frame rate of 56.4 Hz, it's actually slightly more lenient in TH04 and TH05.

The last function in front of the TH05 shot type control functions marks the player's previous position in VRAM to be redrawn. But as it turns out, "player" not only means "the player's option satellites on shot levels ≥ 2", but also "the explosion animation if you lose a life", which required reverse-engineering both things, ultimately leading to the confirmation of deathbombs.

It actually was kind of surprising that we then had reverse-engineered everything related to rendering all three things mentioned above, and could also cover the player rendering function right now. Luckily, TH05 didn't decide to also micro-optimize that function into un-decompilability; in fact, it wasn't changed at all from TH04. Unlike the one invalidation function whose decompilation would have actually been the goal here…

But now, we've finally gotten to where we wanted to… and only got 2 outstanding decompilation pushes left. Time to get the website ready for hosting an actual crowdfunding campaign, I'd say – It'll make a better impression if people can still see things being delivered after the big announcement.

📝 Posted:
🚚 Summary of:
P0023, P0024
💰 Funded by:
🏷 Tags:
rec98+ th01- th02+ th04+ th05+ gameplay+ laser+ micro-optimization+

Actually, I lied, and lasers ended up coming with everything that makes reverse-engineering ZUN code so difficult: weirdly reused variables, unexpected structures within structures, and those TH05-specific nasty, premature ASM micro-optimizations that will waste a lot of time during decompilation, since the majority of the code actually was C, except for where it wasn't.