⮜ Blog

⮜ List of tags

Showing all posts tagged
and

📝 Posted:
🚚 Summary of:
P0193, P0194, P0195, P0196, P0197
Commits:
e1f3f9f...183d7a2, 183d7a2...5d93a50, 5d93a50...e18c53d, e18c53d...57c9ac5, 57c9ac5...48db0b7
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

With Elis, we've not only reached the midway point in TH01's boss code, but also a bunch of other milestones: Both REIIDEN.EXE and TH01 as a whole have crossed the 75% RE mark, and overall position independence has also finally cracked 80%!

And it got done in 4 pushes again? Yup, we're back to 📝 Konngara levels of redundancy and copy-pasta. This time, it didn't even stop at the big copy-pasted code blocks for the rift sprite and 256-pixel circle animations, with the words "redundant" and "unnecessary" ending up a total of 18 times in my source code comments.
But damn is this fight broken. As usual with TH01 bosses, let's start with a high-level overview:

This puts the earliest possible end of the fight at the first frame of phase 5. However, nothing prevents Elis' HP from reaching 0 before that point. You can nicely see this in 📝 debug mode: Wait until the HP bar has filled up to avoid heap corruption, hold ↵ Return to reduce her HP to 0, and watch how Elis still goes through a total of two patterns* and four teleport animations before accepting defeat.

But wait, heap corruption? Yup, there's a bug in the HP bar that already affected Konngara as well, and it isn't even just about the graphical glitches generated by negative HP:

Since Elis starts with 14 HP, which is an even number, this corruption is trivial to cause: Simply hold ↵ Return from the beginning of the fight, and the completion condition will never be true, as the HP and frame numbers run past the off-by-one meeting point.

Edit (2023-07-21): Pressing ↵ Return to reduce HP also works in test mode (game t). There, the game doesn't even check the heap, and consequently won't report any corruption, allowing the HP bar to be glitched even further.

Regular gameplay, however, entirely prevents this due to the fixed start positions of Reimu and the Orb, the Orb's fixed initial trajectory, and the 50 frames of delay until a bomb deals damage to a boss. These aspects make it impossible to hit Elis within the first 14 frames of phase 1, and ensure that her HP bar is always filled up completely. So ultimately, this bug ends up comparable in seriousness to the 📝 recursion / stack overflow bug in the memory info screen.


These wavy teleport animations point to a quite frustrating architectural issue in this fight. It's not even the fact that unblitting the yellow star sprites rips temporary holes into Elis' sprite; that's almost expected from TH01 at this point. Instead, it's all because of this unused frame of the animation:

An unused wave animation frame from TH01's BOSS5.BOS

With this sprite still being part of BOSS5.BOS, Girl-Elis has a total of 9 animation frames, 1 more than the 📝 8 per-entity sprites allowed by ZUN's architecture. The quick and easy solution would have been to simply bump the sprite array size by 1, but… nah, this would have added another 20 bytes to all 6 of the .BOS image slots. :zunpet: Instead, ZUN wrote the manual position synchronization code I mentioned in that 2020 blog post. Ironically, he then copy-pasted this snippet of code often enough that it ended up taking up more than 120 bytes in the Elis fight alone – with, you guessed it, some of those copies being redundant. Not to mention that just going from 8 to 9 sprites would have allowed ZUN to go down from 6 .BOS image slots to 3. That would have actually saved 420 bytes in addition to the manual synchronization trouble. Looking forward to SinGyoku, that's going to be fun again…


As for the fight itself, it doesn't take long until we reach its most janky danmaku pattern, right in phase 1:

The "pellets along circle" pattern on Lunatic, in its original version and with fanfiction fixes for everything that can potentially be interpreted as a bug.

Then again, it might very well be that all of this was intended, or, most likely, just left in the game as a happy accident. The latter interpretation would explain why ZUN didn't just delete the rendering calls for the lower-right quarter of the circle, because seriously, how would you not spot that? The phase 3 patterns continue with more minor graphical glitches that aren't even worth talking about anymore.


And then Elis transforms into her bat form at the beginning of Phase 5, which displays some rather unique hitboxes. The one against the Orb is fine, but the one against player shots…

… uses the bat's X coordinate for both X and Y dimensions. :zunpet: In regular gameplay, it's not too bad as most of the bat patterns fire aimed pellets which typically don't allow you to move below her sprite to begin with. But if you ever tried destroying these pellets while standing near the middle of the playfield, now you know why that didn't work. This video also nicely points out how the bat, like any boss sprite, is only ever blitted at positions on the 8×1-pixel VRAM byte grid, while collision detection uses the actual pixel position.

The bat form patterns are all relatively simple, with little variation depending on the difficulty level, except for the "slow pellet spreads" pattern. This one is almost easiest to dodge on Lunatic, where the 5-spreads are not only always fired downwards, but also at the hardcoded narrow delta angle, leaving plenty of room for the player to move out of the way:

The "slow pellet spreads" pattern of Elis' bat form, on every difficulty. Which version do you think is the easiest one?

Finally, we've got another potential timesave in the girl form's "safety circle" pattern:

After the circle spawned completely, you lose a life by moving outside it, but doing that immediately advances the pattern past the circle part. This part takes 200 frames, but the defeat animation only takes 82 frames, so you can save up to 118 frames there.

Final funny tidbit: As with all dynamic entities, this circle is only blitted to VRAM page 0 to allow easy unblitting. However, it's also kind of static, and there needs to be some way to keep the Orb, the player shots, and the pellets from ripping holes into it. So, ZUN just re-blits the circle every… 4 frames?! 🤪 The same is true for the Star of David and its surrounding circle, but there you at least get a flash animation to justify it. All the overlap is actually quite a good reason for not even attempting to 📝 mess with the hardware color palette instead.


And that's the 4th PC-98 Touhou boss decompiled, 27 to go… but wait, all these quirks, and I still got nothing about the one actual crash that can appear in regular gameplay? There has even been a recent video about it. The cause has to be in Elis' main function, after entering the defeat branch and before the blocking white-out animation. It can't be anywhere else other than in the 📝 central line blitting and unblitting function, called from 📝 that one broken laser reset+unblit function, because everything else in that branch looks fine… and I think we can rule out a crash in MDRV2's non-blocking fade-out call. That's going to need some extra research, and a 5th push added on top of this delivery.

Reproducing the crash was the whole challenge here. Even after moving Elis and Reimu to the exact positions seen in Pearl's video and setting Elis' HP to 0 on the exact same frame, everything ran fine for me. It's definitely no division by 0 this time, the function perfectly guards against that possibility. The line specified in the function's parameters is always clipped to the VRAM region as well, so we can also rule out illegal memory accesses here…

… or can we? Stepping through it all reminded me of how this function brings unblitting sloppiness to the next level: For each VRAM byte touched, ZUN actually unblits the 4 surrounding bytes, adding one byte to the left and two bytes to the right, and using a single 32-bit read and write per bitplane. So what happens if the function tries to unblit the topmost byte of VRAM, covering the pixel positions from (0, 0) to (7, 0) inclusive? The VRAM offset of 0x0000 is decremented to 0xFFFF to cover the one byte to the left, 4 bytes are written to this address, the CPU's internal offset overflows… and as it turns out, that is illegal even in Real Mode as of the 80286, and will raise a General Protection Fault. Which is… ignored by DOSBox-X, every Neko Project II version in common use, the CSCP emulators, SL9821, and T98-Next. Only Anex86 accurately emulates the behavior of real hardware here.

OK, but no laser fired by Elis ever reaches the top-left corner of the screen. How can such a fault even happen in practice? That's where the broken laser reset+unblit function comes in: Not only does it just flat out pass the wrong parameters to the line unblitting function – describing the line already traveled by the laser and stopping where the laser begins – but it also passes them wrongly, in the form of raw 32-bit fixed-point Q24.8 values, with no conversion other than a truncation to the signed 16-bit pixels expected by the function. What then follows is an attempt at interpolation and clipping to find a line segment between those garbage coordinates that actually falls within the boundaries of VRAM:

  1. right/bottom correspond to a laser's origin position, and left/top to the leftmost pixel of its moved-out top line. The bug therefore only occurs with lasers that stopped growing and have started moving.
  2. Moreover, it will only happen if either (left % 256) or (right % 256) is ≤ 127 and the other one of the two is ≥ 128. The typecast to signed 16-bit integers then turns the former into a large positive value and the latter into a large negative value, triggering the function's clipping code.
  3. The function then follows Bresenham's algorithm: left is ensured to be smaller than right by swapping the two values if necessary. If that happened, top and bottom are also swapped, regardless of their value – the algorithm does not care about their order.
  4. The slope in the X dimension is calculated using an integer division of ((bottom - top) / (right - left)). Both subtractions are done on signed 16-bit integers, and overflow accordingly.
  5. (-left × slope_x) is added to top, and left is set to 0.
  6. If both top and bottom are < 0 or ≥ 640, there's nothing to be unblitted. Otherwise, the final coordinates are clipped to the VRAM range of [(0, 0), (639, 399)].
  7. If the function got this far, the line to be unblitted is now very likely to reach from
    1. the top-left to the bottom-right corner, starting out at (0, 0) right away, or
    2. from the bottom-left corner to the top-right corner. In this case, you'd expect unblitting to end at (639, 0), but thanks to an off-by-one error, it actually ends at (640, -1), which is equivalent to (0, 0). Why add clipping to VRAM offset calculations when everything else is clipped already, right? :godzun:
Possible laser states that will cause the fault, with some debug output to help understand the cause, and any pellets removed for better readability. This can happen for all bosses that can potentially have shootout lasers on screen when being defeated, so it also applies to Mima. Fixing this is easier than understanding why it happens, but since y'all love reading this stuff…

tl;dr: TH01 has a high chance of freezing at a boss defeat sequence if there are diagonally moving lasers on screen, and if your PC-98 system raises a General Protection Fault on a 4-byte write to offset 0xFFFF, and if you don't run a TSR with an INT 0Dh handler that might handle this fault differently.

The easiest fix option would be to just remove the attempted laser unblitting entirely, but that would also have an impact on this game's… distinctive visual glitches, in addition to touching a whole lot of code bytes. If I ever get funded to work on a hypothetical TH01 Anniversary Edition that completely rearchitects the game to fix all these glitches, it would be appropriate there, but not for something that purports to be the original game.

(Sidenote to further hype up this Anniversary Edition idea for PC-98 hardware owners: With the amount of performance left on the table at every corner of this game, I'm pretty confident that we can get it to work decently on PC-98 models with just an 80286 CPU.)

Since we're in critical infrastructure territory once again, I went for the most conservative fix with the least impact on the binary: Simply changing any VRAM offsets >= 0xFFFD to 0x0000 to avoid the GPF, and leaving all other bugs in place. Sure, it's rather lazy and "incorrect"; the function still unblits a 32-pixel block there, but adding a special case for blitting 24 pixels would add way too much code. And seriously, it's not like anything happens in the 8 pixels between (24, 0) and (31, 0) inclusive during gameplay to begin with. To balance out the additional per-row if() branch, I inlined the VRAM page change I/O, saving two function calls and one memory write per unblitted row.

That means it's time for a new community_choice_fixes build, containing the new definitive bugfixed versions of these games: 2022-05-31-community-choice-fixes.zip Check the th01_critical_fixes branch for the modified TH01 code. It also contains a fix for the HP bar heap corruption in test or debug mode – simply changing the == comparison to <= is enough to avoid it, and negative HP will still create aesthetic glitch art.


Once again, I then was left with ½ of a push, which I finally filled with some FUUIN.EXE code, specifically the verdict screen. The most interesting part here is the player title calculation, which is quite sneaky: There are only 6 skill levels, but three groups of titles for each level, and the title you'll see is picked from a random group. It looks like this is the first time anyone has documented the calculation?
As for the levels, ZUN definitely didn't expect players to do particularly well. With a 1cc being the standard goal for completing a Touhou game, it's especially funny how TH01 expects you to continue a lot: The code has branches for up to 21 continues, and the on-screen table explicitly leaves room for 3 digits worth of continues per 5-stage scene. Heck, these counts are even stored in 32-bit long variables.

Next up: 📝 Finally finishing the long overdue Touhou Patch Center MediaWiki update work, while continuing with Kikuri in the meantime. Originally I wasn't sure about what to do between Elis and Seihou, but with Ember2528's surprise contribution last week, y'all have demonstrated more than enough interest in the idea of getting TH01 done sooner rather than later. And I agree – after all, we've got the 25th anniversary of its first public release coming up on August 15, and I might still manage to completely decompile this game by that point…

📝 Posted:
🚚 Summary of:
P0189
Commits:
22abdd1...b4876b6
💰 Funded by:
Arandui, Lmocinemod
🏷 Tags:

(Before we start: Make sure you've read the current version of the FAQ section on a potential takedown of this project, updated in light of the recent DMCA claims against PC-98 Touhou game downloads.)


Slight change of plans, because we got instructions for reliably reproducing the TH04 Kurumi Divide Error crash! Major thanks to Colin Douglas Howell. With those, it also made sense to immediately look at the crash in the Stage 4 Marisa fight as well. This way, I could release both of the obligatory bugfix mods at the same time.
Especially since it turned out that I was wrong: Both crashes are entirely unrelated to the custom entity structure that would have required PI-centric progress. They are completely specific to Kurumi's and Marisa's danmaku-pattern code, and really are two separate bugs with no connection to each other. All of the necessary research nicely fit into Arandui's 0.5 pushes, with no further deep understanding required here.

But why were there still three weeks between Colin's message and this blog post? DMCA distractions aside: There are no easy fixes this time, unlike 📝 back when I looked at the Stage 5 Yuuka crash. Just like how division by zero is undefined in mathematics, it's also, literally, undefined what should happen instead of these two Divide error crashes. This means that any possible "fix" can only ever be a fanfiction interpretation of the intentions behind ZUN's code. The gameplay community should be aware of this, and might decide to handle these cases differently. And if we have to go into fanfiction territory to work around crashes in the canon games, we'd better document what exactly we're fixing here and how, as comprehensible as possible.

  1. Kurumi's crash
  2. Marisa's crash

With that out of the way, let's look at Kurumi's crash first, since it's way easier to grasp. This one is known to primarily happen to new players, and it's easy to see why:

The pattern that causes the crash in Kurumi's fight. Also demonstrates how the number of bullets in a ring is always halved on Easy Mode after the rank-based tuning, leading to just a 3-ring on playperf = 16.

So, what should the workaround look like? Obviously, we want to modify neither the default number of ring bullets nor the tuning algorithm – that would change all other non-crashing variations of this pattern on other difficulties and ranks, creating a fork of the original gameplay. Instead, I came up with four possible workarounds that all seemed somewhat logical to me:

  1. Firing no bullet, i.e., interpreting 0-ring literally. This would create the only constellation in which a call to the bullet group spawn functions would not spawn at least one new bullet.
  2. Firing a "1-ring", i.e., a single bullet. This would be consistent with how the bullet spawn functions behave for "0-way" stack and spread groups.
  3. Firing a "∞-ring", i.e., 200 bullets, which is as much as the game's cap on 16×16 bullets would allow. This would poke fun at the whole "division by zero" idea… but given that we're still talking about Easy Mode (and especially new players) here, it might be a tad too cruel. Certainly the most trollish interpretation.
  4. Triggering an immediate Game Over, exchanging the hard crash for a softer and more controlled shutdown. Certainly the option that would be closest to the behavior of the original games, and perhaps the only one to be accepted in Serious, High-Level Play™.

As I was writing this post, it felt increasingly wrong for me to make this decision. So I once again went to Twitter, where 56.3% voted in favor of the 1-bullet option. Good that I asked! I myself was more leaning towards the 0-bullet interpretation, which only got 28.7% of the vote. Also interesting are the 2.3% in favor of the Game Over option but I get it, low-rank Easy Mode isn't exactly the most competitive mode of playing TH04.
There are reports of Kurumi crashing on higher difficulties as well, but I could verify none of them. If they aren't fixed by this workaround, they're caused by an entirely different bug that we have yet to discover.


Onto the Stage 4 Marisa crash then, which does in fact apply to all difficulty levels. I was also wrong on this one – it's a hell of a lot more intricate than being just a division by the number of on-screen bits. Without having decompiled the entire fight, I can't give a completely accurate picture of what happens there yet, but here's the rough idea:

Reference points for Marisa's point-reflected movement. Cyan: Marisa's position, green: (192, 112), yellow: the intended end point.
One of the two patterns in TH04's Stage 4 Marisa boss fight that feature frame number-dependent point-reflected movement. The bits were hacked to self-destruct on the respective frame.

tl;dr: "Game crashes if last bit destroyed within 4-frame window near end of two patterns". For an informed decision on a new movement behavior for these last 8 frames, we definitely need to know all the details behind the crash though. Here's what I would interpret into the code:

  1. Not moving at all, i.e., interpreting 0 as the middle ground between positive and negative movement. This would also make sense because a 12-frame duration implies 100% of the movement to consist of the braking phase – and Marisa wasn't moving before, after all.
  2. Move at maximum speed, i.e., dividing by 1 rather than 0. Since the movement duration is still 12 in this case, Marisa will immediately start braking. In total, she will move exactly ¾ of the way from her initial position to (192, 112) within the 8 frames before the pattern ends.
  3. Directly warping to (192, 112) on frame 0, and to the point-reflected target on 4, respectively. This "emulates" the division by zero by moving Marisa at infinite speed to the exact two points indicated by the velocity formula. It also fits nicely into the 8 frames we have to fill here. Sure, Marisa can't reach these points at any other duration, but why shouldn't she be able to, with infinite speed? Then again, if Marisa is far away enough from (192, 112), this workaround would warp her across the entire playfield. Can Marisa teleport according to lore? I have no idea… :tannedcirno:
  4. Triggering an immediate Game O– hell no, this is the Stage 4 boss, people already hate losing runs to this bug!

Asking Twitter worked great for the Kurumi workaround, so let's do it again! Gotta attach a screenshot of an earlier draft of this blog post though, since this stuff is impossible to explain in tweets…

…and it went through the roof, becoming the most successful ReC98 tweet so far?! Apparently, y'all really like to just look at descriptions of overly complex bugs that I'd consider way beyond the typical attention span that can be expected from Twitter. Unfortunately, all those tweet impressions didn't quite translate into poll turnout. The results were pretty evenly split between 1) and 2), with option 1) just coming out slightly ahead at 49.1%, compared to 41.5% of option 2).

(And yes, I only noticed after creating the poll that warping to both the green and yellow points made more sense than warping to just one of the two. Let's hope that this additional variant wouldn't have shifted the results too much. Both warp options only got 9.4% of the vote after all, and no one else came up with the idea either. :onricdennat: In the end, you can always merge together your preferred combination of workarounds from the Git branches linked below.)


So here you go: The new definitive version of TH04, containing not only the community-chosen Kurumi and Stage 4 Marisa workaround variant, but also the 📝 No-EMS bugfix from last year. Edit (2022-05-31): This package is outdated, 📝 the current version is here! 2022-04-18-community-choice-fixes.zip Oh, and let's also add spaztron64's TH03 GDC clock fix from 2019 because why not. This binary was built from the community_choice_fixes branch, and you can find the code for all the individual workarounds on these branches:

Again, because it can't be stated often enough: These fixes are fanfiction. The gameplay community should be aware of this, and might decide to handle these cases differently.


With all of that taking way more time to evaluate and document, this research really had to become part of a proper push, instead of just being covered in the quick non-push blog post I initially intended. With ½ of a push left at the end, TH05's Stage 1-5 boss background rendering functions fit in perfectly there. If you wonder how these static backdrop images even need any boss-specific code to begin with, you're right – it's basically the same function copy-pasted 4 times, differing only in the backdrop image coordinates and some other inconsequential details.
Only Sara receives a nice variation of the typical 📝 blocky entrance animation: The usually opaque bitmap data from ST00.BB is instead used as a transition mask from stage tiles to the backdrop image, by making clever use of the tile invalidation system:

TH04 uses the same effect a bit more frequently, for its first three bosses.

Next up: Shinki, for real this time! I've already managed to decompile 10 of her 11 danmaku patterns within a little more than one push – and yes, that one is included in there. Looks like I've slightly overestimated the amount of work required for TH04's and TH05's bosses…

📝 Posted:
🚚 Summary of:
P0186, P0187, P0188
Commits:
a21ab3d...bab5634, bab5634...426a531, 426a531...e881f95
💰 Funded by:
Blue Bolt, [Anonymous], nrook
🏷 Tags:

Did you know that moving on top of a boss sprite doesn't kill the player in TH04, only in TH05?

Screenshot of Reimu moving on top of Stage 6 Yuuka, demonstrating the lack of boss↔player collision in TH04
Yup, Reimu is not getting hit… yet.

That's the first of only three interesting discoveries in these 3 pushes, all of which concern TH04. But yeah, 3 for something as seemingly simple as these shared boss functions… that's still not quite the speed-up I had hoped for. While most of this can be blamed, again, on TH04 and all of its hardcoded complexities, there still was a lot of work to be done on the maintenance front as well. These functions reference a bunch of code I RE'd years ago and that still had to be brought up to current standards, with the dependencies reaching from 📝 boss explosions over 📝 text RAM overlay functionality up to in-game dialog loading.

The latter provides a good opportunity to talk a bit about x86 memory segmentation. Many aspiring PC-98 developers these days are very scared of it, with some even going as far as to rather mess with Protected Mode and DOS extenders just so that they don't have to deal with it. I wonder where that fear comes from… Could it be because every modern programming language I know of assumes memory to be flat, and lacks any standard language-level features to even express something like segments and offsets? That's why compilers have a hard time targeting 16-bit x86 these days: Doing anything interesting on the architecture requires giving the programmer full control over segmentation, which always comes down to adding the typical non-standard language extensions of compilers from back in the day. And as soon as DOS stopped being used, these extensions no longer made sense and were subsequently removed from newer tools. A good example for this can be found in an old version of the NASM manual: The project started as an attempt to make x86 assemblers simple again by throwing out most of the segmentation features from MASM-style assemblers, which made complete sense in 1996 when 16-bit DOS and Windows were already on their way out. But there was a point to all those features, and that's why ReC98 still has to use the supposedly inferior TASM.

Not that this fear of segmentation is completely unfounded: All the segmentation-related keywords, directives, and #pragmas provided by Borland C++ and TASM absolutely can be the cause of many weird runtime bugs. Even if the compiler or linker catches them, you are often left with confusing error messages that aged just as poorly as memory segmentation itself.
However, embracing the concept does provide quite the opportunity for optimizations. While it definitely was a very crazy idea, there is a small bit of brilliance to be gained from making proper use of all these segmentation features. Case in point: The buffer for the in-game dialog scripts in TH04 and TH05.

// Thanks to the semantics of `far` pointers, we only need a single 32-bit
// pointer variable for the following code.
extern unsigned char far *dialog_p;

// This master.lib function returns a `void __seg *`, which is a 16-bit
// segment-only pointer. Converting to a `far *` yields a full segment:offset
// pointer to offset 0000h of that segment.
dialog_p = (unsigned char far *)hmem_allocbyte(/* … */);

// Running the dialog script involves pointer arithmetic. On a far pointer,
// this only affects the 16-bit offset part, complete with overflow at 64 KiB,
// from FFFFh back to 0000h.
dialog_p += /* … */;
dialog_p += /* … */;
dialog_p += /* … */;

// Since the segment part of the pointer is still identical to the one we
// allocated above, we can later correctly free the buffer by pulling the
// segment back out of the pointer.
hmem_free((void __seg *)dialog_p);

If dialog_p was a huge pointer, any pointer arithmetic would have also adjusted the segment part, requiring a second pointer to store the base address for the hmem_free call. Doing that will also be necessary for any port to a flat memory model. Depending on how you look at it, this compression of two logical pointers into a single variable is either quite nice, or really, really dumb in its reliance on the precise memory model of one single architecture. :tannedcirno:


Why look at dialog loading though, wasn't this supposed to be all about shared boss functions? Well, TH04 unnecessarily puts certain stage-specific code into the boss defeat function, such as loading the alternate Stage 5 Yuuka defeat dialog before a Bad Ending, or initializing Gengetsu after Mugetsu's defeat in the Extra Stage.
That's TH04's second core function with an explicit conditional branch for Gengetsu, after the 📝 dialog exit code we found last year during EMS research. And I've heard people say that Shinki was the most hardcoded fight in PC-98 Touhou… Really, Shinki is a perfectly regular boss, who makes proper use of all internal mechanics in the way they were intended, and doesn't blast holes into the architecture of the game. Even within TH05, it's Mai and Yuki who rely on hacks and duplicated code, not Shinki.

The worst part about this though? How the function distinguishes Mugetsu from Gengetsu. Once again, it uses its own global variable to track whether it is called the first or the second time within TH04's Extra Stage, unrelated to the same variable used in the dialog exit function. But this time, it's not just any newly created, single-use variable, oh no. In a misguided attempt to micro-optimize away a few bytes of conventional memory, TH04 reserves 16 bytes of "generic boss state", which can (and are) freely used for anything a boss doesn't want to store in a more dedicated variable.
It might have been worth it if the bosses actually used most of these 16 bytes, but the majority just use (the same) two, with only Stage 4 Reimu using a whopping seven different ones. To reverse-engineer the various uses of these variables, I pretty much had to map out which of the undecompiled danmaku-pattern functions corresponds to which boss fight. In the end, I assigned 29 different variable names for each of the semantically different use cases, which made up another full push on its own.

Now, 16 bytes of wildly shared state, isn't that the perfect recipe for bugs? At least during this cursory look, I haven't found any obvious ones yet. If they do exist, it's more likely that they involve reused state from earlier bosses – just how the Shinki death glitch in TH05 is caused by reusing cheeto data from way back in Stage 4 – and hence require much more boss-specific progress.
And yes, it might have been way too early to look into all these tiny details of specific boss scripts… but then, this happened:

TH04 crashing to the DOS prompt in the Stage 4 Marisa fight, right as the last of her bits is destroyed

Looks similar to another screenshot of a crash in the same fight that was reported in December, doesn't it? I was too much in a hurry to figure it out exactly, but notice how both crashes happen right as the last of Marisa's four bits is destroyed. KirbyComment has suspected this to be the cause for a while, and now I can pretty much confirm it to be an unguarded division by the number of on-screen bits in Marisa-specific pattern code. But what's the cause for Kurumi then? :thonk:
As for fixing it, I can go for either a fast or a slow option:

  1. Superficially fixing only this crash will probably just take a fraction of a push.
  2. But I could also go for a deeper understanding by looking at TH04's version of the 📝 custom entity structure. It not only stores the data of Marisa's bits, but is also very likely to be involved in Kurumi's crash, and would get TH04 a lot closer to 100% PI. Taking that look will probably need at least 2 pushes, and might require another 3-4 to completely decompile Marisa's fight, and 2-3 to decompile Kurumi's.

OK, now that that's out of the way, time to finish the boss defeat function… but not without stumbling over the third of TH04's quirks, relating to the Clear Bonus for the main game or the Extra Stage:

And after another few collision-related functions, we're now truly, finally ready to decompile bosses in both TH04 and TH05! Just as the anything funds were running out… :onricdennat: The remaining ¼ of the third push then went to Shinki's 32×32 ball bullets, rounding out this delivery with a small self-contained piece of the first TH05 boss we're probably going to look at.

Next up, though: I'm not sure, actually. Both Shinki and Elis seem just a little bit larger than the 2¼ or 4 pushes purchased so far, respectively. Now that there's a bunch of room left in the cap again, I'll just let the next contribution decide – with a preference for Shinki in case of a tie. And if it will take longer than usual for the store to sell out again this time (heh), there's still the 📝 PC-98 text RAM JIS trail word rendering research waiting to be documented.

📝 Posted:
🚚 Summary of:
P0128, P0129
Commits:
dc65b59...dde36f7, dde36f7...f4c2e45
💰 Funded by:
Yanga
🏷 Tags:

So, only one card-flipping function missing, and then we can start decompiling TH01's two final bosses? Unfortunately, that had to be the one big function that initializes and renders all gameplay objects. #17 on the list of longest functions in all of PC-98 Touhou, requiring two pushes to fully understand what's going on there… and then it immediately returns for all "boss" stages whose number is divisible by 5, yet is still called during Sariel's and Konngara's initialization 🤦

Oh well. This also involved the final file format we hadn't looked at yet – the STAGE?.DAT files that describe the layout for all stages within a single 5-stage scene. Which, for a change is a very well-designed form– no, of course it's completely weird, what did you expect? Development must have looked somewhat like this:

With all that, it's almost not worth mentioning how there are 12 turret types, which only differ in which hardcoded pellet group they fire at a hardcoded interval of either 100 or 200 frames, and that they're all explicitly spelled out in every single switch statement. Or how the layout of the internal card and obstacle SoA classes is quite disjointed. So here's the new ZUN bugs you've probably already been expecting!


Cards and obstacles are blitted to both VRAM pages. This way, any other entities moving on top of them can simply be unblitted by restoring pixels from VRAM page 1, without requiring the stationary objects to be redrawn from main memory. Obviously, the backgrounds behind the cards have to be stored somewhere, since the player can remove them. For faster transitions between stages of a scene, ZUN chose to store the backgrounds behind obstacles as well. This way, the background image really only needs to be blitted for the first stage in a scene.

All that memory for the object backgrounds adds up quite a bit though. ZUN actually made the correct choice here and picked a memory allocation function that can return more than the 64 KiB of a single x86 Real Mode segment. He then accesses the individual backgrounds via regular array subscripts… and that's where the bug lies, because he stores the returned address in a regular far pointer rather than a huge one. This way, the game still can only display a total of 102 objects (i. e., cards and obstacles combined) per stage, without any unblitting glitches.
What a shame, that limit could have been 127 if ZUN didn't needlessly allocate memory for alpha planes when backing up VRAM content. :onricdennat:

And since array subscripts on far pointers wrap around after 64 KiB, trying to save the background of the 103rd object is guaranteed to corrupt the memory block header at the beginning of the returned segment. :zunpet: When TH01 runs in debug mode, it correctly reports a corrupted heap in this case.
After detecting such a corruption, the game loudly reports it by playing the "player hit" sound effect and locking up, freezing any further gameplay or rendering. The locking loop can be left by pressing ↵ Return, but the game will simply re-enter it if the corruption is still present during the next heapcheck(), in the next frame. And since heap corruptions don't tend to repair themselves, you'd have to constantly hold ↵ Return to resume gameplay. Doing that could actually get you safely to the next boss, since the game doesn't allocate or free any further heap memory during a 5-stage card-flipping scene, and just throws away its C heap when restarting the process for a boss. But then again, holding ↵ Return will also auto-flip all cards on the way there… 🤨


Finally, some unused content! Upon discovering TH01's stage selection debug feature, probably everyone tried to access Stage 21, just to see what happens, and indeed landed in an actual stage, with a black background and a weird color palette. Turns out that ZUN did ship an unused scene in SCENE7.DAT, which is exactly what's loaded there.
However, it's easy to believe that this is just garbage data (as I initially did): At the beginning of "Stage 22", the game seems to enter an infinite loop somewhere during the flip-in animation.

Well, we've had a heap overflow above, and the cause here is nothing but a stack buffer overflow – a perhaps more modern kind of classic C bug, given its prevalence in the Windows Touhou games. Explained in a few lines of code:

void stageobjs_init_and_render()
{
	int card_animation_frames[50]; // even though there can be up to 200?!
	int total_frames = 0;

	(code that would end up resetting total_frames if it ever tried to reset
	card_animation_frames[50]…)
}

The number of cards in "Stage 22"? 76. There you have it.

But of course, it's trivial to disable this animation and fix these stage transitions. So here they are, Stages 21 to 24, as shipped with the game in STAGE7.DAT:

TH01 stage 21, loaded from <code>STAGE7.DAT</code>TH01 stage 22, loaded from <code>STAGE7.DAT</code>TH01 stage 23, loaded from <code>STAGE7.DAT</code>TH01 stage 24, loaded from <code>STAGE7.DAT</code>

Wow, what a mess. All that was just a bit too much to be covered in two pushes… Next up, assuming the current subscriptions: Taking a vacation with one smaller TH01 push, covering some smaller functions here and there to ensure some uninterrupted Konngara progress later on.

📝 Posted:
🚚 Summary of:
P0099, P0100, P0101, P0102
Commits:
1799d67...1b25830, 1b25830...ceb81db, ceb81db...c11a956, c11a956...b60f38d
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

Well, make that three days. Trying to figure out all the details behind the sprite flickering was absolutely dreadful…
It started out easy enough, though. Unsurprisingly, TH01 had a quite limited pellet system compared to TH04 and TH05:

As expected from TH01, the code comes with its fair share of smaller, insignificant ZUN bugs and oversights. As you would also expect though, the sprite flickering points to the biggest and most consequential flaw in all of this.


Apparently, it started with ZUN getting the impression that it's only possible to use the PC-98 EGC for fast blitting of all 4 bitplanes in one CPU instruction if you blit 16 horizontal pixels (= 2 bytes) at a time. Consequently, he only wrote one function for EGC-accelerated sprite unblitting, which can only operate on a "grid" of 16×1 tiles in VRAM. But wait, pellets are not only just 8×8, but can also be placed at any unaligned X position…

… yet the game still insists on using this 16-dot-aligned function to unblit pellets, forcing itself into using a super sloppy 16×8 rectangle for the job. 🤦 ZUN then tried to mitigate the resulting flickering in two hilarious ways that just make it worse:

  1. An… "interlaced rendering" mode? This one's activated for all Stage 15 and 20 fights, and separates pellets into two halves that are rendered on alternating frames. Collision detection with the Yin-Yang Orb and the player is only done for the visible half, but collision detection with player shots is still done for all pellets every frame, as are motion updates – so that pellets don't end up moving half as fast as they should.
    So yeah, your eyes weren't deceiving you. The game does effectively drop its perceived frame rate in the Elis, Kikuri, Sariel, and Konngara fights, and it does so deliberately.
  2. 📝 Just like player shots, pellets are also unblitted, moved, and rendered in a single function. Thanks to the 16×8 rectangle, there's now the (completely unnecessary) possibility of accidentally unblitting parts of a sprite that was previously drawn into the 8 pixels right of a pellet. And this is where ZUN went full :tannedcirno: and went "oh, I know, let's test the entire 16 pixels, and in case we got an entity there, we simply make the pellet invisible for this frame! Then we don't even have to unblit it later!" :zunpet:

    Except that this is only done for the first 3 elements of the player shot array…?! Which don't even necessarily have to contain the 3 shots fired last. It's not done for the player sprite, the Orb, or, heck, other pellets that come earlier in the pellet array. (At least we avoided going 𝑂(𝑛²) there?)

    Actually, and I'm only realizing this now as I type this blog post: This test is done even if the shots at those array elements aren't active. So, pellets tend to be made invisible based on comparisons with garbage data. :onricdennat:

    And then you notice that the player shot unblit​/​move​/​render function is actually only ever called from the pellet unblit​/​move​/​render function on the one global instance of the player shot manager class, after pellets were unblitted. So, we end up with a sequence of

    Pellet unblit → Pellet move → Shot unblit → Shot move → Shot render → Pellet render

    which means that we can't ever unblit a previously rendered shot with a pellet. Sure, as terrible as this one function call is from a software architecture perspective, it was enough to fix this issue. Yet we don't even get the intended positive effect, and walk away with pellets that are made temporarily invisible for no reason at all. So, uh, maybe it all just was an attempt at increasing the ramerate on lower spec PC-98 models?

Yup, that's it, we've found the most stupid piece of code in this game, period. It'll be hard to top this.


I'm confident that it's possible to turn TH01 into a well-written, fluid PC-98 game, with no flickering, and no perceived lag, once it's position-independent. With some more in-depth knowledge and documentation on the EGC (remember, there's still 📝 this one TH03 push waiting to be funded), you might even be able to continue using that piece of blitter hardware. And no, you certainly won't need ASM micro-optimizations – just a bit of knowledge about which optimizations Turbo C++ does on its own, and what you'd have to improve in your own code. It'd be very hard to write worse code than what you find in TH01 itself.

(Godbolt for Turbo C++ 4.0J when? Seriously though, that would 📝 also be a great project for outside contributors!)


Oh well. In contrast to TH04 and TH05, where 4 pushes only covered all the involved data types, they were enough to completely cover all of the pellet code in TH01. Everything's already decompiled, and we never have to look at it again. 😌 And with that, TH01 has also gone from by far the least RE'd to the most RE'd game within ReC98, in just half a year! 🎉
Still, that was enough TH01 game logic for a while. :tannedcirno: Next up: Making up for the delay with some more relaxing and easy pieces of TH01 code, that hopefully make just a bit more sense than all this garbage. More image formats, mainly.