⮜ Blog

⮜ List of tags

Showing all posts tagged
and

📝 Posted:
🚚 Summary of:
P0227, P0228
Commits:
4f85326...bfd24c6, bfd24c6...739e1d8
💰 Funded by:
nrook, [Anonymous]
🏷 Tags:

Starting the year with a delivery that wasn't delayed until the last day of the month for once, nice! Still, very soon and high-maintenance did not go well together…

It definitely wasn't Sara's fault though. As you would expect from a Stage 1 Boss, her code was no challenge at all. Most of the TH02, TH04, and TH05 bosses follow the same overall structure, so let's introduce a new table to replace most of the boilerplate overview text:

Phase # Patterns HP boundary Timeout condition
Sprite of Sara in TH05 (Entrance) 4,650 288 frames
2 4 2,550 2,568 frames (= 32 patterns)
3 4 450 5,296 frames (= 24 patterns)
4 1 0 1,300 frames
Total 9 9,452 frames

And that's all the gameplay-relevant detail that ZUN put into Sara's code. It doesn't even make sense to describe the remaining patterns in depth, as their groups can significantly change between difficulties and rank values. The 📝 general code structure of TH05 bosses won't ever make for good-code, but Sara's code is just a lesser example of what I already documented for Shinki.
So, no bugs, no unused content, only inconsequential bloat to be found here, and less than 1 push to get it done… That makes 9 PC-98 Touhou bosses decompiled, with 22 to go, and gets us over the sweet 50% overall finalization mark! 🎉 And sure, it might be possible to pass through the lasers in Sara's final pattern, but the boss script just controls the origin, angle, and activity of lasers, so any quirk there would be part of the laser code… wait, you can do what?!?


TH05 expands TH04's one-off code for Yuuka's Master and Double Sparks into a more featureful laser system, and Sara is the first boss to show it off. Thus, it made sense to look at it again in more detail and finalize the code I had purportedly 📝 reverse-engineered over 4 years ago. That very short delivery notice already hinted at a very time-consuming future finalization of this code, and that prediction certainly came true. On the surface, all of the low-level laser ray rendering and collision detection code is undecompilable: It uses the SI and DI registers without Turbo C++'s safety backups on the stack, and its helper functions take their input and output parameters from convenient registers, completely ignoring common calling conventions. And just to raise the confusion even further, the code doesn't just set these registers for the helper function calls and then restores their original values, but permanently shifts them via additions and subtractions. Unfortunately, these convenient registers also include the BP base pointer to the stack frame of a function… and shifting that register throws any intuition behind accessed local variables right out of the window for a good part of the function, requiring a correctly shifted view of the stack frame just to make sense of it again. :godzun: How could such code even have been written?! This goes well beyond the already wrong assumption that using more stack space is somehow bad, and straight into the territory of self-inflicted pain.

So while it's not a lot of instructions, it's quite dense and really hard to follow. This code would really benefit from a decompilation that anchors all this madness as much as possible in existing C++ structures… so let's decompile it anyway? :tannedcirno:
Doing so would involve emitting lots of raw machine code bytes to hide the SI and DI registers from the compiler, but I already had a certain 📝 batshit insane compiler bug workaround abstraction lying around that could make such code more readable. Hilariously, it only took this one additional use case for that abstraction to reveal itself as premature and way too complicated. :onricdennat: Expanding the core idea into a full-on x86 instruction generator ended up simplifying the code structure a lot. All we really want there is a way to set all potential parameters to e.g. a specific form of the MOV instruction, which can all be expressed as the parameters to a force-inlined __emit__() function. Type safety can help by providing overloads for different operand widths here, but there really is no need for classes, templates, or explicit specialization of templates based on classes. We only need a couple of enums with opcode, register, and prefix constants from the x86 reference documentation, and a set of associated macros that token-paste pseudoregisters onto the prefixes of these enum constants.
And that's how you get a custom compile-time assembler in a 1994 C++ compiler and expand the limits of decompilability even further. What's even truly left now? Self-modifying code, layout tricks that can't be replicated with regularly structured control flow… and that's it. That leaves quite a few functions I previously considered undecompilable to be revisited once I get to work on making this game more portable.

With that, we've turned the low-level laser code into the expected horrible monstrosity that exposes all the hidden complexity in those few ASM instructions. The high-level part should be no big deal now… except that we're immediately bombarded with Fixup overflow errors at link time? Oh well, time to finally learn the true way of fixing this highly annoying issue in a second new piece of decompilation tech – and one that might actually be useful for other x86 Real Mode retro developers at that.
Earlier in the RE history of TH04 and TH05, I often wrote about the need to split the two original code segments into multiple segments within two groups, which makes it possible to slot in code from different translation units at arbitrary places within the original segment. If we don't want to define a unique segment name for each of these slotted-in translation units, we need a way to set custom segment and group names in C land. Turbo C++ offers two #pragmas for that:

For the most part, these #pragmas work well, but they seemed to not help much when it came to calling near functions declared in different segments within the same group. It took a bit of trial and error to figure out what was actually going on in that case, but there is a clear logic to it:

Summarized in code:

#pragma option -zCfoo_TEXT -zPfoo

void bar(void);
void near qux(void); // defined somewhere else, maybe in a different segment

#pragma codeseg baz_TEXT baz

// Despite the segment change in the line above, this function will still be
// put into `foo_TEXT`, the active segment during the first appearance of the
// function name.
void bar(void) {
}

// This function hasn't been declared yet, so it will go into `baz_TEXT` as
// expected.
void baz(void) {
	// This `near` function pointer will be calculated by subtracting the
	// flat/linear address of qux() inside the binary from the base address
	// of qux()'s declared segment, i.e., `foo_TEXT`.
	void (near *ptr_to_qux)(void) = qux;
}

So yeah, you might have to put #pragma codeseg into your headers to tell the linker about the correct segment of a near function in advance. 🤯 This is an important insight for everyone using this compiler, and I'm shocked that none of the Borland C++ books documented the interaction of code segment definitions and near references at least at this level of clarity. The TASM manuals did have a few pages on the topic of groups, but that syntax obviously doesn't apply to a C compiler. Fixup overflows in particular are such a common error and really deserved better than the unhelpful 🤷 of an explanation that ended up in the User's Guide. Maybe this whole technique of custom code segment names was considered arcane even by 1993, judging from the mere three sentences that #pragma codeseg was documented with? Still, it must have been common knowledge among Amusement Makers, because they couldn't have built these exact binaries without knowing about these details. This is the true solution to 📝 any issues involving references to near functions, and I'm glad to see that ZUN did not in fact lie to the compiler. 👍


OK, but now the remaining laser code compiles, and we get to write C++ code to draw some hitboxes during the two collision-detected states of each laser. These confirm what the low-level code from earlier already uncovered: Collision detection against lasers is done by testing a 12×12-pixel box at every 16 pixels along the length of a laser, which leaves obvious 4-pixel gaps at regular intervals that the player can just pass through. :zunpet: This adds 📝 yet 📝 another 📝 quirk to the growing list of quirks that were either intentional or must have been deliberately left in the game after their initial discovery. This is what constants were invented for, and there really is no excuse for not using them – especially during intoxicated coding, and/or if you don't have a compile-time abstraction for Q12.4 literals.

When detecting laser collisions, the game checks the player's single center coordinate against any of the aforementioned 12×12-pixel boxes. Therefore, it's correct to split these 12×12 pixels into two 6×6-pixel boxes and assign the other half to the player for a more natural visualization. Always remember that hitbox visualizations need to keep all colliding entities in mind – 📝 assigning a constant-sized hitbox to "the player" and "the bullets" will be wrong in most other cases.

Using subpixel coordinates in collision detection also introduces a slight inaccuracy into any hitbox visualization recorded in-engine on a 16-color PC-98. Since we have to render discrete pixels, we cannot exactly place a Q12.4 coordinate in the 93.75% of cases where the fractional part is non-zero. This is why pretty much every laser segment hitbox in the video above shows up as 7×7 rather than 6×6: The actual W×H area of each box is 13 pixels smaller, but since the hitbox lies between these pixels, we cannot indicate where it lies exactly, and have to err on the side of caution. It's also why Reimu's box slightly changes size as she moves: Her non-diagonal movement speed is 3.5 pixels per frame, and the constant focused movement in the video above halves that to 1.75 pixels, making her end up on an exact pixel every 4 frames. Looking forward to the glorious future of displays that will allow us to scale up the playfield to 16× its original pixel size, thus rendering the game at its exact internal resolution of 6144×5888 pixels. Such a port would definitely add a lot of value to the game…

The remaining high-level laser code is rather unremarkable for the most part, but raises one final interesting question: With no explicitly defined limit, how wide can a laser be? Looking at the laser structure's 1-byte width field and the unsigned comparisons all throughout the update and rendering code, the answer seems to be an obvious 255 pixels. However, the laser system also contains an automated shrinking state, which can be most notably seen in Mai's wheel pattern. This state shrinks a laser by 2 pixels every 2 frames until it reached a width of 0. This presents a problem with odd widths, which would fall below 0 and overflow back to 255 due to the unsigned nature of this variable. So rather than, I don't know, treating width values of 0 as invalid and stopping at a width of 1, or even adding a condition for that specific case, the code just performs a signed comparison, effectively limiting the width of a shrinkable laser to a maximum of 127 pixels. :zunpet: This small signedness inconsistency now forces the distinction between shrinkable and non-shrinkable lasers onto every single piece of code that uses lasers. Yet another instance where 📝 aiming for a cinematic 30 FPS look made the resulting code much more complicated than if ZUN had just evenly spread out the subtraction across 2 frames. 🤷
Oh well, it's not as if any of the fixed lasers in the original scripts came close to any of these limits. Moving lasers are much more streamlined and limited to begin with: Since they're hardcoded to 6 pixels, the game can safely assume that they're always thinner than the 28 pixels they get gradually widened to during their decay animation.

Finally, in case you were missing a mention of hitboxes in the previous paragraph: Yes, the game always uses the aforementioned 12×12 boxes, regardless of a laser's width.

This video also showcases the 127-pixel limit because I wanted to include the shrink animation for a seamless loop.

That was what, 50% of this blog post just being about complications that made laser difficult for no reason? Next up: The first TH01 Anniversary Edition build, where I finally get to reap the rewards of having a 100% decompiled game and write some good code for once.

📝 Posted:
🚚 Summary of:
P0189
Commits:
22abdd1...b4876b6
💰 Funded by:
Arandui, Lmocinemod
🏷 Tags:

(Before we start: Make sure you've read the current version of the FAQ section on a potential takedown of this project, updated in light of the recent DMCA claims against PC-98 Touhou game downloads.)


Slight change of plans, because we got instructions for reliably reproducing the TH04 Kurumi Divide Error crash! Major thanks to Colin Douglas Howell. With those, it also made sense to immediately look at the crash in the Stage 4 Marisa fight as well. This way, I could release both of the obligatory bugfix mods at the same time.
Especially since it turned out that I was wrong: Both crashes are entirely unrelated to the custom entity structure that would have required PI-centric progress. They are completely specific to Kurumi's and Marisa's danmaku-pattern code, and really are two separate bugs with no connection to each other. All of the necessary research nicely fit into Arandui's 0.5 pushes, with no further deep understanding required here.

But why were there still three weeks between Colin's message and this blog post? DMCA distractions aside: There are no easy fixes this time, unlike 📝 back when I looked at the Stage 5 Yuuka crash. Just like how division by zero is undefined in mathematics, it's also, literally, undefined what should happen instead of these two Divide error crashes. This means that any possible "fix" can only ever be a fanfiction interpretation of the intentions behind ZUN's code. The gameplay community should be aware of this, and might decide to handle these cases differently. And if we have to go into fanfiction territory to work around crashes in the canon games, we'd better document what exactly we're fixing here and how, as comprehensible as possible.

  1. Kurumi's crash
  2. Marisa's crash

With that out of the way, let's look at Kurumi's crash first, since it's way easier to grasp. This one is known to primarily happen to new players, and it's easy to see why:

The pattern that causes the crash in Kurumi's fight. Also demonstrates how the number of bullets in a ring is always halved on Easy Mode after the rank-based tuning, leading to just a 3-ring on playperf = 16.

So, what should the workaround look like? Obviously, we want to modify neither the default number of ring bullets nor the tuning algorithm – that would change all other non-crashing variations of this pattern on other difficulties and ranks, creating a fork of the original gameplay. Instead, I came up with four possible workarounds that all seemed somewhat logical to me:

  1. Firing no bullet, i.e., interpreting 0-ring literally. This would create the only constellation in which a call to the bullet group spawn functions would not spawn at least one new bullet.
  2. Firing a "1-ring", i.e., a single bullet. This would be consistent with how the bullet spawn functions behave for "0-way" stack and spread groups.
  3. Firing a "∞-ring", i.e., 200 bullets, which is as much as the game's cap on 16×16 bullets would allow. This would poke fun at the whole "division by zero" idea… but given that we're still talking about Easy Mode (and especially new players) here, it might be a tad too cruel. Certainly the most trollish interpretation.
  4. Triggering an immediate Game Over, exchanging the hard crash for a softer and more controlled shutdown. Certainly the option that would be closest to the behavior of the original games, and perhaps the only one to be accepted in Serious, High-Level Play™.

As I was writing this post, it felt increasingly wrong for me to make this decision. So I once again went to Twitter, where 56.3% voted in favor of the 1-bullet option. Good that I asked! I myself was more leaning towards the 0-bullet interpretation, which only got 28.7% of the vote. Also interesting are the 2.3% in favor of the Game Over option but I get it, low-rank Easy Mode isn't exactly the most competitive mode of playing TH04.
There are reports of Kurumi crashing on higher difficulties as well, but I could verify none of them. If they aren't fixed by this workaround, they're caused by an entirely different bug that we have yet to discover.


Onto the Stage 4 Marisa crash then, which does in fact apply to all difficulty levels. I was also wrong on this one – it's a hell of a lot more intricate than being just a division by the number of on-screen bits. Without having decompiled the entire fight, I can't give a completely accurate picture of what happens there yet, but here's the rough idea:

Reference points for Marisa's point-reflected movement. Cyan: Marisa's position, green: (192, 112), yellow: the intended end point.
One of the two patterns in TH04's Stage 4 Marisa boss fight that feature frame number-dependent point-reflected movement. The bits were hacked to self-destruct on the respective frame.

tl;dr: "Game crashes if last bit destroyed within 4-frame window near end of two patterns". For an informed decision on a new movement behavior for these last 8 frames, we definitely need to know all the details behind the crash though. Here's what I would interpret into the code:

  1. Not moving at all, i.e., interpreting 0 as the middle ground between positive and negative movement. This would also make sense because a 12-frame duration implies 100% of the movement to consist of the braking phase – and Marisa wasn't moving before, after all.
  2. Move at maximum speed, i.e., dividing by 1 rather than 0. Since the movement duration is still 12 in this case, Marisa will immediately start braking. In total, she will move exactly ¾ of the way from her initial position to (192, 112) within the 8 frames before the pattern ends.
  3. Directly warping to (192, 112) on frame 0, and to the point-reflected target on 4, respectively. This "emulates" the division by zero by moving Marisa at infinite speed to the exact two points indicated by the velocity formula. It also fits nicely into the 8 frames we have to fill here. Sure, Marisa can't reach these points at any other duration, but why shouldn't she be able to, with infinite speed? Then again, if Marisa is far away enough from (192, 112), this workaround would warp her across the entire playfield. Can Marisa teleport according to lore? I have no idea… :tannedcirno:
  4. Triggering an immediate Game O– hell no, this is the Stage 4 boss, people already hate losing runs to this bug!

Asking Twitter worked great for the Kurumi workaround, so let's do it again! Gotta attach a screenshot of an earlier draft of this blog post though, since this stuff is impossible to explain in tweets…

…and it went through the roof, becoming the most successful ReC98 tweet so far?! Apparently, y'all really like to just look at descriptions of overly complex bugs that I'd consider way beyond the typical attention span that can be expected from Twitter. Unfortunately, all those tweet impressions didn't quite translate into poll turnout. The results were pretty evenly split between 1) and 2), with option 1) just coming out slightly ahead at 49.1%, compared to 41.5% of option 2).

(And yes, I only noticed after creating the poll that warping to both the green and yellow points made more sense than warping to just one of the two. Let's hope that this additional variant wouldn't have shifted the results too much. Both warp options only got 9.4% of the vote after all, and no one else came up with the idea either. :onricdennat: In the end, you can always merge together your preferred combination of workarounds from the Git branches linked below.)


So here you go: The new definitive version of TH04, containing not only the community-chosen Kurumi and Stage 4 Marisa workaround variant, but also the 📝 No-EMS bugfix from last year. Edit (2022-05-31): This package is outdated, 📝 the current version is here! 2022-04-18-community-choice-fixes.zip Oh, and let's also add spaztron64's TH03 GDC clock fix from 2019 because why not. This binary was built from the community_choice_fixes branch, and you can find the code for all the individual workarounds on these branches:

Again, because it can't be stated often enough: These fixes are fanfiction. The gameplay community should be aware of this, and might decide to handle these cases differently.


With all of that taking way more time to evaluate and document, this research really had to become part of a proper push, instead of just being covered in the quick non-push blog post I initially intended. With ½ of a push left at the end, TH05's Stage 1-5 boss background rendering functions fit in perfectly there. If you wonder how these static backdrop images even need any boss-specific code to begin with, you're right – it's basically the same function copy-pasted 4 times, differing only in the backdrop image coordinates and some other inconsequential details.
Only Sara receives a nice variation of the typical 📝 blocky entrance animation: The usually opaque bitmap data from ST00.BB is instead used as a transition mask from stage tiles to the backdrop image, by making clever use of the tile invalidation system:

TH04 uses the same effect a bit more frequently, for its first three bosses.

Next up: Shinki, for real this time! I've already managed to decompile 10 of her 11 danmaku patterns within a little more than one push – and yes, that one is included in there. Looks like I've slightly overestimated the amount of work required for TH04's and TH05's bosses…