⮜ Blog

⮜ List of tags

Showing all posts tagged th05-, bullet- and micro-optimization-

📝 Posted:
🚚 Summary of:
P0149, P0150, P0151, P0152
e1a26bb...05e4c4a, 05e4c4a...768251d, 768251d...4d24ca5, 4d24ca5...81fc861
💰 Funded by:
Blue Bolt, Ember2528, -Tom-, [Anonymous]
🏷 Tags:
rec98+ th04+ th05- gameplay+ bullet- animation+ score+ glitch+ jank+ waste+ micro-optimization- tcc+ uth05win+

…or maybe not that soon, as it would have only wasted time to untangle the bullet update commits from the rest of the progress. So, here's all the bullet spawning code in TH04 and TH05 instead. I hope you're ready for this, there's a lot to talk about!

(For the sake of readability, "bullets" in this blog post refers to the white 8×8 pellets and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)

But first, what was going on 📝 in 2020? Spent 4 pushes on the basic types and constants back then, still ended up confusing a couple of things, and even getting some wrong. Like how TH05's "bullet slowdown" flag actually always prevents slowdown and fires bullets at a constant speed instead. :tannedcirno: Or how "random spread" is not the best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen, which deserve different names:

Bullets are zapped at the end of most midboss and boss phases, and cleared everywhere else – most notably, during bombs, when losing a life, or as rewards for extends or a maximized Dream bonus. The Bonus!! points awarded for zapping bullets are calculated iteratively, so it's not trivial to give an exact formula for these. For a small number 𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛 points – or, using uth05win's (correct) recursive definition, Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10. However, one of the internal step variables is capped at a different number of points for each difficulty (and game), after which the points only increase linearly. Hence, "semi-exponential".

On to TH04's bullet spawn code then, because that one can at least be decompiled. And immediately, we have to deal with a pointless distinction between regular bullets, with either a decelerating or constant velocity, and special bullets, with preset velocity changes during their lifetime. That preset has to be set somewhere, so why have separate functions? In TH04, this separation continues even down to the lowest level of functions, where values are written into the global bullet array. TH05 merges those two functions into one, but then goes too far and uses self-modifying code to save a grand total of two local variables… Luckily, the rest of its actual code is identical to TH04.

Most of the complexity in bullet spawning comes from the (thankfully shared) helper function that calculates the velocities of the individual bullets within a group. Both games handle each group type via a large switch statement, which is where TH04 shows off another Turbo C++ 4.0 optimization: If the range of case values is too sparse to be meaningfully expressed in a jump table, it usually generates a linear search through a second value table. But with the -G command-line option, it instead generates branching code for a binary search through the set of cases. 𝑂(log 𝑛) as the worst case for a switch statement in a C++ compiler from 1994… that's so cool. But still, why are the values in TH04's group type enum all over the place to begin with? :onricdennat:
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only shows up here and in a few places in TH02, compared to at least 50 switch value tables.

In all of its micro-optimized pointlessness, TH05's undecompilable version at least fixes some of TH04's redundancy. While it's still not even optimal, it's at least a decently written piece of ASM… if you take the time to understand what's going on there, because it certainly took quite a bit of that to verify that all of the things which looked like bugs or quirks were in fact correct. And that's how the code for this function ended up with 35% comments and blank lines before I could confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04, where an invalid bullet group type would fill all remaining slots in the bullet array with identical versions of the first bullet.

Something that both games also share in these functions is an over-reliance on globals for return values or other local state. The most ridiculous example here: Tuning the speed of a bullet based on rank actually mutates the global bullet template… which ZUN then works around by adding a wrapper function around both regular and special bullet spawning, which saves the base speed before executing that function, and restores it afterward. :zunpet: Add another set of wrappers to bypass that exact tuning, and you've expanded your nice 1-function interface to 4 functions. Oh, and did I mention that TH04 pointlessly duplicates the first set of wrapper functions for 3 of the 4 difficulties, which can't even be explained with "debugging reasons"? That's 10 functions then… and probably explains why I've procrastinated this feature for so long.

At this point, I also finally stopped decompiling ZUN's original ASM just for the sake of it. All these small TH05 functions would look horribly unidiomatic, are identical to their decompiled TH04 counterparts anyway, except for some unique constant… and, in the case of TH05's rank-based speed tuning function, actually become undecompilable as soon as we want to return a C++ class to preserve the semantic meaning of the return value. Mainly, this is because Turbo C++ does not allow register pseudo-variables like _AX or _AL to be cast into class types, even if their size matches. Decompiling that function would have therefore lowered the quality of the rest of the decompiled code, in exchange for the additional maintenance and compile-time cost of another translation unit. Not worth it – and for a TH05 port, you'd already have to decompile all the rest of the bullet spawning code anyway!

The only thing in there that was still somewhat worth being decompiled was the pre-spawn clipping and collision detection function. Due to what's probably a micro-optimization mistake, the TH05 version continues to spawn a bullet even if it was spawned on top of the player. This might sound like it has a different effect on gameplay… until you realize that the player got hit in this case and will either lose a life or deathbomb, both of which will cause all on-screen bullets to be cleared anyway. So it's at most a visual glitch.

But while we're at it, can we please stop talking about hitboxes? At least in the context of TH04 and TH05 bullets. The actual collision detection is described way better as a kill delta of 8×8 pixels between the center points of the player and a bullet. You can distribute these pixels to any combination of bullet and player "hitboxes" that make up 8×8. 4×4 around both the player and bullets? 1×1 for bullets, and 8×8 for the player? All equally valid… or perhaps none of them, once you keep in mind that other entity types might have different kill deltas. With that in mind, the concept of a "hitbox" turns into just a confusing abstraction.

The same is true for the 36×44 graze box delta. For some reason, this one is not exactly around the center of a bullet, but shifted to the right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the player, but only up to 16 pixels left of the player. uth05win also spotted this… and rotated the deltas clockwise by 90°?!

Which brings us to the bullet updates… for which I still had to research a decompilation workaround, because 📝 P0148 turned out to not help at all? Instead, the solution was to lie to the compiler about the true segment distance of the popup function and declare its signature far rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function call/return per frame, and those precious 2 bytes in the BSS segment that he didn't have to spend on a segment value. 📝 Another function that didn't have just a single declaration in a common header file… really, 📝 how were these games even built???

The function itself is among the longer ones in both games. It especially stands out in the indentation department, with 7 levels at its most indented point – and that's the minimum of what's possible without goto. Only two more notable discoveries there:

  1. Bullets are the only entity affected by Slow Mode. If the number of bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04, or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame rate by 33%, by waiting for one additional VSync event every two frames.
    The code also reveals a second tier, with 50% slowdown for a slightly higher number of bullets, but that conditional branch can never be executed :zunpet:
  2. Bullets must have been grazed in a previous frame before they can be collided with. (Note how this does not apply to bullets that spawned on top of the player, as explained earlier!)

Whew… When did ReC98 turn into a full-on code review?! 😅 And after all this, we're still not done with TH04 and TH05 bullets, with all the special movement types still missing. That should be less than one push though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun rewriting the Touhou Wiki Gameplay pages 😛

📝 Posted:
🚚 Summary of:
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:
rec98+ th04+ th05- gameplay+ bullet- micro-optimization- glitch+ uth05win+ tasm+

Back to TH05! Thanks to the good funding situation, I can strike a nice balance between getting TH05 position-independent as quickly as possible, and properly reverse-engineering some missing important parts of the game. Once 100% PI will get the attention of modders, the code will then be in better shape, and a bit more usable than if I just rushed that goal.

By now, I'm apparently also pretty spoiled by TH01's immediate decompilability, after having worked on that game for so long. Reverse-engineering in ASM land is pretty annoying, after all, since it basically boils down to meticulously editing a piece of ASM into something I can confidently call "reverse-engineered". Most of the time, simply decompiling that piece of code would take just a little bit longer, but be massively more useful. So, I immediately tried decompiling with TH05… and it just worked, at every place I tried!? Whatever the issue was that made 📝 segment splitting so annoying at my first attempt, I seem to have completely solved it in the meantime. 🤷 So yeah, backers can now request pretty much any part of TH04 and TH05 to be decompiled immediately, with no additional segment splitting cost.

(Protip for everyone interested in starting their own ReC project: Just declare one segment per function, right from the start, then group them together to restore the original code segmentation…)

Except that TH05 then just throws more of its infamous micro-optimized and undecompilable ASM at you. 🙄 This push covered the function that adjusts the bullet group template based on rank and the selected difficulty, called every time such a group is configured. Which, just like pretty much all of TH05's bullet spawning code, is one of those undecompilable functions. If C allowed labels of other functions as goto targets, it might have been decompilable into something useful to modders… maybe. But like this, there's no point in even trying.

This is such a terrible idea from a software architecture point of view, I can't even. Because now, you suddenly have to mirror your C++ declarations in ASM land, and keep them in sync with each other. I'm always happy when I get to delete an ASM declaration from the codebase once I've decompiled all the instances where it was referenced. But for TH05, we now have to keep those declarations around forever. 😕 And all that for a performance increase you probably couldn't even measure. Oh well, pulling off Galaxy Brain-level ASM optimizations is kind of fun if you don't have portability plans… I guess?

If I started a full fangame mod of a PC-98 Touhou game, I'd base it on TH04 rather than TH05, and backport selective features from TH05 as needed. Just because it was released later doesn't make it better, and this is by far not the only one of ZUN's micro-optimizations that just went way too far.

Dropping down to ASM also makes it easier to introduce weird quirks. Decompiled, one of TH05's tuning conditions for stack groups on Easy Mode would look something like:

case BP_STACK:
	// […]
	if(spread_angle_delta >= 2) {
The fields of the bullet group template aren't typically reset when setting up a new group. So, spread_angle_delta in the context of a stack group effectively refers to "the delta angle of the last spread group that was fired before this stack – whenever that was". uth05win also spotted this quirk, considered it a bug, and wrote fanfiction by changing spread_angle_delta to stack_bullet_count.
As usual for functions that occur in more than one game, I also decompiled the TH04 bullet group tuning function, and it's perfectly sane, with no such quirks.

In the more PI-focused parts of this push, we got the TH05-exclusive smooth boss movement functions, for flying randomly or towards a given point. Pretty unspectacular for the most part, but we've got yet another uth05win inconsistency in the latter one. Once the Y coordinate gets close enough to the target point, it actually speeds up twice as much as the X coordinate would, whereas uth05win used the same speedup factors for both. This might make uth05win a couple of frames slower in all boss fights from Stage 3 on. Hard to measure though – and boss movement partly depends on RNG anyway.

Next up: Shinki's background animations – which are actually the single biggest source of position dependence left in TH05.

📝 Posted:
🚚 Summary of:
P0072, P0073, P0074, P0075
4bb04ab...cea3ea6, cea3ea6...5286417, 5286417...1807906, 1807906...222fc99
💰 Funded by:
[Anonymous], -Tom-, Myles
🏷 Tags:
rec98+ th04+ th05- gameplay+ blitting+ bullet- micro-optimization- uth05win+

Long time no see! And this is exactly why I've been procrastinating bullets while there was still meaningful progress to be had in other parts of TH04 and TH05: There was bound to be quite some complexity in this most central piece of game logic, and so I couldn't possibly get to a satisfying understanding in just one push.

Or in two, because their rendering involves another bunch of micro-optimized functions adapted from master.lib.

Or in three, because we'd like to actually name all the bullet sprites, since there are a number of sprite ID-related conditional branches. And so, I was refining things I supposedly RE'd in the the commits from the first push until the very end of the fourth.

When we talk about "bullets" in TH04 and TH05, we mean just two things: the white 8×8 pellets, with a cap of 240 in TH04 and 180 in TH05, and any 16×16 sprites from MIKO16.BFT, with a cap of 200 in TH04 and 220 in TH05. These are by far the most common types of… err, "things the player can collide with", and so ZUN provides a whole bunch of pre-made motion, animation, and n-way spread / ring / stack group options for those, which can be selected by simply setting a few fields in the bullet template. All the other "non-bullets" have to be fired and controlled individually.

Which is nothing new, since uth05win covered this part pretty accurately – I don't think anyone could just make up these structure member overloads. The interesting insights here all come from applying this research to TH04, and figuring out its differences compared to TH05. The most notable one there is in the default groups: TH05 allows you to add a stack to any single bullet, n-way spread or ring, but TH04 only lets you create stacks separately from n-way spreads and rings, and thus gets by with fewer fields in its bullet template structure. On the other hand, TH04 has a separate "n-way spread with random angles, yet still aimed at the player" group? Which seems to be unused, at least as far as midbosses and bosses are concerned; can't say anything about stage enemies yet.

In fact, TH05's larger bullet template structure illustrates that these distinct group types actually are a rather redundant piece of over-engineering. You can perfectly indicate any permutation of the basic groups through just the stack bullet count (1 = no stack), spread bullet count (1 = no spread), and spread delta angle (0 = ring instead of spread). Add a 4-flag bitfield to cover the rest (aim to player, randomize angle, randomize speed, force single bullet regardless of difficulty or rank), and the result would be less redundant and even slightly more capable.

Even those 4 pushes didn't quite finish all of the bullet-related types, stopping just shy of the most trivial and consistent enum that defines special movement. This also left us in a 📝 TH03-like situation, in which we're still a bit away from actually converting all this research into actual RE%. Oh well, at least this got us way past 50% in overall position independence. On to the second half! 🎉

For the next push though, we'll first have a quick detour to the remaining C code of all the ZUN.COM binaries. Now that the 📝 TH04 and TH05 resident structures no longer block those, -Tom- has requested TH05's RES_KSO.COM to be covered in one of his outstanding pushes. And since 32th System recently RE'd TH03's resident structure, it makes sense to also review and merge that, before decompiling all three remaining RES_*.COM binaries in hopefully a single push. It might even get done faster than that, in which case I'll then review and merge some more of WindowsTiger's research.