⮜ Blog

⮜ List of tags

Showing all posts tagged
,
and

📝 Posted:
🚚 Summary of:
P0227, P0228
Commits:
4f85326...bfd24c6, bfd24c6...739e1d8
💰 Funded by:
nrook, [Anonymous]
🏷 Tags:

Starting the year with a delivery that wasn't delayed until the last day of the month for once, nice! Still, very soon and high-maintenance did not go well together…

It definitely wasn't Sara's fault though. As you would expect from a Stage 1 Boss, her code was no challenge at all. Most of the TH02, TH04, and TH05 bosses follow the same overall structure, so let's introduce a new table to replace most of the boilerplate overview text:

Phase # Patterns HP boundary Timeout condition
Sprite of Sara in TH05 (Entrance) 4,650 288 frames
2 4 2,550 2,568 frames (= 32 patterns)
3 4 450 5,296 frames (= 24 patterns)
4 1 0 1,300 frames
Total 9 9,452 frames

And that's all the gameplay-relevant detail that ZUN put into Sara's code. It doesn't even make sense to describe the remaining patterns in depth, as their groups can significantly change between difficulties and rank values. The 📝 general code structure of TH05 bosses won't ever make for good-code, but Sara's code is just a lesser example of what I already documented for Shinki.
So, no bugs, no unused content, only inconsequential bloat to be found here, and less than 1 push to get it done… That makes 9 PC-98 Touhou bosses decompiled, with 22 to go, and gets us over the sweet 50% overall finalization mark! 🎉 And sure, it might be possible to pass through the lasers in Sara's final pattern, but the boss script just controls the origin, angle, and activity of lasers, so any quirk there would be part of the laser code… wait, you can do what?!?


TH05 expands TH04's one-off code for Yuuka's Master and Double Sparks into a more featureful laser system, and Sara is the first boss to show it off. Thus, it made sense to look at it again in more detail and finalize the code I had purportedly 📝 reverse-engineered over 4 years ago. That very short delivery notice already hinted at a very time-consuming future finalization of this code, and that prediction certainly came true. On the surface, all of the low-level laser ray rendering and collision detection code is undecompilable: It uses the SI and DI registers without Turbo C++'s safety backups on the stack, and its helper functions take their input and output parameters from convenient registers, completely ignoring common calling conventions. And just to raise the confusion even further, the code doesn't just set these registers for the helper function calls and then restores their original values, but permanently shifts them via additions and subtractions. Unfortunately, these convenient registers also include the BP base pointer to the stack frame of a function… and shifting that register throws any intuition behind accessed local variables right out of the window for a good part of the function, requiring a correctly shifted view of the stack frame just to make sense of it again. :godzun: How could such code even have been written?! This goes well beyond the already wrong assumption that using more stack space is somehow bad, and straight into the territory of self-inflicted pain.

So while it's not a lot of instructions, it's quite dense and really hard to follow. This code would really benefit from a decompilation that anchors all this madness as much as possible in existing C++ structures… so let's decompile it anyway? :tannedcirno:
Doing so would involve emitting lots of raw machine code bytes to hide the SI and DI registers from the compiler, but I already had a certain 📝 batshit insane compiler bug workaround abstraction lying around that could make such code more readable. Hilariously, it only took this one additional use case for that abstraction to reveal itself as premature and way too complicated. :onricdennat: Expanding the core idea into a full-on x86 instruction generator ended up simplifying the code structure a lot. All we really want there is a way to set all potential parameters to e.g. a specific form of the MOV instruction, which can all be expressed as the parameters to a force-inlined __emit__() function. Type safety can help by providing overloads for different operand widths here, but there really is no need for classes, templates, or explicit specialization of templates based on classes. We only need a couple of enums with opcode, register, and prefix constants from the x86 reference documentation, and a set of associated macros that token-paste pseudoregisters onto the prefixes of these enum constants.
And that's how you get a custom compile-time assembler in a 1994 C++ compiler and expand the limits of decompilability even further. What's even truly left now? Self-modifying code, layout tricks that can't be replicated with regularly structured control flow… and that's it. That leaves quite a few functions I previously considered undecompilable to be revisited once I get to work on making this game more portable.

With that, we've turned the low-level laser code into the expected horrible monstrosity that exposes all the hidden complexity in those few ASM instructions. The high-level part should be no big deal now… except that we're immediately bombarded with Fixup overflow errors at link time? Oh well, time to finally learn the true way of fixing this highly annoying issue in a second new piece of decompilation tech – and one that might actually be useful for other x86 Real Mode retro developers at that.
Earlier in the RE history of TH04 and TH05, I often wrote about the need to split the two original code segments into multiple segments within two groups, which makes it possible to slot in code from different translation units at arbitrary places within the original segment. If we don't want to define a unique segment name for each of these slotted-in translation units, we need a way to set custom segment and group names in C land. Turbo C++ offers two #pragmas for that:

For the most part, these #pragmas work well, but they seemed to not help much when it came to calling near functions declared in different segments within the same group. It took a bit of trial and error to figure out what was actually going on in that case, but there is a clear logic to it:

Summarized in code:

#pragma option -zCfoo_TEXT -zPfoo

void bar(void);
void near qux(void); // defined somewhere else, maybe in a different segment

#pragma codeseg baz_TEXT baz

// Despite the segment change in the line above, this function will still be
// put into `foo_TEXT`, the active segment during the first appearance of the
// function name.
void bar(void) {
}

// This function hasn't been declared yet, so it will go into `baz_TEXT` as
// expected.
void baz(void) {
	// This `near` function pointer will be calculated by subtracting the
	// flat/linear address of qux() inside the binary from the base address
	// of qux()'s declared segment, i.e., `foo_TEXT`.
	void (near *ptr_to_qux)(void) = qux;
}

So yeah, you might have to put #pragma codeseg into your headers to tell the linker about the correct segment of a near function in advance. 🤯 This is an important insight for everyone using this compiler, and I'm shocked that none of the Borland C++ books documented the interaction of code segment definitions and near references at least at this level of clarity. The TASM manuals did have a few pages on the topic of groups, but that syntax obviously doesn't apply to a C compiler. Fixup overflows in particular are such a common error and really deserved better than the unhelpful 🤷 of an explanation that ended up in the User's Guide. Maybe this whole technique of custom code segment names was considered arcane even by 1993, judging from the mere three sentences that #pragma codeseg was documented with? Still, it must have been common knowledge among Amusement Makers, because they couldn't have built these exact binaries without knowing about these details. This is the true solution to 📝 any issues involving references to near functions, and I'm glad to see that ZUN did not in fact lie to the compiler. 👍


OK, but now the remaining laser code compiles, and we get to write C++ code to draw some hitboxes during the two collision-detected states of each laser. These confirm what the low-level code from earlier already uncovered: Collision detection against lasers is done by testing a 12×12-pixel box at every 16 pixels along the length of a laser, which leaves obvious 4-pixel gaps at regular intervals that the player can just pass through. :zunpet: This adds 📝 yet 📝 another 📝 quirk to the growing list of quirks that were either intentional or must have been deliberately left in the game after their initial discovery. This is what constants were invented for, and there really is no excuse for not using them – especially during intoxicated coding, and/or if you don't have a compile-time abstraction for Q12.4 literals.

When detecting laser collisions, the game checks the player's single center coordinate against any of the aforementioned 12×12-pixel boxes. Therefore, it's correct to split these 12×12 pixels into two 6×6-pixel boxes and assign the other half to the player for a more natural visualization. Always remember that hitbox visualizations need to keep all colliding entities in mind – 📝 assigning a constant-sized hitbox to "the player" and "the bullets" will be wrong in most other cases.

Using subpixel coordinates in collision detection also introduces a slight inaccuracy into any hitbox visualization recorded in-engine on a 16-color PC-98. Since we have to render discrete pixels, we cannot exactly place a Q12.4 coordinate in the 93.75% of cases where the fractional part is non-zero. This is why pretty much every laser segment hitbox in the video above shows up as 7×7 rather than 6×6: The actual W×H area of each box is 13 pixels smaller, but since the hitbox lies between these pixels, we cannot indicate where it lies exactly, and have to err on the side of caution. It's also why Reimu's box slightly changes size as she moves: Her non-diagonal movement speed is 3.5 pixels per frame, and the constant focused movement in the video above halves that to 1.75 pixels, making her end up on an exact pixel every 4 frames. Looking forward to the glorious future of displays that will allow us to scale up the playfield to 16× its original pixel size, thus rendering the game at its exact internal resolution of 6144×5888 pixels. Such a port would definitely add a lot of value to the game…

The remaining high-level laser code is rather unremarkable for the most part, but raises one final interesting question: With no explicitly defined limit, how wide can a laser be? Looking at the laser structure's 1-byte width field and the unsigned comparisons all throughout the update and rendering code, the answer seems to be an obvious 255 pixels. However, the laser system also contains an automated shrinking state, which can be most notably seen in Mai's wheel pattern. This state shrinks a laser by 2 pixels every 2 frames until it reached a width of 0. This presents a problem with odd widths, which would fall below 0 and overflow back to 255 due to the unsigned nature of this variable. So rather than, I don't know, treating width values of 0 as invalid and stopping at a width of 1, or even adding a condition for that specific case, the code just performs a signed comparison, effectively limiting the width of a shrinkable laser to a maximum of 127 pixels. :zunpet: This small signedness inconsistency now forces the distinction between shrinkable and non-shrinkable lasers onto every single piece of code that uses lasers. Yet another instance where 📝 aiming for a cinematic 30 FPS look made the resulting code much more complicated than if ZUN had just evenly spread out the subtraction across 2 frames. 🤷
Oh well, it's not as if any of the fixed lasers in the original scripts came close to any of these limits. Moving lasers are much more streamlined and limited to begin with: Since they're hardcoded to 6 pixels, the game can safely assume that they're always thinner than the 28 pixels they get gradually widened to during their decay animation.

Finally, in case you were missing a mention of hitboxes in the previous paragraph: Yes, the game always uses the aforementioned 12×12 boxes, regardless of a laser's width.

This video also showcases the 127-pixel limit because I wanted to include the shrink animation for a seamless loop.

That was what, 50% of this blog post just being about complications that made laser difficult for no reason? Next up: The first TH01 Anniversary Edition build, where I finally get to reap the rewards of having a 100% decompiled game and write some good code for once.

📝 Posted:
🚚 Summary of:
P0190, P0191, P0192
Commits:
5734815...293e16a, 293e16a...71cb7b5, 71cb7b5...e1f3f9f
💰 Funded by:
nrook, -Tom-, [Anonymous]
🏷 Tags:

The important things first:

So, Shinki! As far as final boss code is concerned, she's surprisingly economical, with 📝 her background animations making up more than ⅓ of her entire code. Going straight from TH01's 📝 final 📝 bosses to TH05's final boss definitely showed how much ZUN had streamlined danmaku pattern code by the end of PC-98 Touhou. Don't get me wrong, there is still room for improvement: TH05 not only 📝 reuses the same 16 bytes of generic boss state we saw in TH04 last month, but also uses them 4× as often, and even for midbosses. Most importantly though, defining danmaku patterns using a single global instance of the group template structure is just bad no matter how you look at it:

Declaring a separate structure instance with the static data for every pattern would be both safer and more space-efficient, and there's more than enough space left for that in the game's data segment.
But all in all, the pattern functions are short, sweet, and easy to follow. The "devil" pattern is significantly more complex than the others, but still far from TH01's final bosses at their worst. I especially like the clear architectural separation between "one-shot pattern" functions that return true once they're done, and "looping pattern" functions that run as long as they're being called from a boss's main function. Not many all too interesting things in these pattern functions for the most part, except for two pieces of evidence that Shinki was coded after Yumeko:


Speaking about that wing sprite: If you look at ST05.BB2 (or any other file with a large sprite, for that matter), you notice a rather weird file layout:

Raw file layout of TH05's ST05.BB2, demonstrating master.lib's supposed BFNT width limit of 64 pixels
A large sprite split into multiple smaller ones with a width of 64 pixels each? What's this, hardware sprite limitations? On my PC-98?!

And it's not a limitation of the sprite width field in the BFNT+ header either. Instead, it's master.lib's BFNT functions which are limited to sprite widths up to 64 pixels… or at least that's what MASTER.MAN claims. Whatever the restriction was, it seems to be completely nonexistent as of master.lib version 0.23, and none of the master.lib functions used by the games have any issues with larger sprites.
Since ZUN stuck to the supposed 64-pixel width limit though, it's now the game that expects Shinki's winged form to consist of 4 physical sprites, not just 1. Any conversion from another, more logical sprite sheet layout back into BFNT+ must therefore replicate the original number of sprites. Otherwise, the sequential IDs ("patnums") assigned to every newly loaded sprite no longer match ZUN's hardcoded IDs, causing the game to crash. This is exactly what used to happen with -Tom-'s MysticTK automation scripts, which combined these exact sprites into a single large one. This issue has now been fixed – just in case there are some underground modders out there who used these scripts and wonder why their game crashed as soon as the Shinki fight started.


And then the code quality takes a nosedive with Shinki's main function. :onricdennat: Even in TH05, these boss and midboss update functions are still very imperative:

The biggest WTF in there, however, goes to using one of the 16 state bytes as a "relative phase" variable for differentiating between boss phases that share the same branch within the switch(boss.phase) statement. While it's commendable that ZUN tried to reduce code duplication for once, he could have just branched depending on the actual boss.phase variable? The same state byte is then reused in the "devil" pattern to track the activity state of the big jerky lasers in the second half of the pattern. If you somehow managed to end the phase after the first few bullets of the pattern, but before these lasers are up, Shinki's update function would think that you're still in the phase before the "devil" pattern. The main function then sequence-breaks right to the defeat phase, skipping the final pattern with the burning Makai background. Luckily, the HP boundaries are far away enough to make this impossible in practice.
The takeaway here: If you want to use the state bytes for your custom boss script mods, alias them to your own 16-byte structure, and limit each of the bytes to a clearly defined meaning across your entire boss script.

One final discovery that doesn't seem to be documented anywhere yet: Shinki actually has a hidden bomb shield during her two purple-wing phases. uth05win got this part slightly wrong though: It's not a complete shield, and hitting Shinki will still deal 1 point of chip damage per frame. For comparison, the first phase lasts for 3,000 HP, and the "devil" pattern phase lasts for 5,800 HP.

And there we go, 3rd PC-98 Touhou boss script* decompiled, 28 to go! 🎉 In case you were expecting a fix for the Shinki death glitch: That one is more appropriately fixed as part of the Mai & Yuki script. It also requires new code, should ideally look a bit prettier than just removing cheetos between one frame and the next, and I'd still like it to fit within the original position-dependent code layout… Let's do that some other time.
Not much to say about the Stage 1 midboss, or midbosses in general even, except that their update functions have to imperatively handle even more subsystems, due to the relative lack of helper functions.


The remaining ¾ of the third push went to a bunch of smaller RE and finalization work that would have hardly got any attention otherwise, to help secure that 50% RE mark. The nicest piece of code in there shows off what looks like the optimal way of setting up the 📝 GRCG tile register for monochrome blitting in a variable color:

mov ah, palette_index ; Any other non-AL 8-bit register works too.
                      ; (x86 only supports AL as the source operand for OUTs.)

rept 4                ; For all 4 bitplanes…
    shr ah,  1        ; Shift the next color bit into the x86 carry flag
    sbb al,  al       ; Extend the carry flag to a full byte
                      ; (CF=0 → 0x00, CF=1 → 0xFF)
    out 7Eh, al       ; Write AL to the GRCG tile register
endm

Thanks to Turbo C++'s inlining capabilities, the loop body even decompiles into a surprisingly nice one-liner. What a beautiful micro-optimization, at a place where micro-optimization doesn't hurt and is almost expected.
Unfortunately, the micro-optimizations went all downhill from there, becoming increasingly dumb and undecompilable. Was it really necessary to save 4 x86 instructions in the highly unlikely case of a new spark sprite being spawned outside the playfield? That one 2D polar→Cartesian conversion function then pointed out Turbo C++ 4.0J's woefully limited support for 32-bit micro-optimizations. The code generation for 32-bit 📝 pseudo-registers is so bad that they almost aren't worth using for arithmetic operations, and the inline assembler just flat out doesn't support anything 32-bit. No use in decompiling a function that you'd have to entirely spell out in machine code, especially if the same function already exists in multiple other, more idiomatic C++ variations.
Rounding out the third push, we got the TH04/TH05 DEMO?.REC replay file reading code, which should finally prove that nothing about the game's original replay system could serve as even just the foundation for community-usable replays. Just in case anyone was still thinking that.


Next up: Back to TH01, with the Elis fight! Got a bit of room left in the cap again, and there are a lot of things that would make a lot of sense now:

📝 Posted:
🚚 Summary of:
P0149, P0150, P0151, P0152
Commits:
e1a26bb...05e4c4a, 05e4c4a...768251d, 768251d...4d24ca5, 4d24ca5...81fc861
💰 Funded by:
Blue Bolt, Ember2528, -Tom-, [Anonymous]
🏷 Tags:

…or maybe not that soon, as it would have only wasted time to untangle the bullet update commits from the rest of the progress. So, here's all the bullet spawning code in TH04 and TH05 instead. I hope you're ready for this, there's a lot to talk about!

(For the sake of readability, "bullets" in this blog post refers to the white 8×8 pellets and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)


But first, what was going on 📝 in 2020? Spent 4 pushes on the basic types and constants back then, still ended up confusing a couple of things, and even getting some wrong. Like how TH05's "bullet slowdown" flag actually always prevents slowdown and fires bullets at a constant speed instead. :tannedcirno: Or how "random spread" is not the best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen, which deserve different names:

Mechanic #1: Clearing bullets for a custom amount of time, awarding 1000 points for all bullets alive on the first frame, and 100 points for all bullets spawned during the clear time.
Mechanic #2: Zapping bullets for a fixed 16 frames, awarding a semi-exponential and loudly announced Bonus!! for all bullets alive on the first frame, and preventing new bullets from being spawned during those 16 frames. In TH04 at least; thanks to a ZUN bug, zapping got reduced to 1 frame and no animation in TH05…

Bullets are zapped at the end of most midboss and boss phases, and cleared everywhere else – most notably, during bombs, when losing a life, or as rewards for extends or a maximized Dream bonus. The Bonus!! points awarded for zapping bullets are calculated iteratively, so it's not trivial to give an exact formula for these. For a small number 𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛 points – or, using uth05win's (correct) recursive definition, Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10. However, one of the internal step variables is capped at a different number of points for each difficulty (and game), after which the points only increase linearly. Hence, "semi-exponential".


On to TH04's bullet spawn code then, because that one can at least be decompiled. And immediately, we have to deal with a pointless distinction between regular bullets, with either a decelerating or constant velocity, and special bullets, with preset velocity changes during their lifetime. That preset has to be set somewhere, so why have separate functions? In TH04, this separation continues even down to the lowest level of functions, where values are written into the global bullet array. TH05 merges those two functions into one, but then goes too far and uses self-modifying code to save a grand total of two local variables… Luckily, the rest of its actual code is identical to TH04.

Most of the complexity in bullet spawning comes from the (thankfully shared) helper function that calculates the velocities of the individual bullets within a group. Both games handle each group type via a large switch statement, which is where TH04 shows off another Turbo C++ 4.0 optimization: If the range of case values is too sparse to be meaningfully expressed in a jump table, it usually generates a linear search through a second value table. But with the -G command-line option, it instead generates branching code for a binary search through the set of cases. 𝑂(log 𝑛) as the worst case for a switch statement in a C++ compiler from 1994… that's so cool. But still, why are the values in TH04's group type enum all over the place to begin with? :onricdennat:
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only shows up here and in a few places in TH02, compared to at least 50 switch value tables.

In all of its micro-optimized pointlessness, TH05's undecompilable version at least fixes some of TH04's redundancy. While it's still not even optimal, it's at least a decently written piece of ASM… if you take the time to understand what's going on there, because it certainly took quite a bit of that to verify that all of the things which looked like bugs or quirks were in fact correct. And that's how the code for this function ended up with 35% comments and blank lines before I could confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04, where an invalid bullet group type would fill all remaining slots in the bullet array with identical versions of the first bullet.

Something that both games also share in these functions is an over-reliance on globals for return values or other local state. The most ridiculous example here: Tuning the speed of a bullet based on rank actually mutates the global bullet template… which ZUN then works around by adding a wrapper function around both regular and special bullet spawning, which saves the base speed before executing that function, and restores it afterward. :zunpet: Add another set of wrappers to bypass that exact tuning, and you've expanded your nice 1-function interface to 4 functions. Oh, and did I mention that TH04 pointlessly duplicates the first set of wrapper functions for 3 of the 4 difficulties, which can't even be explained with "debugging reasons"? That's 10 functions then… and probably explains why I've procrastinated this feature for so long.

At this point, I also finally stopped decompiling ZUN's original ASM just for the sake of it. All these small TH05 functions would look horribly unidiomatic, are identical to their decompiled TH04 counterparts anyway, except for some unique constant… and, in the case of TH05's rank-based speed tuning function, actually become undecompilable as soon as we want to return a C++ class to preserve the semantic meaning of the return value. Mainly, this is because Turbo C++ does not allow register pseudo-variables like _AX or _AL to be cast into class types, even if their size matches. Decompiling that function would have therefore lowered the quality of the rest of the decompiled code, in exchange for the additional maintenance and compile-time cost of another translation unit. Not worth it – and for a TH05 port, you'd already have to decompile all the rest of the bullet spawning code anyway!


The only thing in there that was still somewhat worth being decompiled was the pre-spawn clipping and collision detection function. Due to what's probably a micro-optimization mistake, the TH05 version continues to spawn a bullet even if it was spawned on top of the player. This might sound like it has a different effect on gameplay… until you realize that the player got hit in this case and will either lose a life or deathbomb, both of which will cause all on-screen bullets to be cleared anyway. So it's at most a visual glitch.

But while we're at it, can we please stop talking about hitboxes? At least in the context of TH04 and TH05 bullets. The actual collision detection is described way better as a kill delta of 8×8 pixels between the center points of the player and a bullet. You can distribute these pixels to any combination of bullet and player "hitboxes" that make up 8×8. 4×4 around both the player and bullets? 1×1 for bullets, and 8×8 for the player? All equally valid… or perhaps none of them, once you keep in mind that other entity types might have different kill deltas. With that in mind, the concept of a "hitbox" turns into just a confusing abstraction.

The same is true for the 36×44 graze box delta. For some reason, this one is not exactly around the center of a bullet, but shifted to the right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the player, but only up to 16 pixels left of the player. uth05win also spotted this… and rotated the deltas clockwise by 90°?!


Which brings us to the bullet updates… for which I still had to research a decompilation workaround, because 📝 P0148 turned out to not help at all? Instead, the solution was to lie to the compiler about the true segment distance of the popup function and declare its signature far rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function call/return per frame, and those precious 2 bytes in the BSS segment that he didn't have to spend on a segment value. 📝 Another function that didn't have just a single declaration in a common header file… really, 📝 how were these games even built???

The function itself is among the longer ones in both games. It especially stands out in the indentation department, with 7 levels at its most indented point – and that's the minimum of what's possible without goto. Only two more notable discoveries there:

  1. Bullets are the only entity affected by Slow Mode. If the number of bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04, or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame rate by 33%, by waiting for one additional VSync event every two frames.
    The code also reveals a second tier, with 50% slowdown for a slightly higher number of bullets, but that conditional branch can never be executed :zunpet:
  2. Bullets must have been grazed in a previous frame before they can be collided with. (Note how this does not apply to bullets that spawned on top of the player, as explained earlier!)

Whew… When did ReC98 turn into a full-on code review?! 😅 And after all this, we're still not done with TH04 and TH05 bullets, with all the special movement types still missing. That should be less than one push though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun rewriting the Touhou Wiki Gameplay pages 😛

📝 Posted:
🚚 Summary of:
P0109
Commits:
dcf4e2c...2c7d86b
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:

Back to TH05! Thanks to the good funding situation, I can strike a nice balance between getting TH05 position-independent as quickly as possible, and properly reverse-engineering some missing important parts of the game. Once 100% PI will get the attention of modders, the code will then be in better shape, and a bit more usable than if I just rushed that goal.

By now, I'm apparently also pretty spoiled by TH01's immediate decompilability, after having worked on that game for so long. Reverse-engineering in ASM land is pretty annoying, after all, since it basically boils down to meticulously editing a piece of ASM into something I can confidently call "reverse-engineered". Most of the time, simply decompiling that piece of code would take just a little bit longer, but be massively more useful. So, I immediately tried decompiling with TH05… and it just worked, at every place I tried!? Whatever the issue was that made 📝 segment splitting so annoying at my first attempt, I seem to have completely solved it in the meantime. 🤷 So yeah, backers can now request pretty much any part of TH04 and TH05 to be decompiled immediately, with no additional segment splitting cost.

(Protip for everyone interested in starting their own ReC project: Just declare one segment per function, right from the start, then group them together to restore the original code segmentation…)


Except that TH05 then just throws more of its infamous micro-optimized and undecompilable ASM at you. 🙄 This push covered the function that adjusts the bullet group template based on rank and the selected difficulty, called every time such a group is configured. Which, just like pretty much all of TH05's bullet spawning code, is one of those undecompilable functions. If C allowed labels of other functions as goto targets, it might have been decompilable into something useful to modders… maybe. But like this, there's no point in even trying.

This is such a terrible idea from a software architecture point of view, I can't even. Because now, you suddenly have to mirror your C++ declarations in ASM land, and keep them in sync with each other. I'm always happy when I get to delete an ASM declaration from the codebase once I've decompiled all the instances where it was referenced. But for TH05, we now have to keep those declarations around forever. 😕 And all that for a performance increase you probably couldn't even measure. Oh well, pulling off Galaxy Brain-level ASM optimizations is kind of fun if you don't have portability plans… I guess?

If I started a full fangame mod of a PC-98 Touhou game, I'd base it on TH04 rather than TH05, and backport selected features from TH05 as needed. Just because it was released later doesn't make it better, and this is by far not the only one of ZUN's micro-optimizations that just went way too far.

Dropping down to ASM also makes it easier to introduce weird quirks. Decompiled, one of TH05's tuning conditions for stack groups on Easy Mode would look something like:

case BP_STACK:
	// […]
	if(spread_angle_delta >= 2) {
		stack_bullet_count--;
	}

The fields of the bullet group template aren't typically reset when setting up a new group. So, spread_angle_delta in the context of a stack group effectively refers to "the delta angle of the last spread group that was fired before this stack – whenever that was". uth05win also spotted this quirk, considered it a bug, and wrote fanfiction by changing spread_angle_delta to stack_bullet_count.
As usual for functions that occur in more than one game, I also decompiled the TH04 bullet group tuning function, and it's perfectly sane, with no such quirks.


In the more PI-focused parts of this push, we got the TH05-exclusive smooth boss movement functions, for flying randomly or towards a given point. Pretty unspectacular for the most part, but we've got yet another uth05win inconsistency in the latter one. Once the Y coordinate gets close enough to the target point, it actually speeds up twice as much as the X coordinate would, whereas uth05win used the same speedup factors for both. This might make uth05win a couple of frames slower in all boss fights from Stage 3 on. Hard to measure though – and boss movement partly depends on RNG anyway.


Next up: Shinki's background animations – which are actually the single biggest source of position dependence left in TH05.

📝 Posted:
🚚 Summary of:
P0072, P0073, P0074, P0075
Commits:
4bb04ab...cea3ea6, cea3ea6...5286417, 5286417...1807906, 1807906...222fc99
💰 Funded by:
[Anonymous], -Tom-, Myles
🏷 Tags:

Long time no see! And this is exactly why I've been procrastinating bullets while there was still meaningful progress to be had in other parts of TH04 and TH05: There was bound to be quite some complexity in this most central piece of game logic, and so I couldn't possibly get to a satisfying understanding in just one push.

Or in two, because their rendering involves another bunch of micro-optimized functions adapted from master.lib.

Or in three, because we'd like to actually name all the bullet sprites, since there are a number of sprite ID-related conditional branches. And so, I was refining things I supposedly RE'd in the the commits from the first push until the very end of the fourth.

When we talk about "bullets" in TH04 and TH05, we mean just two things: the white 8×8 pellets, with a cap of 240 in TH04 and 180 in TH05, and any 16×16 sprites from MIKO16.BFT, with a cap of 200 in TH04 and 220 in TH05. These are by far the most common types of… err, "things the player can collide with", and so ZUN provides a whole bunch of pre-made motion, animation, and n-way spread / ring / stack group options for those, which can be selected by simply setting a few fields in the bullet template. All the other "non-bullets" have to be fired and controlled individually.

Which is nothing new, since uth05win covered this part pretty accurately – I don't think anyone could just make up these structure member overloads. The interesting insights here all come from applying this research to TH04, and figuring out its differences compared to TH05. The most notable one there is in the default groups: TH05 allows you to add a stack to any single bullet, n-way spread or ring, but TH04 only lets you create stacks separately from n-way spreads and rings, and thus gets by with fewer fields in its bullet template structure. On the other hand, TH04 has a separate "n-way spread with random angles, yet still aimed at the player" group? Which seems to be unused, at least as far as midbosses and bosses are concerned; can't say anything about stage enemies yet.

In fact, TH05's larger bullet template structure illustrates that these distinct group types actually are a rather redundant piece of over-engineering. You can perfectly indicate any permutation of the basic groups through just the stack bullet count (1 = no stack), spread bullet count (1 = no spread), and spread delta angle (0 = ring instead of spread). Add a 4-flag bitfield to cover the rest (aim to player, randomize angle, randomize speed, force single bullet regardless of difficulty or rank), and the result would be less redundant and even slightly more capable.

Even those 4 pushes didn't quite finish all of the bullet-related types, stopping just shy of the most trivial and consistent enum that defines special movement. This also left us in a 📝 TH03-like situation, in which we're still a bit away from actually converting all this research into actual RE%. Oh well, at least this got us way past 50% in overall position independence. On to the second half! 🎉

For the next push though, we'll first have a quick detour to the remaining C code of all the ZUN.COM binaries. Now that the 📝 TH04 and TH05 resident structures no longer block those, -Tom- has requested TH05's RES_KSO.COM to be covered in one of his outstanding pushes. And since 32th System recently RE'd TH03's resident structure, it makes sense to also review and merge that, before decompiling all three remaining RES_*.COM binaries in hopefully a single push. It might even get done faster than that, in which case I'll then review and merge some more of WindowsTiger's research.

📝 Posted:
🚚 Summary of:
P0046
Commits:
612beb8...deb45ea
💰 Funded by:
-Tom-
🏷 Tags:

Stumbled across one more drawing function in the way… which was only a duplicated and seemingly pointlessly micro-optimized copy of master.lib's super_roll_put_tiny() function, used for fast display of 4-color 16×16 sprites.

With this out of the way, we can tackle player shot sprite animation next. This will get rid of a lot of code, since every power level of every character's shot type is implemented in its own function. Which makes up thousands of instructions in both TH04 and TH05 that we can nicely decompile in the future without going through a dedicated reverse-engineering step.

📝 Posted:
🚚 Summary of:
P0023, P0024
Commits:
807df3d...0cde4b7
💰 Funded by:
zorg
🏷 Tags:

Actually, I lied, and lasers ended up coming with everything that makes reverse-engineering ZUN code so difficult: weirdly reused variables, unexpected structures within structures, and those TH05-specific nasty, premature ASM micro-optimizations that will waste a lot of time during decompilation, since the majority of the code actually was C, except for where it wasn't.