⮜ Blog

⮜ List of tags

Showing all posts tagged
,
and

📝 Posted:
🏷 Tags:

Turns out I was not quite done with the TH01 Anniversary Edition yet. You might have noticed some white streaks at the beginning of Sariel's second form, which are in fact a bug that I accidentally added to the initial release. :tannedcirno:
These can be traced back to a quirk I wasn't aware of, and hadn't documented so far. When defeating Sariel's first form during a pattern that spawns pellets, it's likely for the second form to start with additional pellets that resemble the previous pattern, but come out of seemingly nowhere. This shouldn't really happen if you look at the code: Nothing outside the typical pattern code spawns new pellets, and all existing ones are reset before the form transition…

Except if they're currently showing the 10-frame delay cloud animation , activated for all pellets during the symmetrical radial 2-ring pattern in Phase 2 and left activated for the rest of the fight. These pellets will continue their animation after the transition to the second form, and turn into regular pellets you have to dodge once their animation completed.

By itself, this is just one more quirk to keep in mind during refactoring. It only turned into a bug in the Anniversary Edition because the game tracks the number of living pellets in a separate counter variable. After resetting all pellets, this counter is simply set to 0, regardless of any delay cloud pellets that may still be alive, and it's merely incremented or decremented when pellets are spawned or leave the playfield. :zunpet:
In the original game, this counter is only used as an optimization to skip spawning new pellets once the cap is reached. But with batched EGC-accelerated unblitting, it also makes sense to skip the rather costly setup and shutdown of the EGC if no pellets are active anyway. Except if the counter you use to check for that case can be 0 even if there are pellets alive, which consequently don't get unblitted… :onricdennat:
There is an optimal fix though: Instead of unconditionally resetting the living pellet counter to 0, we decrement it for every pellet that does get reset. This preserves the quirk and gives us a consistently correct counter, allowing us to still skip every unnecessary loop over the pellet array.

Cutting out the lengthy defeat animation makes it easier to see where the additional pellets come from.
Cutting out the lengthy defeat animation makes it easier to see where the additional pellets come from. Also, note how regular unblitting resumes once the first pellet gets clipped at the top of the playfield – the living pellet counter then gets decremented to -1, and who uses <= rather than == on a seemingly unsigned counter, right?
Cutting out the lengthy defeat animation makes it easier to see where the additional pellets come from.

Ultimately, this was a harmless bug that didn't affect gameplay, but it's still something that players would have probably reported a few more times. So here's a free bugfix:

:th01: TH01 Anniversary Edition, version P0234-1 2023-03-14-th01-anniv.zip

Thanks to mu021 for reporting this issue and providing helpful videos to identify the cause!

📝 Posted:
🚚 Summary of:
P0201, P0202
Commits:
9342665...ff49e9e, ff49e9e...4568bf7
💰 Funded by:
Ember2528, Yanga, [Anonymous]
🏷 Tags:

The positive:

The negative:

The overview:


This time, we're back to the Orb hitbox being a logical 49×49 pixels in SinGyoku's center, and the shot hitbox being the weird one. What happens if you want the shot hitbox to be both offset to the left a bit and stretch the entire width of SinGyoku's sprite? You get a hitbox that ends in mid-air, far away from the right edge of the sprite:

Due to VRAM byte alignment, all player shots fired between gx = 376 and gx = 383 inclusive appear at the same visual X position, but are internally already partly outside the hitbox and therefore won't hit SinGyoku – compare the marked shot at gx = 376 to the one at gx = 380. So much for precisely visualizing hitboxes in this game…

Since the female and male forms also use the sphere entity's coordinates, they share the same hitbox.


Onto the rendering glitches then, which can – you guessed it – all be found in the sphere form's slam movement:

By having the sphere move from the right edge of the playfield to the left, this video demonstrates both the lazy reblitting and broken unblitting at the right edge for negative X velocities. Also, isn't it funny how Reimu can partly disappear from all the sloppy SinGyoku-related unblitting going on after her sprite was blitted?

Due to the low contrast of the sphere against the background, you typically don't notice these glitches, but the white invincibility flashing after a hit really does draw attention to them. This time, all of these glitches aren't even directly caused by ZUN having never learned about the EGC's bit length register – if he just wrote correct code for SinGyoku, none of this would have been an issue. Sigh… I wonder how many more glitches will be caused by improper use of this one function in the last 18% of REIIDEN.EXE.

There's even another bug here, with ZUN hardcoding a horizontal delta of 8 pixels rather than just passing the actual X velocity. Luckily, the maximum movement speed is 6 pixels on Lunatic, and this would have only turned into an additional observable glitch if the X velocity were to exceed 24 pixels. But that just means it's the kind of bug that still drains RE attention to prove that you can't actually observe it in-game under some circumstances.


The 5 pellet patterns are all pretty straightforward, with nothing to talk about. The code architecture during phase 2 does hint towards ZUN having had more creative patterns in mind – especially for the male form, which uses the transformation function's three pattern callback slots for three repetitions of the same pellet group.
There is one more oddity to be found at the very end of the fight:

The first frame of TH01 SinGyoku's defeat animation, showing the sphere blitted on top of a potentially active person form

Right before the defeat white-out animation, the sphere form is explicitly reblitted for no reason, on top of the form that was blitted to VRAM in the previous frame, and regardless of which form is currently active. If SinGyoku was meant to immediately transform back to the sphere form before being defeated, why isn't the person form unblitted before then? Therefore, the visibility of both forms is undeniably canon, and there is some lore meaning to be found here… :thonk:
In any case, that's SinGyoku done! 6th PC-98 Touhou boss fully decompiled, 25 remaining.


No FUUIN.EXE code rounding out the last push for a change, as the 📝 remaining missile code has been waiting in front of SinGyoku for a while. It already looked bad in November, but the angle-based sprite selection function definitely takes the cake when it comes to unnecessary and decadent floating-point abuse in this game.
The algorithm itself is very trivial: Even with 📝 .PTN requiring an additional quarter parameter to access 16×16 sprites, it's essentially just one bit shift, one addition, and one binary AND. For whatever reason though, ZUN casts the 8-bit missile angle into a 64-bit double, which turns the following explicit comparisons (!) against all possible 4 + 16 boundary angles (!!) into FPU operations. :zunpet: Even with naive and readable division and modulo operations, and the whole existence of this function not playing well with Turbo C++ 4.0J's terrible code generation at all, this could have been 3 lines of code and 35 un-inlined constant-time instructions. Instead, we've got this 207-instruction monster… but hey, at least it works. 🤷
The remaining time then went to YuugenMagan's initialization code, which allowed me to immediately remove more declarations from ASM land, but more on that once we get to the rest of that boss fight.

That leaves 76 functions until we're done with TH01! Next up: Card-flipping stage obstacles.

📝 Posted:
🚚 Summary of:
P0184, P0185
Commits:
f9d983e...f918298, f918298...a21ab3d
💰 Funded by:
-Tom-, Blue Bolt, [Anonymous]
🏷 Tags:

Two years after 📝 the first look at TH04's and TH05's bullets, we finally get to finish their logic code by looking at the special motion types. Bullets as a whole still aren't completely finished as the rendering code is still waiting to be RE'd, but now we've got everything about them that's required for decompiling the midboss and boss fights of these games.

Just like the motion types of TH01's pellets, the ones we've got here really are special enough to warrant an enum, despite all the overlap in the "slow down and turn" and "bounce at certain edges of the playfield" types. Sure, including them in the bitfield I proposed two years ago would have allowed greater variety, but it wouldn't have saved any memory. On the contrary: These types use a single global state variable for the maximum turn count and delta speed, which a proper customizable architecture would have to integrate into the bullet structure. Maybe it is possible to stuff everything into the same amount of bytes, but not without first completely rearchitecting the bullet structure and removing every single piece of redundancy in there. Simply extending the system by adding a new enum value for a new motion type would be way more straightforward for modders.

Speaking about memory, TH05 already extends the bullet structure by 6 bytes for the "exact linear movement" type exclusive to that game. This type is particularly interesting for all the prospective PC-98 game developers out there, as it nicely points out the precision limits of Q12.4 subpixels.
Regular bullet movement works by adding a Q12.4 velocity to a Q12.4 position every frame, with the velocity typically being calculated only once on spawn time from an 8-bit angle and a Q12.4 speed. Quantization errors from this initial calculation can quickly compound over all the frames a bullet spends moving across the playfield. If a bullet is only supposed to move on a straight line though, there is a more precise way of calculating its position: By storing the origin point, movement angle, and total distance traveled, you can perform a full polar→Cartesian transformation every frame. Out of the 10 danmaku patterns in TH05 that use this motion type, the difference to regular bullet movement can be best seen in Louise's final pattern:

Louise's final pattern in its original form, demonstrating exact linear bullet movement. Note how each bullet spawns slightly behind the delay cloud: ZUN simply forgot to shift the fixed origin point along with it.
The same pattern with standard bullet movement, corrupting its intended appearance. No delay cloud-related oversights here though, at least.

Not far away from the regular bullet code, we've also got the movement function for the infamous curve / "cheeto" bullets. I would have almost called them "cheetos" in the code as well, which surely fits more nicely into 8.3 filenames than "curve bullets" does, but eh, trademarks…

As for hitboxes, we got a 16×16 one on the head node, and a 12×12 one on the 16 trail nodes. The latter simply store the position of the head node during the last 16 frames, Snake style. But what you're all here for is probably the turning and homing algorithm, right? Boiled down to its essence, it works like this:

// [head] points to the controlled "head" part of a curve bullet entity.
// Angles are stored with 8 bits representing a full circle, providing free
// normalization on arithmetic overflow.
// The directions are ordered as you would expect:
// • 0x00: right	(sin(0x00) =  0, cos(0x00) = +1)
// • 0x40: down 	(sin(0x40) = +1, cos(0x40) =  0)
// • 0x80: left 	(sin(0x80) =  0, cos(0x80) = -1)
// • 0xC0: up   	(sin(0xC0) = -1, cos(0xC0) =  0)
uint8_t angle_delta = (head->angle - player_angle_from(
	head->pos.cur.x, head->pos.cur.y
));

// Stop turning if the player is 1/128ths of a circle away from this bullet
const uint8_t SNAP = 0x02;

// Else, turn either clockwise or counterclockwise by 1/256th of a circle,
// depending on what would reach the player the fastest.
if((angle_delta > SNAP) && (angle_delta < static_cast<uint8_t>(-SNAP))) {
	angle_delta = (angle_delta >= 0x80) ? -0x01 : +0x01;
}
head_p->angle -= angle_delta;

5 lines of code, and not all too difficult to follow once you are familiar with 8-bit angles… unlike what ZUN actually wrote. Which is 26 lines, and includes an unused "friction" variable that is never set to any value that makes a difference in the formula. :zunpet: uth05win correctly saw through that all and simplified this code to something equivalent to my explanation. Redoing that work certainly wasted a bit of my time, and means that I now definitely need to spend another push on RE'ing all the shared boss functions before I can start with Shinki.

So while a curve bullet's speed does get faster over time, its angular velocity is always limited to 1/256th of a circle per frame. This reveals the optimal strategy for dodging them: Maximize this delta angle by staying as close to 180° away from their current direction as possible, and let their acceleration do the rest.

At least that's the theory for dodging a single one. As a danmaku designer, you can now of course place other bullets at these technically optimal places to prevent a curve bullet pattern from being cheesed like that. I certainly didn't record the video above in a single take either… :tannedcirno:


After another bunch of boring entity spawn and update functions, the playfield shaking feature turned out as the most notable (and tricky) one to round out these two pushes. It's actually implemented quite well in how it simply "un-shakes" the screen by just marking every stage tile to be redrawn. In the context of all the other tile invalidation that can take place during a frame, that's definitely more performant than 📝 doing another EGC-accelerated memmove(). Due to these two games being double-buffered via page flipping, this invalidation only really needs to happen for the frame after the next one though. The immediately next frame will show the regular, un-shaken playfield on the other VRAM page first, except during the multi-frame shake animation when defeating a midboss, where it will also appear shifted in a different direction… 😵 Yeah, no wonder why ZUN just always invalidates all stage tiles for the next two frames after every shaking animation, which is guaranteed to handle both sporadic single-frame shakes and continuous ones. So close to good-code here.

Finally, this delivery was delayed a bit because -Tom- requested his round-up amount to be limited to the cap in the future. Since that makes it kind of hard to explain on a static page how much money he will exactly provide, I now properly modeled these discounts in the website code. The exact round-up amount is now included in both the pre-purchase breakdown, as well as the cap bar on the main page.
With that in place, the system is now also set up for round-up offers from other patrons. If you'd also like to support certain goals in this way, with any amount of money, now's the time for getting in touch with me about that. Known contributors only, though! 😛

Next up: The final bunch of shared boring boss functions. Which certainly will give me a break from all the maintenance and research work, and speed up delivery progress again… right?

📝 Posted:
🚚 Summary of:
P0165, P0166, P0167
Commits:
7a0e5d8...f2bca01, f2bca01...e697907, e697907...c2de6ab
💰 Funded by:
Ember2528
🏷 Tags:

OK, TH01 missile bullets. Can we maybe have a well-behaved entity type, without any weirdness? Just once?

Ehh, kinda. Apart from another 150 bytes wasted on unused structure members, this code is indeed more on the low end in terms of overall jank. It does become very obvious why dodging these missiles in the YuugenMagan, Mima, and Elis fights feels so awful though: An unfair 46×46 pixel hitbox around Reimu's center pixel, combined with the comeback of 📝 interlaced rendering, this time in every stage. ZUN probably did this because missiles are the only 16×16 sprite in TH01 that is blitted to unaligned X positions, which effectively ends up touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would have been totally possible to render every missile in every frame at roughly the same amount of CPU time that the original game uses for interlaced rendering:

That's an optimization that would have significantly benefitted the game, in contrast to all of the fake ones introduced in later games. Then again, this optimization is actually something that the later games do, and it might have in fact been necessary to achieve their higher bullet counts without significant slowdown.

Unfortunately, it was only worth decompiling half of the missile code right now, thanks to gratuitous FPU usage in the other half, where 📝 double variables are compared to float literals. That one will have to wait 📝 until after SinGyoku.


After some effectively unused Mima sprite effect code that is so broken that it's impossible to make sense out of it, we get to the final feature I wanted to cover for all bosses in parallel before returning to Sariel: The separate sprite background storage for moving or animated boss sprites in the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from main memory on every frame. After all, TH01 and TH02 had a minimum required clock speed of 33 MHz, half of the speed required for the later three games. So, he simply blitted these boss sprites to both VRAM pages, leading the usual unblitting calls to only remove the other sprites on top of the boss. However, these bosses themselves want to move across the screen… and this makes it necessary to save the stage background behind them in some other way.

Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM into a sprite slot. No problem with that approach in theory, as the size of all these bigger sprites is a multiple of 32×32; splitting a larger sprite into these smaller 32×32 chunks makes the code look just a little bit clumsy (and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is blitted on top of her regular sprite, using just regular sprite transparency, she ends up with her infamous third arm:

TH01 Mima's third arm

Ironically, there's an unused code path in Mima's unblit function where ZUN assumes a height of 48 pixels for Mima's animation sprites rather than the actual 64. This leads to even clumsier .PTN function calls for the bottom 128×16 pixels… Failing to unblit the bottom 16 pixels would have also yielded that third arm, although it wouldn't have looked as natural. Still wouldn't say that it was intentional; maybe this casting sprite was just added pretty late in the game's development?


So, mission accomplished, Sariel unblocked… at 2¼ pushes. :thonk: That's quite some time left for some smaller stage initialization code, which bundles a bunch of random function calls in places where they logically really don't belong. The stage opening animation then adds a bunch of VRAM inter-page copies that are not only redundant but can't even be understood without knowing the hidden internal state of the last VRAM page accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any complexity limit on inlining arithmetic expressions, as long as they only operate on compile-time constants. That's how we get macro-free, compile-time Shift-JIS to JIS X 0208 conversion of the individual code points in the 東方★靈異伝 string, in a compiler from 1994. As long as you don't store any intermediate results in variables, that is… :tannedcirno:

But wait, there's more! With still ¼ of a push left, I also went for the boss defeat animation, which includes the route selection after the SinGyoku fight.
As in all other instances, the 2× scaled font is accomplished by first rendering the text at regular 1× resolution to the other, invisible VRAM page, and then scaled from there to the visible one. However, the route selection is unique in that its scaled text is both drawn transparently on top of the stage background (not onto a black one), and can also change colors depending on the selection. It would have been no problem to unblit and reblit the text by rendering the 1× version to a position on the invisible VRAM page that isn't covered by the 2× version on the visible one, but ZUN (needlessly) clears the invisible page before rendering any text. :zunpet: Instead, he assigned a separate VRAM color for both the 魔界 and 地獄 options, and only changed the palette value for these colors to white or gray, depending on the correct selection. This is another one of the 📝 rare cases where TH01 demonstrates good use of PC-98 hardware, as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.

Then, why does this still not count as good-code? When changing palette colors, you kinda need to be aware of everything else that can possibly be on screen, which colors are used there, and which aren't and can therefore be used for such an effect without affecting other sprites. In this case, well… hover over the image below, and notice how Reimu's hair and the bomb sprites in the HUD light up when Makai is selected:

Demonstration of palette changes in TH01's route selection

This push did end on a high note though, with the generic, non-SinGyoku version of the defeat animation being an easily parametrizable copy. And that's how you decompile another 2.58% of TH01 in just slightly over three pushes.


Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and SinGyoku without needing any more detours into non-boss code. Thanks to the current TH01 funding subscriptions, I can plan to cover most, if not all, of Sariel in a single push series, but the currently 3 pending pushes probably won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got quite a lot of not specifically TH01-related funds in the backlog to pass the time though.

Due to recent developments, it actually makes quite a lot of sense to take a break from TH01: spaztron64 has managed what every Touhou download site so far has failed to do: Bundling all 5 game onto a single .HDI together with pre-configured PC-98 emulators and a nice boot menu, and hosting the resulting package on a proper website. While this first release is already quite good (and much better than my attempt from 2014), there is still a bit of room for improvement to be gained from specific ReC98 research. Next up, therefore:

📝 Posted:
🚚 Summary of:
P0160, P0161
Commits:
e491cd7...42ba4a5, 42ba4a5...81dd96e
💰 Funded by:
Yanga, [Anonymous]
🏷 Tags:

Nothing really noteworthy in TH01's stage timer code, just yet another HUD element that is needlessly drawn into VRAM. Sure, ZUN applies his custom boldfacing effect on top of the glyphs retrieved from font ROM, but he could have easily installed those modified glyphs as gaiji.
Well, OK, halfwidth gaiji aren't exactly well documented, and sometimes not even correctly emulated 📝 due to the same PC-98 hardware oddity I was researching last month. I've reserved two of the pending anonymous "anything" pushes for the conclusion of this research, just in case you were wondering why the outstanding workload is now lower after the two delivered here.

And since it doesn't seem to be clearly documented elsewhere: Every 2 ticks on the stage timer correspond to 4 frames.


So, TH01 rank pellet speed. The resident pellet speed value is a factor ranging from a minimum of -0.375 up to a maximum of 0.5 (pixels per frame), multiplied with the difficulty-adjusted base speed for each pellet and added on top of that same speed. This multiplier is modified

Apparently, ZUN noted that these deltas couldn't be losslessly stored in an IEEE 754 floating-point variable, and therefore didn't store the pellet speed factor exactly in a way that would correspond to its gameplay effect. Instead, it's stored similar to Q12.4 subpixels: as a simple integer, pre-multiplied by 40. This results in a raw range of -15 to 20, which is what the undecompiled ASM calls still use. When spawning a new pellet, its base speed is first multiplied by that factor, and then divided by 40 again. This is actually quite smart: The calculation doesn't need to be aware of either Q12.4 or the 40× format, as ((Q12.4 * factor×40) / factor×40) still comes out as a Q12.4 subpixel even if all numbers are integers. The only limiting issue here would be the potential overflow of the 16-bit multiplication at unadjusted base speeds of more than 50 pixels per frame, but that'd be seriously unplayable.
So yeah, pellet speed modifications are indeed gradual, and don't just fall into the coarse three "high, normal, and low" categories.


That's ⅝ of P0160 done, and the continue and pause menus would make good candidates to fill up the remaining ⅜… except that it seemed impossible to figure out the correct compiler options for this code?
The issues centered around the two effects of Turbo C++ 4.0J's -O switch:

  1. Optimizing jump instructions: merging duplicate successive jumps into a single one, and merging duplicated instructions at the end of conditional branches into a single place under a single branch, which the other branches then jump to
  2. Compressing ADD SP and POP CX stack-clearing instructions after multiple successive CALLs to __cdecl functions into a single ADD SP with the combined parameter stack size of all function calls

But how can the ASM for these functions exhibit #1 but not #2? How can it be seemingly optimized and unoptimized at the same time? The only option that gets somewhat close would be -O- -y, which emits line number information into the .OBJ files for debugging. This combination provides its own kind of #1, but these functions clearly need the real deal.

The research into this issue ended up consuming a full push on its own. In the end, this solution turned out to be completely unrelated to compiler options, and instead came from the effects of a compiler bug in a totally different place. Initializing a local structure instance or array like

const uint4_t flash_colors[3] = { 3, 4, 5 };

always emits the { 3, 4, 5 } array into the program's data segment, and then generates a call to the internal SCOPY@ function which copies this data array to the local variable on the stack. And as soon as this SCOPY@ call is emitted, the -O optimization #1 is disabled for the entire rest of the translation unit?!
So, any code segment with an SCOPY@ call followed by __cdecl functions must strictly be decompiled from top to bottom, mirroring the original layout of translation units. That means no TH01 continue and pause menus before we haven't decompiled the bomb animation, which contains such an SCOPY@ call. 😕
Luckily, TH01 is the only game where this bug leads to significant restrictions in decompilation order, as later games predominantly use the pascal calling convention, in which each function itself clears its stack as part of its RET instruction.


What now, then? With 51% of REIIDEN.EXE decompiled, we're slowly running out of small features that can be decompiled within ⅜ of a push. Good that I haven't been looking a lot into OP.EXE and FUUIN.EXE, which pretty much only got easy pieces of code left to do. Maybe I'll end up finishing their decompilations entirely within these smaller gaps?
I still ended up finding one more small piece in REIIDEN.EXE though: The particle system, seen in the Mima fight.

I like how everything about this animation is contained within a single function that is called once per frame, but ZUN could have really consolidated the spawning code for new particles a bit. In Mima's fight, particles are only spawned from the top and right edges of the screen, but the function in fact contains unused code for all other 7 possible directions, written in quite a bloated manner. This wouldn't feel quite as unused if ZUN had used an angle parameter instead… :thonk: Also, why unnecessarily waste another 40 bytes of the BSS segment?

But wait, what's going on with the very first spawned particle that just stops near the bottom edge of the screen in the video above? Well, even in such a simple and self-contained function, ZUN managed to include an off-by-one error. This one then results in an out-of-bounds array access on the 80th frame, where the code attempts to spawn a 41st particle. If the first particle was unlucky to be both slow enough and spawned away far enough from the bottom and right edges, the spawning code will then kill it off before its unblitting code gets to run, leaving its pixel on the screen until something else overlaps it and causes it to be unblitted.
Which, during regular gameplay, will quickly happen with the Orb, all the pellets flying around, and your own player movement. Also, the RNG can easily spawn this particle at a position and velocity that causes it to leave the screen more quickly. Kind of impressive how ZUN laid out the structure of arrays in a way that ensured practically no effect of this bug on the game; this glitch could have easily happened every 80 frames instead. He almost got close to all bugs canceling out each other here! :godzun:

Next up: The player control functions, including the second-biggest function in all of PC-98 Touhou.

📝 Posted:
🚚 Summary of:
P0149, P0150, P0151, P0152
Commits:
e1a26bb...05e4c4a, 05e4c4a...768251d, 768251d...4d24ca5, 4d24ca5...81fc861
💰 Funded by:
Blue Bolt, Ember2528, -Tom-, [Anonymous]
🏷 Tags:

…or maybe not that soon, as it would have only wasted time to untangle the bullet update commits from the rest of the progress. So, here's all the bullet spawning code in TH04 and TH05 instead. I hope you're ready for this, there's a lot to talk about!

(For the sake of readability, "bullets" in this blog post refers to the white 8×8 pellets and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)


But first, what was going on 📝 in 2020? Spent 4 pushes on the basic types and constants back then, still ended up confusing a couple of things, and even getting some wrong. Like how TH05's "bullet slowdown" flag actually always prevents slowdown and fires bullets at a constant speed instead. :tannedcirno: Or how "random spread" is not the best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen, which deserve different names:

Mechanic #1: Clearing bullets for a custom amount of time, awarding 1000 points for all bullets alive on the first frame, and 100 points for all bullets spawned during the clear time.
Mechanic #2: Zapping bullets for a fixed 16 frames, awarding a semi-exponential and loudly announced Bonus!! for all bullets alive on the first frame, and preventing new bullets from being spawned during those 16 frames. In TH04 at least; thanks to a ZUN bug, zapping got reduced to 1 frame and no animation in TH05…

Bullets are zapped at the end of most midboss and boss phases, and cleared everywhere else – most notably, during bombs, when losing a life, or as rewards for extends or a maximized Dream bonus. The Bonus!! points awarded for zapping bullets are calculated iteratively, so it's not trivial to give an exact formula for these. For a small number 𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛 points – or, using uth05win's (correct) recursive definition, Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10. However, one of the internal step variables is capped at a different number of points for each difficulty (and game), after which the points only increase linearly. Hence, "semi-exponential".


On to TH04's bullet spawn code then, because that one can at least be decompiled. And immediately, we have to deal with a pointless distinction between regular bullets, with either a decelerating or constant velocity, and special bullets, with preset velocity changes during their lifetime. That preset has to be set somewhere, so why have separate functions? In TH04, this separation continues even down to the lowest level of functions, where values are written into the global bullet array. TH05 merges those two functions into one, but then goes too far and uses self-modifying code to save a grand total of two local variables… Luckily, the rest of its actual code is identical to TH04.

Most of the complexity in bullet spawning comes from the (thankfully shared) helper function that calculates the velocities of the individual bullets within a group. Both games handle each group type via a large switch statement, which is where TH04 shows off another Turbo C++ 4.0 optimization: If the range of case values is too sparse to be meaningfully expressed in a jump table, it usually generates a linear search through a second value table. But with the -G command-line option, it instead generates branching code for a binary search through the set of cases. 𝑂(log 𝑛) as the worst case for a switch statement in a C++ compiler from 1994… that's so cool. But still, why are the values in TH04's group type enum all over the place to begin with? :onricdennat:
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only shows up here and in a few places in TH02, compared to at least 50 switch value tables.

In all of its micro-optimized pointlessness, TH05's undecompilable version at least fixes some of TH04's redundancy. While it's still not even optimal, it's at least a decently written piece of ASM… if you take the time to understand what's going on there, because it certainly took quite a bit of that to verify that all of the things which looked like bugs or quirks were in fact correct. And that's how the code for this function ended up with 35% comments and blank lines before I could confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04, where an invalid bullet group type would fill all remaining slots in the bullet array with identical versions of the first bullet.

Something that both games also share in these functions is an over-reliance on globals for return values or other local state. The most ridiculous example here: Tuning the speed of a bullet based on rank actually mutates the global bullet template… which ZUN then works around by adding a wrapper function around both regular and special bullet spawning, which saves the base speed before executing that function, and restores it afterward. :zunpet: Add another set of wrappers to bypass that exact tuning, and you've expanded your nice 1-function interface to 4 functions. Oh, and did I mention that TH04 pointlessly duplicates the first set of wrapper functions for 3 of the 4 difficulties, which can't even be explained with "debugging reasons"? That's 10 functions then… and probably explains why I've procrastinated this feature for so long.

At this point, I also finally stopped decompiling ZUN's original ASM just for the sake of it. All these small TH05 functions would look horribly unidiomatic, are identical to their decompiled TH04 counterparts anyway, except for some unique constant… and, in the case of TH05's rank-based speed tuning function, actually become undecompilable as soon as we want to return a C++ class to preserve the semantic meaning of the return value. Mainly, this is because Turbo C++ does not allow register pseudo-variables like _AX or _AL to be cast into class types, even if their size matches. Decompiling that function would have therefore lowered the quality of the rest of the decompiled code, in exchange for the additional maintenance and compile-time cost of another translation unit. Not worth it – and for a TH05 port, you'd already have to decompile all the rest of the bullet spawning code anyway!


The only thing in there that was still somewhat worth being decompiled was the pre-spawn clipping and collision detection function. Due to what's probably a micro-optimization mistake, the TH05 version continues to spawn a bullet even if it was spawned on top of the player. This might sound like it has a different effect on gameplay… until you realize that the player got hit in this case and will either lose a life or deathbomb, both of which will cause all on-screen bullets to be cleared anyway. So it's at most a visual glitch.

But while we're at it, can we please stop talking about hitboxes? At least in the context of TH04 and TH05 bullets. The actual collision detection is described way better as a kill delta of 8×8 pixels between the center points of the player and a bullet. You can distribute these pixels to any combination of bullet and player "hitboxes" that make up 8×8. 4×4 around both the player and bullets? 1×1 for bullets, and 8×8 for the player? All equally valid… or perhaps none of them, once you keep in mind that other entity types might have different kill deltas. With that in mind, the concept of a "hitbox" turns into just a confusing abstraction.

The same is true for the 36×44 graze box delta. For some reason, this one is not exactly around the center of a bullet, but shifted to the right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the player, but only up to 16 pixels left of the player. uth05win also spotted this… and rotated the deltas clockwise by 90°?!


Which brings us to the bullet updates… for which I still had to research a decompilation workaround, because 📝 P0148 turned out to not help at all? Instead, the solution was to lie to the compiler about the true segment distance of the popup function and declare its signature far rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function call/return per frame, and those precious 2 bytes in the BSS segment that he didn't have to spend on a segment value. 📝 Another function that didn't have just a single declaration in a common header file… really, 📝 how were these games even built???

The function itself is among the longer ones in both games. It especially stands out in the indentation department, with 7 levels at its most indented point – and that's the minimum of what's possible without goto. Only two more notable discoveries there:

  1. Bullets are the only entity affected by Slow Mode. If the number of bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04, or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame rate by 33%, by waiting for one additional VSync event every two frames.
    The code also reveals a second tier, with 50% slowdown for a slightly higher number of bullets, but that conditional branch can never be executed :zunpet:
  2. Bullets must have been grazed in a previous frame before they can be collided with. (Note how this does not apply to bullets that spawned on top of the player, as explained earlier!)

Whew… When did ReC98 turn into a full-on code review?! 😅 And after all this, we're still not done with TH04 and TH05 bullets, with all the special movement types still missing. That should be less than one push though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun rewriting the Touhou Wiki Gameplay pages 😛

📝 Posted:
🚚 Summary of:
P0109
Commits:
dcf4e2c...2c7d86b
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:

Back to TH05! Thanks to the good funding situation, I can strike a nice balance between getting TH05 position-independent as quickly as possible, and properly reverse-engineering some missing important parts of the game. Once 100% PI will get the attention of modders, the code will then be in better shape, and a bit more usable than if I just rushed that goal.

By now, I'm apparently also pretty spoiled by TH01's immediate decompilability, after having worked on that game for so long. Reverse-engineering in ASM land is pretty annoying, after all, since it basically boils down to meticulously editing a piece of ASM into something I can confidently call "reverse-engineered". Most of the time, simply decompiling that piece of code would take just a little bit longer, but be massively more useful. So, I immediately tried decompiling with TH05… and it just worked, at every place I tried!? Whatever the issue was that made 📝 segment splitting so annoying at my first attempt, I seem to have completely solved it in the meantime. 🤷 So yeah, backers can now request pretty much any part of TH04 and TH05 to be decompiled immediately, with no additional segment splitting cost.

(Protip for everyone interested in starting their own ReC project: Just declare one segment per function, right from the start, then group them together to restore the original code segmentation…)


Except that TH05 then just throws more of its infamous micro-optimized and undecompilable ASM at you. 🙄 This push covered the function that adjusts the bullet group template based on rank and the selected difficulty, called every time such a group is configured. Which, just like pretty much all of TH05's bullet spawning code, is one of those undecompilable functions. If C allowed labels of other functions as goto targets, it might have been decompilable into something useful to modders… maybe. But like this, there's no point in even trying.

This is such a terrible idea from a software architecture point of view, I can't even. Because now, you suddenly have to mirror your C++ declarations in ASM land, and keep them in sync with each other. I'm always happy when I get to delete an ASM declaration from the codebase once I've decompiled all the instances where it was referenced. But for TH05, we now have to keep those declarations around forever. 😕 And all that for a performance increase you probably couldn't even measure. Oh well, pulling off Galaxy Brain-level ASM optimizations is kind of fun if you don't have portability plans… I guess?

If I started a full fangame mod of a PC-98 Touhou game, I'd base it on TH04 rather than TH05, and backport selected features from TH05 as needed. Just because it was released later doesn't make it better, and this is by far not the only one of ZUN's micro-optimizations that just went way too far.

Dropping down to ASM also makes it easier to introduce weird quirks. Decompiled, one of TH05's tuning conditions for stack groups on Easy Mode would look something like:

case BP_STACK:
	// […]
	if(spread_angle_delta >= 2) {
		stack_bullet_count--;
	}

The fields of the bullet group template aren't typically reset when setting up a new group. So, spread_angle_delta in the context of a stack group effectively refers to "the delta angle of the last spread group that was fired before this stack – whenever that was". uth05win also spotted this quirk, considered it a bug, and wrote fanfiction by changing spread_angle_delta to stack_bullet_count.
As usual for functions that occur in more than one game, I also decompiled the TH04 bullet group tuning function, and it's perfectly sane, with no such quirks.


In the more PI-focused parts of this push, we got the TH05-exclusive smooth boss movement functions, for flying randomly or towards a given point. Pretty unspectacular for the most part, but we've got yet another uth05win inconsistency in the latter one. Once the Y coordinate gets close enough to the target point, it actually speeds up twice as much as the X coordinate would, whereas uth05win used the same speedup factors for both. This might make uth05win a couple of frames slower in all boss fights from Stage 3 on. Hard to measure though – and boss movement partly depends on RNG anyway.


Next up: Shinki's background animations – which are actually the single biggest source of position dependence left in TH05.

📝 Posted:
🚚 Summary of:
P0099, P0100, P0101, P0102
Commits:
1799d67...1b25830, 1b25830...ceb81db, ceb81db...c11a956, c11a956...b60f38d
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

Well, make that three days. Trying to figure out all the details behind the sprite flickering was absolutely dreadful…
It started out easy enough, though. Unsurprisingly, TH01 had a quite limited pellet system compared to TH04 and TH05:

As expected from TH01, the code comes with its fair share of smaller, insignificant ZUN bugs and oversights. As you would also expect though, the sprite flickering points to the biggest and most consequential flaw in all of this.


Apparently, it started with ZUN getting the impression that it's only possible to use the PC-98 EGC for fast blitting of all 4 bitplanes in one CPU instruction if you blit 16 horizontal pixels (= 2 bytes) at a time. Consequently, he only wrote one function for EGC-accelerated sprite unblitting, which can only operate on a "grid" of 16×1 tiles in VRAM. But wait, pellets are not only just 8×8, but can also be placed at any unaligned X position…

… yet the game still insists on using this 16-dot-aligned function to unblit pellets, forcing itself into using a super sloppy 16×8 rectangle for the job. 🤦 ZUN then tried to mitigate the resulting flickering in two hilarious ways that just make it worse:

  1. An… "interlaced rendering" mode? This one's activated for all Stage 15 and 20 fights, and separates pellets into two halves that are rendered on alternating frames. Collision detection with the Yin-Yang Orb and the player is only done for the visible half, but collision detection with player shots is still done for all pellets every frame, as are motion updates – so that pellets don't end up moving half as fast as they should.
    So yeah, your eyes weren't deceiving you. The game does effectively drop its perceived frame rate in the Elis, Kikuri, Sariel, and Konngara fights, and it does so deliberately.
  2. 📝 Just like player shots, pellets are also unblitted, moved, and rendered in a single function. Thanks to the 16×8 rectangle, there's now the (completely unnecessary) possibility of accidentally unblitting parts of a sprite that was previously drawn into the 8 pixels right of a pellet. And this is where ZUN went full :tannedcirno: and went "oh, I know, let's test the entire 16 pixels, and in case we got an entity there, we simply make the pellet invisible for this frame! Then we don't even have to unblit it later!" :zunpet:

    Except that this is only done for the first 3 elements of the player shot array…?! Which don't even necessarily have to contain the 3 shots fired last. It's not done for the player sprite, the Orb, or, heck, other pellets that come earlier in the pellet array. (At least we avoided going 𝑂(𝑛²) there?)

    Actually, and I'm only realizing this now as I type this blog post: This test is done even if the shots at those array elements aren't active. So, pellets tend to be made invisible based on comparisons with garbage data. :onricdennat:

    And then you notice that the player shot unblit​/​move​/​render function is actually only ever called from the pellet unblit​/​move​/​render function on the one global instance of the player shot manager class, after pellets were unblitted. So, we end up with a sequence of

    Pellet unblit → Pellet move → Shot unblit → Shot move → Shot render → Pellet render

    which means that we can't ever unblit a previously rendered shot with a pellet. Sure, as terrible as this one function call is from a software architecture perspective, it was enough to fix this issue. Yet we don't even get the intended positive effect, and walk away with pellets that are made temporarily invisible for no reason at all. So, uh, maybe it all just was an attempt at increasing the ramerate on lower spec PC-98 models?

Yup, that's it, we've found the most stupid piece of code in this game, period. It'll be hard to top this.


I'm confident that it's possible to turn TH01 into a well-written, fluid PC-98 game, with no flickering, and no perceived lag, once it's position-independent. With some more in-depth knowledge and documentation on the EGC (remember, there's still 📝 this one TH03 push waiting to be funded), you might even be able to continue using that piece of blitter hardware. And no, you certainly won't need ASM micro-optimizations – just a bit of knowledge about which optimizations Turbo C++ does on its own, and what you'd have to improve in your own code. It'd be very hard to write worse code than what you find in TH01 itself.

(Godbolt for Turbo C++ 4.0J when? Seriously though, that would 📝 also be a great project for outside contributors!)


Oh well. In contrast to TH04 and TH05, where 4 pushes only covered all the involved data types, they were enough to completely cover all of the pellet code in TH01. Everything's already decompiled, and we never have to look at it again. 😌 And with that, TH01 has also gone from by far the least RE'd to the most RE'd game within ReC98, in just half a year! 🎉
Still, that was enough TH01 game logic for a while. :tannedcirno: Next up: Making up for the delay with some more relaxing and easy pieces of TH01 code, that hopefully make just a bit more sense than all this garbage. More image formats, mainly.

📝 Posted:
🚚 Summary of:
P0078, P0079
Commits:
f4eb7a8...9e52cb1, 9e52cb1...cd48aa3
💰 Funded by:
iruleatgames, -Tom-
🏷 Tags:

To finish this TH05 stretch, we've got a feature that's exclusive to TH05 for once! As the final memory management innovation in PC-98 Touhou, TH05 provides a single static (64 * 26)-byte array for storing up to 64 entities of a custom type, specific to a stage or boss portion. (Edit (2023-05-29): This system actually debuted in 📝 TH04, where it was used for much simpler entities.)

TH05 uses this array for

  1. the Stage 2 star particles,
  2. Alice's puppets,
  3. the tip of curve ("jello") bullets,
  4. Mai's snowballs and Yuki's fireballs,
  5. Yumeko's swords,
  6. and Shinki's 32×32 bullets,

which makes sense, given that only one of those will be active at any given time.

On the surface, they all appear to share the same 26-byte structure, with consistently sized fields, merely using its 5 generic fields for different purposes. Looking closer though, there actually are differences in the signedness of certain fields across the six types. uth05win chose to declare them as entirely separate structures, and given all the semantic differences (pixels vs. subpixels, regular vs. tiny master.lib sprites, …), it made sense to do the same in ReC98. It quickly turned out to be the only solution to meet my own standards of code readability.

Which blew this one up to two pushes once again… But now, modders can trivially resize any of those structures without affecting the other types within the original (64 * 26)-byte boundary, even without full position independence. While you'd still have to reduce the type-specific number of distinct entities if you made any structure larger, you could also have more entities with fewer structure members.

As for the types themselves, they're full of redundancy once again – as you might have already expected from seeing #4, #5, and #6 listed as unrelated to each other. Those could have indeed been merged into a single 32×32 bullet type, supporting all the unique properties of #4 (destructible, with optional revenge bullets), #5 (optional number of twirl animation frames before they begin to move) and #6 (delay clouds). The *_add(), *_update(), and *_render() functions of #5 and #6 could even already be completely reverse-engineered from just applying the structure onto the ASM, with the ones of #3 and #4 only needing one more RE push.

But perhaps the most interesting discovery here is in the curve bullets: TH05 only renders every second one of the 17 nodes in a curve bullet, yet hit-tests every single one of them. In practice, this is an acceptable optimization though – you only start to notice jagged edges and gaps between the fragments once their speed exceeds roughly 11 pixels per second:

And that brings us to the last 20% of TH05 position independence! But first, we'll have more cheap and fast TH01 progress.

📝 Posted:
🚚 Summary of:
P0072, P0073, P0074, P0075
Commits:
4bb04ab...cea3ea6, cea3ea6...5286417, 5286417...1807906, 1807906...222fc99
💰 Funded by:
[Anonymous], -Tom-, Myles
🏷 Tags:

Long time no see! And this is exactly why I've been procrastinating bullets while there was still meaningful progress to be had in other parts of TH04 and TH05: There was bound to be quite some complexity in this most central piece of game logic, and so I couldn't possibly get to a satisfying understanding in just one push.

Or in two, because their rendering involves another bunch of micro-optimized functions adapted from master.lib.

Or in three, because we'd like to actually name all the bullet sprites, since there are a number of sprite ID-related conditional branches. And so, I was refining things I supposedly RE'd in the the commits from the first push until the very end of the fourth.

When we talk about "bullets" in TH04 and TH05, we mean just two things: the white 8×8 pellets, with a cap of 240 in TH04 and 180 in TH05, and any 16×16 sprites from MIKO16.BFT, with a cap of 200 in TH04 and 220 in TH05. These are by far the most common types of… err, "things the player can collide with", and so ZUN provides a whole bunch of pre-made motion, animation, and n-way spread / ring / stack group options for those, which can be selected by simply setting a few fields in the bullet template. All the other "non-bullets" have to be fired and controlled individually.

Which is nothing new, since uth05win covered this part pretty accurately – I don't think anyone could just make up these structure member overloads. The interesting insights here all come from applying this research to TH04, and figuring out its differences compared to TH05. The most notable one there is in the default groups: TH05 allows you to add a stack to any single bullet, n-way spread or ring, but TH04 only lets you create stacks separately from n-way spreads and rings, and thus gets by with fewer fields in its bullet template structure. On the other hand, TH04 has a separate "n-way spread with random angles, yet still aimed at the player" group? Which seems to be unused, at least as far as midbosses and bosses are concerned; can't say anything about stage enemies yet.

In fact, TH05's larger bullet template structure illustrates that these distinct group types actually are a rather redundant piece of over-engineering. You can perfectly indicate any permutation of the basic groups through just the stack bullet count (1 = no stack), spread bullet count (1 = no spread), and spread delta angle (0 = ring instead of spread). Add a 4-flag bitfield to cover the rest (aim to player, randomize angle, randomize speed, force single bullet regardless of difficulty or rank), and the result would be less redundant and even slightly more capable.

Even those 4 pushes didn't quite finish all of the bullet-related types, stopping just shy of the most trivial and consistent enum that defines special movement. This also left us in a 📝 TH03-like situation, in which we're still a bit away from actually converting all this research into actual RE%. Oh well, at least this got us way past 50% in overall position independence. On to the second half! 🎉

For the next push though, we'll first have a quick detour to the remaining C code of all the ZUN.COM binaries. Now that the 📝 TH04 and TH05 resident structures no longer block those, -Tom- has requested TH05's RES_KSO.COM to be covered in one of his outstanding pushes. And since 32th System recently RE'd TH03's resident structure, it makes sense to also review and merge that, before decompiling all three remaining RES_*.COM binaries in hopefully a single push. It might even get done faster than that, in which case I'll then review and merge some more of WindowsTiger's research.