⮜ Blog

⮜ List of tags

Showing all posts tagged animation-

📝 Posted:
🚚 Summary of:
P0201, P0202
Commits:
9342665...ff49e9e, ff49e9e...4568bf7
💰 Funded by:
Ember2528, Yanga, [Anonymous]
🏷 Tags:
rec98+ th01+ gameplay+ boss+ singyoku+ blitting+ glitch+ animation- bullet+

The positive:

The negative:

The overview:


This time, we're back to the Orb hitbox being a logical 49×49 pixels in SinGyoku's center, and the shot hitbox being the weird one. What happens if you want the shot hitbox to be both offset to the left a bit and stretch the entire width of SinGyoku's sprite? You get a hitbox that ends in mid-air, far away from the right edge of the sprite:

Due to VRAM byte alignment, all player shots fired between gx = 376 and gx = 383 inclusive appear at the same visual X position, but are internally already partly outside the hitbox and therefore won't hit SinGyoku – compare gx = 376 to gx = 380. So much for precisely visualizing hitboxes in this game…

Since the female and male forms also use the sphere entity's coordinates, they share the same hitbox.


Onto the rendering glitches then, which can – you guessed it – all be found in the sphere form's slam movement:

By having the sphere move from the right edge of the playfield to the left, this video demonstrates both the lazy reblitting and broken unblitting at the right edge for negative X velocities. Also, isn't it funny how Reimu can partly disappear from all the sloppy SinGyoku-related unblitting going on after her sprite was blitted?

Due to the low contrast of the sphere against the background, you typically don't notice these glitches, but the white invincibility flashing after a hit really does draw attention to them. This time, all of these glitches aren't even directly caused by ZUN having never learned about the EGC's bit length register – if he just wrote correct code for SinGyoku, none of this would have been an issue. Sigh… I wonder how many more glitches will be caused by improper use of this one function in the last 18% of REIIDEN.EXE.

There's even another bug here, with ZUN hardcoding a horizontal delta of 8 pixels rather than just passing the actual X velocity. Luckily, the maximum movement speed is 6 pixels on Lunatic, and this would have only turned into an additional observable glitch if the X velocity were to exceed 24 pixels. But that just means it's the kind of bug that still drains RE attention to prove that you can't actually observe it in-game under some circumstances.


The 5 pellet patterns are all pretty straightforward, with nothing to talk about. The code architecture during phase 2 does hint towards ZUN having had more creative patterns in mind – especially for the male form, which uses the transformation function's three pattern callback slots for three repetitions of the same pellet group.
There is one more oddity to be found at the very end of the fight:

The first frame of TH01 SinGyoku's defeat animation, showing the sphere blitted on top of a potentially active person form

Right before the defeat white-out animation, the sphere form is explicitly reblitted for no reason, on top of the form that was blitted to VRAM in the previous frame, and regardless of which form is currently active. If SinGyoku was meant to immediately transform back to the sphere form before being defeated, why isn't the person form unblitted before then? Therefore, the visibility of both forms is undeniably canon, and there is some lore meaning to be found here… :thonk:
In any case, that's SinGyoku done! 6th PC-98 Touhou boss fully decompiled, 25 remaining.


No FUUIN.EXE code rounding out the last push for a change, as the 📝 remaining missile code has been waiting in front of SinGyoku for a while. It already looked bad in November, but the angle-based sprite selection function definitely takes the cake when it comes to unnecessary and decadent floating-point abuse in this game.
The algorithm itself is very trivial: Even with 📝 .PTN requiring an additional quarter parameter to access 16×16 sprites, it's essentially just one bit shift, one addition, and one binary AND. For whatever reason though, ZUN casts the 8-bit missile angle into a 64-bit double, which turns the following explicit comparisons (!) against all possible 4 + 16 boundary angles (!!) into FPU operations. :zunpet: Even with naive and readable division and modulo operations, and the whole existence of this function not playing well with Turbo C++ 4.0J's terrible code generation at all, this could have been 3 lines of code and 35 un-inlined constant-time instructions. Instead, we've got this 207-instruction monster… but hey, at least it works. 🤷
The remaining time then went to YuugenMagan's initialization code, which allowed me to immediately remove more declarations from ASM land, but more on that once we get to the rest of that boss fight.

That leaves 76 functions until we're done with TH01! Next up: Card-flipping stage obstacles.

📝 Posted:
🚚 Summary of:
P0189
Commits:
22abdd1...b4876b6
💰 Funded by:
Arandui, Lmocinemod
🏷 Tags:
rec98+ th04+ th05+ gameplay+ boss+ kurumi+ marisa-4+ bug+ danmaku-pattern+ mod+ sara+ blitting+ animation-

(Before we start: Make sure you've read the current version of the FAQ section on a potential takedown of this project, updated in light of the recent DMCA claims against PC-98 Touhou game downloads.)


Slight change of plans, because we got instructions for reliably reproducing the TH04 Kurumi Divide Error crash! Major thanks to Colin Douglas Howell. With those, it also made sense to immediately look at the crash in the Stage 4 Marisa fight as well. This way, I could release both of the obligatory bugfix mods at the same time.
Especially since it turned out that I was wrong: Both crashes are entirely unrelated to the custom entity structure that would have required PI-centric progress. They are completely specific to Kurumi's and Marisa's danmaku-pattern code, and really are two separate bugs with no connection to each other. All of the necessary research nicely fit into Arandui's 0.5 pushes, with no further deep understanding required here.

But why were there still three weeks between Colin's message and this blog post? DMCA distractions aside: There are no easy fixes this time, unlike 📝 back when I looked at the Stage 5 Yuuka crash. Just like how division by zero is undefined in mathematics, it's also, literally, undefined what should happen instead of these two Divide error crashes. This means that any possible "fix" can only ever be a fanfiction interpretation of the intentions behind ZUN's code. The gameplay community should be aware of this, and might decide to handle these cases differently. And if we have to go into fanfiction territory to work around crashes in the canon games, we'd better document what exactly we're fixing here and how, as comprehensible as possible.


With that out of the way, let's look at Kurumi's crash first, since it's way easier to grasp. This one is known to primarily happen to new players, and it's easy to see why:


The pattern that causes the crash in Kurumi's fight. Also demonstrates how the number of bullets in a ring is always halved on Easy Mode after the rank-based tuning, leading to just a 3-ring on playperf = 16.

So, what should the workaround look like? Obviously, we want to modify neither the default number of ring bullets nor the tuning algorithm – that would change all other non-crashing variations of this pattern on other difficulties and ranks, creating a fork of the original gameplay. Instead, I came up with four possible workarounds that all seemed somewhat logical to me:

  1. Firing no bullet, i.e., interpreting 0-ring literally. This would create the only constellation in which a call to the bullet group spawn functions would not spawn at least one new bullet.
  2. Firing a "1-ring", i.e., a single bullet. This would be consistent with how the bullet spawn functions behave for "0-way" stack and spread groups.
  3. Firing a "∞-ring", i.e., 200 bullets, which is as much as the game's cap on 16×16 bullets would allow. This would poke fun at the whole "division by zero" idea… but given that we're still talking about Easy Mode (and especially new players) here, it might be a tad too cruel. Certainly the most trollish interpretation.
  4. Triggering an immediate Game Over, exchanging the hard crash for a softer and more controlled shutdown. Certainly the option that would be closest to the behavior of the original games, and perhaps the only one to be accepted in Serious, High-Level Play™.

As I was writing this post, it felt increasingly wrong for me to make this decision. So I once again went to Twitter, where 56.3% voted in favor of the 1-bullet option. Good that I asked! I myself was more leaning towards the 0-bullet interpretation, which only got 28.7% of the vote. Also interesting are the 2.3% in favor of the Game Over option but I get it, low-rank Easy Mode isn't exactly the most competitive mode of playing TH04.
There are reports of Kurumi crashing on higher difficulties as well, but I could verify none of them. If they aren't fixed by this workaround, they're caused by an entirely different bug that we have yet to discover.


Onto the Stage 4 Marisa crash then, which does in fact apply to all difficulty levels. I was also wrong on this one – it's a hell of a lot more intricate than being just a division by the number of on-screen bits. Without having decompiled the entire fight, I can't give a completely accurate picture of what happens there yet, but here's the rough idea:

Reference points for Marisa's point-reflected movement. Cyan: Marisa's position, green: (192, 112), yellow: the intended end point.

One of the two patterns in TH04's Stage 4 Marisa boss fight that feature frame number-dependent point-reflected movement. The bits were hacked to self-destruct on .

tl;dr: "Game crashes if last bit destroyed within 4-frame window near end of two patterns". For an informed decision on a new movement behavior for these last 8 frames, we definitely need to know all the details behind the crash though. Here's what I would interpret into the code:

  1. Not moving at all, i.e., interpreting 0 as the middle ground between positive and negative movement. This would also make sense because a 12-frame duration implies 100% of the movement to consist of the braking phase – and Marisa wasn't moving before, after all.
  2. Move at maximum speed, i.e., dividing by 1 rather than 0. Since the movement duration is still 12 in this case, Marisa will immediately start braking. In total, she will move exactly ¾ of the way from her initial position to (192, 112) within the 8 frames before the pattern ends.
  3. Directly warping to (192, 112) on frame 0, and to the point-reflected target on 4, respectively. This "emulates" the division by zero by moving Marisa at infinite speed to the exact two points indicated by the velocity formula. It also fits nicely into the 8 frames we have to fill here. Sure, Marisa can't reach these points at any other duration, but why shouldn't she be able to, with infinite speed? Then again, if Marisa is far away enough from (192, 112), this workaround would warp her across the entire playfield. Can Marisa teleport according to lore? I have no idea… :tannedcirno:
  4. Triggering an immediate Game O– hell no, this is the Stage 4 boss, people already hate losing runs to this bug!

Asking Twitter worked great for the Kurumi workaround, so let's do it again! Gotta attach a screenshot of an earlier draft of this blog post though, since this stuff is impossible to explain in tweets…

…and it went through the roof, becoming the most successful ReC98 tweet so far?! Apparently, y'all really like to just look at descriptions of overly complex bugs that I'd consider way beyond the typical attention span that can be expected from Twitter. Unfortunately, all those tweet impressions didn't quite translate into poll turnout. The results were pretty evenly split between 1) and 2), with option 1) just coming out slightly ahead at 49.1%, compared to 41.5% of option 2).

(And yes, I only noticed after creating the poll that warping to both the green and yellow points made more sense than warping to just one of the two. Let's hope that this additional variant wouldn't have shifted the results too much. Both warp options only got 9.4% of the vote after all, and no one else came up with the idea either. :onricdennat: In the end, you can always merge together your preferred combination of workarounds from the Git branches linked below.)


So here you go: The new definitive version of TH04, containing not only the community-chosen Kurumi and Stage 4 Marisa workaround variant, but also the 📝 No-EMS bugfix from last year. Edit (2022-05-31): This package is outdated, 📝 the current version is here! 2022-04-18-community-choice-fixes.zip Oh, and let's also add spaztron64's TH03 GDC clock fix from 2019 because why not. This binary was built from the community_choice_fixes branch, and you can find the code for all the individual workarounds on these branches:

Again, because it can't be stated often enough: These fixes are fanfiction. The gameplay community should be aware of this, and might decide to handle these cases differently.


With all of that taking way more time to evaluate and document, this research really had to become part of a proper push, instead of just being covered in the quick non-push blog post I initially intended. With ½ of a push left at the end, TH05's Stage 1-5 boss background rendering functions fit in perfectly there. If you wonder how these static backdrop images even need any boss-specific code to begin with, you're right – it's basically the same function copy-pasted 4 times, differing only in the backdrop image coordinates and some other inconsequential details.
Only Sara receives a nice variation of the typical 📝 blocky entrance animation: The usually opaque bitmap data from ST00.BB is instead used as a transition mask from stage tiles to the backdrop image, by making clever use of the tile invalidation system:

TH04 uses the same effect a bit more frequently, for its first three bosses.

Next up: Shinki, for real this time! I've already managed to decompile 10 of her 11 danmaku patterns within a little more than one push – and yes, that one is included in there. Looks like I've slightly overestimated the amount of work required for TH04's and TH05's bosses…

📝 Posted:
🚚 Summary of:
P0165, P0166, P0167
Commits:
7a0e5d8...f2bca01, f2bca01...e697907, e697907...c2de6ab
💰 Funded by:
Ember2528
🏷 Tags:
rec98+ th01+ gameplay+ animation- blitting+ bullet+ glitch+ waste+ boss+ singyoku+ mima-th01+ elis+ tcc+ menu+

OK, TH01 missile bullets. Can we maybe have a well-behaved entity type, without any weirdness? Just once?

Ehh, kinda. Apart from another 150 bytes wasted on unused structure members, this code is indeed more on the low end in terms of overall jank. It does become very obvious why dodging these missiles in the YuugenMagan, Mima, and Elis fights feels so awful though: An unfair 46×46 pixel hitbox around Reimu's center pixel, combined with the comeback of 📝 interlaced rendering, this time in every stage. ZUN probably did this because missiles are the only 16×16 sprite in TH01 that is blitted to unaligned X positions, which effectively ends up touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would have been totally possible to render every missile in every frame at roughly the same amount of CPU time that the original game uses for interlaced rendering:

That's an optimization that would have significantly benefitted the game, in contrast to all of the fake ones introduced in later games. Then again, this optimization is actually something that the later games do, and it might have in fact been necessary to achieve their higher bullet counts without significant slowdown.

Unfortunately, it was only worth decompiling half of the missile code right now, thanks to gratuitous FPU usage in the other half, where 📝 double variables are compared to float literals. That one will have to wait 📝 until after SinGyoku.


After some effectively unused Mima sprite effect code that is so broken that it's impossible to make sense out of it, we get to the final feature I wanted to cover for all bosses in parallel before returning to Sariel: The separate sprite background storage for moving or animated boss sprites in the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from main memory on every frame. After all, TH01 and TH02 had a minimum required clock speed of 33 MHz, half of the speed required for the later three games. So, he simply blitted these boss sprites to both VRAM pages, leading the usual unblitting calls to only remove the other sprites on top of the boss. However, these bosses themselves want to move across the screen… and this makes it necessary to save the stage background behind them in some other way.

Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM into a sprite slot. No problem with that approach in theory, as the size of all these bigger sprites is a multiple of 32×32; splitting a larger sprite into these smaller 32×32 chunks makes the code look just a little bit clumsy (and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is blitted on top of her regular sprite, using just regular sprite transparency, she ends up with her infamous third arm:

TH01 Mima's third arm

Ironically, there's an unused code path in Mima's unblit function where ZUN assumes a height of 48 pixels for Mima's animation sprites rather than the actual 64. This leads to even clumsier .PTN function calls for the bottom 128×16 pixels… Failing to unblit the bottom 16 pixels would have also yielded that third arm, although it wouldn't have looked as natural. Still wouldn't say that it was intentional; maybe this casting sprite was just added pretty late in the game's development?


So, mission accomplished, Sariel unblocked… at 2¼ pushes. :thonk: That's quite some time left for some smaller stage initialization code, which bundles a bunch of random function calls in places where they logically really don't belong. The stage opening animation then adds a bunch of VRAM inter-page copies that are not only redundant but can't even be understood without knowing the hidden internal state of the last VRAM page accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any complexity limit on inlining arithmetic expressions, as long as they only operate on compile-time constants. That's how we get macro-free, compile-time Shift-JIS to JIS X 0208 conversion of the individual code points in the 東方★靈異伝 string, in a compiler from 1994. As long as you don't store any intermediate results in variables, that is… :tannedcirno:

But wait, there's more! With still ¼ of a push left, I also went for the boss defeat animation, which includes the route selection after the SinGyoku fight.
As in all other instances, the 2× scaled font is accomplished by first rendering the text at regular 1× resolution to the other, invisible VRAM page, and then scaled from there to the visible one. However, the route selection is unique in that its scaled text is both drawn transparently on top of the stage background (not onto a black one), and can also change colors depending on the selection. It would have been no problem to unblit and reblit the text by rendering the 1× version to a position on the invisible VRAM page that isn't covered by the 2× version on the visible one, but ZUN (needlessly) clears the invisible page before rendering any text. :zunpet: Instead, he assigned a separate VRAM color for both the 魔界 and 地獄 options, and only changed the palette value for these colors to white or gray, depending on the correct selection. This is another one of the 📝 rare cases where TH01 demonstrates good use of PC-98 hardware, as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.

Then, why does this still not count as good-code? When changing palette colors, you kinda need to be aware of everything else that can possibly be on screen, which colors are used there, and which aren't and can therefore be used for such an effect without affecting other sprites. In this case, well… hover over the image below, and notice how Reimu's hair and the bomb sprites in the HUD light up when Makai is selected:

Demonstration of palette changes in TH01's route selection

This push did end on a high note though, with the generic, non-SinGyoku version of the defeat animation being an easily parametrizable copy. And that's how you decompile another 2.58% of TH01 in just slightly over three pushes.


Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and SinGyoku without needing any more detours into non-boss code. Thanks to the current TH01 funding subscriptions, I can plan to cover most, if not all, of Sariel in a single push series, but the currently 3 pending pushes probably won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got quite a lot of not specifically TH01-related funds in the backlog to pass the time though.

Due to recent developments, it actually makes quite a lot of sense to take a break from TH01: spaztron64 has managed what every Touhou download site so far has failed to do: Bundling all 5 game onto a single .HDI together with pre-configured PC-98 emulators and a nice boot menu, and hosting the resulting package on a proper website. While this first release is already quite good (and much better than my attempt from 2014), there is still a bit of room for improvement to be gained from specific ReC98 research. Next up, therefore:

📝 Posted:
🚚 Summary of:
P0162, P0163, P0164
Commits:
81dd96e...24b3a0d, 24b3a0d...6d572b3, 6d572b3...7a0e5d8
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98+ th01+ gameplay+ player+ shot+ animation- glitch+ waste+ jank+ unused+

No technical obstacles for once! Just pure overcomplicated ZUN code. Unlike 📝 Konngara's main function, the main TH01 player function was every bit as difficult to decompile as you would expect from its size.

With TH01 using both separate left- and right-facing sprites for all of Reimu's moves and separate classes for Reimu's 32×32 and 48×* sprites, we're already off to a bad start. Sure, sprite mirroring is minimally more involved on PC-98, as the planar nature of VRAM requires the bits within an 8-pixel byte to also be mirrored, in addition to writing the sprite bytes from right to left. TH03 uses a 256-byte lookup table for this, generated at runtime by an infamous micro-optimized and undecompilable ASM algorithm. With TH01's existing architecture, ZUN would have then needed to write 3 additional blitting functions. But instead, he chose to waste a total of 26,112 bytes of memory on pre-mirrored sprites… :godzun:

Alright, but surely selecting those sprites from code is no big deal? Just store the direction Reimu is facing in, and then add some branches to the rendering code. And there is in fact a variable for Reimu's direction… during regular arrow-key movement, and another one while shooting and sliding, and a third as part of the special attack types, launched out of a slide.
Well, OK, technically, the last two are the same variable. But that's even worse, because it means that ZUN stores two distinct enums at the same place in memory: Shooting and sliding uses 1 for left, 2 for right, and 3 for the "invalid" direction of holding both, while the special attack types indicate the direction in their lowest bit, with 0 for right and 1 for left. I decompiled the latter as bitflags, but in ZUN's code, each of the 8 permutations is handled as a distinct type, with copy-pasted and adapted code… :zunpet: The interpretation of this two-enum "sub-mode" union variable is controlled by yet another "mode" variable… and unsurprisingly, two of the bugs in this function relate to the sub-mode variable being interpreted incorrectly.

Also, "rendering code"? This one big function basically consists of separate unblit→update→render code snippets for every state and direction Reimu can be in (moving, shooting, swinging, sliding, special-attacking, and bombing), pasted together into a tangled mess of nested if(…) statements. While a lot of the code is copy-pasted, there are still a number of inconsistencies that defeat the point of my usual refactoring treatment. After all, with a total of 85 conditional branches, anything more than I did would have just obscured the control flow too badly, making it even harder to understand what's going on.
In the end, I spotted a total of 8 bugs in this function, all of which leave Reimu invisible for one or more frames:

Thanks to the last one, Reimu's first swing animation frame is never actually rendered. So whenever someone complains about TH01 sprite flickering on an emulator: That emulator is accurate, it's the game that's poorly written. :tannedcirno:

And guess what, this function doesn't even contain everything you'd associate with per-frame player behavior. While it does handle Yin-Yang Orb repulsion as part of slides and special attacks, it does not handle the actual player/Orb collision that results in lives being lost. The funny thing about this: These two things are done in the same function… :onricdennat:

Therefore, the life loss animation is also part of another function. This is where we find the final glitch in this 3-push series: Before the 16-frame shake, this function only unblits a 32×32 area around Reimu's center point, even though it's possible to lose a life during the non-deflecting part of a 48×48-pixel animation. In that case, the extra pixels will just stay on screen during the shake. They are unblitted afterwards though, which suggests that ZUN was at least somewhat aware of the issue?
Finally, the chance to see the alternate life loss sprite Alternate TH01 life loss sprite is exactly ⅛.


As for any new insights into game mechanics… you know what? I'm just not going to write anything, and leave you with this flowchart instead. Here's the definitive guide on how to control Reimu in TH01 we've been waiting for 24 years:

(SVG download)

Pellets are deflected during all gray states. Not shown is the obvious "double-tap Z and X" transition from all non-(#1) states to the Bomb state, but that would have made this diagram even more unwieldy than it turned out. And yes, you can shoot twice as fast while moving left or right.

While I'm at it, here are two more animations from MIKO.PTN which aren't referenced by any code:

An unused animation from TH01's MIKO.PTNAn unused animation from TH01's MIKO.PTN

With that monster of a function taken care of, we've only got boss sprite animation as the final blocker of uninterrupted Sariel progress. Due to some unfavorable code layout in the Mima segment though, I'll need to spend a bit more time with some of the features used there. Next up: The missile bullets used in the Mima and YuugenMagan fights.

📝 Posted:
🚚 Summary of:
P0160, P0161
Commits:
e491cd7...42ba4a5, 42ba4a5...81dd96e
💰 Funded by:
Yanga, [Anonymous]
🏷 Tags:
rec98+ th01+ gameplay+ resident+ bullet+ boss+ mima-th01+ animation- waste+ glitch+ tcc+

Nothing really noteworthy in TH01's stage timer code, just yet another HUD element that is needlessly drawn into VRAM. Sure, ZUN applies his custom boldfacing effect on top of the glyphs retrieved from font ROM, but he could have easily installed those modified glyphs as gaiji.
Well, OK, halfwidth gaiji aren't exactly well documented, and sometimes not even correctly emulated 📝 due to the same PC-98 hardware oddity I was researching last month. I've reserved two of the pending anonymous "anything" pushes for the conclusion of this research, just in case you were wondering why the outstanding workload is now lower after the two delivered here.

And since it doesn't seem to be clearly documented elsewhere: Every 2 ticks on the stage timer correspond to 4 frames.


So, TH01 rank pellet speed. The resident pellet speed value is a factor ranging from a minimum of -0.375 up to a maximum of 0.5 (pixels per frame), multiplied with the difficulty-adjusted base speed for each pellet and added on top of that same speed. This multiplier is modified

Apparently, ZUN noted that these deltas couldn't be losslessly stored in an IEEE 754 floating-point variable, and therefore didn't store the pellet speed factor exactly in a way that would correspond to its gameplay effect. Instead, it's stored similar to Q12.4 subpixels: as a simple integer, pre-multiplied by 40. This results in a raw range of -15 to 20, which is what the undecompiled ASM calls still use. When spawning a new pellet, its base speed is first multiplied by that factor, and then divided by 40 again. This is actually quite smart: The calculation doesn't need to be aware of either Q12.4 or the 40× format, as ((Q12.4 * factor×40) / factor×40) still comes out as a Q12.4 subpixel even if all numbers are integers. The only limiting issue here would be the potential overflow of the 16-bit multiplication at unadjusted base speeds of more than 50 pixels per frame, but that'd be seriously unplayable.
So yeah, pellet speed modifications are indeed gradual, and don't just fall into the coarse three "high, normal, and low" categories.


That's ⅝ of P0160 done, and the continue and pause menus would make good candidates to fill up the remaining ⅜… except that it seemed impossible to figure out the correct compiler options for this code?
The issues centered around the two effects of Turbo C++ 4.0J's -O switch:

  1. Optimizing jump instructions: merging duplicate successive jumps into a single one, and merging duplicated instructions at the end of conditional branches into a single place under a single branch, which the other branches then jump to
  2. Compressing ADD SP and POP CX stack-clearing instructions after multiple successive CALLs to __cdecl functions into a single ADD SP with the combined parameter stack size of all function calls

But how can the ASM for these functions exhibit #1 but not #2? How can it be seemingly optimized and unoptimized at the same time? The only option that gets somewhat close would be -O- -y, which emits line number information into the .OBJ files for debugging. This combination provides its own kind of #1, but these functions clearly need the real deal.

The research into this issue ended up consuming a full push on its own. In the end, this solution turned out to be completely unrelated to compiler options, and instead came from the effects of a compiler bug in a totally different place. Initializing a local structure instance or array like

const uint4_t flash_colors[3] = { 3, 4, 5 };

always emits the { 3, 4, 5 } array into the program's data segment, and then generates a call to the internal SCOPY@ function which copies this data array to the local variable on the stack. And as soon as this SCOPY@ call is emitted, the -O optimization #1 is disabled for the entire rest of the translation unit?!
So, any code segment with an SCOPY@ call followed by __cdecl functions must strictly be decompiled from top to bottom, mirroring the original layout of translation units. That means no TH01 continue and pause menus before we haven't decompiled the bomb animation, which contains such an SCOPY@ call. 😕
Luckily, TH01 is the only game where this bug leads to significant restrictions in decompilation order, as later games predominantly use the pascal calling convention, in which each function itself clears its stack as part of its RET instruction.


What now, then? With 51% of REIIDEN.EXE decompiled, we're slowly running out of small features that can be decompiled within ⅜ of a push. Good that I haven't been looking a lot into OP.EXE and FUUIN.EXE, which pretty much only got easy pieces of code left to do. Maybe I'll end up finishing their decompilations entirely within these smaller gaps?
I still ended up finding one more small piece in REIIDEN.EXE though: The particle system, seen in the Mima fight.

I like how everything about this animation is contained within a single function that is called once per frame, but ZUN could have really consolidated the spawning code for new particles a bit. In Mima's fight, particles are only spawned from the top and right edges of the screen, but the function in fact contains unused code for all other 7 possible directions, written in quite a bloated manner. This wouldn't feel quite as unused if ZUN had used an angle parameter instead… :thonk: Also, why unnecessarily waste another 40 bytes of the BSS segment?

But wait, what's going on with the very first spawned particle that just stops near the bottom edge of the screen in the video above? Well, even in such a simple and self-contained function, ZUN managed to include an off-by-one error. This one then results in an out-of-bounds array access on the 80th frame, where the code attempts to spawn a 41st particle. If the first particle was unlucky to be both slow enough and spawned away far enough from the bottom and right edges, the spawning code will then kill it off before its unblitting code gets to run, leaving its pixel on the screen until something else overlaps it and causes it to be unblitted.
Which, during regular gameplay, will quickly happen with the Orb, all the pellets flying around, and your own player movement. Also, the RNG can easily spawn this particle at a position and velocity that causes it to leave the screen more quickly. Kind of impressive how ZUN laid out the structure of arrays in a way that ensured practically no effect of this bug on the game; this glitch could have easily happened every 80 frames instead. He almost got close to all bugs canceling out each other here! :godzun:

Next up: The player control functions, including the second-biggest function in all of PC-98 Touhou.

📝 Posted:
🚚 Summary of:
P0158, P0159
Commits:
bf7bb7e...c0c0ebc, c0c0ebc...e491cd7
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ gameplay+ card-flipping+ rng+ score+ bomb+ animation- jank+ waste+

Of course, Sariel's potentially bloated and copy-pasted code is blocked by even more definitely bloated and copy-pasted code. It's TH01, what did you expect? :tannedcirno:

But even then, TH01's item code is on a new level of software architecture ridiculousness. First, ZUN uses distinct arrays for both types of items, with their own caps of 4 for bomb items, and 10 for point items. Since that obviously makes any type-related switch statement redundant, he also used distinct functions for both types, with copy-pasted boilerplate code. The main per-item update and render function is shared though… and takes every single accessed member of the item structure as its own reference parameter. Like, why, you have a structure, right there?! That's one way to really practice the C++ language concept of passing arbitrary structure fields by mutable reference… :zunpet:
To complete the unwarranted grand generic design of this function, it calls back into per-type collision detection, drop, and collect functions with another three reference parameters. Yeah, why use C++ virtual methods when you can also implement the effectively same polymorphism functionality by hand? Oh, and the coordinate clamping code in one of these callbacks could only possibly have come from nested min() and max() preprocessor macros. And that's how you extend such dead-simple functionality to 1¼ pushes…

Amidst all this jank, we've at least got a sensible item↔player hitbox this time, with 24 pixels around Reimu's center point to the left and right, and extending from 24 pixels above Reimu down to the bottom of the playfield. It absolutely didn't look like that from the initial naive decompilation though. Changing entity coordinates from left/top to center was one of the better lessons from TH01 that ZUN implemented in later games, it really makes collision detection code much more intuitive to grasp.


The card flip code is where we find out some slightly more interesting aspects about item drops in this game, and how they're controlled by a hidden cycle variable:

Then again, score players largely ignore point items anyway, as card combos simply have a much bigger effect on the score. With this, I should have RE'd all information necessary to construct a tool-assisted score run, though?
Edit: Turns out that 1) point items are becoming increasingly important in score runs, and 2) Pearl already did a TAS some months ago. Thanks to spaztron64 for the info!

The Orb↔card hitbox also makes perfect sense, with 24 pixels around the center point of a card in every direction.

The rest of the code confirms the card flip score formula documented on Touhou Wiki, as well as the way cards are flipped by bombs: During every of the 90 "damaging" frames of the 140-frame bomb animation, there is a 75% chance to flip the card at the [bomb_frame % total_card_count_in_stage] array index. Since stages can only have up to 50 cards 📝 thanks to a bug, even a 75% chance is high enough to typically flip most cards during a bomb. Each of these flips still only removes a single card HP, just like after a regular collision with the Orb.
Also, why are the card score popups rendered before the cards themselves? That's two needless frames of flicker during that 25-frame animation. Not all too noticeable, but still.


And that's over 50% of REIIDEN.EXE decompiled as well! Next up: More HUD update and rendering code… with a direct dependency on rank pellet speed modifications?

📝 Posted:
🚚 Summary of:
P0157
Commits:
4bc6405...bf7bb7e
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ pc98+ animation-

Yup, there still are features that can be fully covered in a single push and don't lead to sprawling blog posts. The giant STAGE number and HARRY UP messages, as well as the flashing transparent 東方★靈異伝 at the beginning of each scene are drawn by retrieving the glyphs for each letter from font ROM, and then "blitting" them to text RAM by placing a colored fullwidth 16×16 square at every pixel that is set in the font bitmap.
And 📝 once again, ZUN's code there matches the mediocre example code for the related hardware interrupt from the PC-9801 Programmers' Bible. It's not 100% copied this time, but definitely inspired by the code on page 121. Therefore, we can conclude that these letters are probably only displayed as these 16× scaled glyphs because that book had code on how to achieve this effect.

ZUN "improved" on the example code by implementing a write-only cursor over the entire text RAM that fills every 16×16 cell with a differently colored space character, fully clearing the text RAM as a side effect. For once, he even removed some redundancy here by using helper functions! It's all still far from good-code though. For example, there's a function for filling 5 rows worth of cells, which he uses for both the top and bottom margin of these letters. But since the bottom margin starts at the 22nd line, the code writes past the 25th line and into the second TRAM page. Good that this page is not used by either the hardware or the game.

These cursor functions can actually write any fullwidth JIS code point to text RAM… and seem to do that in a rather simplified way, because shouldn't you set the most significant bit to indicate the right half of a fullwidth character? That's what's written in the same book that ZUN copied all functions out of, after all. 🤔 Researching this led me down quite the rabbit hole, where I found an oddity in PC-98 text RAM rendering that no single one of the widely-used PC-98 emulators gets completely right. I'm almost done with the 2-push research into this issue, which will include fixes for DOSBox-X and Neko Project II. The only thing I'm missing to get these fully accurate is a screenshot of the output created by this binary, on any PC-98 model made by EPSON: 2021-09-12-jist0x28.com.zip That's the reason why this push was rather delayed. Thanks in advance to anyone who'd like to help with this!


In maybe more disappointing news: Sariel is going to be delayed for a while longer. 😕 The player- and HUD-related functions, which previously delayed further progress there, turned out to call a lot of not yet RE'd functions themselves. Seems as if we're doing most of the card-flipping code second, after all? Next up: Point and bomb items, which at least are a significant step in terms of position independence.

📝 Posted:
🚚 Summary of:
P0149, P0150, P0151, P0152
Commits:
e1a26bb...05e4c4a, 05e4c4a...768251d, 768251d...4d24ca5, 4d24ca5...81fc861
💰 Funded by:
Blue Bolt, Ember2528, -Tom-, [Anonymous]
🏷 Tags:
rec98+ th04+ th05+ gameplay+ bullet+ animation- score+ glitch+ jank+ waste+ micro-optimization+ tcc+ uth05win+

…or maybe not that soon, as it would have only wasted time to untangle the bullet update commits from the rest of the progress. So, here's all the bullet spawning code in TH04 and TH05 instead. I hope you're ready for this, there's a lot to talk about!

(For the sake of readability, "bullets" in this blog post refers to the white 8×8 pellets and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)


But first, what was going on 📝 in 2020? Spent 4 pushes on the basic types and constants back then, still ended up confusing a couple of things, and even getting some wrong. Like how TH05's "bullet slowdown" flag actually always prevents slowdown and fires bullets at a constant speed instead. :tannedcirno: Or how "random spread" is not the best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen, which deserve different names:

Mechanic #1: Clearing bullets for a custom amount of time, awarding 1000 points for all bullets alive on the first frame, and 100 points for all bullets spawned during the clear time.
Mechanic #2: Zapping bullets for a fixed 16 frames, awarding a semi-exponential and loudly announced Bonus!! for all bullets alive on the first frame, and preventing new bullets from being spawned during those 16 frames. In TH04 at least; thanks to a ZUN bug, zapping got reduced to 1 frame and no animation in TH05…

Bullets are zapped at the end of most midboss and boss phases, and cleared everywhere else – most notably, during bombs, when losing a life, or as rewards for extends or a maximized Dream bonus. The Bonus!! points awarded for zapping bullets are calculated iteratively, so it's not trivial to give an exact formula for these. For a small number 𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛 points – or, using uth05win's (correct) recursive definition, Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10. However, one of the internal step variables is capped at a different number of points for each difficulty (and game), after which the points only increase linearly. Hence, "semi-exponential".


On to TH04's bullet spawn code then, because that one can at least be decompiled. And immediately, we have to deal with a pointless distinction between regular bullets, with either a decelerating or constant velocity, and special bullets, with preset velocity changes during their lifetime. That preset has to be set somewhere, so why have separate functions? In TH04, this separation continues even down to the lowest level of functions, where values are written into the global bullet array. TH05 merges those two functions into one, but then goes too far and uses self-modifying code to save a grand total of two local variables… Luckily, the rest of its actual code is identical to TH04.

Most of the complexity in bullet spawning comes from the (thankfully shared) helper function that calculates the velocities of the individual bullets within a group. Both games handle each group type via a large switch statement, which is where TH04 shows off another Turbo C++ 4.0 optimization: If the range of case values is too sparse to be meaningfully expressed in a jump table, it usually generates a linear search through a second value table. But with the -G command-line option, it instead generates branching code for a binary search through the set of cases. 𝑂(log 𝑛) as the worst case for a switch statement in a C++ compiler from 1994… that's so cool. But still, why are the values in TH04's group type enum all over the place to begin with? :onricdennat:
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only shows up here and in a few places in TH02, compared to at least 50 switch value tables.

In all of its micro-optimized pointlessness, TH05's undecompilable version at least fixes some of TH04's redundancy. While it's still not even optimal, it's at least a decently written piece of ASM… if you take the time to understand what's going on there, because it certainly took quite a bit of that to verify that all of the things which looked like bugs or quirks were in fact correct. And that's how the code for this function ended up with 35% comments and blank lines before I could confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04, where an invalid bullet group type would fill all remaining slots in the bullet array with identical versions of the first bullet.

Something that both games also share in these functions is an over-reliance on globals for return values or other local state. The most ridiculous example here: Tuning the speed of a bullet based on rank actually mutates the global bullet template… which ZUN then works around by adding a wrapper function around both regular and special bullet spawning, which saves the base speed before executing that function, and restores it afterward. :zunpet: Add another set of wrappers to bypass that exact tuning, and you've expanded your nice 1-function interface to 4 functions. Oh, and did I mention that TH04 pointlessly duplicates the first set of wrapper functions for 3 of the 4 difficulties, which can't even be explained with "debugging reasons"? That's 10 functions then… and probably explains why I've procrastinated this feature for so long.

At this point, I also finally stopped decompiling ZUN's original ASM just for the sake of it. All these small TH05 functions would look horribly unidiomatic, are identical to their decompiled TH04 counterparts anyway, except for some unique constant… and, in the case of TH05's rank-based speed tuning function, actually become undecompilable as soon as we want to return a C++ class to preserve the semantic meaning of the return value. Mainly, this is because Turbo C++ does not allow register pseudo-variables like _AX or _AL to be cast into class types, even if their size matches. Decompiling that function would have therefore lowered the quality of the rest of the decompiled code, in exchange for the additional maintenance and compile-time cost of another translation unit. Not worth it – and for a TH05 port, you'd already have to decompile all the rest of the bullet spawning code anyway!


The only thing in there that was still somewhat worth being decompiled was the pre-spawn clipping and collision detection function. Due to what's probably a micro-optimization mistake, the TH05 version continues to spawn a bullet even if it was spawned on top of the player. This might sound like it has a different effect on gameplay… until you realize that the player got hit in this case and will either lose a life or deathbomb, both of which will cause all on-screen bullets to be cleared anyway. So it's at most a visual glitch.

But while we're at it, can we please stop talking about hitboxes? At least in the context of TH04 and TH05 bullets. The actual collision detection is described way better as a kill delta of 8×8 pixels between the center points of the player and a bullet. You can distribute these pixels to any combination of bullet and player "hitboxes" that make up 8×8. 4×4 around both the player and bullets? 1×1 for bullets, and 8×8 for the player? All equally valid… or perhaps none of them, once you keep in mind that other entity types might have different kill deltas. With that in mind, the concept of a "hitbox" turns into just a confusing abstraction.

The same is true for the 36×44 graze box delta. For some reason, this one is not exactly around the center of a bullet, but shifted to the right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the player, but only up to 16 pixels left of the player. uth05win also spotted this… and rotated the deltas clockwise by 90°?!


Which brings us to the bullet updates… for which I still had to research a decompilation workaround, because 📝 P0148 turned out to not help at all? Instead, the solution was to lie to the compiler about the true segment distance of the popup function and declare its signature far rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function call/return per frame, and those precious 2 bytes in the BSS segment that he didn't have to spend on a segment value. 📝 Another function that didn't have just a single declaration in a common header file… really, 📝 how were these games even built???

The function itself is among the longer ones in both games. It especially stands out in the indentation department, with 7 levels at its most indented point – and that's the minimum of what's possible without goto. Only two more notable discoveries there:

  1. Bullets are the only entity affected by Slow Mode. If the number of bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04, or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame rate by 33%, by waiting for one additional VSync event every two frames.
    The code also reveals a second tier, with 50% slowdown for a slightly higher number of bullets, but that conditional branch can never be executed :zunpet:
  2. Bullets must have been grazed in a previous frame before they can be collided with. (Note how this does not apply to bullets that spawned on top of the player, as explained earlier!)

Whew… When did ReC98 turn into a full-on code review?! 😅 And after all this, we're still not done with TH04 and TH05 bullets, with all the special movement types still missing. That should be less than one push though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun rewriting the Touhou Wiki Gameplay pages 😛

📝 Posted:
🚚 Summary of:
P0148
Commits:
c940059...e1a26bb
💰 Funded by:
[Anonymous]
🏷 Tags:
rec98+ th03+ th04+ th05+ animation- hud+ gaiji+ kaja+

Back after taking way too long to get Touhou Patch Center's MediaWiki update feature complete… I'm still waiting for more translators to test and review the new translation interface before delivering and deploying it all, which will most likely lead to another break from ReC98 within the next few months. For now though, I'm happy to have mostly addressed the nagging responsibility I still had after willing that site into existence, and to be back working on ReC98. 🙂

As announced, the next few pushes will focus on TH04's and TH05's bullet spawning code, before I get to put all that accumulated TH01 money towards finishing all of konngara's code in TH01. For a full picture of what's happening with bullets, we'd really also like to have the bullet update function as readable C code though.
Clearing all bullets on the playfield will trigger a Bonus!! popup, displayed as 📝 gaiji in that proportional font. Unfortunately, TLINK refused to link the code as soon as I referenced the function for animating the popups at the top of the playfield? Which can only mean that we have to decompile that function first… :thonk:

So, let's turn that piece of technical debt into a full push, and first decompile another random set of previously reverse-engineered TH04 and TH05 functions. Most of these are stored in a different place within the two MAIN.EXE binaries, and the tried-and-true method of matching segment names would therefore have introduced several unnecessary translation units. So I resorted to a segment splitting technique I should have started using way earlier: Simply creating new segments with names derived from their functions, at the exact positions they're needed. All the new segment start and end directives do bloat the ASM code somewhat, and certainly contributed to this push barely removing any actual lines of code. However, what we get in return is total freedom as far as decompilation order is concerned, 📝 which should be the case for any ReC project, really. And in the end, all these tiny code segments will cancel out anyway.
If only we could do the same with the data segment…


The popup function happened to be the final one I RE'd before my long break in the spring of 2019. Back then, I didn't even bother looking into that 64-frame delay between changing popups, and what that meant for the game.
Each of these popups stays on screen for 128 frames, during which, of course, another popup-worthy event might happen. Handling this cleanly without removing previous popups too early would involve some sort of event queue, whose size might even be meaningfully limited to the number of distinct events that can happen. But still, that'd be a data structure, and we're not gonna have that! :zunpet: Instead, ZUN simply keeps two variables for the new and current popup ID. During an active popup, any change to that ID will only be committed once the current popup has been shown for at least 64 frames. And during that time, that new ID can be freely overwritten with a different one, which drops any previous, undisplayed event. But surely, there won't be more than two events happening within 63 frames, right? :tannedcirno:

The rest was fairly uneventful – no newly RE'd functions in this push, after all – until I reached the widely used helper function for applying the current vertical scrolling offset to a Y coordinate. Its combination of a function parameter, the pascal calling convention, and no stack frame was previously thought to be undecompilable… except that it isn't, and the decompilation didn't even require any new workarounds to be developed? Good thing that I already forgot how impossible it was to decompile the first function I looked at that fell into this category!
Oh well, this discovery wasn't too groundbreaking. Looking back at all the other functions with that combination only revealed a grand total of 1 additional one where a decompilation made sense: TH05's version of snd_kaja_interrupt(), which is now compiled from the same C++ file for all 4 games that use it. And well, looks like some quirks really remain unnoticed and undocumented until you look at a function for the 11th time: Its return value is undefined if BGM is inactive – that is, if the user disabled it, or if no FM board is installed. Not that it matters for the original code, which never uses this function to retrieve anything from KAJA's drivers. But people apparently do copy ReC98 code into their own projects, so it is something to keep in mind.


All in all, nothing quite at jank level in this one, but we were surely grazing that tag. Next up, with that out of the way: The bullet update/step function! Very soon in fact, since I've mostly got it done already.

📝 Posted:
🚚 Summary of:
P0147
Commits:
456b621...c940059
💰 Funded by:
Ember2528, -Tom-
🏷 Tags:
rec98+ th04+ th05+ file-format+ pc98+ player+ bomb+ boss+ shinki+ ex-alice+ animation- waste+

Didn't quite get to cover background rendering for TH05's Stage 1-5 bosses in this one, as I had to reverse-engineer two more fundamental parts involved in boss background rendering before.

First, we got the those blocky transitions from stage tiles to bomb and boss backgrounds, loaded from BB*.BB and ST*.BB, respectively. These files store 16 frames of animation, with every bit corresponding to a 16×16 tile on the playfield. With 384×368 pixels to be covered, that would require 69 bytes per frame. But since that's a very odd number to work with in micro-optimized ASM, ZUN instead stores 512×512 pixels worth of bits, ending up with a frame size of 128 bytes, and a per-frame waste of 59 bytes. :tannedcirno: At least it was possible to decompile the core blitting function as __fastcall for once.
But wait, TH05 comes with, and loads, a bomb .BB file for every character, not just for the Reimu and Yuuka bomb transitions you see in-game… 🤔 Restoring those unused stage tile → bomb image transition animations for Mima and Marisa isn't that trivial without having decompiled their actual bomb animation functions before, so stay tuned!

Interestingly though, the code leaves out what would look like the most obvious optimization: All stage tiles are unconditionally redrawn each frame before they're erased again with the 16×16 blocks, no matter if they weren't covered by such a block in the previous frame, or are going to be covered by such a block in this frame. The same is true for the static bomb and boss background images, where ZUN simply didn't write a .CDG blitting function that takes the dirty tile array into account. If VRAM writes on PC-98 really were as slow as the games' README.TXT files claim them to be, shouldn't all the optimization work have gone towards minimizing them? :thonk: Oh well, it's not like I have any idea what I'm talking about here. I'd better stop talking about anything relating to VRAM performance on PC-98… :onricdennat:


Second, it finally was time to solve the long-standing confusion about all those callbacks that are supposed to render the playfield background. Given the aforementioned static bomb background images, ZUN chose to make this needlessly complicated. And so, we have two callback function pointers: One during bomb animations, one outside of bomb animations, and each boss update function is responsible for keeping the former in sync with the latter. :zunpet:

Other than that, this was one of the smoothest pushes we've had in a while; the hardest parts of boss background rendering all were part of 📝 the last push. Once you figured out that ZUN does indeed dynamically change hardware color #0 based on the current boss phase, the remaining one function for Shinki, and all of EX-Alice's background rendering becomes very straightforward and understandable.


Meanwhile, -Tom- told me about his plans to publicly release 📝 his TH05 scripting toolkit once TH05's MAIN.EXE would hit around 50% RE! That pretty much defines what the next bunch of generic TH05 pushes will go towards: bullets, shared boss code, and one full, concrete boss script to demonstrate how it's all combined. Next up, therefore: TH04's bullet firing code…? Yes, TH04's. I want to see what I'm doing before I tackle the undecompilable mess that is TH05's bullet firing code, and you all probably want readable code for that feature as well. Turns out it's also the perfect place for Blue Bolt's pending contributions.

📝 Posted:
🚚 Summary of:
P0146
Commits:
08bc188...456b621
💰 Funded by:
Ember2528, -Tom-
🏷 Tags:
rec98+ th05+ tcc+ animation- boss+ shinki+ micro-optimization+ waste+ uth05win+

Y'know, I kinda prefer the pending crowdfunded workload to stay more near the middle of the cap, rather than being sold out all the time. So to reach this point more quickly, let's do the most relaxing thing that can be easily done in TH05 right now: The boss backgrounds, starting with Shinki's, 📝 now that we've got the time to look at it in detail.

… Oh come on, more things that are borderline undecompilable, and require new workarounds to be developed? Yup, Borland C++ always optimizes any comparison of a register with a literal 0 to OR reg, reg, no matter how many calculations and inlined function calls you replace the 0 with. Shinki's background particle rendering function contains a CMP AX, 0 instruction though… so yeah, 📝 yet another piece of custom ASM that's worse than what Turbo C++ 4.0J would have generated if ZUN had just written readable C. This was probably motivated by ZUN insisting that his modified master.lib function for blitting particles takes its X and Y parameters as registers. If he had just used the __fastcall convention, he also would have got the sprite ID passed as a register. 🤷
So, we really don't want to be forced into inline assembly just because of the third comparison in the otherwise perfectly decompilable four-comparison if() expression that prevents invisible particles from being drawn. The workaround: Comparing to a pointer instead, which only the linker gets to resolve to the actual value of 0. :tannedcirno: This way, the compiler has to make room for any 16-bit literal, and can't optimize anything.


And then we go straight from micro-optimization to waste, with all the duplication in the code that animates all those particles together with the zooming and spinning lines. This push decompiled 1.31% of all code in TH05, and thanks to alignment, we're still missing Shinki's high-level background rendering function that calls all the subfunctions I decompiled here.
With all the manipulated state involved here, it's not at all trivial to see how this code produces what you see in-game. Like:

  1. If all lines have the same Y velocity, how do the other three lines in background type B get pushed down into this vertical formation while the top one stays still? (Answer: This velocity is only applied to the top line, the other lines are only pushed based on some delta.)
  2. How can this delta be calculated based on the distance of the top line with its supposed target point around Shinki's wings? (Answer: The velocity is never set to 0, so the top line overshoots this target point in every frame. After calculating the delta, the top line itself is pushed down as well, canceling out the movement. :zunpet:)
  3. Why don't they get pushed down infinitely, but stop eventually? (Answer: We only see four lines out of 20, at indices #0, #6, #12, and #18. In each frame, lines [0..17] are copied to lines [1..18], before anything gets moved. The invisible lines are pushed down based on the delta as well, which defines a distance between the visible lines of (velocity * array gap). And since the velocity is capped at -14 pixels per frame, this also means a maximum distance of 84 pixels between the midpoints of each line.)
  4. And why are the lines moving back up when switching to background type C, before moving down? (Answer: Because type C increases the velocity rather than decreasing it. Therefore, it relies on the previous velocity state from type B to show a gapless animation.)

So yeah, it's a nice-looking effect, just very hard to understand. 😵

With the amount of effort I'm putting into this project, I typically gravitate towards more descriptive function names. Here, however, uth05win's simple and seemingly tiny-brained "background type A/B/C/D" was quite a smart choice. It clearly defines the sequence in which these animations are intended to be shown, and as we've seen with point 4 from the list above, that does indeed matter.

Next up: At least EX-Alice's background animations, and probably also the high-level parts of the background rendering for all the other TH05 bosses.

📝 Posted:
🚚 Summary of:
P0123
Commits:
4406c3d...72dfa09
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ file-format+ player+ animation- blitting+ waste+ jank+

Done with the .BOS format, at last! While there's still quite a bunch of undecompiled non-format blitting code left, this was in fact the final piece of graphics format loading code in TH01.

📝 Continuing the trend from three pushes ago, we've got yet another class, this time for the 48×48 and 48×32 sprites used in Reimu's gohei, slide, and kick animations. The only reason these had to use the .BOS format at all is simply because Reimu's regular sprites are 32×32, and are therefore loaded from 📝 .PTN files.
Yes, this makes no sense, because why would you split animations for the same character across two file formats and two APIs, just because of a sprite size difference? This necessity for switching blitting APIs might also explain why Reimu vanishes for a few frames at the beginning and the end of the gohei swing animation, but more on that once we get to the high-level rendering code.

Now that we've decompiled all the .BOS implementations in TH01, here's an overview of all of them, together with .PTN to show that there really was no reason for not using the .BOS API for all of Reimu's sprites:

CBossEntity CBossAnim CPlayerAnim ptn_* (32×32)
Format .BOS .BOS .BOS .PTN
Hitbox
Byte-aligned blitting
Byte-aligned unblitting
Unaligned blitting Single-line and wave only
Precise unblitting
Per-file sprite limit 8 8 32 64
Pixels blitted at once 16 16 8 32

And even that last property could simply be handled by branching based on the sprite width, and wouldn't be a reason for switching formats. But well, it just wouldn't be TH01 without all that redundant bloat though, would it?

The basic loading, freeing, and blitting code was yet another variation on the other .BOS code we've seen before. So this should have caused just as little trouble as the CBossAnim code… except that CPlayerAnim did add one slightly difficult function to the mix, which led to it requiring almost a full push after all. Similar to 📝 the unblitting code for moving lasers we've seen in the last push, ZUN tries to minimize the amount of VRAM writes when unblitting Reimu's slide animations. Technically, it's only necessary to restore the pixels that Reimu traveled by, plus the ones that wouldn't be redrawn by the new animation frame at the new X position.
The theoretically arbitrary distance between the two sprites is, of course, modeled by a fixed-size buffer on the stack :onricdennat:, coming with the further assumption that the sprite surely hasn't moved by more than 1 horizontal VRAM byte compared to the last frame. Which, of course, results in glitches if that's not the case, leaving little Reimu parts in VRAM if the slide speed ever exceeded 8 pixels per frame. :tannedcirno: (Which it never does, being hardcoded to 6 pixels, but still.). As it also turns out, all those bit masking operations easily lead to incredibly sloppy C code. Which compiles into incredibly terrible ASM, which in turn might end up wasting way more CPU time than the final VRAM write optimization would have gained? Then again, in-depth profiling is way beyond the scope of this project at this point.

Next up: The TH04 main menu, and some more technical debt.

📝 Posted:
🚚 Summary of:
P0111, P0112
Commits:
8b5c146...4ef4c9e, 4ef4c9e...e447a2d
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay+ player+ bomb+ boss+ ex-alice+ animation- glitch+ jank+

Only one newly ordered push since I've reopened the store? Great, that's all the justification I needed for the extended maintenance delay that was part of these two pushes 😛

Having to write comments to explain whether coordinates are relative to the top-left corner of the screen or the top-left corner of the playfield has finally become old. So, I introduced distinct types for all the coordinate systems we typically encounter, applying them to all code decompiled so far. Note how the planar nature of PC-98 VRAM meant that X and Y coordinates also had to be different from each other. On the X side, there's mainly the distinction between the [0; 640] screen space and the corresponding [0; 80] VRAM byte space. On the Y side, we also have the [0; 400] screen space, but the visible area of VRAM might be limited to [0; 200] when running in the PC-98's line-doubled 640×200 mode. A VRAM Y coordinate also always implies an added offset for vertical scrolling.
During all of the code reconstruction, these types can only have a documenting purpose. Turning them into anything more than just typedefs to int, in order to define conversion operators between them, simply won't recompile into identical binaries. Modding and porting projects, however, now have a nice foundation for doing just that, and can entirely lift coordinate system transformations into the type system, without having to proofread all the meaningless int declarations themselves.


So, what was left in terms of memory references? EX-Alice's fire waves were our final unknown entity that can collide with the player. Decently implemented, with little to say about them.

That left the bomb animation structures as the one big remaining PI blocker. They started out nice and simple in TH04, with a small 6-byte star animation structure used for both Reimu and Marisa. TH05, however, gave each character her own animation… and what the hell is going on with Reimu's blue stars there? Nope, not going to figure this out on ASM level.

A decompilation first required some more bomb-related variables to be named though. Since this was part of a generic RE push, it made sense to do this in all 5 games… which then led to nice PI gains in anything but TH05. :tannedcirno: Most notably, we now got the "pulling all items to player" flag in TH04 and TH05, which is actually separate from bombing. The obvious cheat mod is left as an exercise to the reader.


So, TH05 bomb animations. Just like the 📝 custom entity types of this game, all 4 characters share the same memory, with the superficially same 10-byte structure.
But let's just look at the very first field. Seen from a low level, it's a simple struct { int x, y; } pos, storing the current position of the character-specific bomb animation entity. But all 4 characters use this field differently:

Therefore, I decompiled it as 4 separate structures once again, bundled into an union of arrays.

As for Reimu… yup, that's some pointer arithmetic straight out of Jigoku* for setting and updating the positions of the falling star trails. :zunpet: While that certainly required several comments to wrap my head around the current array positions, the one "bug" in all this arithmetic luckily has no effect on the game.
There is a small glitch with the growing circles, though. They are spawned at the end of the loop, with their position taken from the star pointer… but after that pointer has already been incremented. On the last loop iteration, this leads to an out-of-bounds structure access, with the position taken from some unknown EX-Alice data, which is 0 during most of the game. If you look at the animation, you can easily spot these bugged circles, consistently growing from the top-left corner (0, 0) of the playfield:


After all that, there was barely enough remaining time to filter out and label the final few memory references. But now, TH05's MAIN.EXE is technically position-independent! 🎉 -Tom- is going to work on a pretty extensive demo of this unprecedented level of efficient Touhou game modding. For a more impactful effect of both the 100% PI mark and that demo, I'll be delaying the push covering the remaining false positives in that binary until that demo is done. I've accumulated a pretty huge backlog of minor maintenance issues by now…
Next up though: The first part of the long-awaited build system improvements. I've finally come up with a way of sanely accelerating the 32-bit build part on most setups you could possibly want to build ReC98 on, without making the building experience worse for the other few setups.

📝 Posted:
🚚 Summary of:
P0110
Commits:
2c7d86b...8b5c146
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ animation- tcc+ shinki+ ex-alice+

… and just as I explained 📝 in the last post how decompilation is typically more sensible and efficient than ASM-level reverse-engineering, we have this push demonstrating a counter-example. The reason why the background particles and lines in the Shinki and EX-Alice battles contributed so much to position dependence was simply because they're accessed in a relatively large amount of functions, one for each different animation. Too many to spend the remaining precious crowdfunded time on reverse-engineering or even decompiling them all, especially now that everyone anticipates 100% PI for TH05's MAIN.EXE.

Therefore, I only decompiled the two functions of the line structure that also demonstrate best how it works, which in turn also helped with RE. Sadly, this revealed that we actually can't 📝 overload operator =() to get that nice assignment syntax for 12.4 fixed-point values, because one of those new functions relies on Turbo C++'s built-in optimizations for trivially copyable structures. Still, impressive that this abstraction caused no other issues for almost one year.

As for the structures themselves… nope, nothing to criticize this time! Sure, one good particle system would have been awesome, instead of having separate structures for the Stage 2 "starfield" particles and the one used in Shinki's battle, with hardcoded animations for both. But given the game's short development time, that was quite an acceptable compromise, I'd say.
And as for the lines, there just has to be a reason why the game reserves 20 lines per set, but only renders lines #0, #6, #12, and #18. We'll probably see once we get to look at those animation functions more closely.

This was quite a 📝 TH03-style RE push, which yielded way more PI% than RE%. But now that that's done, I can finally not get distracted by all that stuff when looking at the list of remaining memory references. Next up: The last few missing structures in TH05's MAIN.EXE!

📝 Posted:
🚚 Summary of:
P0105, P0106, P0107, P0108
Commits:
3622eb6...11b776b, 11b776b...1f1829d, 1f1829d...1650241, 1650241...dcf4e2c
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ meta+ file-format+ animation- blitting+ boss+ singyoku+ yuugenmagan+ elis+ kikuri+ konngara+ waste+

And indeed, I got to end my vacation with a lot of image format and blitting code, covering the final two formats, .GRC and .BOS. .GRC was nothing noteworthy – one function for loading, one function for byte-aligned blitting, and one function for freeing memory. That's it – not even a unblitting function for this one. .BOS, on the other hand…

…has no generic (read: single/sane) implementation, and is only implemented as methods of some boss entity class. And then again for Sariel's dress and wand animations, and then again for Reimu's animations, both of which weren't even part of these 4 pushes. Looking forward to decompiling essentially the same algorithms all over again… And that's how TH01 became the largest and most bloated PC-98 Touhou game. So yeah, still not done with image formats, even at 44% RE.

This means I also had to reverse-engineer that "boss entity" class… yeah, what else to call something a boss can have multiple of, that may or may not be part of a larger boss sprite, may or may not be animated, and that may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this class. Since renaming all these variables in ASM land is tedious anyway, I went the extra mile and directly defined separate, meaningful names for the entities of all bosses. These also now document the natural order in which the bosses will ultimately be decompiled. So, unless a backer requests anything else, this order will be:

  1. Konngara
  2. Sariel
  3. Elis
  4. Kikuri
  5. SinGyoku
  6. (code for regular card-flipping stages)
  7. Mima
  8. YuugenMagan

As everyone kind of expects from TH01 by now, this class reveals yet another… um, unique and quirky piece of code architecture. In addition to the position and hitbox members you'd expect from a class like this, the game also stores the .BOS metadata – width, height, animation frame count, and 📝 bitplane pointer slot number – inside the same class. But if each of those still corresponds to one individual on-screen sprite, how can YuugenMagan have 5 eye sprites, or Kikuri have more than one soul and tear sprite? By duplicating that metadata, of course! And copying it from one entity to another :onricdennat:
At this point, I feel like I even have to congratulate the game for not actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760 bytes of waste would have definitely been noticeable in the DOS days. Makes much more sense to waste that amount of space on an unused C++ exception handler, and a bunch of redundant, unoptimized blitting functions :tannedcirno:

(Thinking about it, YuugenMagan fits this entire system perfectly. And together with its position in the game's code – last to be decompiled means first on the linker command line – we might speculate that YuugenMagan was the first boss to be programmed for TH01?)

So if a boss wants to use sprites with different sizes, there's no way around using another entity. And that's why Girl-Elis and Bat-Elis are two distinct entities internally, and have to manually sync their position. Except that there's also a third one for Attacking-Girl-Elis, because Girl-Elis has 9 frames of animation in total, and the global .BOS bitplane pointers are divided into 4 slots of only 8 images each. :zunpet:
Same for SinGyoku, who is split into a sphere entity, a person entity, and a… white flash entity for all three forms, all at the same resolution. Or Konngara's facial expressions, which also require two entities just for themselves.


And once you decompile all this code, you notice just how much of it the game didn't even use. 13 of the 50 bytes of the boss entity class are outright unused, and 10 bytes are used for a movement clamping and lock system that would have been nice if ZUN also used it outside of Kikuri's soul sprites. Instead, all other bosses ignore this system completely, and just party on the X/Y coordinates of the boss entities directly.

As for the rendering functions, 5 out of 10 are unused. And while those definitely make up less than half of the code, I still must have spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis' entrance animation, the class provides functions for wavy blitting and unblitting, which use a separate X coordinate for every line of the sprite. But there's also an unused and sort of broken one for unblitting two overlapping wavy sprites, located at the same Y coordinate. This might indicate that Elis could originally split herself into two sprites, similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind of animation effect, who knows.


After over 3 months of TH01 progress though, it's finally time to look at other games, to cover the rest of the crowdfunding backlog. Next up: Going back to TH05, and getting rid of those last PI false positives. And since I can potentially spend the next 7 weeks on almost full-time ReC98 work, I've also re-opened the store until October!

📝 Posted:
🚚 Summary of:
P0088, P0089
Commits:
97ce7b7...da6b856, da6b856...90252cc
💰 Funded by:
-Tom-, [Anonymous], Blue Bolt
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay+ enemy+ hud+ animation- portability+

As expected, we've now got the TH04 and TH05 stage enemy structure, finishing position independence for all big entity types. This one was quite straightfoward, as the .STD scripting system is pretty simple.

Its most interesting aspect can be found in the way timing is handled. In Windows Touhou, all .ECL script instructions come with a frame field that defines when they are executed. In TH04's and TH05's .STD scripts, on the other hand, it's up to each individual instruction to add a frame time parameter, anywhere in its parameter list. This frame time defines for how long this instruction should be repeatedly executed, before it manually advances the instruction pointer to the next one. From what I've seen so far, these instruction typically apply their effect on the first frame they run on, and then do nothing for the remaining frames.
Oh, and you can't nest the LOOP instruction, since the enemy structure only stores one single counter for the current loop iteration.

Just from the structure, the only innovation introduced by TH05 seems to have been enemy subtypes. These can be used to parametrize scripts via conditional jumps based on this value, as a first attempt at cutting down the need to duplicate entire scripts for similar enemy behavior. And thanks to TH05's favorable segment layout, this game's version of the .STD enemy script interpreter is even immediately ready for decompilation, in one single future push.

As far as I can tell, that now only leaves

until TH05's MAIN.EXE is completely position-independent.
Which, however, won't be all it needs for that 100% PI rating on the front page. And with that many false positives, it's quite easy to get lost with immediately reverse-engineering everything around them. This time, the rendering of the text dissolve circles, used for the stage and BGM title popups, caught my eye… and since the high-level code to handle all of that was near the end of a segment in both TH04 and TH05, I just decided to immediately decompile it all. Like, how hard could it possibly be? Sure, it needed another segment split, which was a bit harder due to all the existing ASM referencing code in that segment, but certainly not impossible…

Oh wait, this code depends on 9 other sets of identifiers that haven't been declared in C land before, some of which require vast reorganizations to bring them up to current consistency standards. Whoops! Good thing that this is the part of the project I'm still offering for free…
Among the referenced functions was tiles_invalidate_around(), which marks the stage background tiles within a rectangular area to be redrawn this frame. And this one must have had the hardest function signature to figure out in all of PC-98 Touhou, because it actually seems impossible. Looking at all the ways the game passes the center coordinate to this function, we have

  1. X and Y as 16-bit integer literals, merged into a single PUSH of a 32-bit immediate
  2. X and Y calculated and pushed independently from each other
  3. by-value copies of entire Point instances

Any single declaration would only lead to at most two of the three cases generating the original instructions. No way around separately declaring the function in every translation unit then, with the correct parameter list for the respective calls. That's how ZUN must have also written it.

Oh well, we would have needed to do all of this some time. At least there were quite a bit of insights to be gained from the actual decompilation, where using const references actually made it possible to turn quite a number of potentially ugly macros into wholesome inline functions.

But still, TH04 and TH05 will come out of ReC98's decompilation as one big mess. A lot of further manual decompilation and refactoring, beyond the limits of the original binary, would be needed to make these games portable to any non-PC-98, non-x86 architecture.
And yes, that includes IBM-compatible DOS – which, for some reason, a number of people see as the obvious choice for a first system to port PC-98 Touhou to. This will barely be easier. Sure, you'll save the effort of decompiling all the remaining original ASM. But even with master.lib's MASTER_DOSV setting, these games still very much rely on PC-98 hardware, with corresponding assumptions all over ZUN's code. You will need to provide abstractions for the PC-98's superimposed text mode, the gaiji, and planar 4-bit color access in general, exchanging the use of the PC-98's GRCG and EGC blitter chips with something else. At that point, you might as well port the game to one generic 640×400 framebuffer and away from the constraints of DOS, resulting in that Doom source code-like situation which made that game easily portable to every architecture to begin with. But ZUN just wasn't a John Carmack, sorry.

Or what do I know. I've never programmed for IBM-compatible DOS, but maybe ReC98's audience does include someone who is intimately familiar with IBM-compatible DOS so that the constraints aren't much of an issue for them? But even then, 16-bit Windows would make much more sense as a first porting target if you don't want to bother with that undecompilable ASM.

At least I won't have to look at TH04 and TH05 for quite a while now. :tannedcirno: The delivery delays have made it obvious that my life has become pretty busy again, probably until September. With a total of 9 TH01 pushes from monthly subscriptions now waiting in the backlog, the shop will stay closed until I've caught up with most of these. Which I'm quite hyped for!

📝 Posted:
🚚 Summary of:
P0086, P0087
Commits:
54ee99b...24b96cd, 24b96cd...97ce7b7
💰 Funded by:
[Anonymous], Blue Bolt, -Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay+ animation- hud+ blitting+ boss+ yuuka-6+

Alright, the score popup numbers shown when collecting items or defeating (mid)bosses. The second-to-last remaining big entity type in TH05… with quite some PI false positives in the memory range occupied by its data. Good thing I still got some outstanding generic RE pushes that haven't been claimed for anything more specific in over a month! These conveniently allowed me to RE most of these functions right away, the right way.

Most of the false positives were boss HP values, passed to a "boss phase end" function which sets the HP value at which the next phase should end. Stage 6 Yuuka, Mugetsu, and EX-Alice have their own copies of this function, in which they also reset certain boss-specific global variables. Since I always like to cover all varieties of such duplicated functions at once, it made sense to reverse-engineer all the involved variables while I was at it… and that's why this was exactly the right time to cover the implementation details of Stage 6 Yuuka's parasol and vanishing animations in TH04. :zunpet:

With still a bit of time left in that RE push afterwards, I could also start looking into some of the smaller functions that didn't quite fit into other pushes. The most notable one there was a simple function that aims from any point to the current player position. Which actually only became a separate function in TH05, probably since it's called 27 times in total. That's 27 places no longer being blocked from further RE progress.

WindowsTiger already did most of the work for the score popup numbers in January, which meant that I only had to review it and bring it up to ReC98's current coding styles and standards. This one turned out to be one of those rare features whose TH05 implementation is significantly less insane than the TH04 one. Both games lazily redraw only the tiles of the stage background that were drawn over in the previous frame, and try their best to minimize the amount of tiles to be redrawn in this way. For these popup numbers, this involves calculating the on-screen width, based on the exact number of digits in the point value. TH04 calculates this width every frame during the rendering function, and even resorts to setting that field through the digit iteration pointer via self-modifying code… yup. TH05, on the other hand, simply calculates the width once when spawning a new popup number, during the conversion of the point value to binary-coded decimal. The "×2" multiplier suffix being removed in TH05 certainly also helped in simplifying that feature in this game.

And that's ⅓ of TH05 reverse-engineered! Next up, one more TH05 PI push, in which the stage enemies hopefully finish all the big entity types. Maybe it will also be accompanied by another RE push? In any case, that will be the last piece of TH05 progress for quite some time. The next TH01 stretch will consist of 6 pushes at the very least, and I currently have no idea of how much time I can spend on ReC98 a month from now…

📝 Posted:
🚚 Summary of:
P0085
Commits:
110d6dd...54ee99b
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ blitting+ animation- waste+ good-code+

Wait, PI for FUUIN.EXE is mainly blocked by the high score menu? That one should really be properly decompiled in a separate RE push, since it's also present in largely identical form in REIIDEN.EXE… but I currently lack the explicit funding to do that.

And as it turns out, I shouldn't really capture any of the existing generic RE contributions for it either. Back in 2018 when I ran the crowdfunding on the Touhou Patch Center Discord server, I said that generic RE contributions would never go towards TH01. No one was interested in that game back then, and as it's significantly different from all the other games, it made sense to only cover it if explicitly requested.
As Touhou Patch Center still remains one of the biggest supporters and advertisers for ReC98, someone recently believed that this rule was still in effect, despite not being mentioned anywhere on this website.

Fast forward to today, and TH01 has become the single most supported game lately, with plenty of incomplete pushes still open to be completed. Reverse-engineering it has proven to be quite efficient, yielding lots of completion percentage points per push. This, I suppose, is exactly what backers that don't give any specific priorities are mainly interested in. Therefore, I will allocate future partial contributions to TH01, whenever it makes sense.

So, instead of rushing TH01 PI, let's wait for Ember2528's April subscription, and get the 25% total RE milestone with some TH05 PI progress instead. This one primarily focused on the gather circles (spirals…?), the third-last missing entity type in TH05. These are rendered using the same 8×8 pellet sprite introduced in TH02… except that the actual pellets received a darkened bottom part in TH04 . Which, in turn, is actually rendered quite efficiently – the games first render the top white part of all pellets, followed by the bottom gray part of all pellets. The PC-98 GRCG is used throughout the process, doing its typical job of accelerating monochrome blitting, and by arranging the rendering like this, only two GRCG color changes are required to draw any number of pellets. I guess that makes it quite a worthwhile optimization? Don't ask me for specific performance numbers or even saved cycles, though :onricdennat:

Next up, one more TH05 PI push!

📝 Posted:
🚚 Summary of:
P0078, P0079
Commits:
f4eb7a8...9e52cb1, 9e52cb1...cd48aa3
💰 Funded by:
iruleatgames, -Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay+ animation- bullet+ boss+ alice+ mai+ yuki+ yumeko+ shinki+ ex-alice+ uth05win+

To finish this TH05 stretch, we've got a feature that's exclusive to TH05 for once! As the final memory management innovation in PC-98 Touhou, TH05 provides a single static (64 * 26)-byte array for storing up to 64 entities of a custom type, specific to a stage or boss portion. The game uses this array for

  1. the Stage 2 star particles,
  2. Alice's puppets,
  3. the tip of curve ("jello") bullets,
  4. Mai's snowballs and Yuki's fireballs,
  5. Yumeko's swords,
  6. and Shinki's 32×32 bullets,

which makes sense, given that only one of those will be active at any given time.

On the surface, they all appear to share the same 26-byte structure, with consistently sized fields, merely using its 5 generic fields for different purposes. Looking closer though, there actually are differences in the signedness of certain fields across the six types. uth05win chose to declare them as entirely separate structures, and given all the semantic differences (pixels vs. subpixels, regular vs. tiny master.lib sprites, …), it made sense to do the same in ReC98. It quickly turned out to be the only solution to meet my own standards of code readability.

Which blew this one up to two pushes once again… But now, modders can trivially resize any of those structures without affecting the other types within the original (64 * 26)-byte boundary, even without full position independence. While you'd still have to reduce the type-specific number of distinct entities if you made any structure larger, you could also have more entities with fewer structure members.

As for the types themselves, they're full of redundancy once again – as you might have already expected from seeing #4, #5, and #6 listed as unrelated to each other. Those could have indeed been merged into a single 32×32 bullet type, supporting all the unique properties of #4 (destructible, with optional revenge bullets), #5 (optional number of twirl animation frames before they begin to move) and #6 (delay clouds). The *_add(), *_update(), and *_render() functions of #5 and #6 could even already be completely reverse-engineered from just applying the structure onto the ASM, with the ones of #3 and #4 only needing one more RE push.

But perhaps the most interesting discovery here is in the curve bullets: TH05 only renders every second one of the 17 nodes in a curve bullet, yet hit-tests every single one of them. In practice, this is an acceptable optimization though – you only start to notice jagged edges and gaps between the fragments once their speed exceeds roughly 11 pixels per second:

And that brings us to the last 20% of TH05 position independence! But first, we'll have more cheap and fast TH01 progress.

📝 Posted:
🚚 Summary of:
P0062
Commits:
1d6fbb8...f275e04
💰 Funded by:
Touhou Patch Center
🏷 Tags:
rec98+ th04+ th05+ gameplay+ player+ shot+ position-independence+ animation-

Big gains, as expected, but not much to say about this one. With TH05 Reimu being way too easy to decompile after 📝 the shot control groundwork done in October, there was enough time to give the comprehensive PI false-positive treatment to two other sets of functions present in TH04's and TH05's OP.EXE. One of them, master.lib's super_*() functions, was used a lot in TH02, more than in any other game… I wonder how much more that game will progress without even focusing on it in particular.

Alright then! 100% PI for TH04's and TH05's OP.EXE upcoming… (Edit: Already got funding to cover this!)

📝 Posted:
🚚 Summary of:
P0061
Commits:
96684f4...5f4f5d8
💰 Funded by:
Touhou Patch Center
🏷 Tags:
rec98+ th03+ gameplay+ player+ shot+ animation-

… nope, with a game whose MAIN.EXE is still just 5% reverse-engineered and which naturally makes heavy use of structures, there's still a lot more PI groundwork to be done before RE progress can speed up to the levels that we've now reached with TH05. The good news is that this game is (now) way easier to understand: In contrast to TH04 and TH05, where we needed to work towards player shots over a two-digit number of pushes, TH03 only needed two for SPRITE16, and a half one for the playfield shaking mechanism. After that, I could even already decompile the per-frame shot update and render functions, thanks to TH03's high number of code segments. Now, even the big 128-byte player structure doesn't seem all too far off.

Then again, as TH03 shares no code with any other game, this actually was a completely average PI push. For the remaining three, we'll return to TH04 and TH05 though, which should more than make up for the slight drop in RE speed after this one.

In other news, we've now also reached peak C++, with the introduction of templates! TH03 stores movement speeds in a 4.4 fixed-point format, which is an 8-bit spin on the usual 16-bit, 12.4 fixed-point format.

📝 Posted:
🚚 Summary of:
P0057, P0058
Commits:
1cb9731...ac7540d, ac7540d...fef0299
💰 Funded by:
[Anonymous], -Tom-
🏷 Tags:
rec98+ th04+ th05+ gameplay+ item+ animation- uth05win+ meta+

So, here we have the first two pushes with an explicit focus on position independence… and they start out looking barely different from regular reverse-engineering? They even already deduplicate a bunch of item-related code, which was simple enough that it required little additional work? Because the actual work, once again, was in comparing uth05win's interpretations and naming choices with the original PC-98 code? So that we only ended up removing a handful of memory references there?

(Oh well, you can mod item drops now!)

So, continuing to interpret PI as a mere by-product of reverse-engineering might ultimately drive up the total PI cost quite a bit. But alright then, let's systematically clear out some false positives by looking at master.lib function calls instead… and suddenly we get the PI progress we were looking for, nicely spread out over all games since TH02. That kinda makes it sound like useless work, only done because it's dictated by some counting algorithm on a website. But decompilation will want to convert all of these values to decimal anyway. We're merely doing that right now, across all games.

Then again, it doesn't actually make any game more position-independent, and only proves how position-independent it already was. So I'm really wondering right now whether I should just rush actual position independence by simply identifying structures and their sizes, and not bother with members or false positives until that's done. That would certainly get the job done for TH04 and TH05 in just a few more pushes, but then leave all the proving work (and the road to 100% PI on the front page) to reverse-engineering.

I don't know. Would it be worth it to have a game that's „maybe fully position-independent“, only for there to maybe be rare edge cases where it isn't?

Or maybe, continuing to strike a balance between identifying false positives (fast) and reverse-engineering structures (slow) will continue to work out like it did now, and make us end up close to the current estimate, which was attractive enough to sell out the crowdfunding for the first time… 🤔

Please give feedback! If possible, by Friday evening UTC+1, before I start working on the next PI push, this time with a focus on TH04.

📝 Posted:
🚚 Summary of:
P0034, P0035
Commits:
6cdd229...6f1f367, 6f1f367...a533b5d
💰 Funded by:
zorg
🏷 Tags:
rec98+ th01+ th02+ th04+ th05+ animation- gameplay+ player+ bomb+

Deathbombs confirmed, in both TH04 and TH05! On the surface, it's the same 8-frame window as in most Windows games, but due to the slightly lower PC-98 frame rate of 56.4 Hz, it's actually slightly more lenient in TH04 and TH05.

The last function in front of the TH05 shot type control functions marks the player's previous position in VRAM to be redrawn. But as it turns out, "player" not only means "the player's option satellites on shot levels ≥ 2", but also "the explosion animation if you lose a life", which required reverse-engineering both things, ultimately leading to the confirmation of deathbombs.

It actually was kind of surprising that we then had reverse-engineered everything related to rendering all three things mentioned above, and could also cover the player rendering function right now. Luckily, TH05 didn't decide to also micro-optimize that function into un-decompilability; in fact, it wasn't changed at all from TH04. Unlike the one invalidation function whose decompilation would have actually been the goal here…

But now, we've finally gotten to where we wanted to… and only got 2 outstanding decompilation pushes left. Time to get the website ready for hosting an actual crowdfunding campaign, I'd say – It'll make a better impression if people can still see things being delivered after the big announcement.

📝 Posted:
🚚 Summary of:
P0051, P0052, P0053
Commits:
6ed8e60...3ba536a
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th03+ th04+ th05+ animation- micro-optimization+ hud+

Boss explosions! And… urgh, I really also had to wade through that overly complicated HUD rendering code. Even though I had to pick -Tom-'s 7th push here as well, the worst of that is still to come. TH04 and TH05 exclusively store the current and high score internally as unpacked little-endian BCD, with some pretty dense ASM code involving the venerable x86 BCD instructions to update it.

So, what's actually the goal here. Since I was given no priorities :onricdennat:, I still haven't had to (potentially) waste time researching whether we really can decompile from anywhere else inside a segment other than backwards from the end. So, the most efficient place for decompilation right now still is the end of TH05's main_01_TEXT segment. With maybe 1 or 2 more reverse-engineering commits, we'd have everything for an efficient decompilation up to sub_123AD. And that mass of code just happens to include all the shot type control functions, and makes up 3,007 instructions in total, or 12% of the entire remaining unknown code in MAIN.EXE.

So, the most reasonable thing would be to actually put some of the upcoming decompilation pushes towards reverse-engineering that missing part. I don't think that's a bad deal since it will allow us to mod TH05 shot types in C sooner, but zorg and qp might disagree :thonk:

Next up: thcrap TL notes, followed by finally finishing GhostPhanom's old ReC98 future-proofing pushes. I really don't want to decompile without a proper build system.

📝 Posted:
🚚 Summary of:
P0047, P0048
Commits:
9a2c6f7...893bd46
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay+ player+ shot+ animation- waste+

So, let's continue with player shots! …eh, or maybe not directly, since they involve two other structure types in TH05, which we'd have to cover first. One of them is a different sort of sprite, and since I like me some context in my reverse-engineering, let's disable every other sprite type first to figure out what it is.

One of those other sprite types were the little sparks flying away from killed stage enemies, midbosses, and grazed bullets; easy enough to also RE right now. Turns out they use the same 8 hardcoded 8×8 sprites in TH02, TH04, and TH05. Except that it's actually 64 16×8 sprites, because ZUN wanted to pre-shift them for all 8 possible start pixels within a planar VRAM byte (rather than, like, just writing a few instructions to shift them programmatically), leading to them taking up 1,024 bytes rather than just 64.
Oh, and the thing I wanted to RE *actually* was the decay animation whenever a shot hits something. Not too complex either, especially since it's exclusive to TH05.

And since there was some time left and I actually have to pick some of the next RE places strategically to best prepare for the upcoming 17 decompilation pushes, here's two more function pointers for good measure.

📝 Posted:
🚚 Summary of:
P0040
Commits:
d7483c0...b03bc91
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th04+ th05+ pc98+ animation-

Let's start this stretch with a pretty simple entity type, the growing and shrinking circles shown during bomb animations and around bosses in TH04 and TH05. Which can be drawn in varying colors… wait, what's all this inlined and duplicated GRCG mode and color setting code? Let's move that out into macros too, it takes up too much space when grepping for constants, and will raise a "wait, what was that I/O port doing again" question for most people reading the code again after a few months.

🎉 With this push, we've also hit a milestone! Less than 200,000 unknown x86 instructions remain until we've completely reverse-engineered all of PC-98 Touhou.