P0304
TH02 RE (Stage / (mid)boss variables) + Decompilation (Bullets, part 1/2)
P0305
TH02 decompilation (Bullets, part 2/2 + Sparks, part 1/2)
P0306
TH02 decompilation (Player, part 1/2: Update/render functions + Miss animation) + Random TH04/TH05 finalization
💰 Funded by:
Yanga, iruleatgames, nrook, [Anonymous]
🏷️ Tags:
Sometimes, the gameplay community will come up with the most outlandish theories before they even begin to consider the idea that certain safespots might not be intentional and only work by accident to begin with. Want more details? Read on…
So, TH02's bullet system! At a high level, it marks an interesting transitional point: It's still very much based on TH01's design with its predefined static or aimed spreads, but also introduces a few features that would later return in TH04 and TH05. By transplanting the TH01 system into a double-buffered environment, ZUN eliminated the 📝 worst📝 unblitting-related parts that plagued TH01, ending up with the simplest and cleanest implementation of bullets I've seen so far. That's not to say it's good-code – far from it – but it also hasn't reached the messy levels that TH04 and especially TH05 would bring later. Of course, there's still TH03's system left to be done until I can say for sure, but TH02's is a pretty strong contender.
The more detailed overview of the system:
TH02 introduces the distinction between the white 8×8 pellets and the 16×16 sprite bullets that TH04 and TH05 would later expand upon.
The game has a single cap of 150 that is shared among both 8×8 and 16×16 bullets, unlike TH04 and TH05 where the cap is split for optimization reasons.
In 封魔録.TXT, ZUN claims that TH02 could even compete with DoDonPachi in terms of bullet amounts:
怒首領蜂もびっくりな判定の小ささ、弾の量。
Can it really, though? DoDonPachi spawns decidedly more bullets than TH02 throughout all of the game, and this pattern definitely exceeds 150 bullets. Hence, we can immediately debunk this claim as marketing hyperbole rather than a factual statement about the game. It would be nice to have a specific bullet cap number for DoDonPachi as well, but I can't find a decompilation project or annotated disassembly. Nor for any other CAVE game either, for that matter… 👀
TH01's decay and delay cloud effects were removed for TH02. Slightly unfortunate as it leaves bullets completely without any sprite effect, but hey, less code surface to mess up!
All bullets lose 0.625 pixels of per-frame speed on Easy and gain an extra 0.75 pixels of per-frame speed on Lunatic. Each bullet is clamped to a minimum speed of at least 1 pixel per frame; on Easy, the game also filters every second bullet that would have been slower. This mechanism mainly kicks in with the blob enemies at minimum rank during Stage 4.
TH02 sticks with the fixed 2-, 3-, 4-, and 5-way spreads that TH01 introduced, but adds a third delta angle variant on top of TH01's two "narrow" and "wide" ones. 2-spreads even get a fourth "ultrawide" angle, which Evil Eye Σ uses in the pellet corridor pattern during its last phase.
TH02 also adds predefined 4-, 8-, 16-, and 32-ring groups, all of which are used by bosses.
The game does not yet offer predefined stack groups, but has an auto-stacking system that automatically turns every spawned group into a potential 2-stack on Hard and Lunatic. This system forms the main way in which these difficulties differ from the easier ones, and is exactly why going from Normal to Hard roughly doubles the number of bullets fired. On Hard, the second bullet in each stack moves at half the speed of the primary bullet, while Lunatic adds another 0.5 pixels per frame onto that halved speed.
The game also has a function to apply a further multiplier on top of the difficulty-specific stack count, but only uses it to temporarily disable stacking during three patterns, one of them used by the Five Magic Stones and two of them used by Mima.
Just like all other games, TH02 offers a variety of special bullet motion types. For some reason, ZUN limited these to single 16×16 bullets in TH02; they are not supported for either 8×8 pellets or any of the multi-pellet groups. There is no technical reason for this, so ZUN likely did this as a deliberate game design choice. The upside is that you as a player can be certain that every 8×8 pellet moves in a straight line, which may or may not help reading patterns.
Chase bullets adjust their X/Y velocity by a configurable amount on every frame relative to the player's location. These are exclusively used by the 呪 bullets fired by the Stage 2 midboss.
Homing bullets work in a very similar way, re-aiming at the player more properly for a customizable number of frames after a bullet was spawned. These are completely unused.
Decelerating bullets reduce their speed to 0 by halving their velocity every 8 frames, and then turn and repeat this process a fixed number of times. In TH02, this movement type is only used in a symmetric green-ball pattern used by the eastern and western Magic Stones, but it would become really popular later on, showing up in 6 of TH04's midboss and/or boss patterns and 9 of TH05's.
Gravity bullets add a customizable acceleration factor to their Y position on every frame. Another movement type exclusive to a single green-ball pattern by the northern Magic Stone, and interestingly special-cased to bypass any difficulty- or rank-based speed tuning.
Drift bullets either add a remote-controlled angle and speed delta value to a bullet's angle and speed on every frame, or use that remote-controlled angle to chase toward the player using the same algorithm as the 呪 bullets. These two types are criminally underutilized and could have created some widely inventive patterns that you wouldn't have expected out of the first PC-98 Touhou shmup. Instead, they're only used for two of Marisa's rotating star patterns.
And finally, of course, we have bullets that bounce and flip their direction near the edge of the playfield. In this game, the bounce edges actually lie 8 pixels inside the playfield:The velocity flip only happens on the frame in which a bullet enters the red bounce margin zone. So, faster bullets might still travel a good deal toward the actual edge of the playfield before getting flipped.
This type is not only used by Meira's and Evil Eye Σ's red and purple billiard ball bullets, but also by some star bullet patterns during the Mima fight.
Pellet rendering is batched! For the first time, ZUN preserves the GRCG state for successively blitted pellets, avoiding the extra >168 cycles per pellet that master.lib's grcg_setcolor() and grcg_off() would cost on a 486. The caveat, however, lies in the words successively blitted. Without an architectural split between pellets and sprite bullets, the rendering code ends up looking like this:
While this definitely is suboptimal once you start mixing the two size types, it's not too bad in context. The actual bullet scripts in TH02 mostly stick to one of the two sprite types, and once the script switches from one to the other, the old and new bullets will occupy mostly contiguous areas of the bullet array anyway. The game doesn't actually mix 8×8 and 16×16 bullets within the same pattern until literally the last pattern of Mima's second form.
The four other ZUN quirks in the system are all related to clipping and aim point calculations. ZUN tries very hard to use constants that are supposed to work for both 8×8 and 16×16 bullets, but they never perfectly fit either of the two.
To find out where all these bullet types are used, I of course had to label all the individual pattern functions and assign them to their (mid)boss owners. As a side effect, we now also know the preferred boss decompilation order for this game!
Marisa
Mima
Evil Eye Σ
Meira
Rika
5 Magic Stones
Quite a satisfying order, if I may say so myself – burning off the big fireworks right in the beginning, getting slightly more unexciting later on, but then ending on arguably the best Touhou character ever conceived.
Each of these decompilations will be preceded by the stage's respective midboss. This includes the Extra Stage – you might not think that this stage has a midboss, but it technically does, in the form of this combination of patterns:
Lasting exactly these 420 frames.
There's nothing in TH02's code that mandates midbosses to have sprite-like entities or even something like an HP bar. Instead, the code-level definition of a midboss is all about these properties:
It assigns control functions to the same function pointers that the other stages use for their midbosses.
These functions are activated at a fixed, specific point throughout the stage.
Regular stage enemy spawns are deactivated until these control functions signal completion.
If a pattern manipulates stage tiles, it can only be part of a boss or midboss with custom C code, as this is not supported for regular stage enemy scripts.
Stage 5, on the other hand, indeed doesn't have anything that can be interpreted as a midboss.
Finally, and probably most importantly, hitboxes! The raw decompilation of TH02's bullet collision detection code looks like this:
However, if you aren't deeply familiar with the sizes of all involved sprites, these top-left positions slightly obscure the actual position of the hitbox. That top-left point might also not be where you think it is:
It's the red point.
So let's transform these checks to a more useful comparison of the respective center points against each other, and also fix that inconsistency of the right coordinates being compared with < instead of <= like the other values:
Now also revealing the horizontal asymmetry that ZUN's code was sneakily hiding.
TH02 has only 5 different bullet shapes and no directional or vector bullets, so we can exactly visualize all of them:
📝 As📝 usual, a bullet sprite has to be fully surrounded by the blue box for a hit to be registered.
Yup. Quite asymmetric indeed, and probably surprising no one.
While experimenting with the various hardcoded group types, I stumbled over a quite surprising quirk that you might have already noticed in the spread showcase video further above. For some reason, none of these spreads are perfectly symmetric, what the…?
By the time the bullets have reached the bottom of the playfield, the inaccuracy has compounded so much that the right lane ends up 6 pixels closer to the player's center position than the left lane. Depending on which of the two lanes actually gets the correct angle, this either means that the left lane is moving too far (2️⃣) or that the right lane is not moving far enough (3️⃣).
This is very weird because the angles that go into the velocity calculations are demonstrably correct. You'd therefore get this asymmetry for not only the hardcoded spreads, but also for code that does its own angle calculations and spawns each bullet manually. It's not something that can arise from the other known issue of 📝 Q12.4 quantization either, because that would affect all parts of a pattern equally.
Instead, the inaccuracy originates in the conversion from the polar coordinates of angles and speeds into the per-frame X/Y pixel velocities that the game uses for actual movement. The integer math algorithm that ZUN uses here is pretty much the single most fundamental piece of code shared by all 5 games:
// Using 📝 typical 8-bit angles.
int16_t polar_x(int16_t center, int16_t radius, uint8_t angle)
{
// Ensure that the multiplication below doesn't overflow
int32_t radius32 = radius;
// Get the cosine value from master.lib's lookup table, which scales the
// real-number range of [-1; +1] to the integer range of [-256; +256].
int16_t cosine = CosTable8[angle];
// The multiplication will include master.lib's 256× scaling factor, so
// divide the result to bring it within the intended radius.
return (((radius * cosine) >> 8) + center);
}
This exact algorithm is even recommended in the master.lib manual.
The pattern above uses TH02's medium delta angle for 2-spreads and moves at a Q12.4 subpixel speed of 2.5, which corresponds to a radius of 40 in the context of polar coordinate calculation. Let's step through it:
Angle
Cosine
Multiplied
In hex
Shift result
In decimal
In Q12.4
(0x40 - 6)
38
1520
000005F0
00000005
5
0.3125
(0x40 + 6)
-38
-1520
FFFFFA10
FFFFFFFA
-6
-0.3750
Whoa, talk about getting a basic lesson about how computers work! PC-98 Touhou has just taught us that signedness-preserving arithmetic bitshifts are not equivalent to the apparently corresponding division by a power of two, because the typical two's complement representation of negative numbers causes the result to effectively get rounded away from zero rather than toward zero like the corresponding positive value. In our example, this means that the right lane is correct and moves at the angle we passed in, while the left lane moves 1/16 pixels per frame further to the left than intended. Since we're talking about the most basic piece of trigonometry code here, this inaccuracy also applies to every other entity in PC-98 Touhou that moves left relative to its origin point – and/or up, because Y coordinates are calculated analogously. Imagine that… it's been 10 years since I decompiled the first variant of this function, and I'm only now noticing how fundamentally broken it is.
It's understandable why master.lib's manual recommends bitshifts instead of the more correct division here. On a 486, a single 32-bit IDIV takes a whopping >33 cycles, and it would have been even slower on the 286 systems that master.lib is geared toward. But there's no need to go that far: By simply rounding up negative numbers, we can emulate the rounding behavior of regular division while still using a bitshift:
int16_t polar_x(int16_t center, int16_t radius, uint8_t angle)
{
int32_t ret = (static_cast<int32_t>(radius) * CosTable8[angle]);
+ if(ret < 0) {
+ // Round the multiplication result so that the shift below will yield a number
+ // that's 1 closer to 0, thus rounding toward zero rather than away from zero as
+ // bitshifts with negative numbers would usually do. This ensures that we return
+ // the same absolute value after the bitshift that we would return if [ret] were
+ // positive, thus repairing certain broken symmetries in PC-98 Touhou.
+ ret += 255;
+ }
return ((ret >> 8) + center);
}
You could also do this in a branchless way, which is coincidentally very close to what current Clang would generate if you just wrote a regular division by 256. This branchless way does seem slightly slower on a 486 though, as it adds a constant >8 cycles worth of instructions. The branching implementation only adds >4 cycles for positive numbers and >3 for negative ones.
But that would be deep quirk-fixing territory. uth05win just uses floating-point math for this transformation, exchanging master.lib's 8-bit lookup tables for the C library's regular sin() and cos() functions, but bypassing the issue like this also forms the single biggest source of porting inaccuracy. Can't really win here… 🤷
Now it will be interesting to see whether ZUN worked around this inaccuracy in certain places by using slightly lower left- or up-pointing angles…
Alright, but aren't we still missing the single biggest quirk about bullets in TH02? What's with Reimu's hitbox misaligning when dying? I can't release a blog post about TH02's bullet system without solving the single most infamous bullet-related mystery that this game has to offer. So, time to start a third push for looking at all the player movement, rendering, and death sequence code…
If you remember the code above, there is no way that a hitbox defined using hardcoded numbers can ever shift in response to anything. Any so-called hitbox misalignment would therefore be a player position misalignment, which sounds even harder to believe. And sure enough, after decompiling all of it, there's nothing of that sort to be found in the player code either.
If we take player position misalignment literally, we're only left with one other place where it could possibly somehow come from: the strange vertical shaking you can observe right in the first few frames of most stages. So let's visualize the hitbox and… nope, the shaking is purely a scrolling bug, nothing about it changes the internal player position used for collision detection.
So, uh, what are people even talking about? It doesn't help that noone cites any source for this claim and just presents it as a natural and seemingly self-evident fact, as if it was the most obvious and most easily verified property about the game.
Thankfully though, there have been two relativelyrecent videos about the issue, but both of them only showcase the supposed hitbox shifting in relation to a specific safespot at the end of the Extra Stage midboss. So is that what's been going on here? The community taking the game's behavior in just a single instance of collision detection within a single stage, and extending it to a general claim about the game as a whole?
But indeed, the described behavior cleanly reproduces every time. Enter the spot with 2 remaining lives and you survive, but enter with 1 remaining life and you die:
Whatever this is about, it's not due to a difference in hitboxes because Reimu's position demonstrably stays identical. But if we switch between these two videos, we can easily spot that it's the patterns that are different! With 1 life left, the pattern moves at an ever so slightly slower speed, which apparently adds up to a life-or-death difference at that specific spot.
And that's what the supposed hitbox shifting ultimately boils down to: The natural impact of rank on patterns, adjusting bullet speed with a factor of ((playperf + 48) / 48) times 1/16 pixels. And nothing else.
Let's visualize the hitbox and also track one of the bullets:
If we look at the respective frames in the playperf = +2 case, we see that the bullet misses the hitbox by either one or two pixels on three successive frames:
That's not a safespot, that's Reimu barely surviving only thanks to rounding.
So, for once, this is not a quirk, and doesn't even qualify as a "funny ZUN code moment" if you ask me. This is the game working exactly as designed, and it's the players who are instead making wild assumptions about safespots that only hold when the rank system plugs very specific numbers into the game's fixed-point math.
If anything, you could make the stronger case that this safespot should not work under any circumstance. If the game tested the whole parallelogram covered by a bullet's trajectory between two successive frames instead of just looking at a bullet's current position, it would consistently detect this collision regardless of rank. But even the later games don't go to these lengths.
By testing with parallelograms, the game would not only look at the distinct bullet positions in green, but also detect that the bullet traveled through the position highlighted in cyan, which does lie fully within the hitbox.
Amusingly, if you die twice before this pattern and reach a rank of -2, bullet speed drops enough for the safespot to work again:
It's even the same bullet that fails to hit Reimu, although coming in 5 frames later.
If you're now sad because you liked the idea of ZUN deliberately putting hitbox-shifting code into the game, you don't have to be! You might have already noticed it in the 1-life videos above, but TH02 does have one funny but inconsequential instance of death-induced player position shifting. In the 19 frames between the end of the animation and Reimu respawning at the bottom of the playfield, ZUN just adds 4 pixels to Reimu's Y position. You don't really notice it because the game doesn't render Reimu's sprite during these frames, but this modified position still partakes in collision detection, causing bullets to be removed accordingly.
Hilariously, ZUN was well aware that this shift could move the player's Y position beyond the bottom of the playfield, and thus cause sparks to be spawned at Y coordinates larger than 400. So he just… wrapped these spark spawn coordinates back into the visible range of VRAM, thus moving them to the top of the playfield…
The off-center spawn point of these sparks was the only actual bug in this delivery, by the way.
To round out the third push, I took some of the Anything budget towards finalizing random bits of previously RE'd TH04 and TH05 code that wouldn't add anything more to this blog post. These posts aren't really meant to be a reference – that's the job of the code, the actual primary source of the facts discussed here – but people have still started to use them as such. So it makes sense to try focusing them a bit more in the future, and not bundle all too many topics into a single one.
This finalization work was mostly centered on some tile rendering and .STD file loading boilerplate, but it also covered some of TH05's unfortunately undecompilable HUD number display code. The irony is that it's actually quite good ASM code that makes smart register choices and uses secondary side effects of certain instructions in a way that's clever but not overly incomprehensible. Too bad that these optimizations have no right to exist in logic code that is called way less than once per frame…
Next up: An unexpected quick return to the Shuusou Gyoku Linux port, as Arch Linux is bullying us onto SDL 3 faster than I would have liked.
P0203
TH01 decompilation (Card-flipping stages, part 3/4: Bumpers and turrets)
P0204
TH01 decompilation (Card-flipping stages, part 4/4: Portals + Bomb animation)
💰 Funded by:
GhostRiderCog, [Anonymous], Yanga
🏷️ Tags:
Let's start right with the milestones:
More than 50% of all PC-98 Touhou game code has now been
reverse-engineered! 🎉 While this number isn't equally distributed among the
games, we've got one game very close to 100% and reverse-engineered most of
the core features of two others. During the last 32 months of continuous
funding, I've averaged an overall speed of 1.11% total RE per month. That
looks like a decent prediction of how much more time it will take for 100%
across all games – unless, of course, I'd get to work towards some of the
non-RE goals in the meantime.
70 functions left in TH01, with less than 10,000 ASM instructions
remaining! Due to immense hype, I've temporarily raised the cap by 50% until
August 15. With the last TH01 pushes delivering at roughly 1.5× of the
currently calculated average speed, that should be more than enough to get
TH01 done – especially since I expect YuugenMagan to come with lots of
redundant code. Therefore, please also request a secondary priority for
these final TH01 RE contributions.
So, how did this card-flipping stage obstacle delivery get so horribly
delayed? With all the different layouts showcased in the 28 card-flipping
stages, you'd expect this to be among the more stable and bug-free parts of
the codebase. Heck, with all stage objects being placed on a 32×32-pixel
grid, this is the first TH01-related blog post this year that doesn't have
to describe an alignment-related unblitting glitch!
That alone doesn't mean that this code is free from quirky behavior though,
and we have to look no further than the first few lines of the collision
handling for round bumpers to already find a whole lot of that. Simplified,
they do the following:
Immediately, you wonder why these assignments only exist for the Y
coordinate. Sure, hitting a bumper from the left or right side should happen
less often, but it's definitely possible. Is it really a good idea to warp
the Orb to the top or bottom edge of a bumper regardless?
What's more important though: The fact that these immediate assignments
exist at all. The game's regular Orb physics work by producing a Y velocity
from the single force acting on the Orb and a gravity factor, and are
completely independent of its current Y position. A bumper collision does
also apply a new force onto the Orb further down in the code, but these
assignments still bypass the physics system and are bound to have
some knock-on effect on the Orb's movement.
To observe that effect, we just have to enter Stage 18 on the 地獄/Jigoku route, where it's particularly trivial to
reproduce. At a 📝 horizontal velocity of ±4,
these assignments are exactly what can cause the Orb to endlessly
bounce between two bumpers. As rudimentary as the Orb's physics may be, just
letting them do their work would have entirely prevented these loops:
One of at least three infinite bumper loop constellations within just
this 10×5-tile section of TH01's Stage 18 on the 地獄/Jigoku route. With an effective 56 horizontal
pixels between both hitboxes, the Orb would have to travel an absolute
Y distance of at least 16 vertical pixels within
(56 / 4) = 14 frames to escape the
other bumper's hitbox. If the initial bounce reduces the Orb's Y
velocity far enough for it to not manage that distance the first time,
it will never reach the necessary speed again. In this loop, the
bounce-off force even stabilizes, though this doesn't have to happen.
The blue areas indicate the pixel-perfect* hitboxes of each bumper.
TH01 bumper collision handling without ZUN's manual assignment of the Y
coordinate. The Orb still bounces back and forth between two bumpers
for a while, but its top position always follows naturally
from its Y velocity and the force applied to it, and gravity wins out
in the end. The blue areas indicate the pixel-perfect* hitboxes of each bumper.
Now, you might be thinking that these Y assignments were just an attempt to
prevent the Orb from colliding with the same bumper again on the next frame.
After all, those 24 pixels exactly correspond to ⅓ of the height of a
bumper's hitbox with an additional pixel added on top. However, the game
already perfectly prevents repeated collisions by turning off collision
testing with the same bumper for the next 7 frames after a collision. Thus,
we can conclude that ZUN either explicitly coded bumper collision handling
to facilitate these loops, or just didn't take out that code after
inevitably discovering what it did. This is not janky code, it's not a
glitch, it's not sarcasm from my end, and it's not the game's physics being
bad.
But wait. Couldn't these assignments just be a remnant from a time in
development before ZUN decided on the 7-frame delay on further
collisions? Well, even that explanation stops holding water after the next
few lines of code. Simplified, again:
What's important here is the part that's not in the code – namely,
anything that handles X velocities of -8 or +8. In those cases, the Orb
simply continues in the same horizontal direction. The manual Y assignment
is the only part of the code that actually prevents a collision there, as
the newly applied force is not guaranteed to be enough:
An infinite loop across three bumpers, made possible by the edge of the
playfield and bumper bars on opposite sides, an unchanged horizontal
direction, and the Y assignments neatly placing the Orb on either the
top or bottom side of a bumper. The alternating sign of the force
further ensures that the Orb will travel upwards half the time,
canceling out gravity during the short time between two hitboxes.
With the unchanged horizontal direction and the Y assignments removed,
nothing keeps an Orb at ±8 pixels per frame from flying into/over a
bumper. The collision force pushes the Orb slightly, but not enough to
truly matter. The final force sends the Orb on a significant downward
trajectory beyond the next bumper's hitbox, breaking the original loop.
Forgetting to handle ⅖ of your discrete X velocity cases is simply not
something you do by accident. So we might as well say that ZUN deliberately
designed the game to behave exactly as it does in this regard.
Bumpers also come in vertical or horizontal bar shapes. Their collision
handling also turns off further collision testing for the next 7 frames, and
doesn't do any manual coordinate assignment. That's definitely a step up in
cleanliness from round bumpers, but it doesn't seem to keep in mind that the
player can fire a new shot every 4 frames when standing still. That makes it
immediately obvious why this works:
The green numbers show the amount of
frames since the last detected collision with the respective bumper bar,
and indicate that collision testing with the bar below is currently
disabled.
That's the most well-known case of reducing the Orb's horizontal velocity to
0 by exactly hitting it with shots in its center and then button-mashing it
through a horizontal bar. This also works with vertical bars and yields even
more interesting results there, but if we want to have any chance of
understanding what happens there, we have to first go over some basics:
Collision detection for all stage obstacles is done in row-major
order from the top-left to the bottom-right corner of the
playfield.
All obstacles are collision-tested independently from each other, with
the collision response code immediately following the test.
The hitboxes for bumper bars extend far past their 32×32 sprites to make
sure that the Orb can collide with them from any side. They are a
pixel-perfect* 87×56 pixels for horizontal bars, and 57×87 pixels for
vertical ones. Yes, that's no typo, they really do differ in one pixel.
Changing the Y velocity during such a collision just involves applying a
new force with the magnitude of the negated current Y velocity, which can be
done multiple times during a frame without changing the result. This
explains why the force is correctly inverted in the clip above, despite the
Orb colliding with two bumpers simultaneously.
Lacking a similar force system, the X coordinate is simply directly
inverted.
However, if that were everything the game did, kicking the Orb into a column
of vertical bumper bars would lead them to behave more like a rope that the
Orb can climb, as the initial collision with two hitboxes cancels out the
intended sign change that reflects the Orb away from the bars:
This footage was recorded without the workaround I am about to describe.
It does not reflect the behavior of the original game. You
cannot do this in the original game.
While the visualization reveals small sections where three hitboxes
overlap, the Orb can never actually collide with three of them at the
same time, as those 3-hitbox regions are 2 pixels smaller than they
would need to be to fit the Orb. That's exactly the difference between
using < rather than <= in these hitbox
comparisons.
While that would have been a fun gameplay mechanic on its own, it
immediately breaks apart once you place two vertical bumper bars next to
each other. Due to how these bumper bar hitboxes extend past their sprites,
any two adjacent vertical bars will end up with the exact same hitbox in
absolute screen coordinates. Stage 17 on the
魔界/Makai route contains exactly such a layout:
The collision handlers of adjacent vertical bars always activate in the
same frame, independently invert the Orb's X velocity, and therefore
fully cancel out their intended effect on the Orb… if the game did not
have the workaround I am about to describe. This cannot happen
in the original game.
ZUN's workaround: Setting a "vertical bumper bar block flag" after any
collision with such a bar, which simply disables any collision with
any vertical bar for the next 7 frames. This quick hack made all
vertical bars work as intended, and avoided the need for involving the Orb's
X velocity in any kind of physics system.
Edit (2022-07-12): This flag only works around glitches
that would be caused by simultaneously colliding with more than one vertical
bar. The actual response to a bumper bar collision still remains unaffected,
and is very naive:
Horizontal bars always invert the Orb's Y velocity
Vertical bars invert either the Y or X velocity depending on whether
the Orb's current X velocity is 0 (Y) or not (X)
These conditions are only correct if the Orb comes in at an angle roughly
between 45° and 135° on either side of a bar. If it's anywhere close to 0°
or 180°, this response will be incorrect, and send the Orb straight
through the bar. Since the large hitboxes make this easily possible, you can
still get the Orb to climb a vertical column, or glide along a horizontal
row:
Here's the hitbox overlay for
地獄/Jigoku Stage 19, and here's an updated
version of the 📝 Orb physics debug mod that
now also shows bumper bar collision frame numbers:
2022-07-10-TH01OrbPhysicsDebug.zip
See the th01_orb_debug
branch for the code. To use it, simply replace REIIDEN.EXE, and
run the game in debug mode, via game d on the DOS prompt. If you
encounter a gameplay situation that doesn't seem to be covered by this blog
post, you can now verify it for yourself. Thanks to touhou-memories for bringing these
issues to my attention! That definitely was a glaring omission from the
initial version of this blog post.
With that clarified, we can now try mashing the Orb into these two vertical
bars:
At first, that workaround doesn't seem to make a difference here. As we
expect, the frame numbers now tell us that only one of the two bumper bars
in a row activates, but we couldn't have told otherwise as the number of
bars has no effect on newly applied Y velocity forces. On a closer look, the
Orb's rise to the top of the playfield is in fact caused by that
workaround though, combined with the unchanged top-to-bottom order of
collision testing. As soon as any bumper bar completed its 7
collision delay frames, it resets the aforementioned flag, which already
reactivates collision handling for any remaining vertical bumper bars during
the same frame. Look out for frames with both a 7 and a 1, like the one marked in the video above:
The 7 will always appear before
the 1 in the row-major order. Whenever
this happens, the current oscillation period is cut down from 7 to 6
frames – and because collision testing runs from top to bottom, this will
always happen during the falling part. Depending on the Y velocity, the
rising part may also be cut down to 6 frames from time to time, but that one
at least has a chance to last for the full 7 frames. This difference
adds those crucial extra frames of upward movement, which add up to send the
Orb to the top. Without the flag, you'd always see the Orb oscillating
between a fixed range of the bar column.
Finally, it's the "top of playfield" force that gradually slows down the Orb
and makes sure it ultimately only moves at sub-pixel velocities, which have
no visible effect. Because
📝 the regular effect of gravity is reset with
each newly applied force, it's completely negated during most of the climb.
This even holds true once the Orb reached the top: Since the Orb requires a
negative force to repeatedly arrive up there and be bounced back, this force
will stay active for the first 5 of the 7 collision frames and not move the
Orb at all. Once gravity kicks in at the 5th frame and adds 1 to
the Y velocity, it's already too late: The new velocity can't be larger than
0.5, and the Orb only has 1 or 2 frames before the flag reset causes it to
be bounced back up to the top again.
Portals, on the other hand, turn out to be much simpler than the old
description that ended up on Touhou Wiki in October 2005 might suggest.
Everything about their teleportations is random: The destination portal, the
exit force (as an integer between -9 and +9), as well as the exit X
velocity, with each of the
📝 5 distinct horizontal velocities having an
equal chance of being chosen. Of course, if the destination portal is next
to the left or right edge of the playfield and it chooses to fire the Orb
towards that edge, it immediately bounces off into the opposite direction,
whereas the 0 velocity is always selected with a constant 20% probability.
The selection process for the destination portal involves a bit more than a
single rand() call. The game bundles all obstacles in a single
structure of dynamically allocated arrays, and only knows how many obstacles
there are in total, not per type. Now, that alone wouldn't have much
of an impact on random portal selection, as you could simply roll a random
obstacle ID and try again if it's not a portal. But just to be extra cute,
ZUN instead iterates over all obstacles, selects any non-entered portal with
a chance of ¼, and just gives up if that dice roll wasn't successful after
16 loops over the whole array, defaulting to the entered portal in that
case.
In all its silliness though, this works perfectly fine, and results in a
chance of 0.7516(𝑛 - 1) for the Orb exiting out of the
same portal it entered, with 𝑛 being the total number of portals in a
stage. That's 1% for two portals, and 0.01% for three. Pretty decent for a
random result you don't want to happen, but that hurts nobody if it does.
The one tiny ZUN bug with portals is technically not even part of the newly
decompiled code here. If Reimu gets hit while the Orb is being sent through
a portal, the Orb is immediately kicked out of the portal it entered, no
matter whether it already shows up inside the sprite of the destination
portal. Neither of the two portal sprites is reset when this happens,
leading to "two Orbs" being visible simultaneously.
This makes very little sense no matter how you look at it. The Orb doesn't
receive a new velocity or force when this happens, so it will simply
re-enter the same portal once the gameplay resumes on Reimu's next life:
That left another ½ of a push over at the end. Way too much time to finish
FUUIN.exe, way too little time to start with Mima… but the bomb
animation fit perfectly in there. No secrets or bugs there, just a bunch of
sprite animation code wasting at least another 82 bytes in the data segment.
The special effect after the kuji-in sprites uses the same single-bitplane
32×32 square inversion effect seen at the end of Kikuri's and Sariel's
entrance animation, except that it's a 3-stack of 16-rings moving at 6, 7,
and 8 pixels per frame respectively. At these comparatively slow speeds, the
byte alignment of each square adds some further noise to the discoloration
pattern… if you even notice it below all the shaking and seizure-inducing
hardware palette manipulation.
And yes, due to the very destructive nature of the effect, the game does in
fact rely on it only being applied to VRAM page 0. While that will cause
every moving sprite to tear holes into the inverted squares along its
trajectory, keeping a clean playfield on VRAM page 1 is what allows all that
pixel damage to be easily undone at the end of this 89-frame animation.
Next up: Mima! Let's hope that stage obstacles already were the most complex
part remaining in TH01…
Been 📝 a while since we last looked at any of
TH03's game code! But before that, we need to talk about Y coordinates.
During TH03's MAIN.EXE, the PC-98 graphics GDC runs in its
line-doubled 640×200 resolution, which gives the in-game portion its
distinctive stretched low-res look. This lower resolution is a consequence
of using 📝 Promisence Soft's SPRITE16 driver:
Its performance simply stems from the fact that it expects sprites to be
stored in the bottom half of VRAM, which allows them to be blitted using the
same EGC-accelerated VRAM-to-VRAM copies we've seen again and again in all
other games. Reducing the visible resolution also means that the sprites can
be stored on both VRAM pages, allowing the game to still be double-buffered.
If you force the graphics chip to run at 640×400, you can see them:
The full VRAM contents during TH03's in-game portion, as seen when forcing the system into a 640×400 resolution.
•
Note that the text chip still displays its overlaid contents at 640×400,
which means that TH03's in-game portion technically runs at two
resolutions at the same time.
But that means that any mention of a Y coordinate is ambiguous: Does it
refer to undoubled VRAM pixels, or on-screen stretched pixels? Especially
people who have known about the line doubling for years might almost expect
technical blog posts on this game to use undoubled VRAM coordinates. So,
let's introduce a new formatting convention for both on-screen
640×400 and undoubled 640×200 coordinates,
and always write out both to minimize the confusion.
Alright, now what's the thing gonna be? The enemy structure is highly
overloaded, being used for enemies, fireballs, and explosions with seemingly
different semantics for each. Maybe a bit too much to be figured out in what
should ideally be a single push, especially with all the functions that
would need to be decompiled? Bullet code would be easier, but not exactly
single-push material either. As it turns out though, there's something more
fundamental left to be done first, which both of these subsystems depend on:
collision detection!
And it's implemented exactly how I always naively imagined collision
detection to be implemented in a fixed-resolution 2D bullet hell game with
small hitboxes: By keeping a separate 1bpp bitmap of both playfields in
memory, drawing in the collidable regions of all entities on every frame,
and then checking whether any pixels at the current location of the player's
hitbox are set to 1. It's probably not done in the other games because their
single data segment was already too packed for the necessary 17,664 bytes to
store such a bitmap at pixel resolution, and 282,624 bytes for a bitmap at
Q12.4 subpixel resolution would have been prohibitively expensive in 16-bit
Real Mode DOS anyway. In TH03, on the other hand, this bitmap is doubly
useful, as the AI also uses it to elegantly learn what's on the playfield.
By halving the resolution and only tracking tiles of 2×2 / 2×1 pixels, TH03 only requires an adequate total
of 6,624 bytes of memory for the collision bitmaps of both playfields.
So how did the implementation not earn the good-code tag this time? Because the code for drawing into these bitmaps is undecompilable hand-written x86 assembly. And not just your usual ASM that was basically compiled from C and then edited to maybe optimize register allocation and maybe replace a bunch of local variables with self-modifying code, oh no. This code is full of overly clever bit twiddling, abusing the fact that the 16-bit AX,
BX, CX, and DX registers can also be
accessed as two 8-bit registers, calculations that change the semantic
meaning behind the value of a register, or just straight-up reassignments of
different values to the same small set of registers. Sure, in some way it is
impressive, and it all does work and correctly covers every edge
case, but come on. This could have all been a lot more readable in
exchange for just a few CPU cycles.
What's most interesting though are the actual shapes that these functions
draw into the collision bitmap. On the surface, we have:
vertical slopes at any angle across the whole playfield; exclusively
used for Chiyuri's diagonal laser EX attack
straight vertical lines, with a width of 1 tile; exclusively used for
the 2×2 / 2×1 hitboxes of bullets
rectangles at arbitrary sizes
But only 2) actually draws a full solid line. 1) and 3) are only ever drawn
as horizontal stripes, with a hardcoded distance of 2 vertical tiles
between every stripe of a slope, and 4 vertical tiles between every stripe
of a rectangle. That's 66-75% of each rectangular entity's intended hitbox
not actually taking part in collision detection. Now, if player hitboxes
were ≤ 6 / 3 pixels, we'd have one
possible explanation of how the AI can "cheat", because it could just
precisely move through those blank regions at TAS speeds. So, let's make
this two pushes after all and tell the complete story, since this is one of
the more interesting aspects to still be documented in this game.
And the code only gets worse. While the player
collision detection function is decompilable, it might as well not
have been, because it's just more of the same "optimized", hard-to-follow
assembly. With the four splittable 16-bit registers having a total of 20
different meanings in this function, I would have almost preferred
self-modifying code…
In fact, it was so bad that it prompted some maintenance work on my inline
assembly coding standards as a whole. Turns out that the _asm
keyword is not only still supported in modern Visual Studio compilers, but
also in Clang with the -fms-extensions flag, and compiles fine
there even for 64-bit targets. While that might sound like amazing news at
first ("awesome, no need to rewrite this stuff for my x86_64 Linux
port!"), you quickly realize that almost all inline assembly in this
codebase assumes either PC-98 hardware, segmented 16-bit memory addressing,
or is a temporary hack that will be removed with further RE progress.
That's mainly because most of the raw arithmetic code uses Turbo C++'s
register pseudovariables where possible. While they certainly have their
drawbacks, being a non-standard extension that's not supported in other
x86-targeting C compilers, their advantages are quite significant: They
allow this code to stay in the same language, and provide slightly more
immediate portability to any other architecture, together with
📝 readability and maintainability improvements that can get quite significant when combined with inlining:
// This one line compiles to five ASM instructions, which would need to be
// spelled out in any C compiler that doesn't support register pseudovariables.
// By adding typed aliases for these registers via `#define`, this code can be
// both made even more readable, and be prepared for an easier transformation
// into more portable local variables.
_ES = (((_AX * 4) + _BX) + SEG_PLANE_B);
However, register pseudovariables might cause potential portability issues
as soon as they are mixed with inline assembly instructions that rely on
their state. The lazy way of "supporting pseudo-registers" in other
compilers would involve declaring the full set as global variables, which
would immediately break every one of those instances:
_DI = 0;
_AX = 0xFFFF;
// Special x86 instruction doing the equivalent of
//
// *reinterpret_cast(MK_FP(_ES, _DI)) = _AX;
// _DI += sizeof(uint16_t);
//
// Only generated by Turbo C++ in very specific cases, and therefore only
// reliably available through inline assembly.
asm { movsw; }
What's also not all too standardized, though, are certain variants of
the asm keyword. That's why I've now introduced a distinction
between the _asm keyword for "decently sane" inline assembly,
and the slightly less standard asm keyword for inline assembly
that relies on the contents of pseudo-registers, and should break on
compilers that don't support them. So yeah, have some minor
portability work in exchange for these two pushes not having all that much
in RE'd content.
With that out of the way and the function deciphered, we can confirm the
player hitboxes to be a constant 8×8 /
8×4 pixels, and prove that the hit stripes are nothing but
an adequate optimization that doesn't affect gameplay in any way.
And what's the obvious thing to immediately do if you have both the
collision bitmap and the player hitbox? Writing a "real hitbox" mod, of
course:
Reorder the calls to rendering functions so that player and shot sprites
are rendered after bullets
Blank out all player sprite pixels outside an
8×8 / 8×4 box around the center
point
After the bullet rendering function, turn on the GRCG in RMW mode and
set the tile register set to the background color
Stretch the negated contents of collision bitmap onto each playfield,
leaving only collidable pixels untouched
Do the same with the actual, non-negated contents and a white color, for
extra contrast against the background. This also makes sure to show any
collidable areas whose sprite pixels are transparent, such as with the moon
enemy. (Yeah, how unfair.) Doing that also loses a lot of information about
the playfield, such as enemy HP indicated by their color, but what can you
do:
A decently busy TH03 in-game frame and its underlying collision bitmap,
showing off all three different collision shapes together with the
player hitboxes.
2022-02-18-TH03-real-hitbox.zip
The secret for writing such mods before having reached a sufficient level of
position independence? Put your new code segment into DGROUP,
past the end of the uninitialized data section. That's why this modded
MAIN.EXE is a lot larger than you would expect from the raw amount of new code: The file now actually needs to store all these
uninitialized 0 bytes between the end of the data segment and the first
instruction of the mod code – normally, this number is simply a part of the
MZ EXE header, and doesn't need to be redundantly stored on disk. Check the
th03_real_hitbox
branch for the code.
And now we know why so many "real hitbox" mods for the Windows Touhou games
are inaccurate: The games would simply be unplayable otherwise – or can
you dodge rapidly moving 2×2 /
2×1 blocks as an 8×8 /
8×4 rectangle that is smaller than your shot sprites,
especially without focused movement? I can't.
Maybe it will feel more playable after making explosions visible, but that
would need more RE groundwork first.
It's also interesting how adding two full GRCG-accelerated redraws of both
playfields per frame doesn't significantly drop the game's frame rate – so
why did the drawing functions have to be micro-optimized again? It
would be possible in one pass by using the GRCG's TDW mode, which
should theoretically be 8× faster, but I have to stop somewhere.
Next up: The final missing piece of TH04's and TH05's
bullet-moving code, which will include a certain other
type of projectile as well.
P0162
TH01 decompilation (Player control, part 1/3)
P0163
TH01 decompilation (Player control, part 2/3)
P0164
TH01 decompilation (Player control, part 3/3)
💰 Funded by:
Ember2528, Yanga
🏷️ Tags:
No technical obstacles for once! Just pure overcomplicated ZUN code. Unlike
📝 Konngara's main function, the main TH01
player function was every bit as difficult to decompile as you would expect
from its size.
With TH01 using both separate left- and right-facing sprites for all of
Reimu's moves and separate classes for Reimu's 32×32 and 48×*
sprites, we're already off to a bad start. Sure, sprite mirroring is
minimally more involved on PC-98, as the planar
nature of VRAM requires the bits within an 8-pixel byte to also be
mirrored, in addition to writing the sprite bytes from right to left. TH03
uses a 256-byte lookup table for this, generated at runtime by an infamous
micro-optimized and undecompilable ASM algorithm. With TH01's existing
architecture, ZUN would have then needed to write 3 additional blitting
functions. But instead, he chose to waste a total of 26,112 bytes of memory
on pre-mirrored sprites…
Alright, but surely selecting those sprites from code is no big deal? Just
store the direction Reimu is facing in, and then add some branches to the
rendering code. And there is in fact a variable for Reimu's direction…
during regular arrow-key movement, and another one while shooting and
sliding, and a third as part of the special attack types,
launched out of a slide.
Well, OK, technically, the last two are the same variable. But that's even
worse, because it means that ZUN stores two distinct enums at
the same place in memory: Shooting and sliding uses 1 for left,
2 for right, and 3 for the "invalid" direction of
holding both, while the special attack types indicate the direction in their
lowest bit, with 0 for right and 1 for left. I
decompiled the latter as bitflags, but in ZUN's code, each of the 8
permutations is handled as a distinct type, with copy-pasted and adapted
code… The interpretation of this
two-enum "sub-mode" union variable is controlled
by yet another "mode" variable… and unsurprisingly, two of the bugs in this
function relate to the sub-mode variable being interpreted incorrectly.
Also, "rendering code"? This one big function basically consists of separate
unblit→update→render code snippets for every state and direction Reimu can
be in (moving, shooting, swinging, sliding, special-attacking, and bombing),
pasted together into a tangled mess of nested if(…) statements.
While a lot of the code is copy-pasted, there are still a number of
inconsistencies that defeat the point of my usual refactoring treatment.
After all, with a total of 85 conditional branches, anything more than I did
would have just obscured the control flow too badly, making it even harder
to understand what's going on.
In the end, I spotted a total of 8 bugs in this function, all of which leave
Reimu invisible for one or more frames:
2 frames after all special attacks
2 frames after swing attacks, and
4 frames before swing attacks
Thanks to the last one, Reimu's first swing animation frame is never
actually rendered. So whenever someone complains about TH01 sprite
flickering on an emulator: That emulator is accurate, it's the game that's
poorly written.
And guess what, this function doesn't even contain everything you'd
associate with per-frame player behavior. While it does
handle Yin-Yang Orb repulsion as part of slides and special attacks, it does
not handle the actual player/Orb collision that results in lives being lost.
The funny thing about this: These two things are done in the same function…
Therefore, the life loss animation is also part of another function. This is
where we find the final glitch in this 3-push series: Before the 16-frame
shake, this function only unblits a 32×32 area around Reimu's center point,
even though it's possible to lose a life during the non-deflecting part of a
48×48-pixel animation. In that case, the extra pixels will just stay on
screen during the shake. They are unblitted afterwards though, which
suggests that ZUN was at least somewhat aware of the issue?
Finally, the chance to see the alternate life loss sprite is exactly ⅛.
As for any new insights into game mechanics… you know what? I'm just not
going to write anything, and leave you with this flowchart instead. Here's
the definitive guide on how to control Reimu in TH01 we've been waiting for
24 years:
Pellets are deflected during all gray
states. Not shown is the obvious "double-tap Z and X" transition from
all non-(#1) states to the Bomb state, but that would have made this
diagram even more unwieldy than it turned out. And yes, you can shoot
twice as fast while moving left or right.
While I'm at it, here are two more animations from MIKO.PTN
which aren't referenced by any code:
With that monster of a function taken care of, we've only got boss sprite animation as the final blocker of uninterrupted Sariel progress. Due to some unfavorable code layout in the Mima segment though, I'll need to spend a bit more time with some of the features used there. Next up: The missile bullets used in the Mima and YuugenMagan fights.
Didn't quite get to cover background rendering for TH05's Stage 1-5
bosses in this one, as I had to reverse-engineer two more fundamental parts
involved in boss background rendering before.
First, we got the those blocky transitions from stage tiles to bomb and
boss backgrounds, loaded from BB*.BB and ST*.BB,
respectively. These files store 16 frames of animation, with every bit
corresponding to a 16×16 tile on the playfield. With 384×368 pixels to be
covered, that would require 69 bytes per frame. But since that's a very odd
number to work with in micro-optimized ASM, ZUN instead stores 512×512
pixels worth of bits, ending up with a frame size of 128 bytes, and a
per-frame waste of 59 bytes. At least it was
possible to decompile the core blitting function as __fastcall
for once.
But wait, TH05 comes with, and loads, a bomb .BB file for every character,
not just for the Reimu and Yuuka bomb transitions you see in-game… 🤔
Restoring those unused stage tile → bomb image transition
animations for Mima and Marisa isn't that trivial without having decompiled
their actual bomb animation functions before, so stay tuned!
Interestingly though, the code leaves out what would look like the most
obvious optimization: All stage tiles are unconditionally redrawn
each frame before they're erased again with the 16×16 blocks, no matter if
they weren't covered by such a block in the previous frame, or are
going to be covered by such a block in this frame. The same is true
for the static bomb and boss background images, where ZUN simply didn't
write a .CDG blitting function that takes the dirty tile array into
account. If VRAM writes on PC-98 really were as slow as the games'
README.TXT files claim them to be, shouldn't all the
optimization work have gone towards minimizing them?
Oh well, it's not like I have any idea what I'm talking about here. I'd
better stop talking about anything relating to VRAM performance on PC-98…
Second, it finally was time to solve the long-standing confusion about all
those callbacks that are supposed to render the playfield background. Given
the aforementioned static bomb background images, ZUN chose to make this
needlessly complicated. And so, we have two callback function
pointers: One during bomb animations, one outside of bomb
animations, and each boss update function is responsible for keeping the
former in sync with the latter.
Other than that, this was one of the smoothest pushes we've had in a while;
the hardest parts of boss background rendering all were part of
📝 the last push. Once you figured out that
ZUN does indeed dynamically change hardware color #0 based on the current
boss phase, the remaining one function for Shinki, and all of EX-Alice's
background rendering becomes very straightforward and understandable.
Meanwhile, -Tom- told me about his plans to publicly
release 📝 his TH05 scripting toolkit once
TH05's MAIN.EXE would hit around 50% RE! That pretty much
defines what the next bunch of generic TH05 pushes will go towards:
bullets, shared boss code, and one
full, concrete boss script to demonstrate how it's all combined. Next up,
therefore: TH04's bullet firing code…? Yes, TH04's. I want to see what I'm
doing before I tackle the undecompilable mess that is TH05's bullet firing
code, and you all probably want readable code for that feature as
well. Turns out it's also the perfect place for Blue Bolt's
pending contributions.
Done with the .BOS format, at last! While there's still quite a bunch of
undecompiled non-format blitting code left, this was in fact the final
piece of graphics format loading code in TH01.
📝 Continuing the trend from three pushes ago,
we've got yet another class, this time for the 48×48 and 48×32 sprites
used in Reimu's gohei, slide, and kick animations. The only reason these
had to use the .BOS format at all is simply because Reimu's regular
sprites are 32×32, and are therefore loaded from
📝 .PTN files.
Yes, this makes no sense, because why would you split animations for
the same character across two file formats and two APIs, just because
of a sprite size difference?
This necessity for switching blitting APIs might also explain why Reimu
vanishes for a few frames at the beginning and the end of the gohei swing
animation, but more on that once we get to the high-level rendering code.
Now that we've decompiled all the .BOS implementations in TH01, here's an
overview of all of them, together with .PTN to show that there really was
no reason for not using the .BOS API for all of Reimu's sprites:
CBossEntity
CBossAnim
CPlayerAnim
ptn_* (32×32)
Format
.BOS
.BOS
.BOS
.PTN
Hitbox
✔
✘
✘
✘
Byte-aligned blitting
✔
✔
✔
✔
Byte-aligned unblitting
✔
✘
✔
✔
Unaligned blitting
Single-line and wave only
✘
✘
✘
Precise unblitting
✔
✘
✔
✔
Per-file sprite limit
8
8
32
64
Pixels blitted at once
16
16
8
32
And even that last property could simply be handled by branching based on
the sprite width, and wouldn't be a reason for switching formats. But
well, it just wouldn't be TH01 without all that redundant bloat though,
would it?
The basic loading, freeing, and blitting code was yet another variation
on the other .BOS code we've seen before. So this should have caused just
as little trouble as the CBossAnim code… except that
CPlayerAnimdid add one slightly difficult function to
the mix, which led to it requiring almost a full push after all.
Similar to 📝 the unblitting code for moving lasers we've seen in the last push,
ZUN tries to minimize the amount of VRAM writes when unblitting Reimu's
slide animations. Technically, it's only necessary to restore the pixels
that Reimu traveled by, plus the ones that wouldn't be redrawn by
the new animation frame at the new X position.
The theoretically arbitrary distance between the two sprites is, of
course, modeled by a fixed-size buffer on the stack
, coming with the further assumption that the
sprite surely hasn't moved by more than 1 horizontal VRAM byte compared to
the last frame. Which, of course, results in glitches if that's not the
case, leaving little Reimu parts in VRAM if the slide speed ever exceeded
8 pixels per frame. (Which it never does,
being hardcoded to 6 pixels, but still.). As it also turns out, all those
bit masking operations easily lead to incredibly sloppy C code.
Which compiles into incredibly terrible ASM, which in turn might end up
wasting way more CPU time than the final VRAM write optimization would
have gained? Then again, in-depth profiling is way beyond the scope of
this project at this point.
Next up: The TH04 main menu, and some more technical debt.
P0111
TH05 RE (Code around the final MAIN.EXE data references, part 1/2)
P0112
TH05 RE (Code around the final MAIN.EXE data references, part 2/2)
💰 Funded by:
[Anonymous], Blue Bolt
🏷️ Tags:
Only one newly ordered push since I've reopened the store? Great, that's
all the justification I needed for the extended maintenance delay that was
part of these two pushes 😛
Having to write comments to explain whether coordinates are relative to
the top-left corner of the screen or the top-left corner of the playfield
has finally become old. So, I introduced
distinct
types for all the coordinate systems we typically encounter, applying
them to all code decompiled so far. Note how the planar nature of PC-98
VRAM meant that X and Y coordinates also had to be different from each
other. On the X side, there's mainly the distinction between the
[0; 640] screen space and the corresponding [0; 80] VRAM byte
space. On the Y side, we also have the [0; 400] screen space, but
the visible area of VRAM might be limited to [0; 200] when running in
the PC-98's line-doubled 640×200 mode. A VRAM Y coordinate also always
implies an added offset for vertical scrolling.
During all of the code reconstruction, these types can only have a
documenting purpose. Turning them into anything more than just
typedefs to int, in order to define conversion
operators between them, simply won't recompile into identical binaries.
Modding and porting projects, however, now have a nice foundation for
doing just that, and can entirely lift coordinate system transformations
into the type system, without having to proofread all the meaningless
int declarations themselves.
So, what was left in terms of memory references? EX-Alice's fire waves
were our final unknown entity that can collide with the player. Decently
implemented, with little to say about them.
That left the bomb animation structures as the one big remaining PI
blocker. They started out nice and simple in TH04, with a small 6-byte
star animation structure used for both Reimu and Marisa. TH05, however,
gave each character her own animation… and what the hell is going
on with Reimu's blue stars there? Nope, not going to figure this out on
ASM level.
A decompilation first required some more bomb-related variables to be
named though. Since this was part of a generic RE push, it made sense to
do this in all 5 games… which then led to nice PI gains in anything
but TH05. Most notably, we now got the
"pulling all items to player" flag in TH04 and TH05, which is
actually separate from bombing. The obvious cheat mod is left as an
exercise to the reader.
So, TH05 bomb animations. Just like the
📝 custom entity types of this game, all 4
characters share the same memory, with the superficially same 10-byte
structure.
But let's just look at the very first field. Seen from a low level, it's a
simple struct { int x, y; } pos, storing the current position
of the character-specific bomb animation entity. But all 4 characters use
this field differently:
For Reimu's blue stars, it's the top-left position of each star, in the
12.4 fixed-point format. But unlike the vast majority of these values in
TH04 and TH05, it's relative to the top-left corner of the
screen, not the playfield. Much better represented as
struct { Subpixel screen_x, screen_y; } topleft.
For Marisa's lasers, it's the center of each circle, as a regular 12.4
fixed-point coordinate, relative to the top-left corner of the playfield.
Much better represented as
struct { Subpixel x, y; } center.
For Mima's shrinking circles, it's the center of each circle in regular
pixel coordinates. Much better represented as
struct { screen_x_t x; screen_y_t y; } center.
For Yuuka's spinning heart, it's the top-left corner in regular pixel
coordinates. Much better represented as
struct { screen_x_t x; screen_y_t y; } topleft.
And yes, singular. The game is actually smart enough to only store a single
heart, and then create the rest of the circle on the fly. (If it were even
smarter, it wouldn't even use this structure member, but oh well.)
Therefore, I decompiled it as 4 separate structures once again, bundled
into an union of arrays.
As for Reimu… yup, that's some pointer arithmetic straight out of
Jigoku* for setting and updating the positions of the falling star
trails. While that certainly required several
comments to wrap my head around the current array positions, the one "bug"
in all this arithmetic luckily has no effect on the game.
There is a small glitch with the growing circles, though. They are
spawned at the end of the loop, with their position taken from the star
pointer… but after that pointer has already been incremented. On
the last loop iteration, this leads to an out-of-bounds structure access,
with the position taken from some unknown EX-Alice data, which is 0 during
most of the game. If you look at the animation, you can easily spot these
bugged circles, consistently growing from the top-left corner (0, 0)
of the playfield:
After all that, there was barely enough remaining time to filter out and
label the final few memory references. But now, TH05's
MAIN.EXE is technically position-independent! 🎉
-Tom- is going to work on a pretty extensive demo of this
unprecedented level of efficient Touhou game modding. For a more impactful
effect of both the 100% PI mark and that demo, I'll be delaying the push
covering the remaining false positives in that binary until that demo is
done. I've accumulated a pretty huge backlog of minor maintenance issues
by now…
Next up though: The first part of the long-awaited build system
improvements. I've finally come up with a way of sanely accelerating the
32-bit build part on most setups you could possibly want to build ReC98
on, without making the building experience worse for the other few setups.
P0096
TH01 decompilation (.PTN format, part 2)
P0097
TH01 decompilation (Orb physics)
P0098
TH01 decompilation (Player shots)
💰 Funded by:
Ember2528, Yanga
🏷️ Tags:
So, let's finally look at some TH01 gameplay structures! The obvious
choices here are player shots and pellets, which are conveniently located
in the last code segment. Covering these would therefore also help in
transferring some first bits of data in REIIDEN.EXE from ASM
land to C land. (Splitting the data segment would still be quite
annoying.) Player shots are immediately at the beginning…
…but wait, these are drawn as transparent sprites loaded from .PTN files.
Guess we first have to spend a push on
📝 Part 2 of this format.
Hm, 4 functions for alpha-masked blitting and unblitting of both 16×16 and
32×32 .PTN sprites that align the X coordinate to a multiple of 8
(remember, the PC-98 uses a
planar
VRAM memory layout, where 8 pixels correspond to a byte), but only one
function that supports unaligned blitting to any X coordinate, and only
for 16×16 sprites? Which is only called twice? And doesn't come with a
corresponding unblitting function?
Yeah, "unblitting". TH01 isn't
double-buffered,
and uses the PC-98's second VRAM page exclusively to store a stage's
background and static sprites. Since the PC-98 has no hardware sprites,
all you can do is write pixels into VRAM, and any animated sprite needs to
be manually removed from VRAM at the beginning of each frame. Not using
double-buffering theoretically allows TH01 to simply copy back all 128 KB
of VRAM once per frame to do this. But that
would be pretty wasteful, so TH01 just looks at all animated sprites, and
selectively copies only their occupied pixels from the second to the first
VRAM page.
Alright, player shot class methods… oh, wait, the collision functions
directly act on the Yin-Yang Orb, so we first have to spend a push on
that one. And that's where the impression we got from the .PTN
functions is confirmed: The orb is, in fact, only ever displayed at
byte-aligned X coordinates, divisible by 8. It's only thanks to the
constant spinning that its movement appears at least somewhat
smooth.
This is purely a rendering issue; internally, its position is
tracked at pixel precision. Sadly, smooth orb rendering at any unaligned X
coordinate wouldn't be that trivial of a mod, because well, the
necessary functions for unaligned blitting and unblitting of 32×32 sprites
don't exist in TH01's code. Then again, there's so much potential for
optimization in this code, so it might be very possible to squeeze those
additional two functions into the same C++ translation unit, even without
position independence…
More importantly though, this was the right time to decompile the core
functions controlling the orb physics – probably the highlight in these
three pushes for most people.
Well, "physics". The X velocity is restricted to the 5 discrete states of
-8, -4, 0, 4, and 8, and gravity is applied by simply adding 1 to the Y
velocity every 5 frames No wonder that this can
easily lead to situations in which the orb infinitely bounces from the
ground.
At least fangame authors now have
a
reference of how ZUN did it originally, because really, this bad
approximation of physics had to have been written that way on purpose. But
hey, it uses 64-bit floating-point variables!
…sometimes at least, and quite randomly. This was also where I had to
learn about Turbo C++'s floating-point code generation, and how rigorously
it defines the order of instructions when mixing double and
float variables in arithmetic or conditional expressions.
This meant that I could only get ZUN's original instruction order by using
literal constants instead of variables, which is impossible right now
without somehow splitting the data segment. In the end, I had to resort to
spelling out ⅔ of one function, and one conditional branch of another, in
inline ASM. 😕 If ZUN had just written 16.0 instead of
16.0f there, I would have saved quite some hours of my life
trying to decompile this correctly…
To sort of make up for the slowdown in progress, here's the TH01 orb
physics debug mod I made to properly understand them. Edit
(2022-07-12): This mod is outdated,
📝 the current version is here!2020-06-13-TH01OrbPhysicsDebug.zip
To use it, simply replace REIIDEN.EXE, and run the game
in debug mode, via game d on the DOS prompt.
Its code might also serve as an example of how to achieve this sort of
thing without position independence.
Alright, now it's time for player shots though. Yeah, sure, they
don't move horizontally, so it's not too bad that those are also
always rendered at byte-aligned positions. But, uh… why does this code
only use the 16×16 alpha-masked unblitting function for decaying shots,
and just sloppily unblits an entire 16×16 square everywhere else?
The worst part though: Unblitting, moving, and rendering player shots
is done in a single function, in that order. And that's exactly where
TH01's sprite flickering comes from. Since different types of sprites are
free to overlap each other, you'd have to first unblit all types, then
move all types, and then render all types, as done in later
PC-98 Touhou games. If you do these three steps per-type instead, you
will unblit sprites of other types that have been rendered before… and
therefore end up with flicker.
Oh, and finally, ZUN also added an additional sloppy 16×16 square unblit
call if a shot collides with a pellet or a boss, for some
guaranteed flicker. Sigh.
And that's ⅓ of all ZUN code in TH01 decompiled! Next up: Pellets!
Turns out that covering TH03's 128-byte player structure was way
more insightful than expected! And while it doesn't include every
bit of per-player data, we still got to know quite a bit about the game
from just trying to name its members:
50 frames of invincibility when starting a new round
110 frames of invincibility when getting hit
64 frames of knockback when getting hit
128 frames before a charged up gauge/boss attack is fired
automatically
The damage a player will take from the next hit starts out at ½ heart
at the beginning of each round, and increases by another ½ heart every
1024 frames, capped at a maximum of 3 hearts. This guarantees that a
player will always survive at least two hits.
In Story Mode, hit damage is biased in favor of the player for the
first 6 stages. The CPU will always take an additional 1½ hearts of damage
in stages 1 and 2, 1 heart in stages 3 and 4, and ½ heart in stages 5 and
6, plus the above frame-based and capped damage amount. So while it's
therefore possible to cause 4½ hearts of damage in Stages 1 and 2 if the
first hit is somehow delayed for at least 5120 frames, you'd still win
faster if the CPU gets hit as soon as possible.
CPU players will charge up a gauge/boss attack as soon as their gauge
has reached a certain level. These levels are now proved to be random; at
the start of every round, the game generates a sequence of 64 gauge level
positions (from 1 to 4), separately for each player. If a round were to
last long enough for a CPU player to fire all 64 of those predetermined
attacks, you'd observe that sequence repeating.
Yes, that means that in theory, these levels can be
RNG-manipulated. More details on that once we got this game's resident
structure, where the seed is stored.
CPU players follow two main strategies: trying to not get hit, and…
not quite doing that once they've survived for a certain safety threshold
of frames. For the first 2000 frames of a round, this safety frame counter
is reset to 0 every 64 frames, leading the CPU to switch quickly between
the two strategies in the first few Story Mode stages on lower
difficulties, where this safety threshold is less than 64. The calculation
of the actual value is a bit more complex; more on that also once we got
this game's resident structure.
Section 13 of 夢時空.TXT states that Boss Attacks are only counted
towards the Clear Bonus if they were caused by reaching a certain number
of spell points. This is incorrect; manually charged Level 4 Boss Attacks
are counted as well.
The next TH03 pushes can now cover all the functions that reference this
structure in one way or another, and actually commit all this research and
translate it into some RE%. Since the non-TH05 priorities have become a
bit unclear after the last 50 € RE contribution though (as of this
writing, it's still 10 € to decide on what game to cover in two RE
pushes!), I'll be returning to TH05 until that's decided.
Big gains, as expected, but not much to say about this one. With TH05 Reimu
being way too easy to decompile after
📝 the shot control groundwork done in October,
there was enough time to give the comprehensive PI false-positive
treatment to two other sets of functions present in TH04's and TH05's
OP.EXE. One of them, master.lib's super_*()
functions, was used a lot in TH02, more than in any other game… I
wonder how much more that game will progress without even focusing on it
in particular.
Alright then! 100% PI for TH04's and TH05's OP.EXE upcoming…
(Edit: Already got funding to cover this!)
… nope, with a game whose MAIN.EXE is still just 5%
reverse-engineered and which naturally makes heavy use of
structures, there's still a lot more PI groundwork to be done before RE
progress can speed up to the levels that we've now reached with TH05. The
good news is that this game is (now) way easier to understand: In contrast
to TH04 and TH05, where we needed to work towards player shots over a
two-digit number of pushes, TH03 only needed two for SPRITE16, and a half
one for the playfield shaking mechanism. After that, I could even already
decompile the per-frame shot update and render functions, thanks to TH03's
high number of code segments. Now, even the big 128-byte player structure
doesn't seem all too far off.
Then again, as TH03 shares no code with any other game, this actually was
a completely average PI push. For the remaining three, we'll return to
TH04 and TH05 though, which should more than make up for the slight drop
in RE speed after this one.
In other news, we've now also reached peak C++, with the introduction of
templates! TH03 stores movement speeds in a 4.4 fixed-point
format, which is an 8-bit spin on the usual 16-bit, 12.4 fixed-point
format.
And just in time for zorg's last outstanding pushes, the
TH05 shot type control functions made the speedup happen!
TH05 as a whole is now 20% reverse-engineered, and 50% position
independent,
TH05's MAIN.EXE is now even below TH02's in terms of not
yet RE'd instructions,
and all price estimates have now fallen significantly.
It would have been really nice to also include Reimu's shot
control functions in this last push, but figuring out this entire system,
with its weird bitflags and switch statement
micro-optimizations, was once again taking way longer than it should
have. Especially with my new-found insistence on turning this obvious
copy-pasta into something somewhat readable and terse…
But with such a rather tabular visual structure, things should now be
moddable in hopefully easily consistent way. Of course, since we're
only at 54% position independence for MAIN.EXE,
this isn't possible yet without
crashing the game, but modifying damage would already work.
Deathbombs confirmed, in both TH04 and TH05! On the surface, it's the same
8-frame window as in
most Windows games, but due to the slightly lower PC-98 frame rate of
56.4 Hz, it's actually slightly more lenient in TH04 and TH05.
The last function in front of the TH05 shot type control functions marks
the player's previous position in VRAM to be redrawn. But as it turns out,
"player" not only means "the player's option satellites on shot levels ≥
2", but also "the explosion animation if you lose a life", which required
reverse-engineering both things, ultimately leading to the confirmation of
deathbombs.
It actually was kind of surprising that we then had reverse-engineered
everything related to rendering all three things mentioned above,
and could also cover the player rendering function right now. Luckily,
TH05 didn't decide to also micro-optimize that function into
un-decompilability; in fact, it wasn't changed at all from TH04. Unlike
the one invalidation function whose decompilation would have
actually been the goal here…
But now, we've finally gotten to where we wanted to… and only got 2
outstanding decompilation pushes left. Time to get the website ready for
hosting an actual crowdfunding campaign, I'd say – It'll make a better
impression if people can still see things being delivered after the big
announcement.
So, let's continue with player shots! …eh, or maybe not directly, since they involve two other structure types in TH05, which we'd have to cover first. One of them is a different sort of sprite, and since I like me some context in my reverse-engineering, let's disable every other sprite type first to figure out what it is.
One of those other sprite types were the little sparks flying away from killed stage enemies, midbosses, and grazed bullets; easy enough to also RE right now. Turns out they use the same 8 hardcoded 8×8 sprites in TH02, TH04, and TH05. Except that it's actually 64 16×8 sprites, because ZUN wanted to pre-shift them for all 8 possible start pixels within a planar VRAM byte (rather than, like, just writing a few instructions to shift them programmatically), leading to them taking up 1,024 bytes rather than just 64.
Oh, and the thing I wanted to RE *actually* was the decay animation whenever a shot hits something. Not too complex either, especially since it's exclusive to TH05.
And since there was some time left and I actually have to pick some of the next RE places strategically to best prepare for the upcoming 17 decompilation pushes, here's two more function pointers for good measure.
Stumbled across one more drawing function in the way… which was only a duplicated and seemingly pointlessly micro-optimized copy of master.lib's super_roll_put_tiny() function, used for fast display of 4-color 16×16 sprites.
With this out of the way, we can tackle player shot sprite animation next. This will get rid of a lot of code, since every power level of every character's shot type is implemented in its own function. Which makes up thousands of instructions in both TH04 and TH05 that we can nicely decompile in the future without going through a dedicated reverse-engineering step.
What do you do if the TH06 text image feature for thcrap should have been done 3 days™ ago, but keeps getting more and more complex, and you have a ton of other pushes to deliver anyway? Get some distraction with some light ReC98 reverse-engineering work. This is where it becomes very obvious how much uth05win helps us with all the games, not just TH05.
5a5c347 is the most important one in there, this was the missing substructure that now makes every other sprite-like structure trivial to figure out.