Slight change of plans, because we got instructions for
reliably reproducing the TH04 Kurumi Divide Error crash! Major thanks to
Colin Douglas Howell. With those, it also made sense to immediately look at
the crash in the Stage 4 Marisa fight as well. This way, I could release
both of the obligatory bugfix mods at the same time.
Especially since it turned out that I was wrong: Both crashes are entirely
unrelated to the custom entity structure that would have required PI-centric
progress. They are completely specific to Kurumi's and Marisa's
danmaku-pattern code, and really are two separate bugs
with no connection to each other. All of the necessary research nicely fit
into Arandui's 0.5 pushes, with no further deep understanding
required here.
But why were there still three weeks between Colin's message and this blog
post? DMCA distractions aside: There are no easy fixes this time, unlike
📝 back when I looked at the Stage 5 Yuuka crash.
Just like how division by zero is undefined in mathematics, it's also,
literally, undefined what should happen instead of these two
Divide error crashes. This means that any possible "fix" can
only ever be a fanfiction interpretation of the intentions behind ZUN's
code. The gameplay community should be aware of this, and
might decide to handle these cases differently. And if we
have to go into fanfiction territory to work around crashes in the
canon games, we'd better document what exactly we're fixing here and how, as
comprehensible as possible.
With that out of the way, let's look at Kurumi's crash first, since it's way
easier to grasp. This one is known to primarily happen to new players, and
it's easy to see why:
In one of the patterns in her third phase, Kurumi fires a series of 3
aimed rings from both edges of the playfield. By default (that is, on Normal
and with regular rank), these are 6-way rings.
6 happens to be quite a peculiar number here, due to how rings are
(manually) tuned based on the current "rank" value (playperf)
before being fired. The code, abbreviated for clarity:
Let's look at the range of possible playperf values per
difficulty level:
Easy
Normal
Hard
Lunatic
Extra
playperf_min
4
11
20
22
16
playperf_max
16
24
32
34
20
Edit (2022-05-24): This blog post initially had
26 instead of 16 for playperf_min for the Extra Stage. Thanks
to Popfan for pointing out that typo!
Reducing rank to its minimum on Easy mode will therefore result in a
0-ring after tuning.
To calculate the individual angles of each bullet in a ring, ZUN divides
360° (or, more correctly,
📝 0x100) by the total number of
bullets…
Boom, division by zero.
So, what should the workaround look like? Obviously, we want to modify
neither the default number of ring bullets nor the tuning algorithm – that
would change all other non-crashing variations of this pattern on other
difficulties and ranks, creating a fork of the original gameplay. Instead, I
came up with four possible workarounds that all seemed somewhat logical to
me:
Firing no bullet, i.e., interpreting 0-ring literally. This would
create the only constellation in which a call to the bullet group spawn
functions would not spawn at least one new bullet.
Firing a "1-ring", i.e., a single bullet. This would be consistent with
how the bullet spawn functions behave for "0-way" stack and spread
groups.
Firing a "∞-ring", i.e., 200 bullets, which is as much as the game's cap
on 16×16 bullets would allow. This would poke fun at the whole "division by
zero" idea… but given that we're still talking about Easy Mode (and
especially new players) here, it might be a tad too cruel. Certainly the
most trollish interpretation.
Triggering an immediate Game Over, exchanging the hard crash for a
softer and more controlled shutdown. Certainly the option that would be
closest to the behavior of the original games, and perhaps the only one to
be accepted in Serious, High-Level Play™.
As I was writing this post, it felt increasingly wrong for me to make this
decision. So I once again went to Twitter, where 56.3%
voted in favor of the 1-bullet option. Good that I asked! I myself was
more leaning towards the 0-bullet interpretation, which only got 28.7% of
the vote. Also interesting are the 2.3% in favor of the Game Over option but
I get it, low-rank Easy Mode isn't exactly the most competitive mode of
playing TH04.
There are reports of Kurumi crashing on higher difficulties as well, but I
could verify none of them. If they aren't fixed by this workaround, they're
caused by an entirely different bug that we have yet to discover.
Onto the Stage 4 Marisa crash then, which does in fact apply to all
difficulty levels. I was also wrong on this one – it's a hell of a lot more
intricate than being just a division by the number of on-screen bits.
Without having decompiled the entire fight, I can't give a completely
accurate picture of what happens there yet, but here's the rough idea:
Marisa uses different patterns, depending on whether at least one of her
bits is still alive, or all of them have been destroyed.
Destroying the last bit will immediately switch to the bit-less
counterpart of the current pattern.
The bits won't respawn before the pattern ended, which ensures that the
bit-less version is always shown in its entirety after being started or
switched into.
In two of the bit-less patterns, Marisa gradually moves to the point
reflection of her position at the start of the pattern across the playfield
coordinate of (192, 112), or (224, 128) on screen.
The velocity of this movement is determined by both her distance to that
point and the total amount of frames that this instance of the bit-less
pattern will last.
Since this frame amount is directly tied to the frame the player
destroyed the last bit on, it becomes a user-controlled variable. I think
you can see where this is going…
The last 12 frames of this duration, however, are always reserved for a
"braking phase", where Marisa's velocity is halved on each frame.
This part of the code only runs every 4 frames though. This expands the
time window for this crash to 4 frames, rather than just the two frames you
would expect from looking at the division itself.
Both of the broken patterns run for a maximum of 160 frames. Therefore,
the crash will occur when Marisa's last bit is destroyed between frame 152
and 155 inclusive. On these frames, the
last_frame_with_bits_alive variable is set to 148, which is the
crucial 12 duration frames away from the maximum of 160.
Interestingly enough, the calculated velocity is also only
applied every 4 frames, with Marisa actually staying still for the 3 frames
inbetween. As a result, she either moves
too slowly to ever actually reach the yellow point if the last bit
was destroyed early in the pattern (see destruction frames 68 or
112),
or way too quickly, and almost in a jerky, teleporting way (see
destruction frames 144 or 148).
Finally, as you may have already gathered from the formula: Destroying
the last bit between frame 156 and 160 inclusive results in
duration values of 8 or 4. These actually push Marisa
away from the intended point, as the divisor becomes negative.
tl;dr: "Game crashes if last bit destroyed within 4-frame window near end of
two patterns". For an informed decision on a new movement behavior for these
last 8 frames, we definitely need to know all the details behind the crash
though. Here's what I would interpret into the code:
Not moving at all, i.e., interpreting 0 as the middle ground between
positive and negative movement. This would also make sense because a
12-frame duration implies 100% of the movement to consist of
the braking phase – and Marisa wasn't moving before, after all.
Move at maximum speed, i.e., dividing by 1 rather than 0. Since the
movement duration is still 12 in this case, Marisa will immediately start
braking. In total, she will move exactly ¾ of the way from her initial
position to (192, 112) within the 8 frames before the pattern
ends.
Directly warping to (192, 112) on frame 0, and to the
point-reflected target on 4, respectively. This "emulates" the division by
zero by moving Marisa at infinite speed to the exact two points indicated by
the velocity formula. It also fits nicely into the 8 frames we have to fill
here. Sure, Marisa can't reach these points at any other duration, but why
shouldn't she be able to, with infinite speed? Then again, if Marisa
is far away enough from (192, 112), this workaround would warp her
across the entire playfield. Can Marisa teleport according to lore? I
have no idea…
Triggering an immediate Game O– hell no, this is the Stage 4 boss,
people already hate losing runs to this bug!
Asking Twitter worked great for the Kurumi workaround, so let's do it again!
Gotta attach a screenshot of an earlier draft of this blog post though,
since this stuff is impossible to explain in tweets…
…and it went
through the roof, becoming the most successful ReC98 tweet so far?!
Apparently, y'all really like to just look at descriptions of overly complex
bugs that I'd consider way beyond the typical attention span that can be
expected from Twitter. Unfortunately, all those tweet impressions didn't
quite translate into poll turnout. The results
were pretty evenly split between 1) and 2), with option 1) just coming out
slightly ahead at 49.1%, compared to 41.5% of option 2).
(And yes, I only noticed after creating the poll that warping to both the
green and yellow points made more sense than warping to just one of the two.
Let's hope that this additional variant wouldn't have shifted the results
too much. Both warp options only got 9.4% of the vote after all, and no one
else came up with the idea either. In the end,
you can always merge together your preferred combination of workarounds from
the Git branches linked below.)
So here you go: The new definitive version of TH04, containing not only the
community-chosen Kurumi and Stage 4 Marisa workaround variant, but also the
📝 No-EMS bugfix from last year.
Edit (2022-05-31): This package is outdated, 📝 the current version is here!2022-04-18-community-choice-fixes.zip
Oh, and let's also add spaztron64's TH03 GDC clock fix
from 2019 because why not. This binary was built from the community_choice_fixes
branch, and you can find the code for all the individual workarounds on
these branches:
Again, because it can't be stated often enough: These fixes are
fanfiction. The gameplay community should be aware of
this, and might decide to handle these cases differently.
With all of that taking way more time to evaluate and document, this
research really had to become part of a proper push, instead of just being
covered in the quick non-push blog post I initially intended. With ½ of a
push left at the end, TH05's Stage 1-5 boss background rendering functions
fit in perfectly there. If you wonder how these static backdrop images even
need any boss-specific code to begin with, you're right – it's basically the
same function copy-pasted 4 times, differing only in the backdrop image
coordinates and some other inconsequential details.
Only Sara receives a nice variation of the typical
📝 blocky entrance animation: The usually
opaque bitmap data from ST00.BB is instead used as a transition
mask from stage tiles to the backdrop image, by making clever use of the
tile invalidation system:
TH04 uses the same effect a bit more frequently, for its first three bosses.
Next up: Shinki, for real this time! I've already managed to decompile 10 of
her 11 danmaku patterns within a little more than one push – and yes,
that one is included in there. Looks like I've slightly
overestimated the amount of work required for TH04's and TH05's bosses…
Two years after
📝 the first look at TH04's and TH05's bullets,
we finally get to finish their logic code by looking at the special motion
types. Bullets as a whole still aren't completely finished as the
rendering code is still waiting to be RE'd, but now we've got everything
about them that's required for decompiling the midboss and boss fights of
these games.
Just like the motion types of TH01's pellets, the ones we've got here really
are special enough to warrant an enum, despite all the
overlap in the "slow down and turn" and "bounce at certain edges of the
playfield" types. Sure, including them in the bitfield I proposed two years
ago would have allowed greater variety, but it wouldn't have saved any
memory. On the contrary: These types use a single global state variable for
the maximum turn count and delta speed, which a proper customizable
architecture would have to integrate into the bullet structure. Maybe it is
possible to stuff everything into the same amount of bytes, but not without
first completely rearchitecting the bullet structure and removing every
single piece of redundancy in there. Simply extending the system by adding a
new enum value for a new motion type would be way more
straightforward for modders.
Speaking about memory, TH05 already extends the bullet structure by 6 bytes
for the "exact linear movement" type exclusive to that game. This type is
particularly interesting for all the prospective PC-98 game developers out
there, as it nicely points out the precision limits of Q12.4 subpixels.
Regular bullet movement works by adding a Q12.4 velocity to a Q12.4 position
every frame, with the velocity typically being calculated only once on spawn
time from an 8-bit angle and a Q12.4 speed. Quantization errors from this
initial calculation can quickly compound over all the frames a bullet spends
moving across the playfield. If a bullet is only supposed to move on a
straight line though, there is a more precise way of calculating its
position: By storing the origin point, movement angle, and total distance
traveled, you can perform a full polar→Cartesian transformation every frame.
Out of the 10 danmaku patterns in TH05 that use this motion type, the
difference to regular bullet movement can be best seen in Louise's final
pattern:
Not far away from the regular bullet code, we've also got the movement
function for the infamous curve / "cheeto" bullets. I would have almost
called them "cheetos" in the code as well, which surely fits more nicely
into 8.3 filenames than "curve bullets" does, but eh, trademarks…
As for hitboxes, we got a 16×16 one on the head node, and a 12×12 one on the
16 trail nodes. The latter simply store the position of the head node during
the last 16 frames, Snake style. But what you're all here for is probably
the turning and homing algorithm, right? Boiled down to its essence, it
works like this:
// [head] points to the controlled "head" part of a curve bullet entity.
// Angles are stored with 8 bits representing a full circle, providing free
// normalization on arithmetic overflow.
// The directions are ordered as you would expect:
// • 0x00: right (sin(0x00) = 0, cos(0x00) = +1)
// • 0x40: down (sin(0x40) = +1, cos(0x40) = 0)
// • 0x80: left (sin(0x80) = 0, cos(0x80) = -1)
// • 0xC0: up (sin(0xC0) = -1, cos(0xC0) = 0)
uint8_t angle_delta = (head->angle - player_angle_from(
head->pos.cur.x, head->pos.cur.y
));
// Stop turning if the player is 1/128ths of a circle away from this bullet
const uint8_t SNAP = 0x02;
// Else, turn either clockwise or counterclockwise by 1/256th of a circle,
// depending on what would reach the player the fastest.
if((angle_delta > SNAP) && (angle_delta < static_cast<uint8_t>(-SNAP))) {
angle_delta = (angle_delta >= 0x80) ? -0x01 : +0x01;
}
head_p->angle -= angle_delta;
5 lines of code, and not all too difficult to follow once you are familiar
with 8-bit angles… unlike what ZUN actually wrote. Which is 26 lines,
and includes an unused "friction" variable that is never set to any value
that makes a difference in the formula. uth05win
correctly saw through that all and simplified this code to something
equivalent to my explanation. Redoing that work certainly wasted a bit of my
time, and means that I now definitely need to spend another push on RE'ing
all the shared boss functions before I can start with Shinki.
So while a curve bullet's speed does get faster over time, its
angular velocity is always limited to 1/256th of a
circle per frame. This reveals the optimal strategy for dodging them:
Maximize this delta angle by staying as close to 180° away from their
current direction as possible, and let their acceleration do the rest.
At least that's the theory for dodging a single one. As a danmaku
designer, you can now of course place other bullets at these technically
optimal places to prevent a curve bullet pattern from being cheesed like
that. I certainly didn't record the video above in a single take either…
After another bunch of boring entity spawn and update functions, the
playfield shaking feature turned out as the most notable (and tricky) one to
round out these two pushes. It's actually implemented quite well in how it
simply "un-shakes" the screen by just marking every stage tile to be
redrawn. In the context of all the other tile invalidation that can take
place during a frame, that's definitely more performant than
📝 doing another EGC-accelerated memmove().
Due to these two games being double-buffered via page flipping, this
invalidation only really needs to happen for the frame after the next
one though. The immediately next frame will show the regular, un-shaken
playfield on the other VRAM page first, except during the multi-frame
shake animation when defeating a midboss, where it will also appear shifted
in a different direction… 😵 Yeah, no wonder why ZUN just always invalidates
all stage tiles for the next two frames after every shaking animation, which
is guaranteed to handle both sporadic single-frame shakes and continuous
ones. So close to good-code here.
Finally, this delivery was delayed a bit because -Tom-
requested his round-up amount to be limited to the cap in the future. Since
that makes it kind of hard to explain on a static page how much money he
will exactly provide, I now properly modeled these discounts in the website
code. The exact round-up amount is now included in both the pre-purchase
breakdown, as well as the cap bar on the main page.
With that in place, the system is now also set up for round-up offers from
other patrons. If you'd also like to support certain goals in this way, with
any amount of money, now's the time for getting in touch with me about that.
Known contributors only, though! 😛
Next up: The final bunch of shared boring boss functions. Which certainly
will give me a break from all the maintenance and research work, and speed
up delivery progress again… right?
Alright, the score popup numbers shown when collecting items or defeating
(mid)bosses. The second-to-last remaining big entity type in TH05… with
quite some PI false positives in the memory range occupied by its data.
Good thing I still got some outstanding generic RE pushes that haven't
been claimed for anything more specific in over a month! These
conveniently allowed me to RE most of these functions right away, the
right way.
Most of the false positives were boss HP values, passed to a "boss phase
end" function which sets the HP value at which the next phase should end.
Stage 6 Yuuka, Mugetsu, and EX-Alice have their own copies of this
function, in which they also reset certain boss-specific global variables.
Since I always like to cover all varieties of such duplicated functions at
once, it made sense to reverse-engineer all the involved variables while I
was at it… and that's why this was exactly the right time to cover the
implementation details of Stage 6 Yuuka's parasol and vanishing animations
in TH04.
With still a bit of time left in that RE push afterwards, I could also
start looking into some of the smaller functions that didn't quite fit
into other pushes. The most notable one there was a simple function that
aims from any point to the current player position. Which actually only
became a separate function in TH05, probably since it's called 27 times in
total. That's 27 places no longer being blocked from further RE progress.
WindowsTiger already
did most of the work for the score popup numbers in January, which meant
that I only had to review it and bring it up to ReC98's current coding
styles and standards. This one turned out to be one of those rare features
whose TH05 implementation is significantly less insane than the
TH04 one. Both games lazily redraw only the tiles of the stage background
that were drawn over in the previous frame, and try their best to minimize
the amount of tiles to be redrawn in this way. For these popup numbers,
this involves calculating the on-screen width, based on the exact number
of digits in the point value. TH04 calculates this width every frame
during the rendering function, and even resorts to setting that field
through the digit iteration pointer via self-modifying code… yup. TH05, on
the other hand, simply calculates the width once when spawning a new popup
number, during the conversion of the point value to
binary-coded
decimal. The "×2" multiplier suffix being removed in TH05 certainly
also helped in simplifying that feature in this game.
And that's ⅓ of TH05 reverse-engineered! Next up, one more TH05 PI push,
in which the stage enemies hopefully finish all the big entity types.
Maybe it will also be accompanied by another RE push? In any case, that
will be the last piece of TH05 progress for quite some time. The next TH01
stretch will consist of 6 pushes at the very least, and I currently have
no idea of how much time I can spend on ReC98 a month from now…
Long time no see! And this is exactly why I've been procrastinating
bullets while there was still meaningful progress to be had in other parts
of TH04 and TH05: There was bound to be quite some complexity in this most
central piece of game logic, and so I couldn't possibly get to a
satisfying understanding in just one push.
Or in two, because their rendering involves another bunch of
micro-optimized functions adapted from master.lib.
Or in three, because we'd like to actually name all the bullet sprites,
since there are a number of sprite ID-related conditional branches. And
so, I was refining things I supposedly RE'd in the the commits from the
first push until the very end of the fourth.
When we talk about "bullets" in TH04 and TH05, we mean just two things:
the white 8×8 pellets, with a cap of 240 in TH04 and 180 in TH05, and any
16×16 sprites from MIKO16.BFT, with a cap of 200 in TH04 and
220 in TH05. These are by far the most common types of… err, "things the
player can collide with", and so ZUN provides a whole bunch of pre-made
motion, animation, and
n-way spread / ring / stack group options for those, which can be
selected by simply setting a few fields in the bullet template. All the
other "non-bullets" have to be fired and controlled individually.
Which is nothing new, since uth05win covered this part pretty accurately –
I don't think anyone could just make up these structure member
overloads. The interesting insights here all come from applying this
research to TH04, and figuring out its differences compared to TH05. The
most notable one there is in the default groups: TH05 allows you to add
a stack
to any single bullet, n-way spread or ring, but TH04 only lets you create
stacks separately from n-way spreads and rings, and thus gets by with
fewer fields in its bullet template structure. On the other hand, TH04 has
a separate "n-way spread with random angles, yet still aimed at the
player" group? Which seems to be unused, at least as far as
midbosses and bosses are concerned; can't say anything about stage enemies
yet.
In fact, TH05's larger bullet template structure illustrates that these
distinct group types actually are a rather redundant piece of
over-engineering. You can perfectly indicate any permutation of the basic
groups through just the stack bullet count (1 = no stack), spread bullet
count (1 = no spread), and spread delta angle (0 = ring instead of
spread). Add a 4-flag bitfield to cover the rest (aim to player, randomize
angle, randomize speed, force single bullet regardless of difficulty or
rank), and the result would be less redundant and even slightly
more capable.
Even those 4 pushes didn't quite finish all of the bullet-related types,
stopping just shy of the most trivial and consistent enum that defines
special movement. This also left us in a
📝 TH03-like situation, in which we're still
a bit away from actually converting all this research into actual RE%. Oh
well, at least this got us way past 50% in overall position independence.
On to the second half! 🎉
For the next push though, we'll first have a quick detour to the remaining
C code of all the ZUN.COM binaries. Now that the
📝 TH04 and TH05 resident structures no
longer block those, -Tom- has requested TH05's
RES_KSO.COM to be covered in one of his outstanding pushes.
And since 32th System
recently RE'd TH03's resident structure, it makes sense to also review and
merge that, before decompiling all three remaining RES_*.COM
binaries in hopefully a single push. It might even get done faster than
that, in which case I'll then review and merge some more of
WindowsTiger's
research.
Here we go, new C code! …eh, it will still take a bit to really get
decompilation going at the speeds I was hoping for. Especially with the
sheer amount of stuff that is set in the first few significant
functions we actually can decompile, which now all has to be
correctly declared in the C world. Turns out I spent the last 2 years
screwing up the case of exported functions, and even some of their names,
so that it didn't actually reflect their calling convention… yup. That's
just the stuff you tend to forget while it doesn't matter.
To make up for that, I decided to research whether we can make use of some
C++ features to improve code readability after all. Previously, it seemed
that TH01 was the only game that included any C++ code, whereas TH02 and
later seemed to be 100% C and ASM. However, during the development of the
soon to be released new build system, I noticed that even this old
compiler from the mid-90's, infamous for prioritizing compile speeds over
all but the most trivial optimizations, was capable of quite surprising
levels of automatic inlining with class methods…
…leading the research to culminate in the mindblow that is
9d121c7 – yes, we can use C++ class methods
and operator overloading to make the code more readable, while still
generating the same code than if we had just used C and preprocessor
macros.
Looks like there's now the potential for a few pull requests from outside
devs that apply C++ features to improve the legibility of previously
decompiled and terribly macro-ridden code. So, if anyone wants to help
without spending money…