TH05 has passed the 50% RE mark, with both MAIN.EXE and the
game as a whole! With that, we've also reached what -Tom-
wanted out of the project, so he's suspending his discount offer for a
bit.
Curve bullets are now officially called cheetos! 76.7% of
fans prefer this term, and it fits into the 8.3 DOS filename scheme much
better than homing lasers (as they're called in
OMAKE.TXT) or Taito
lasers (which would indeed have made sense as well).
…oh, and I managed to decompile Shinki within 2 pushes after all. That
left enough budget to also add the Stage 1 midboss on top.
So, Shinki! As far as final boss code is concerned, she's surprisingly
economical, with 📝 her background animations
making up more than ⅓ of her entire code. Going straight from TH01's
📝 final📝 bosses
to TH05's final boss definitely showed how much ZUN had streamlined
danmaku pattern code by the end of PC-98 Touhou. Don't get me wrong, there
is still room for improvement: TH05 not only
📝 reuses the same 16 bytes of generic boss state we saw in TH04 last month,
but also uses them 4× as often, and even for midbosses. Most importantly
though, defining danmaku patterns using a single global instance of the
group template structure is just bad no matter how you look at it:
The script code ends up rather bloated, with a single MOV
instruction for setting one of the fields taking up 5 bytes. By comparison,
the entire structure for regular bullets is 14 bytes large, while the
template structure for Shinki's 32×32 ball bullets could have easily been
reduced to 8 bytes.
Since it's also one piece of global state, you can easily forget to set
one of the required fields for a group type. The resulting danmaku group
then reuses these values from the last time they were set… which might have
been as far back as another boss fight from a previous stage.
And of course, I wouldn't point this out if it
didn't actually happen in Shinki's pattern code. Twice.
Declaring a separate structure instance with the static data for every
pattern would be both safer and more space-efficient, and there's
more than enough space left for that in the game's data segment.
But all in all, the pattern functions are short, sweet, and easy to follow.
The "devil"
patternis significantly more complex than the others, but still
far from TH01's final bosses at their worst. I especially like the clear
architectural separation between "one-shot pattern" functions that return
true once they're done, and "looping pattern" functions that
run as long as they're being called from a boss's main function. Not many
all too interesting things in these pattern functions for the most part,
except for two pieces of evidence that Shinki was coded after Yumeko:
The gather animation function in the first two phases contains a bullet
group configuration that looks like it's part of an unused danmaku
pattern. It quickly turns out to just be copy-pasted from a similar function
in Yumeko's fight though, where it is turned into actual
bullets.
As one of the two places where ZUN forgot to set a template field, the
lasers at the end of the white wing preparation pattern reuse the 6-pixel
width of Yumeko's final laser pattern. This actually has an effect on
gameplay: Since these lasers are active for the first 8 frames after
Shinki's wings appear on screen, the player can get hit by them in the last
2 frames after they grew to their final width.
Speaking about that wing sprite: If you look at ST05.BB2 (or
any other file with a large sprite, for that matter), you notice a rather
weird file layout:
And it's not a limitation of the sprite width field in the BFNT+ header
either. Instead, it's master.lib's BFNT functions which are limited to
sprite widths up to 64 pixels… or at least that's what
MASTER.MAN claims. Whatever the restriction was, it seems to be
completely nonexistent as of master.lib version 0.23, and none of the
master.lib functions used by the games have any issues with larger
sprites.
Since ZUN stuck to the supposed 64-pixel width limit though, it's now the
game that expects Shinki's winged form to consist of 4 physical
sprites, not just 1. Any conversion from another, more logical sprite sheet
layout back into BFNT+ must therefore replicate the original number of
sprites. Otherwise, the sequential IDs ("patnums") assigned to every newly
loaded sprite no longer match ZUN's hardcoded IDs, causing the game to
crash. This is exactly what used to happen with -Tom-'s
MysticTK automation scripts,
which combined these exact sprites into a single large one. This issue has
now been fixed – just in case there are some underground modders out there
who used these scripts and wonder why their game crashed as soon as the
Shinki fight started.
And then the code quality takes a nosedive with Shinki's main function.
Even in TH05, these boss and midboss update
functions are still very imperative:
The origin point of all bullet types used by a boss must be manually set
to the current boss/midboss position; there is no concept of a bullet type
tracking a certain entity.
The same is true for the target point of a player's homing shots…
… and updating the HP bar. At least the initial fill animation is
abstracted away rather decently.
Incrementing the phase frame variable also must be done manually. TH05
even "innovates" here by giving the boss update function exclusive ownership
of that variable, in contrast to TH04 where that ownership is given out to
the player shot collision detection (?!) and boss defeat helper
functions.
Speaking about collision detection: That is done by calling different
functions depending on whether the boss is supposed to be invincible or
not.
Timeout conditions? No standard way either, and all done with manual
if statements. In combination with the regular phase end
condition of lowering (mid)boss HP to a certain value, this leads to quite a
convoluted control flow.
The manual calls to the score bonus functions for cleared phases at least provide some sense of orientation.
One potentially nice aspect of all this imperative freedom is that
phases can end outside of HP boundaries… by manually incrementing the
phase variable and resetting the phase frame variable to 0.
The biggest WTF in there, however, goes to using one of the 16 state bytes
as a "relative phase" variable for differentiating between boss phases that
share the same branch within the switch(boss.phase)
statement. While it's commendable that ZUN tried to reduce code duplication
for once, he could have just branched depending on the actual
boss.phase variable? The same state byte is then reused in the
"devil" pattern to track the activity state of the big jerky lasers in the
second half of the pattern. If you somehow managed to end the phase after
the first few bullets of the pattern, but before these lasers are up,
Shinki's update function would think that you're still in the phase
before the "devil" pattern. The main function then sequence-breaks
right to the defeat phase, skipping the final pattern with the burning Makai
background. Luckily, the HP boundaries are far away enough to make this
impossible in practice.
The takeaway here: If you want to use the state bytes for your custom
boss script mods, alias them to your own 16-byte structure, and limit each
of the bytes to a clearly defined meaning across your entire boss script.
One final discovery that doesn't seem to be documented anywhere yet: Shinki
actually has a hidden bomb shield during her two purple-wing phases.
uth05win got this part slightly wrong though: It's not a complete
shield, and hitting Shinki will still deal 1 point of chip damage per
frame. For comparison, the first phase lasts for 3,000 HP, and the "devil"
pattern phase lasts for 5,800 HP.
And there we go, 3rd PC-98 Touhou boss
script* decompiled, 28 to go! 🎉 In case you were expecting a fix for
the Shinki death glitch: That one
is more appropriately fixed as part of the Mai & Yuki script. It also
requires new code, should ideally look a bit prettier than just removing
cheetos between one frame and the next, and I'd still like it to fit within
the original position-dependent code layout… Let's do that some other
time.
Not much to say about the Stage 1 midboss, or midbosses in general even,
except that their update functions have to imperatively handle even more
subsystems, due to the relative lack of helper functions.
The remaining ¾ of the third push went to a bunch of smaller RE and
finalization work that would have hardly got any attention otherwise, to
help secure that 50% RE mark. The nicest piece of code in there shows off
what looks like the optimal way of setting up the
📝 GRCG tile register for monochrome blitting
in a variable color:
mov ah, palette_index ; Any other non-AL 8-bit register works too.
; (x86 only supports AL as the source operand for OUTs.)
rept 4 ; For all 4 bitplanes…
shr ah, 1 ; Shift the next color bit into the x86 carry flag
sbb al, al ; Extend the carry flag to a full byte
; (CF=0 → 0x00, CF=1 → 0xFF)
out 7Eh, al ; Write AL to the GRCG tile register
endm
Thanks to Turbo C++'s inlining capabilities, the loop body even decompiles
into a surprisingly nice one-liner. What a beautiful micro-optimization, at
a place where micro-optimization doesn't hurt and is almost expected.
Unfortunately, the micro-optimizations went all downhill from there,
becoming increasingly dumb and undecompilable. Was it really necessary to
save 4 x86 instructions in the highly unlikely case of a new spark sprite
being spawned outside the playfield? That one 2D polar→Cartesian
conversion function then pointed out Turbo C++ 4.0J's woefully limited
support for 32-bit micro-optimizations. The code generation for 32-bit
📝 pseudo-registers is so bad that they almost
aren't worth using for arithmetic operations, and the inline assembler just
flat out doesn't support anything 32-bit. No use in decompiling a function
that you'd have to entirely spell out in machine code, especially if the
same function already exists in multiple other, more idiomatic C++
variations.
Rounding out the third push, we got the TH04/TH05 DEMO?.REC
replay file reading code, which should finally prove that nothing about the
game's original replay system could serve as even just the foundation for
community-usable replays. Just in case anyone was still thinking that.
Next up: Back to TH01, with the Elis fight! Got a bit of room left in the
cap again, and there are a lot of things that would make a lot of
sense now:
TH04 would really enjoy a large number of dedicated pushes to catch up
with TH05. This would greatly support the finalization of both games.
Continuing with TH05's bosses and midbosses has shown to be good value
for your money. Shinki would have taken even less than 2 pushes if she
hadn't been the first boss I looked at.
Oh, and I also added Seihou as a selectable goal, for the two people out
there who genuinely like it. If I ever want to quit my day job, I need to
branch out into safer territory that isn't threatened by takedowns, after
all.
Slight change of plans, because we got instructions for
reliably reproducing the TH04 Kurumi Divide Error crash! Major thanks to
Colin Douglas Howell. With those, it also made sense to immediately look at
the crash in the Stage 4 Marisa fight as well. This way, I could release
both of the obligatory bugfix mods at the same time.
Especially since it turned out that I was wrong: Both crashes are entirely
unrelated to the custom entity structure that would have required PI-centric
progress. They are completely specific to Kurumi's and Marisa's
danmaku-pattern code, and really are two separate bugs
with no connection to each other. All of the necessary research nicely fit
into Arandui's 0.5 pushes, with no further deep understanding
required here.
But why were there still three weeks between Colin's message and this blog
post? DMCA distractions aside: There are no easy fixes this time, unlike
📝 back when I looked at the Stage 5 Yuuka crash.
Just like how division by zero is undefined in mathematics, it's also,
literally, undefined what should happen instead of these two
Divide error crashes. This means that any possible "fix" can
only ever be a fanfiction interpretation of the intentions behind ZUN's
code. The gameplay community should be aware of this, and
might decide to handle these cases differently. And if we
have to go into fanfiction territory to work around crashes in the
canon games, we'd better document what exactly we're fixing here and how, as
comprehensible as possible.
With that out of the way, let's look at Kurumi's crash first, since it's way
easier to grasp. This one is known to primarily happen to new players, and
it's easy to see why:
In one of the patterns in her third phase, Kurumi fires a series of 3
aimed rings from both edges of the playfield. By default (that is, on Normal
and with regular rank), these are 6-way rings.
6 happens to be quite a peculiar number here, due to how rings are
(manually) tuned based on the current "rank" value (playperf)
before being fired. The code, abbreviated for clarity:
Let's look at the range of possible playperf values per
difficulty level:
Easy
Normal
Hard
Lunatic
Extra
playperf_min
4
11
20
22
16
playperf_max
16
24
32
34
20
Edit (2022-05-24): This blog post initially had
26 instead of 16 for playperf_min for the Extra Stage. Thanks
to Popfan for pointing out that typo!
Reducing rank to its minimum on Easy mode will therefore result in a
0-ring after tuning.
To calculate the individual angles of each bullet in a ring, ZUN divides
360° (or, more correctly,
📝 0x100) by the total number of
bullets…
Boom, division by zero.
So, what should the workaround look like? Obviously, we want to modify
neither the default number of ring bullets nor the tuning algorithm – that
would change all other non-crashing variations of this pattern on other
difficulties and ranks, creating a fork of the original gameplay. Instead, I
came up with four possible workarounds that all seemed somewhat logical to
me:
Firing no bullet, i.e., interpreting 0-ring literally. This would
create the only constellation in which a call to the bullet group spawn
functions would not spawn at least one new bullet.
Firing a "1-ring", i.e., a single bullet. This would be consistent with
how the bullet spawn functions behave for "0-way" stack and spread
groups.
Firing a "∞-ring", i.e., 200 bullets, which is as much as the game's cap
on 16×16 bullets would allow. This would poke fun at the whole "division by
zero" idea… but given that we're still talking about Easy Mode (and
especially new players) here, it might be a tad too cruel. Certainly the
most trollish interpretation.
Triggering an immediate Game Over, exchanging the hard crash for a
softer and more controlled shutdown. Certainly the option that would be
closest to the behavior of the original games, and perhaps the only one to
be accepted in Serious, High-Level Play™.
As I was writing this post, it felt increasingly wrong for me to make this
decision. So I once again went to Twitter, where 56.3%
voted in favor of the 1-bullet option. Good that I asked! I myself was
more leaning towards the 0-bullet interpretation, which only got 28.7% of
the vote. Also interesting are the 2.3% in favor of the Game Over option but
I get it, low-rank Easy Mode isn't exactly the most competitive mode of
playing TH04.
There are reports of Kurumi crashing on higher difficulties as well, but I
could verify none of them. If they aren't fixed by this workaround, they're
caused by an entirely different bug that we have yet to discover.
Onto the Stage 4 Marisa crash then, which does in fact apply to all
difficulty levels. I was also wrong on this one – it's a hell of a lot more
intricate than being just a division by the number of on-screen bits.
Without having decompiled the entire fight, I can't give a completely
accurate picture of what happens there yet, but here's the rough idea:
Marisa uses different patterns, depending on whether at least one of her
bits is still alive, or all of them have been destroyed.
Destroying the last bit will immediately switch to the bit-less
counterpart of the current pattern.
The bits won't respawn before the pattern ended, which ensures that the
bit-less version is always shown in its entirety after being started or
switched into.
In two of the bit-less patterns, Marisa gradually moves to the point
reflection of her position at the start of the pattern across the playfield
coordinate of (192, 112), or (224, 128) on screen.
The velocity of this movement is determined by both her distance to that
point and the total amount of frames that this instance of the bit-less
pattern will last.
Since this frame amount is directly tied to the frame the player
destroyed the last bit on, it becomes a user-controlled variable. I think
you can see where this is going…
The last 12 frames of this duration, however, are always reserved for a
"braking phase", where Marisa's velocity is halved on each frame.
This part of the code only runs every 4 frames though. This expands the
time window for this crash to 4 frames, rather than just the two frames you
would expect from looking at the division itself.
Both of the broken patterns run for a maximum of 160 frames. Therefore,
the crash will occur when Marisa's last bit is destroyed between frame 152
and 155 inclusive. On these frames, the
last_frame_with_bits_alive variable is set to 148, which is the
crucial 12 duration frames away from the maximum of 160.
Interestingly enough, the calculated velocity is also only
applied every 4 frames, with Marisa actually staying still for the 3 frames
inbetween. As a result, she either moves
too slowly to ever actually reach the yellow point if the last bit
was destroyed early in the pattern (see destruction frames 68 or
112),
or way too quickly, and almost in a jerky, teleporting way (see
destruction frames 144 or 148).
Finally, as you may have already gathered from the formula: Destroying
the last bit between frame 156 and 160 inclusive results in
duration values of 8 or 4. These actually push Marisa
away from the intended point, as the divisor becomes negative.
tl;dr: "Game crashes if last bit destroyed within 4-frame window near end of
two patterns". For an informed decision on a new movement behavior for these
last 8 frames, we definitely need to know all the details behind the crash
though. Here's what I would interpret into the code:
Not moving at all, i.e., interpreting 0 as the middle ground between
positive and negative movement. This would also make sense because a
12-frame duration implies 100% of the movement to consist of
the braking phase – and Marisa wasn't moving before, after all.
Move at maximum speed, i.e., dividing by 1 rather than 0. Since the
movement duration is still 12 in this case, Marisa will immediately start
braking. In total, she will move exactly ¾ of the way from her initial
position to (192, 112) within the 8 frames before the pattern
ends.
Directly warping to (192, 112) on frame 0, and to the
point-reflected target on 4, respectively. This "emulates" the division by
zero by moving Marisa at infinite speed to the exact two points indicated by
the velocity formula. It also fits nicely into the 8 frames we have to fill
here. Sure, Marisa can't reach these points at any other duration, but why
shouldn't she be able to, with infinite speed? Then again, if Marisa
is far away enough from (192, 112), this workaround would warp her
across the entire playfield. Can Marisa teleport according to lore? I
have no idea…
Triggering an immediate Game O– hell no, this is the Stage 4 boss,
people already hate losing runs to this bug!
Asking Twitter worked great for the Kurumi workaround, so let's do it again!
Gotta attach a screenshot of an earlier draft of this blog post though,
since this stuff is impossible to explain in tweets…
…and it went
through the roof, becoming the most successful ReC98 tweet so far?!
Apparently, y'all really like to just look at descriptions of overly complex
bugs that I'd consider way beyond the typical attention span that can be
expected from Twitter. Unfortunately, all those tweet impressions didn't
quite translate into poll turnout. The results
were pretty evenly split between 1) and 2), with option 1) just coming out
slightly ahead at 49.1%, compared to 41.5% of option 2).
(And yes, I only noticed after creating the poll that warping to both the
green and yellow points made more sense than warping to just one of the two.
Let's hope that this additional variant wouldn't have shifted the results
too much. Both warp options only got 9.4% of the vote after all, and no one
else came up with the idea either. In the end,
you can always merge together your preferred combination of workarounds from
the Git branches linked below.)
So here you go: The new definitive version of TH04, containing not only the
community-chosen Kurumi and Stage 4 Marisa workaround variant, but also the
📝 No-EMS bugfix from last year.
Edit (2022-05-31): This package is outdated, 📝 the current version is here!2022-04-18-community-choice-fixes.zip
Oh, and let's also add spaztron64's TH03 GDC clock fix
from 2019 because why not. This binary was built from the community_choice_fixes
branch, and you can find the code for all the individual workarounds on
these branches:
Again, because it can't be stated often enough: These fixes are
fanfiction. The gameplay community should be aware of
this, and might decide to handle these cases differently.
With all of that taking way more time to evaluate and document, this
research really had to become part of a proper push, instead of just being
covered in the quick non-push blog post I initially intended. With ½ of a
push left at the end, TH05's Stage 1-5 boss background rendering functions
fit in perfectly there. If you wonder how these static backdrop images even
need any boss-specific code to begin with, you're right – it's basically the
same function copy-pasted 4 times, differing only in the backdrop image
coordinates and some other inconsequential details.
Only Sara receives a nice variation of the typical
📝 blocky entrance animation: The usually
opaque bitmap data from ST00.BB is instead used as a transition
mask from stage tiles to the backdrop image, by making clever use of the
tile invalidation system:
TH04 uses the same effect a bit more frequently, for its first three bosses.
Next up: Shinki, for real this time! I've already managed to decompile 10 of
her 11 danmaku patterns within a little more than one push – and yes,
that one is included in there. Looks like I've slightly
overestimated the amount of work required for TH04's and TH05's bosses…
Two years after
📝 the first look at TH04's and TH05's bullets,
we finally get to finish their logic code by looking at the special motion
types. Bullets as a whole still aren't completely finished as the
rendering code is still waiting to be RE'd, but now we've got everything
about them that's required for decompiling the midboss and boss fights of
these games.
Just like the motion types of TH01's pellets, the ones we've got here really
are special enough to warrant an enum, despite all the
overlap in the "slow down and turn" and "bounce at certain edges of the
playfield" types. Sure, including them in the bitfield I proposed two years
ago would have allowed greater variety, but it wouldn't have saved any
memory. On the contrary: These types use a single global state variable for
the maximum turn count and delta speed, which a proper customizable
architecture would have to integrate into the bullet structure. Maybe it is
possible to stuff everything into the same amount of bytes, but not without
first completely rearchitecting the bullet structure and removing every
single piece of redundancy in there. Simply extending the system by adding a
new enum value for a new motion type would be way more
straightforward for modders.
Speaking about memory, TH05 already extends the bullet structure by 6 bytes
for the "exact linear movement" type exclusive to that game. This type is
particularly interesting for all the prospective PC-98 game developers out
there, as it nicely points out the precision limits of Q12.4 subpixels.
Regular bullet movement works by adding a Q12.4 velocity to a Q12.4 position
every frame, with the velocity typically being calculated only once on spawn
time from an 8-bit angle and a Q12.4 speed. Quantization errors from this
initial calculation can quickly compound over all the frames a bullet spends
moving across the playfield. If a bullet is only supposed to move on a
straight line though, there is a more precise way of calculating its
position: By storing the origin point, movement angle, and total distance
traveled, you can perform a full polar→Cartesian transformation every frame.
Out of the 10 danmaku patterns in TH05 that use this motion type, the
difference to regular bullet movement can be best seen in Louise's final
pattern:
Not far away from the regular bullet code, we've also got the movement
function for the infamous curve / "cheeto" bullets. I would have almost
called them "cheetos" in the code as well, which surely fits more nicely
into 8.3 filenames than "curve bullets" does, but eh, trademarks…
As for hitboxes, we got a 16×16 one on the head node, and a 12×12 one on the
16 trail nodes. The latter simply store the position of the head node during
the last 16 frames, Snake style. But what you're all here for is probably
the turning and homing algorithm, right? Boiled down to its essence, it
works like this:
// [head] points to the controlled "head" part of a curve bullet entity.
// Angles are stored with 8 bits representing a full circle, providing free
// normalization on arithmetic overflow.
// The directions are ordered as you would expect:
// • 0x00: right (sin(0x00) = 0, cos(0x00) = +1)
// • 0x40: down (sin(0x40) = +1, cos(0x40) = 0)
// • 0x80: left (sin(0x80) = 0, cos(0x80) = -1)
// • 0xC0: up (sin(0xC0) = -1, cos(0xC0) = 0)
uint8_t angle_delta = (head->angle - player_angle_from(
head->pos.cur.x, head->pos.cur.y
));
// Stop turning if the player is 1/128ths of a circle away from this bullet
const uint8_t SNAP = 0x02;
// Else, turn either clockwise or counterclockwise by 1/256th of a circle,
// depending on what would reach the player the fastest.
if((angle_delta > SNAP) && (angle_delta < static_cast<uint8_t>(-SNAP))) {
angle_delta = (angle_delta >= 0x80) ? -0x01 : +0x01;
}
head_p->angle -= angle_delta;
5 lines of code, and not all too difficult to follow once you are familiar
with 8-bit angles… unlike what ZUN actually wrote. Which is 26 lines,
and includes an unused "friction" variable that is never set to any value
that makes a difference in the formula. uth05win
correctly saw through that all and simplified this code to something
equivalent to my explanation. Redoing that work certainly wasted a bit of my
time, and means that I now definitely need to spend another push on RE'ing
all the shared boss functions before I can start with Shinki.
So while a curve bullet's speed does get faster over time, its
angular velocity is always limited to 1/256th of a
circle per frame. This reveals the optimal strategy for dodging them:
Maximize this delta angle by staying as close to 180° away from their
current direction as possible, and let their acceleration do the rest.
At least that's the theory for dodging a single one. As a danmaku
designer, you can now of course place other bullets at these technically
optimal places to prevent a curve bullet pattern from being cheesed like
that. I certainly didn't record the video above in a single take either…
After another bunch of boring entity spawn and update functions, the
playfield shaking feature turned out as the most notable (and tricky) one to
round out these two pushes. It's actually implemented quite well in how it
simply "un-shakes" the screen by just marking every stage tile to be
redrawn. In the context of all the other tile invalidation that can take
place during a frame, that's definitely more performant than
📝 doing another EGC-accelerated memmove().
Due to these two games being double-buffered via page flipping, this
invalidation only really needs to happen for the frame after the next
one though. The immediately next frame will show the regular, un-shaken
playfield on the other VRAM page first, except during the multi-frame
shake animation when defeating a midboss, where it will also appear shifted
in a different direction… 😵 Yeah, no wonder why ZUN just always invalidates
all stage tiles for the next two frames after every shaking animation, which
is guaranteed to handle both sporadic single-frame shakes and continuous
ones. So close to good-code here.
Finally, this delivery was delayed a bit because -Tom-
requested his round-up amount to be limited to the cap in the future. Since
that makes it kind of hard to explain on a static page how much money he
will exactly provide, I now properly modeled these discounts in the website
code. The exact round-up amount is now included in both the pre-purchase
breakdown, as well as the cap bar on the main page.
With that in place, the system is now also set up for round-up offers from
other patrons. If you'd also like to support certain goals in this way, with
any amount of money, now's the time for getting in touch with me about that.
Known contributors only, though! 😛
Next up: The final bunch of shared boring boss functions. Which certainly
will give me a break from all the maintenance and research work, and speed
up delivery progress again… right?