OK, let's decompile TH02's HUD code first, gain a solid understanding of how
increasing the score works, and then look at the item system of this game.
Should be no big deal, no surprises expected, let's go!
…Yeah, right, that's never how things end up in ReC98 land.
And so, we get the usual host of newly discovered
oddities in addition to the expected insights into the item mechanics. Let's
start with the latter:
Some regular stage enemies appear to randomly drop either or items. In reality, there is
very little randomness at play here: These items are picked from a
hardcoded, repeating ring of 10 items
(𝄆 𝄇), and the only source of
randomness is the initial position within this ring, which changes at
the beginning of every stage. ZUN further increased the illusion of
randomness by only dropping such a semi-random item for every
4th defeated enemy that is coded to drop one, and also having
enemies that drop fixed, non-random items. I'd say it's a decent way of
ensuring both randomness and balance.
There's a 1/512 chance for such a semi-random
item drop to turn into a item instead –
which translates to 1/2048 enemies due to the
fixed drop rate.
Edit (2023-06-11): These are the only ways that items can randomly drop in this game. All other drops, including
any items, are scripted and deterministic.
After using a continue (both after a Game Over, or after manually
choosing to do so through the Pause menu for whatever reason), the
next
(Stage number + 1) semi-random item
drops are turned into items instead.
Items can contribute up to 25 points to the skill value and subsequent
rating (あなたの腕前) on the final verdict
screen. Doing well at item collection first increases a separate
collect_skill value:
Item
Collection condition
collect_skill change
below max power
+1
at or above max power
+2
value == 51,200
+8
value ≥20,000 and <51,200
+4
value ≥10,000 and <20,000
+2
value <10,000
+1
with 5 bombs in stock
+16
Note, again, the lack of anything involving
items. At the maximum of 5 lives, the item spawn function transforms
them into bomb items anyway. It is possible though to gain
the 5th life by reaching one of the extend scores while a
item is still on screen; in that case,
collecting the 1-up has no effect at all.
Every 32 collect_skill points will then raise the
item_skill by 1, whereas every 16 dropped items will lower
it by 1. Before launching into the ending sequence,
item_skill is clamped to the [0; 25] range and
added to the other skill-relevant metrics we're going to look at in
future pushes.
When losing a life, the game will drop a single
and 4 randomly picked or items in a random order
around Reimu's position. Contrary to an
unsourced Touhou Wiki edit from 2009, each of the 4 does have an
equal and independent chance of being either a
or item.
Finally, and perhaps most
interestingly, item values! These are
determined by the top Y coordinate of an item during the frame it is
collected on. The maximum value of 51,200 points applies to the top 48
pixels of the playfield, and drops off as soon as an item falls below
that line. For the rest of the playfield, point items then use a formula
of (28,000 - (top Y coordinate of item in
screen space × 70)):
Onto score tracking then, which only took a single commit to raise another
big research question. It's widely known that TH02 grants extra lives upon
reaching a score of 1, 2, 3, 5, or 8 million points. But what hasn't been
documented is the fact that the game does not stop at the end of the
hardcoded extend score array. ZUN merely ends it with a sentinel value of
999,999,990 points, but if the score ever increased beyond this value, the
game will interpret adjacent memory as signed 32-bit score values and
continue giving out extra lives based on whatever thresholds it ends up
finding there. Since the following bytes happen to turn into a negative
number, the next extra life would be awarded right after gaining another 10
points at exactly 1,000,000,000 points, and the threshold after that would
be 11,114,905,600 points. Without an explicit counterstop, the number of
score-based extra lives is theoretically unlimited, and would even continue
after the signed 32-bit value overflowed into the negative range. Although
we certainly have bigger problems once scores ever reach that point…
That said, it seems impossible that any of this could ever happen
legitimately. The current high scores of 42,942,800 points on
Lunatic and 42,603,800 points on
Extra don't even reach 1/20 of ZUN's sentinel
value. Without either a graze or a bullet cancel system, the scoring
potential in this game is fairly limited, making it unlikely for high scores
to ever increase by that additional order of magnitude to end up anywhere
near the 1 billion mark.
But can we really be sure? Is this a landmine because it's impossible
to ever reach such high scores, or is it a quirk because these extends
could be observed under rare conditions, perhaps as the result of
other quirks? And if it's the latter, how many of these adjacent bytes do we
need to preserve in cleaned-up versions and ports? We'd pretty much need to
know the upper bound of high scores within the original stage and boss
scripts to tell. This value should be rather easy to calculate in a
game with such a simple scoring system, but doing that only makes sense
after we RE'd all scoring-related code and could efficiently run such
simulations. It's definitely something we'd need to look at before working
on this game's debloated version in the far future, which is
when the difference between quirks and landmines will become relevant.
Still, all that uncertainty just because ZUN didn't restrict a loop to the
size of the extend threshold array…
TH02 marks a pivotal point in how the PC-98 Touhou games handle the current
score. It's the last game to use a 32-bit variable before the later games
would regrettably start using arrays of binary-coded
decimals. More importantly though, TH02 is also the first game to
introduce the delayed score counting animation, where the displayed score
intentionally lags behind and gradually counts towards the real one over
multiple frames. This could be implemented in one of two ways:
Keep the displayed score as a separate variable inside the presentation
layer, and let it gradually count up to the real score value passed in from
the logic layer
Burden the game logic with this presentation detail, and split the score
into two variables: One for the displayed score, and another for the
delta between that score and the actual one. Newly gained points are
first added to the delta variable, and then gradually subtracted from there
and added to the real score before being displayed.
And by now, we can all tell which option ZUN picked for the rest of the
PC-98 games, even if you don't remember
📝 me mentioning this system last year.
📝 Once again, TH02 immortalized ZUN's initial
attempt at the concept, which lacks the abstraction boundaries you'd want
for managing this one piece of state across two variables, and messes up the
abstractions it does have. In addition to the regular score
transfer/render function, the codebase therefore has
a function that transfers the current delta to the score immediately,
but does not re-render the HUD, and
a function that adds the delta to the score and re-renders the HUD, but
does not reset the delta.
And – you guessed it – I wouldn't have mentioned any of this if it didn't
result in one bug and one quirk in TH02. The bug resulting from 1) is pretty
minor: The function is called when losing a life, and simply stops any
active score-counting animation at the value rendered on the frame where the
player got hit. This one is only a rendering issue – no points are lost, and
you just need to gain 10 more for the rendered value to jump back up to its
actual value. You'll probably never notice this one because you're likely
busy collecting the single spawned around Reimu
when losing a life, which always awards at least 10 points.
The quirk resulting from 2) is more intriguing though. Without a separate
reset of the score delta, the function effectively awards the current delta
value as a one-time point bonus, since the same delta will still be
regularly transferred to the score on further game frames.
This function is called at the start of every dialog sequence. However, TH02
stops running the regular game loop between the post-boss dialog and the
next stage where the delta is reset, so we can only observe this quirk for
the pre-boss sequences and the dialog before Mima's form change.
Unfortunately, it's not all too exploitable in either case: Each of the
pre-boss dialog sequences is preceded by an ungrazeable pellet pattern and
followed by multiple seconds of flying over an empty playfield with zero
scoring opportunities. By the time the sequence starts, the game will have
long transferred any big score delta from max-valued point items. It's
slightly better with Mima since you can at least shoot her and use a bomb to
keep the delta at a nonzero value, but without a health bar, there is little
indication of when the dialog starts, and it'd be long after Mima
gave out her last bonus items in any case.
But two of the bosses – that is, Rika, and the Five Magic Stones – are
scrolled onto the playfield as part of the stage script, and can also be hit
with player shots and bombs for a few seconds before their dialog starts.
While I'll only get to cover shot types and bomb damage within the next few
TH02 pushes, there is an obvious initial strategy for maximizing the effect
of this quirk: Spreading out the A-Type / Wide / High Mobility shot to land
as many hits as possible on all Five Magic Stones, while firing off a bomb.
Wow, a grand total of 1,750 extra points! Totally worth wasting a bomb for…
yeah, probably not. But at the very least, it's
something that a TAS score run would want to keep in mind. And all that just
because ZUN "forgot" a single score_delta = 0; assignment at
the end of one function…
And that brings TH02 over the 30% RE mark! Next up: 100% position
independence for TH04. If anyone wants to grab the
that have now been freed up in the cap: Any small Touhou-related task would
be perfect to round out that upcoming TH04 PI delivery.
TH05 has passed the 50% RE mark, with both MAIN.EXE and the
game as a whole! With that, we've also reached what -Tom-
wanted out of the project, so he's suspending his discount offer for a
bit.
Curve bullets are now officially called cheetos! 76.7% of
fans prefer this term, and it fits into the 8.3 DOS filename scheme much
better than homing lasers (as they're called in
OMAKE.TXT) or Taito
lasers (which would indeed have made sense as well).
…oh, and I managed to decompile Shinki within 2 pushes after all. That
left enough budget to also add the Stage 1 midboss on top.
So, Shinki! As far as final boss code is concerned, she's surprisingly
economical, with 📝 her background animations
making up more than ⅓ of her entire code. Going straight from TH01's
📝 final📝 bosses
to TH05's final boss definitely showed how much ZUN had streamlined
danmaku pattern code by the end of PC-98 Touhou. Don't get me wrong, there
is still room for improvement: TH05 not only
📝 reuses the same 16 bytes of generic boss state we saw in TH04 last month,
but also uses them 4× as often, and even for midbosses. Most importantly
though, defining danmaku patterns using a single global instance of the
group template structure is just bad no matter how you look at it:
The script code ends up rather bloated, with a single MOV
instruction for setting one of the fields taking up 5 bytes. By comparison,
the entire structure for regular bullets is 14 bytes large, while the
template structure for Shinki's 32×32 ball bullets could have easily been
reduced to 8 bytes.
Since it's also one piece of global state, you can easily forget to set
one of the required fields for a group type. The resulting danmaku group
then reuses these values from the last time they were set… which might have
been as far back as another boss fight from a previous stage.
And of course, I wouldn't point this out if it
didn't actually happen in Shinki's pattern code. Twice.
Declaring a separate structure instance with the static data for every
pattern would be both safer and more space-efficient, and there's
more than enough space left for that in the game's data segment.
But all in all, the pattern functions are short, sweet, and easy to follow.
The "devil"
patternis significantly more complex than the others, but still
far from TH01's final bosses at their worst. I especially like the clear
architectural separation between "one-shot pattern" functions that return
true once they're done, and "looping pattern" functions that
run as long as they're being called from a boss's main function. Not many
all too interesting things in these pattern functions for the most part,
except for two pieces of evidence that Shinki was coded after Yumeko:
The gather animation function in the first two phases contains a bullet
group configuration that looks like it's part of an unused danmaku
pattern. It quickly turns out to just be copy-pasted from a similar function
in Yumeko's fight though, where it is turned into actual
bullets.
As one of the two places where ZUN forgot to set a template field, the
lasers at the end of the white wing preparation pattern reuse the 6-pixel
width of Yumeko's final laser pattern. This actually has an effect on
gameplay: Since these lasers are active for the first 8 frames after
Shinki's wings appear on screen, the player can get hit by them in the last
2 frames after they grew to their final width.
Speaking about that wing sprite: If you look at ST05.BB2 (or
any other file with a large sprite, for that matter), you notice a rather
weird file layout:
And it's not a limitation of the sprite width field in the BFNT+ header
either. Instead, it's master.lib's BFNT functions which are limited to
sprite widths up to 64 pixels… or at least that's what
MASTER.MAN claims. Whatever the restriction was, it seems to be
completely nonexistent as of master.lib version 0.23, and none of the
master.lib functions used by the games have any issues with larger
sprites.
Since ZUN stuck to the supposed 64-pixel width limit though, it's now the
game that expects Shinki's winged form to consist of 4 physical
sprites, not just 1. Any conversion from another, more logical sprite sheet
layout back into BFNT+ must therefore replicate the original number of
sprites. Otherwise, the sequential IDs ("patnums") assigned to every newly
loaded sprite no longer match ZUN's hardcoded IDs, causing the game to
crash. This is exactly what used to happen with -Tom-'s
MysticTK automation scripts,
which combined these exact sprites into a single large one. This issue has
now been fixed – just in case there are some underground modders out there
who used these scripts and wonder why their game crashed as soon as the
Shinki fight started.
And then the code quality takes a nosedive with Shinki's main function.
Even in TH05, these boss and midboss update
functions are still very imperative:
The origin point of all bullet types used by a boss must be manually set
to the current boss/midboss position; there is no concept of a bullet type
tracking a certain entity.
The same is true for the target point of a player's homing shots…
… and updating the HP bar. At least the initial fill animation is
abstracted away rather decently.
Incrementing the phase frame variable also must be done manually. TH05
even "innovates" here by giving the boss update function exclusive ownership
of that variable, in contrast to TH04 where that ownership is given out to
the player shot collision detection (?!) and boss defeat helper
functions.
Speaking about collision detection: That is done by calling different
functions depending on whether the boss is supposed to be invincible or
not.
Timeout conditions? No standard way either, and all done with manual
if statements. In combination with the regular phase end
condition of lowering (mid)boss HP to a certain value, this leads to quite a
convoluted control flow.
The manual calls to the score bonus functions for cleared phases at least provide some sense of orientation.
One potentially nice aspect of all this imperative freedom is that
phases can end outside of HP boundaries… by manually incrementing the
phase variable and resetting the phase frame variable to 0.
The biggest WTF in there, however, goes to using one of the 16 state bytes
as a "relative phase" variable for differentiating between boss phases that
share the same branch within the switch(boss.phase)
statement. While it's commendable that ZUN tried to reduce code duplication
for once, he could have just branched depending on the actual
boss.phase variable? The same state byte is then reused in the
"devil" pattern to track the activity state of the big jerky lasers in the
second half of the pattern. If you somehow managed to end the phase after
the first few bullets of the pattern, but before these lasers are up,
Shinki's update function would think that you're still in the phase
before the "devil" pattern. The main function then sequence-breaks
right to the defeat phase, skipping the final pattern with the burning Makai
background. Luckily, the HP boundaries are far away enough to make this
impossible in practice.
The takeaway here: If you want to use the state bytes for your custom
boss script mods, alias them to your own 16-byte structure, and limit each
of the bytes to a clearly defined meaning across your entire boss script.
One final discovery that doesn't seem to be documented anywhere yet: Shinki
actually has a hidden bomb shield during her two purple-wing phases.
uth05win got this part slightly wrong though: It's not a complete
shield, and hitting Shinki will still deal 1 point of chip damage per
frame. For comparison, the first phase lasts for 3,000 HP, and the "devil"
pattern phase lasts for 5,800 HP.
And there we go, 3rd PC-98 Touhou boss
script* decompiled, 28 to go! 🎉 In case you were expecting a fix for
the Shinki death glitch: That one
is more appropriately fixed as part of the Mai & Yuki script. It also
requires new code, should ideally look a bit prettier than just removing
cheetos between one frame and the next, and I'd still like it to fit within
the original position-dependent code layout… Let's do that some other
time.
Not much to say about the Stage 1 midboss, or midbosses in general even,
except that their update functions have to imperatively handle even more
subsystems, due to the relative lack of helper functions.
The remaining ¾ of the third push went to a bunch of smaller RE and
finalization work that would have hardly got any attention otherwise, to
help secure that 50% RE mark. The nicest piece of code in there shows off
what looks like the optimal way of setting up the
📝 GRCG tile register for monochrome blitting
in a variable color:
mov ah, palette_index ; Any other non-AL 8-bit register works too.
; (x86 only supports AL as the source operand for OUTs.)
rept 4 ; For all 4 bitplanes…
shr ah, 1 ; Shift the next color bit into the x86 carry flag
sbb al, al ; Extend the carry flag to a full byte
; (CF=0 → 0x00, CF=1 → 0xFF)
out 7Eh, al ; Write AL to the GRCG tile register
endm
Thanks to Turbo C++'s inlining capabilities, the loop body even decompiles
into a surprisingly nice one-liner. What a beautiful micro-optimization, at
a place where micro-optimization doesn't hurt and is almost expected.
Unfortunately, the micro-optimizations went all downhill from there,
becoming increasingly dumb and undecompilable. Was it really necessary to
save 4 x86 instructions in the highly unlikely case of a new spark sprite
being spawned outside the playfield? That one 2D polar→Cartesian
conversion function then pointed out Turbo C++ 4.0J's woefully limited
support for 32-bit micro-optimizations. The code generation for 32-bit
📝 pseudo-registers is so bad that they almost
aren't worth using for arithmetic operations, and the inline assembler just
flat out doesn't support anything 32-bit. No use in decompiling a function
that you'd have to entirely spell out in machine code, especially if the
same function already exists in multiple other, more idiomatic C++
variations.
Rounding out the third push, we got the TH04/TH05 DEMO?.REC
replay file reading code, which should finally prove that nothing about the
game's original replay system could serve as even just the foundation for
community-usable replays. Just in case anyone was still thinking that.
Next up: Back to TH01, with the Elis fight! Got a bit of room left in the
cap again, and there are a lot of things that would make a lot of
sense now:
TH04 would really enjoy a large number of dedicated pushes to catch up
with TH05. This would greatly support the finalization of both games.
Continuing with TH05's bosses and midbosses has shown to be good value
for your money. Shinki would have taken even less than 2 pushes if she
hadn't been the first boss I looked at.
Oh, and I also added Seihou as a selectable goal, for the two people out
there who genuinely like it. If I ever want to quit my day job, I need to
branch out into safer territory that isn't threatened by takedowns, after
all.
Did you know that moving on top of a boss sprite doesn't kill the player in
TH04, only in TH05?
That's the first of only three interesting discoveries in these 3 pushes,
all of which concern TH04. But yeah, 3 for something as seemingly simple as
these shared boss functions… that's still not quite the speed-up I had hoped
for. While most of this can be blamed, again, on TH04 and all of its
hardcoded complexities, there still was a lot of work to be done on the
maintenance front as well. These functions reference a bunch of code I RE'd
years ago and that still had to be brought up to current standards, with the
dependencies reaching from 📝 boss explosions
over 📝 text RAM overlay functionality up to
in-game dialog loading.
The latter provides a good opportunity to talk a bit about x86 memory
segmentation. Many aspiring PC-98 developers these days are very scared
of it, with some even going as far as to rather mess with Protected Mode and
DOS extenders just so that they don't have to deal with it. I wonder where
that fear comes from… Could it be because every modern programming language
I know of assumes memory to be flat, and lacks any standard language-level
features to even express something like segments and offsets? That's why
compilers have a hard time targeting 16-bit x86 these days: Doing anything
interesting on the architecture requires giving the programmer full
control over segmentation, which always comes down to adding the
typical non-standard language extensions of compilers from back in the day.
And as soon as DOS stopped being used, these extensions no longer made sense
and were subsequently removed from newer tools. A good example for this can
be found in an old version of the
NASM manual: The project started as an attempt to make x86 assemblers
simple again by throwing out most of the segmentation features from
MASM-style assemblers, which made complete sense in 1996 when 16-bit DOS and
Windows were already on their way out. But there was a point to all
those features, and that's why ReC98 still has to use the supposedly
inferior TASM.
Not that this fear of segmentation is completely unfounded: All the
segmentation-related keywords, directives, and #pragmas
provided by Borland C++ and TASM absolutely can be the cause of many
weird runtime bugs. Even if the compiler or linker catches them, you are
often left with confusing error messages that aged just as poorly as memory
segmentation itself.
However, embracing the concept does provide quite the opportunity for
optimizations. While it definitely was a very crazy idea, there is a small
bit of brilliance to be gained from making proper use of all these
segmentation features. Case in point: The buffer for the in-game dialog
scripts in TH04 and TH05.
// Thanks to the semantics of `far` pointers, we only need a single 32-bit
// pointer variable for the following code.
extern unsigned char far *dialog_p;
// This master.lib function returns a `void __seg *`, which is a 16-bit
// segment-only pointer. Converting to a `far *` yields a full segment:offset
// pointer to offset 0000h of that segment.
dialog_p = (unsigned char far *)hmem_allocbyte(/* … */);
// Running the dialog script involves pointer arithmetic. On a far pointer,
// this only affects the 16-bit offset part, complete with overflow at 64 KiB,
// from FFFFh back to 0000h.
dialog_p += /* … */;
dialog_p += /* … */;
dialog_p += /* … */;
// Since the segment part of the pointer is still identical to the one we
// allocated above, we can later correctly free the buffer by pulling the
// segment back out of the pointer.
hmem_free((void __seg *)dialog_p);
If dialog_p was a huge pointer, any pointer
arithmetic would have also adjusted the segment part, requiring a second
pointer to store the base address for the hmem_free call. Doing
that will also be necessary for any port to a flat memory model. Depending
on how you look at it, this compression of two logical pointers into a
single variable is either quite nice, or really, really dumb in its
reliance on the precise memory model of one single architecture.
Why look at dialog loading though, wasn't this supposed to be all about
shared boss functions? Well, TH04 unnecessarily puts certain stage-specific
code into the boss defeat function, such as loading the alternate Stage 5
Yuuka defeat dialog before a Bad Ending, or initializing Gengetsu after
Mugetsu's defeat in the Extra Stage.
That's TH04's second core function with an explicit conditional branch for
Gengetsu, after the
📝 dialog exit code we found last year during EMS research.
And I've heard people say that Shinki was the most hardcoded fight in PC-98
Touhou… Really, Shinki is a perfectly regular boss, who makes proper use of
all internal mechanics in the way they were intended, and doesn't blast
holes into the architecture of the game. Even within TH05, it's Mai and Yuki
who rely on hacks and duplicated code, not Shinki.
The worst part about this though? How the function distinguishes Mugetsu
from Gengetsu. Once again, it uses its own global variable to track whether
it is called the first or the second time within TH04's Extra Stage,
unrelated to the same variable used in the dialog exit function. But this
time, it's not just any newly created, single-use variable, oh no. In a
misguided attempt to micro-optimize away a few bytes of conventional memory,
TH04 reserves 16 bytes of "generic boss state", which can (and are) freely
used for anything a boss doesn't want to store in a more dedicated
variable.
It might have been worth it if the bosses actually used most of these
16 bytes, but the majority just use (the same) two, with only Stage 4 Reimu
using a whopping seven different ones. To reverse-engineer the various uses
of these variables, I pretty much had to map out which of the undecompiled
danmaku-pattern functions corresponds to which boss
fight. In the end, I assigned 29 different variable names for each of the
semantically different use cases, which made up another full push on its
own.
Now, 16 bytes of wildly shared state, isn't that the perfect recipe for
bugs? At least during this cursory look, I haven't found any obvious ones
yet. If they do exist, it's more likely that they involve reused state from
earlier bosses – just how the Shinki death glitch in
TH05 is caused by reusing cheeto data from way back in Stage 4 – and
hence require much more boss-specific progress.
And yes, it might have been way too early to look into all these tiny
details of specific boss scripts… but then, this happened:
Looks similar to another
screenshot of a crash in the same fight that was reported in December,
doesn't it? I was too much in a hurry to figure it out exactly, but notice
how both crashes happen right as the last of Marisa's four bits is destroyed.
KirbyComment has suspected
this to be the cause for a while, and now I can pretty much confirm it
to be an unguarded division by the number of on-screen bits in
Marisa-specific pattern code. But what's the cause for Kurumi then?
As for fixing it, I can go for either a fast or a slow option:
Superficially fixing only this crash will probably just take a fraction
of a push.
But I could also go for a deeper understanding by looking at TH04's
version of the 📝 custom entity structure. It
not only stores the data of Marisa's bits, but is also very likely to be
involved in Kurumi's crash, and would get TH04 a lot closer to 100%
PI. Taking that look will probably need at least 2 pushes, and might require
another 3-4 to completely decompile Marisa's fight, and 2-3 to decompile
Kurumi's.
OK, now that that's out of the way, time to finish the boss defeat function…
but not without stumbling over the third of TH04's quirks, relating to the
Clear Bonus for the main game or the Extra Stage:
To achieve the incremental addition effect for the in-game score display
in the HUD, all new points are first added to a score_delta
variable, which is then added to the actual score at a maximum rate of
61,110 points per frame.
There are a fixed 416 frames between showing the score tally and
launching into MAINE.EXE.
As a result, TH04's Clear Bonus is effectively limited to
(416 × 61,110) = 25,421,760 points.
Only TH05 makes sure to commit the entirety of the
score_delta to the actual score before switching binaries,
which fixes this issue.
And after another few collision-related functions, we're now truly,
finally ready to decompile bosses in both TH04 and TH05! Just as the
anything funds were running out… The
remaining ¼ of the third push then went to Shinki's 32×32 ball bullets,
rounding out this delivery with a small self-contained piece of the first
TH05 boss we're probably going to look at.
Next up, though: I'm not sure, actually. Both Shinki and Elis seem just a
little bit larger than the 2¼ or 4 pushes purchased so far, respectively.
Now that there's a bunch of room left in the cap again, I'll just let the
next contribution decide – with a preference for Shinki in case of a tie.
And if it will take longer than usual for the store to sell out again this
time (heh), there's still the
📝 PC-98 text RAM JIS trail word rendering research
waiting to be documented.
Only one newly ordered push since I've reopened the store? Great, that's
all the justification I needed for the extended maintenance delay that was
part of these two pushes 😛
Having to write comments to explain whether coordinates are relative to
the top-left corner of the screen or the top-left corner of the playfield
has finally become old. So, I introduced
distinct
types for all the coordinate systems we typically encounter, applying
them to all code decompiled so far. Note how the planar nature of PC-98
VRAM meant that X and Y coordinates also had to be different from each
other. On the X side, there's mainly the distinction between the
[0; 640] screen space and the corresponding [0; 80] VRAM byte
space. On the Y side, we also have the [0; 400] screen space, but
the visible area of VRAM might be limited to [0; 200] when running in
the PC-98's line-doubled 640×200 mode. A VRAM Y coordinate also always
implies an added offset for vertical scrolling.
During all of the code reconstruction, these types can only have a
documenting purpose. Turning them into anything more than just
typedefs to int, in order to define conversion
operators between them, simply won't recompile into identical binaries.
Modding and porting projects, however, now have a nice foundation for
doing just that, and can entirely lift coordinate system transformations
into the type system, without having to proofread all the meaningless
int declarations themselves.
So, what was left in terms of memory references? EX-Alice's fire waves
were our final unknown entity that can collide with the player. Decently
implemented, with little to say about them.
That left the bomb animation structures as the one big remaining PI
blocker. They started out nice and simple in TH04, with a small 6-byte
star animation structure used for both Reimu and Marisa. TH05, however,
gave each character her own animation… and what the hell is going
on with Reimu's blue stars there? Nope, not going to figure this out on
ASM level.
A decompilation first required some more bomb-related variables to be
named though. Since this was part of a generic RE push, it made sense to
do this in all 5 games… which then led to nice PI gains in anything
but TH05. Most notably, we now got the
"pulling all items to player" flag in TH04 and TH05, which is
actually separate from bombing. The obvious cheat mod is left as an
exercise to the reader.
So, TH05 bomb animations. Just like the
📝 custom entity types of this game, all 4
characters share the same memory, with the superficially same 10-byte
structure.
But let's just look at the very first field. Seen from a low level, it's a
simple struct { int x, y; } pos, storing the current position
of the character-specific bomb animation entity. But all 4 characters use
this field differently:
For Reimu's blue stars, it's the top-left position of each star, in the
12.4 fixed-point format. But unlike the vast majority of these values in
TH04 and TH05, it's relative to the top-left corner of the
screen, not the playfield. Much better represented as
struct { Subpixel screen_x, screen_y; } topleft.
For Marisa's lasers, it's the center of each circle, as a regular 12.4
fixed-point coordinate, relative to the top-left corner of the playfield.
Much better represented as
struct { Subpixel x, y; } center.
For Mima's shrinking circles, it's the center of each circle in regular
pixel coordinates. Much better represented as
struct { screen_x_t x; screen_y_t y; } center.
For Yuuka's spinning heart, it's the top-left corner in regular pixel
coordinates. Much better represented as
struct { screen_x_t x; screen_y_t y; } topleft.
And yes, singular. The game is actually smart enough to only store a single
heart, and then create the rest of the circle on the fly. (If it were even
smarter, it wouldn't even use this structure member, but oh well.)
Therefore, I decompiled it as 4 separate structures once again, bundled
into an union of arrays.
As for Reimu… yup, that's some pointer arithmetic straight out of
Jigoku* for setting and updating the positions of the falling star
trails. While that certainly required several
comments to wrap my head around the current array positions, the one "bug"
in all this arithmetic luckily has no effect on the game.
There is a small glitch with the growing circles, though. They are
spawned at the end of the loop, with their position taken from the star
pointer… but after that pointer has already been incremented. On
the last loop iteration, this leads to an out-of-bounds structure access,
with the position taken from some unknown EX-Alice data, which is 0 during
most of the game. If you look at the animation, you can easily spot these
bugged circles, consistently growing from the top-left corner (0, 0)
of the playfield:
After all that, there was barely enough remaining time to filter out and
label the final few memory references. But now, TH05's
MAIN.EXE is technically position-independent! 🎉
-Tom- is going to work on a pretty extensive demo of this
unprecedented level of efficient Touhou game modding. For a more impactful
effect of both the 100% PI mark and that demo, I'll be delaying the push
covering the remaining false positives in that binary until that demo is
done. I've accumulated a pretty huge backlog of minor maintenance issues
by now…
Next up though: The first part of the long-awaited build system
improvements. I've finally come up with a way of sanely accelerating the
32-bit build part on most setups you could possibly want to build ReC98
on, without making the building experience worse for the other few setups.
As expected, we've now got the TH04 and TH05 stage enemy structure,
finishing position independence for all big entity types. This one was
quite straightfoward, as the .STD scripting system is pretty simple.
Its most interesting aspect can be found in the way timing is handled. In
Windows Touhou, all .ECL script instructions come with a frame field that
defines when they are executed. In TH04's and TH05's .STD scripts, on the
other hand, it's up to each individual instruction to add a frame time
parameter, anywhere in its parameter list. This frame time defines for how
long this instruction should be repeatedly executed, before it manually
advances the instruction pointer to the next one. From what I've seen so
far, these instruction typically apply their effect on the first frame
they run on, and then do nothing for the remaining frames.
Oh, and you can't nest the LOOP instruction, since the enemy
structure only stores one single counter for the current loop iteration.
Just from the structure, the only innovation introduced by TH05 seems to
have been enemy subtypes. These can be used to parametrize scripts via
conditional jumps based on this value, as a first attempt at cutting down
the need to duplicate entire scripts for similar enemy behavior. And
thanks to TH05's favorable segment layout, this game's version of the
.STD enemy script interpreter is even immediately ready for decompilation,
in one single future push.
As far as I can tell, that now only leaves
.MPN file loading
player bomb animations
some structures specific to the Shinki and EX-Alice battles
plus some smaller things I've missed over the years
until TH05's MAIN.EXE is completely position-independent.
Which, however, won't be all it needs for that 100% PI rating on the front
page. And with that many false positives, it's quite easy to get lost with
immediately reverse-engineering everything around them. This time, the
rendering of the text dissolve circles, used for the stage and BGM title
popups, caught my eye… and since the high-level code to handle all of
that was near the end of a segment in both TH04 and TH05, I just decided
to immediately decompile it all. Like, how hard could it possibly be?
Sure, it needed another segment split, which was a bit harder due
to all the existing ASM referencing code in that segment, but certainly
not impossible…
Oh wait, this code depends on 9 other sets of identifiers that haven't
been declared in C land before, some of which require vast reorganizations
to bring them up to current consistency standards. Whoops! Good thing that
this is the part of the project I'm still offering for free…
Among the referenced functions was tiles_invalidate_around(),
which marks the stage background tiles within a rectangular area to be
redrawn this frame. And this one must have had the hardest function
signature to figure out in all of PC-98 Touhou, because it actually
seems impossible. Looking at all the ways the game passes the center
coordinate to this function, we have
X and Y as 16-bit integer literals, merged into a single
PUSH of a 32-bit immediate
X and Y calculated and pushed independently from each other
by-value copies of entire Point instances
Any single declaration would only lead to at most two of the three cases
generating the original instructions. No way around separately declaring
the function in every translation unit then, with the correct parameter
list for the respective calls. That's how ZUN must have also written it.
Oh well, we would have needed to do all of this some time. At least
there were quite a bit of insights to be gained from the actual
decompilation, where using const references actually made it
possible to turn quite a number of potentially ugly macros into wholesome
inline functions.
But still, TH04 and TH05 will come out of ReC98's decompilation as one big
mess. A lot of further manual decompilation and refactoring, beyond the
limits of the original binary, would be needed to make these games
portable to any non-PC-98, non-x86 architecture.
And yes, that includes IBM-compatible DOS – which, for some reason, a
number of people see as the obvious choice for a first system to port
PC-98 Touhou to. This will barely be easier. Sure, you'll save the effort
of decompiling all the remaining original ASM. But even with
master.lib's MASTER_DOSV setting, these games still very much
rely on PC-98 hardware, with corresponding assumptions all over ZUN's
code. You will need to provide abstractions for the PC-98's
superimposed text mode, the gaiji, and planar 4-bit color access in
general, exchanging the use of the PC-98's GRCG and EGC blitter chips with
something else. At that point, you might as well port the game to one
generic 640×400 framebuffer and away from the constraints of DOS,
resulting in that Doom source code-like situation which made that
game easily portable to every architecture to begin with. But ZUN just
wasn't a John Carmack, sorry.
Or what do I know. I've never programmed for IBM-compatible DOS, but maybe
ReC98's audience does include someone who is intimately familiar
with IBM-compatible DOS so that the constraints aren't much of an issue
for them? But even then, 16-bit Windows would make much more sense
as a first porting target if you don't want to bother with that
undecompilable ASM.
At least I won't have to look at TH04 and TH05 for quite a while now.
The delivery delays have made it obvious that
my life has become pretty busy again, probably until September. With a
total of 9 TH01 pushes from monthly subscriptions now waiting in the
backlog, the shop will stay closed until I've caught up with most of
these. Which I'm quite hyped for!
Alright, the score popup numbers shown when collecting items or defeating
(mid)bosses. The second-to-last remaining big entity type in TH05… with
quite some PI false positives in the memory range occupied by its data.
Good thing I still got some outstanding generic RE pushes that haven't
been claimed for anything more specific in over a month! These
conveniently allowed me to RE most of these functions right away, the
right way.
Most of the false positives were boss HP values, passed to a "boss phase
end" function which sets the HP value at which the next phase should end.
Stage 6 Yuuka, Mugetsu, and EX-Alice have their own copies of this
function, in which they also reset certain boss-specific global variables.
Since I always like to cover all varieties of such duplicated functions at
once, it made sense to reverse-engineer all the involved variables while I
was at it… and that's why this was exactly the right time to cover the
implementation details of Stage 6 Yuuka's parasol and vanishing animations
in TH04.
With still a bit of time left in that RE push afterwards, I could also
start looking into some of the smaller functions that didn't quite fit
into other pushes. The most notable one there was a simple function that
aims from any point to the current player position. Which actually only
became a separate function in TH05, probably since it's called 27 times in
total. That's 27 places no longer being blocked from further RE progress.
WindowsTiger already
did most of the work for the score popup numbers in January, which meant
that I only had to review it and bring it up to ReC98's current coding
styles and standards. This one turned out to be one of those rare features
whose TH05 implementation is significantly less insane than the
TH04 one. Both games lazily redraw only the tiles of the stage background
that were drawn over in the previous frame, and try their best to minimize
the amount of tiles to be redrawn in this way. For these popup numbers,
this involves calculating the on-screen width, based on the exact number
of digits in the point value. TH04 calculates this width every frame
during the rendering function, and even resorts to setting that field
through the digit iteration pointer via self-modifying code… yup. TH05, on
the other hand, simply calculates the width once when spawning a new popup
number, during the conversion of the point value to
binary-coded
decimal. The "×2" multiplier suffix being removed in TH05 certainly
also helped in simplifying that feature in this game.
And that's ⅓ of TH05 reverse-engineered! Next up, one more TH05 PI push,
in which the stage enemies hopefully finish all the big entity types.
Maybe it will also be accompanied by another RE push? In any case, that
will be the last piece of TH05 progress for quite some time. The next TH01
stretch will consist of 6 pushes at the very least, and I currently have
no idea of how much time I can spend on ReC98 a month from now…
To finish this TH05 stretch, we've got a feature that's exclusive to TH05
for once! As the final memory management innovation in PC-98 Touhou, TH05
provides a single static (64 * 26)-byte array for storing up to 64
entities of a custom type, specific to a stage or boss portion.
(Edit (2023-05-29): This system actually debuted in
📝 TH04, where it was used for much simpler
entities.)
TH05 uses this array for
the Stage 2 star particles,
Alice's puppets,
the tip of curve ("jello") bullets,
Mai's snowballs and Yuki's fireballs,
Yumeko's swords,
and Shinki's 32×32 bullets,
which makes sense, given that only one of those will be active at any
given time.
On the surface, they all appear to share the same 26-byte structure, with
consistently sized fields, merely using its 5 generic fields for different
purposes. Looking closer though, there actually are differences in
the signedness of certain fields across the six types. uth05win chose to
declare them as entirely separate structures, and given all the semantic
differences (pixels vs. subpixels, regular vs. tiny master.lib sprites,
…), it made sense to do the same in ReC98. It quickly turned out to be the
only solution to meet my own standards of code readability.
Which blew this one up to two pushes once again… But now, modders can
trivially resize any of those structures without affecting the other types
within the original (64 * 26)-byte boundary, even without full position
independence. While you'd still have to reduce the type-specific
number of distinct entities if you made any structure larger, you
could also have more entities with fewer structure members.
As for the types themselves, they're full of redundancy once again – as
you might have already expected from seeing #4, #5, and #6 listed as
unrelated to each other. Those could have indeed been merged into a single
32×32 bullet type, supporting all the unique properties of #4
(destructible, with optional revenge bullets), #5 (optional number of
twirl animation frames before they begin to move) and #6 (delay clouds).
The *_add(), *_update(), and *_render()
functions of #5 and #6 could even already be completely
reverse-engineered from just applying the structure onto the ASM, with the
ones of #3 and #4 only needing one more RE push.
But perhaps the most interesting discovery here is in the curve bullets:
TH05 only renders every second one of the 17 nodes in a curve
bullet, yet hit-tests every single one of them. In practice, this is an
acceptable optimization though – you only start to notice jagged edges and
gaps between the fragments once their speed exceeds roughly 11 pixels per
second:
And that brings us to the last 20% of TH05 position independence! But
first, we'll have more cheap and fast TH01 progress.
Deathbombs confirmed, in both TH04 and TH05! On the surface, it's the same
8-frame window as in
most Windows games, but due to the slightly lower PC-98 frame rate of
56.4 Hz, it's actually slightly more lenient in TH04 and TH05.
The last function in front of the TH05 shot type control functions marks
the player's previous position in VRAM to be redrawn. But as it turns out,
"player" not only means "the player's option satellites on shot levels ≥
2", but also "the explosion animation if you lose a life", which required
reverse-engineering both things, ultimately leading to the confirmation of
deathbombs.
It actually was kind of surprising that we then had reverse-engineered
everything related to rendering all three things mentioned above,
and could also cover the player rendering function right now. Luckily,
TH05 didn't decide to also micro-optimize that function into
un-decompilability; in fact, it wasn't changed at all from TH04. Unlike
the one invalidation function whose decompilation would have
actually been the goal here…
But now, we've finally gotten to where we wanted to… and only got 2
outstanding decompilation pushes left. Time to get the website ready for
hosting an actual crowdfunding campaign, I'd say – It'll make a better
impression if people can still see things being delivered after the big
announcement.
Here we go, new C code! …eh, it will still take a bit to really get
decompilation going at the speeds I was hoping for. Especially with the
sheer amount of stuff that is set in the first few significant
functions we actually can decompile, which now all has to be
correctly declared in the C world. Turns out I spent the last 2 years
screwing up the case of exported functions, and even some of their names,
so that it didn't actually reflect their calling convention… yup. That's
just the stuff you tend to forget while it doesn't matter.
To make up for that, I decided to research whether we can make use of some
C++ features to improve code readability after all. Previously, it seemed
that TH01 was the only game that included any C++ code, whereas TH02 and
later seemed to be 100% C and ASM. However, during the development of the
soon to be released new build system, I noticed that even this old
compiler from the mid-90's, infamous for prioritizing compile speeds over
all but the most trivial optimizations, was capable of quite surprising
levels of automatic inlining with class methods…
…leading the research to culminate in the mindblow that is
9d121c7 – yes, we can use C++ class methods
and operator overloading to make the code more readable, while still
generating the same code than if we had just used C and preprocessor
macros.
Looks like there's now the potential for a few pull requests from outside
devs that apply C++ features to improve the legibility of previously
decompiled and terribly macro-ridden code. So, if anyone wants to help
without spending money…
So, let's continue with player shots! …eh, or maybe not directly, since they involve two other structure types in TH05, which we'd have to cover first. One of them is a different sort of sprite, and since I like me some context in my reverse-engineering, let's disable every other sprite type first to figure out what it is.
One of those other sprite types were the little sparks flying away from killed stage enemies, midbosses, and grazed bullets; easy enough to also RE right now. Turns out they use the same 8 hardcoded 8×8 sprites in TH02, TH04, and TH05. Except that it's actually 64 16×8 sprites, because ZUN wanted to pre-shift them for all 8 possible start pixels within a planar VRAM byte (rather than, like, just writing a few instructions to shift them programmatically), leading to them taking up 1,024 bytes rather than just 64.
Oh, and the thing I wanted to RE *actually* was the decay animation whenever a shot hits something. Not too complex either, especially since it's exclusive to TH05.
And since there was some time left and I actually have to pick some of the next RE places strategically to best prepare for the upcoming 17 decompilation pushes, here's two more function pointers for good measure.
Actually, I lied, and lasers ended up coming with everything that makes reverse-engineering ZUN code so difficult: weirdly reused variables, unexpected structures within structures, and those TH05-specific nasty, premature ASM micro-optimizations that will waste a lot of time during decompilation, since the majority of the code actually was C, except for where it wasn't.
What do you do if the TH06 text image feature for thcrap should have been done 3 days™ ago, but keeps getting more and more complex, and you have a ton of other pushes to deliver anyway? Get some distraction with some light ReC98 reverse-engineering work. This is where it becomes very obvious how much uth05win helps us with all the games, not just TH05.
5a5c347 is the most important one in there, this was the missing substructure that now makes every other sprite-like structure trivial to figure out.