Didn't quite get to cover background rendering for TH05's Stage 1-5
bosses in this one, as I had to reverse-engineer two more fundamental parts
involved in boss background rendering before.
First, we got the those blocky transitions from stage tiles to bomb and
boss backgrounds, loaded from BB*.BB and ST*.BB,
respectively. These files store 16 frames of animation, with every bit
corresponding to a 16×16 tile on the playfield. With 384×368 pixels to be
covered, that would require 69 bytes per frame. But since that's a very odd
number to work with in micro-optimized ASM, ZUN instead stores 512×512
pixels worth of bits, ending up with a frame size of 128 bytes, and a
per-frame waste of 59 bytes. At least it was
possible to decompile the core blitting function as __fastcall
for once.
But wait, TH05 comes with, and loads, a bomb .BB file for every character,
not just for the Reimu and Yuuka bomb transitions you see in-game… 🤔
Restoring those unused stage tile → bomb image transition
animations for Mima and Marisa isn't that trivial without having decompiled
their actual bomb animation functions before, so stay tuned!
Interestingly though, the code leaves out what would look like the most
obvious optimization: All stage tiles are unconditionally redrawn
each frame before they're erased again with the 16×16 blocks, no matter if
they weren't covered by such a block in the previous frame, or are
going to be covered by such a block in this frame. The same is true
for the static bomb and boss background images, where ZUN simply didn't
write a .CDG blitting function that takes the dirty tile array into
account. If VRAM writes on PC-98 really were as slow as the games'
README.TXT files claim them to be, shouldn't all the
optimization work have gone towards minimizing them?
Oh well, it's not like I have any idea what I'm talking about here. I'd
better stop talking about anything relating to VRAM performance on PC-98…
Second, it finally was time to solve the long-standing confusion about all
those callbacks that are supposed to render the playfield background. Given
the aforementioned static bomb background images, ZUN chose to make this
needlessly complicated. And so, we have two callback function
pointers: One during bomb animations, one outside of bomb
animations, and each boss update function is responsible for keeping the
former in sync with the latter.
Other than that, this was one of the smoothest pushes we've had in a while;
the hardest parts of boss background rendering all were part of
📝 the last push. Once you figured out that
ZUN does indeed dynamically change hardware color #0 based on the current
boss phase, the remaining one function for Shinki, and all of EX-Alice's
background rendering becomes very straightforward and understandable.
Meanwhile, -Tom- told me about his plans to publicly
release 📝 his TH05 scripting toolkit once
TH05's MAIN.EXE would hit around 50% RE! That pretty much
defines what the next bunch of generic TH05 pushes will go towards:
bullets, shared boss code, and one
full, concrete boss script to demonstrate how it's all combined. Next up,
therefore: TH04's bullet firing code…? Yes, TH04's. I want to see what I'm
doing before I tackle the undecompilable mess that is TH05's bullet firing
code, and you all probably want readable code for that feature as
well. Turns out it's also the perfect place for Blue Bolt's
pending contributions.
Y'know, I kinda prefer the pending crowdfunded workload to stay more near
the middle of the cap, rather than being sold out all the time. So to reach
this point more quickly, let's do the most relaxing thing that can be
easily done in TH05 right now: The boss backgrounds, starting with Shinki's,
📝 now that we've got the time to look at it in detail.
… Oh come on, more things that are borderline undecompilable, and
require new workarounds to be developed? Yup, Borland C++ always optimizes
any comparison of a register with a literal 0 to OR reg, reg,
no matter how many calculations and inlined function calls you replace the
0 with. Shinki's background particle rendering function contains a
CMP AX, 0 instruction though… so yeah,
📝 yet another piece of custom ASM that's worse
than what Turbo C++ 4.0J would have generated if ZUN had just written
readable C. This was probably motivated by ZUN insisting that his modified
master.lib function for blitting particles takes its X and Y parameters as
registers. If he had just used the __fastcall convention, he
also would have got the sprite ID passed as a register. 🤷
So, we really don't want to be forced into inline assembly just
because of the third comparison in the otherwise perfectly decompilable
four-comparison if() expression that prevents invisible
particles from being drawn. The workaround: Comparing to a pointer
instead, which only the linker gets to resolve to the actual value of 0.
This way, the compiler has to make room for
any 16-bit literal, and can't optimize anything.
And then we go straight from micro-optimization to
waste, with all the duplication in the code that
animates all those particles together with the zooming and spinning lines.
This push decompiled 1.31% of all code in TH05, and thanks to alignment,
we're still missing Shinki's high-level background rendering function that
calls all the subfunctions I decompiled here.
With all the manipulated state involved here, it's not at all trivial to
see how this code produces what you see in-game. Like:
If all lines have the same Y velocity, how do the other three lines in
background type B get pushed down into this vertical formation while the
top one stays still? (Answer: This velocity is only applied to the top
line, the other lines are only pushed based on some delta.)
How can this delta be calculated based on the distance of the top line
with its supposed target point around Shinki's wings? (Answer: The velocity
is never set to 0, so the top line overshoots this target point in every
frame. After calculating the delta, the top line itself is pushed down as
well, canceling out the movement. )
Why don't they get pushed down infinitely, but stop eventually?
(Answer: We only see four lines out of 20, at indices #0, #6, #12, and
#18. In each frame, lines [0..17] are copied to lines [1..18], before
anything gets moved. The invisible lines are pushed down based on the delta
as well, which defines a distance between the visible lines of (velocity *
array gap). And since the velocity is capped at -14 pixels per frame, this
also means a maximum distance of 84 pixels between the midpoints of each
line.)
And why are the lines moving back up when switching to background type
C, before moving down? (Answer: Because type C increases the
velocity rather than decreasing it. Therefore, it relies on the previous
velocity state from type B to show a gapless animation.)
So yeah, it's a nice-looking effect, just very hard to understand. 😵
With the amount of effort I'm putting into this project, I typically
gravitate towards more descriptive function names. Here, however,
uth05win's simple and seemingly tiny-brained "background type A/B/C/D" was
quite a smart choice. It clearly defines the sequence in which these
animations are intended to be shown, and as we've seen with point 4
from the list above, that does indeed matter.
Next up: At least EX-Alice's background animations, and probably also the
high-level parts of the background rendering for all the other TH05 bosses.
… and just as I explained 📝 in the last post
how decompilation is typically more sensible and efficient than ASM-level
reverse-engineering, we have this push demonstrating a counter-example.
The reason why the background particles and lines in the Shinki and
EX-Alice battles contributed so much to position dependence was simply
because they're accessed in a relatively large amount of functions, one
for each different animation. Too many to spend the remaining precious
crowdfunded time on reverse-engineering or even decompiling them all,
especially now that everyone anticipates 100% PI for TH05's
MAIN.EXE.
Therefore, I only decompiled the two functions of the line structure that
also demonstrate best how it works, which in turn also helped with RE.
Sadly, this revealed that we actually can't📝 overload operator =() to get
that nice assignment syntax for 12.4 fixed-point values, because one of
those new functions relies on Turbo C++'s built-in optimizations for
trivially copyable structures. Still, impressive that this abstraction
caused no other issues for almost one year.
As for the structures themselves… nope, nothing to criticize this time!
Sure, one good particle system would have been awesome, instead of having
separate structures for the Stage 2 "starfield" particles and the one used
in Shinki's battle, with hardcoded animations for both. But given the
game's short development time, that was quite an acceptable compromise,
I'd say.
And as for the lines, there just has to be a reason why the game
reserves 20 lines per set, but only renders lines #0, #6, #12, and #18.
We'll probably see once we get to look at those animation functions more
closely.
This was quite a 📝 TH03-style RE push,
which yielded way more PI% than RE%. But now that that's done, I can
finally not get distracted by all that stuff when looking at the
list of remaining memory references. Next up: The last few missing
structures in TH05's MAIN.EXE!
To finish this TH05 stretch, we've got a feature that's exclusive to TH05
for once! As the final memory management innovation in PC-98 Touhou, TH05
provides a single static (64 * 26)-byte array for storing up to 64
entities of a custom type, specific to a stage or boss portion.
(Edit (2023-05-29): This system actually debuted in
📝 TH04, where it was used for much simpler
entities.)
TH05 uses this array for
the Stage 2 star particles,
Alice's puppets,
the tip of curve ("jello") bullets,
Mai's snowballs and Yuki's fireballs,
Yumeko's swords,
and Shinki's 32×32 bullets,
which makes sense, given that only one of those will be active at any
given time.
On the surface, they all appear to share the same 26-byte structure, with
consistently sized fields, merely using its 5 generic fields for different
purposes. Looking closer though, there actually are differences in
the signedness of certain fields across the six types. uth05win chose to
declare them as entirely separate structures, and given all the semantic
differences (pixels vs. subpixels, regular vs. tiny master.lib sprites,
…), it made sense to do the same in ReC98. It quickly turned out to be the
only solution to meet my own standards of code readability.
Which blew this one up to two pushes once again… But now, modders can
trivially resize any of those structures without affecting the other types
within the original (64 * 26)-byte boundary, even without full position
independence. While you'd still have to reduce the type-specific
number of distinct entities if you made any structure larger, you
could also have more entities with fewer structure members.
As for the types themselves, they're full of redundancy once again – as
you might have already expected from seeing #4, #5, and #6 listed as
unrelated to each other. Those could have indeed been merged into a single
32×32 bullet type, supporting all the unique properties of #4
(destructible, with optional revenge bullets), #5 (optional number of
twirl animation frames before they begin to move) and #6 (delay clouds).
The *_add(), *_update(), and *_render()
functions of #5 and #6 could even already be completely
reverse-engineered from just applying the structure onto the ASM, with the
ones of #3 and #4 only needing one more RE push.
But perhaps the most interesting discovery here is in the curve bullets:
TH05 only renders every second one of the 17 nodes in a curve
bullet, yet hit-tests every single one of them. In practice, this is an
acceptable optimization though – you only start to notice jagged edges and
gaps between the fragments once their speed exceeds roughly 11 pixels per
second:
And that brings us to the last 20% of TH05 position independence! But
first, we'll have more cheap and fast TH01 progress.