Didn't quite get to cover background rendering for TH05's Stage 1-5
bosses in this one, as I had to reverse-engineer two more fundamental parts
involved in boss background rendering before.
First, we got the those blocky transitions from stage tiles to bomb and
boss backgrounds, loaded from BB*.BB and ST*.BB,
respectively. These files store 16 frames of animation, with every bit
corresponding to a 16×16 tile on the playfield. With 384×368 pixels to be
covered, that would require 69 bytes per frame. But since that's a very odd
number to work with in micro-optimized ASM, ZUN instead stores 512×512
pixels worth of bits, ending up with a frame size of 128 bytes, and a
per-frame waste of 59 bytes. At least it was
possible to decompile the core blitting function as __fastcall
for once.
But wait, TH05 comes with, and loads, a bomb .BB file for every character,
not just for the Reimu and Yuuka bomb transitions you see in-game… 🤔
Restoring those unused stage tile → bomb image transition
animations for Mima and Marisa isn't that trivial without having decompiled
their actual bomb animation functions before, so stay tuned!
Interestingly though, the code leaves out what would look like the most
obvious optimization: All stage tiles are unconditionally redrawn
each frame before they're erased again with the 16×16 blocks, no matter if
they weren't covered by such a block in the previous frame, or are
going to be covered by such a block in this frame. The same is true
for the static bomb and boss background images, where ZUN simply didn't
write a .CDG blitting function that takes the dirty tile array into
account. If VRAM writes on PC-98 really were as slow as the games'
README.TXT files claim them to be, shouldn't all the
optimization work have gone towards minimizing them?
Oh well, it's not like I have any idea what I'm talking about here. I'd
better stop talking about anything relating to VRAM performance on PC-98…
Second, it finally was time to solve the long-standing confusion about all
those callbacks that are supposed to render the playfield background. Given
the aforementioned static bomb background images, ZUN chose to make this
needlessly complicated. And so, we have two callback function
pointers: One during bomb animations, one outside of bomb
animations, and each boss update function is responsible for keeping the
former in sync with the latter.
Other than that, this was one of the smoothest pushes we've had in a while;
the hardest parts of boss background rendering all were part of
📝 the last push. Once you figured out that
ZUN does indeed dynamically change hardware color #0 based on the current
boss phase, the remaining one function for Shinki, and all of EX-Alice's
background rendering becomes very straightforward and understandable.
Meanwhile, -Tom- told me about his plans to publicly
release 📝 his TH05 scripting toolkit once
TH05's MAIN.EXE would hit around 50% RE! That pretty much
defines what the next bunch of generic TH05 pushes will go towards:
bullets, shared boss code, and one
full, concrete boss script to demonstrate how it's all combined. Next up,
therefore: TH04's bullet firing code…? Yes, TH04's. I want to see what I'm
doing before I tackle the undecompilable mess that is TH05's bullet firing
code, and you all probably want readable code for that feature as
well. Turns out it's also the perfect place for Blue Bolt's
pending contributions.
… and just as I explained 📝 in the last post
how decompilation is typically more sensible and efficient than ASM-level
reverse-engineering, we have this push demonstrating a counter-example.
The reason why the background particles and lines in the Shinki and
EX-Alice battles contributed so much to position dependence was simply
because they're accessed in a relatively large amount of functions, one
for each different animation. Too many to spend the remaining precious
crowdfunded time on reverse-engineering or even decompiling them all,
especially now that everyone anticipates 100% PI for TH05's
MAIN.EXE.
Therefore, I only decompiled the two functions of the line structure that
also demonstrate best how it works, which in turn also helped with RE.
Sadly, this revealed that we actually can't📝 overload operator =() to get
that nice assignment syntax for 12.4 fixed-point values, because one of
those new functions relies on Turbo C++'s built-in optimizations for
trivially copyable structures. Still, impressive that this abstraction
caused no other issues for almost one year.
As for the structures themselves… nope, nothing to criticize this time!
Sure, one good particle system would have been awesome, instead of having
separate structures for the Stage 2 "starfield" particles and the one used
in Shinki's battle, with hardcoded animations for both. But given the
game's short development time, that was quite an acceptable compromise,
I'd say.
And as for the lines, there just has to be a reason why the game
reserves 20 lines per set, but only renders lines #0, #6, #12, and #18.
We'll probably see once we get to look at those animation functions more
closely.
This was quite a 📝 TH03-style RE push,
which yielded way more PI% than RE%. But now that that's done, I can
finally not get distracted by all that stuff when looking at the
list of remaining memory references. Next up: The last few missing
structures in TH05's MAIN.EXE!
To finish this TH05 stretch, we've got a feature that's exclusive to TH05
for once! As the final memory management innovation in PC-98 Touhou, TH05
provides a single static (64 * 26)-byte array for storing up to 64
entities of a custom type, specific to a stage or boss portion.
(Edit (2023-05-29): This system actually debuted in
📝 TH04, where it was used for much simpler
entities.)
TH05 uses this array for
the Stage 2 star particles,
Alice's puppets,
the tip of curve ("jello") bullets,
Mai's snowballs and Yuki's fireballs,
Yumeko's swords,
and Shinki's 32×32 bullets,
which makes sense, given that only one of those will be active at any
given time.
On the surface, they all appear to share the same 26-byte structure, with
consistently sized fields, merely using its 5 generic fields for different
purposes. Looking closer though, there actually are differences in
the signedness of certain fields across the six types. uth05win chose to
declare them as entirely separate structures, and given all the semantic
differences (pixels vs. subpixels, regular vs. tiny master.lib sprites,
…), it made sense to do the same in ReC98. It quickly turned out to be the
only solution to meet my own standards of code readability.
Which blew this one up to two pushes once again… But now, modders can
trivially resize any of those structures without affecting the other types
within the original (64 * 26)-byte boundary, even without full position
independence. While you'd still have to reduce the type-specific
number of distinct entities if you made any structure larger, you
could also have more entities with fewer structure members.
As for the types themselves, they're full of redundancy once again – as
you might have already expected from seeing #4, #5, and #6 listed as
unrelated to each other. Those could have indeed been merged into a single
32×32 bullet type, supporting all the unique properties of #4
(destructible, with optional revenge bullets), #5 (optional number of
twirl animation frames before they begin to move) and #6 (delay clouds).
The *_add(), *_update(), and *_render()
functions of #5 and #6 could even already be completely
reverse-engineered from just applying the structure onto the ASM, with the
ones of #3 and #4 only needing one more RE push.
But perhaps the most interesting discovery here is in the curve bullets:
TH05 only renders every second one of the 17 nodes in a curve
bullet, yet hit-tests every single one of them. In practice, this is an
acceptable optimization though – you only start to notice jagged edges and
gaps between the fragments once their speed exceeds roughly 11 pixels per
second:
And that brings us to the last 20% of TH05 position independence! But
first, we'll have more cheap and fast TH01 progress.