And then, the supposed boilerplate code revealed yet another confusing issue
that quickly forced me back to serial work, leading to no parallel progress
made with Shuusou Gyoku after all. 🥲 The list of functions I put together
for the first ½ of this push seemed so boring at first, and I was so sure
that there was almost nothing I could possibly talk about:
TH02's gaiji animations at the start and end of each stage, resembling
opening and closing window blind slats. ZUN should have maybe not defined
the regular whitespace gaiji as what's technically the last frame of the
closing animation, but that's a minor nitpick. Nothing special there
otherwise.
The remaining spawn functions for TH04's and TH05's gather circles. The
only dumb antic there is the way ZUN initializes the template for bullets
fired at the end of the animation, featuring ASM instructions that are
equivalent to what Turbo C++ 4.0J generates for the __memcpy__
intrinsic, but show up in a different order. Which means that they must have
been handwritten. I already figured that out in 2022
though, so this was just more of the same.
EX-Alice's override for the game's main 16×16 sprite sheet, loaded
during her dialog script. More of a naming and consistency challenge, if
anything.
The rendering function for TH04's Stage 4 midboss, which seems to
feature the same premature clipping quirk we've seen for
📝 TH05's Stage 5 midboss, 7 months ago?
The rendering function for the big 48×48 explosion sprite, which also
features the same clipping quirk?
That's three instances of ZUN removing sprites way earlier than you'd want
to, intentionally deciding against those sprites flying smoothly in and out
of the playfield. Clearly, there has to be a system and a reason behind it.
Turns out that it can be almost completely blamed on master.lib. None of the
super_*() sprite blitting functions can clip the rendered
sprite to the edges of VRAM, and much less to the custom playfield rectangle
we would actually want here. This is exactly the wrong choice to make for a
game engine: Not only is the game developer now stuck with either rendering
the sprite in full or not at all, but they're also left with the burden of
manually calculating when not to display a sprite.
However, strictly limiting the top-left screen-space coordinate to
(0, 0) and the bottom-right one to (640, 400) would actually
stop rendering some of the sprites much earlier than the clipping conditions
we encounter in these games. So what's going on there?
The answer is a combination of playfield borders, hardware scrolling, and
master.lib needing to provide at least some help to support the
latter. Hardware scrolling on PC-98 works by dividing VRAM into two vertical
partitions along the Y-axis and telling the GDC to display one of them at
the top of the screen and the other one below. The contents of VRAM remain
unmodified throughout, which raises the interesting question of how to deal
with sprites that reach the vertical edges of VRAM. If the top VRAM row that
starts at offset 0x0000 ends up being displayed below
the bottom row of VRAM that starts at offset 0x7CB0 for 399 of
the 400 possible scrolling positions, wouldn't we then need to vertically
wrap most of the rendered sprites?
For this reason, master.lib provides the super_roll_*()
functions, which unconditionally perform exactly this vertical wrapping. But
this creates a new problem: If these functions still can't clip, and don't
even know which VRAM rows currently correspond to the top and bottom row of
the screen (since master.lib's graph_scrollup() function
doesn't retain this information), won't we also see sprites wrapping around
the actual edges of the screen? That's something we certainly
wouldn't want in a vertically scrolling game…
The answer is yes, and master.lib offers no solution for this issue. But
this is where the playfield borders come in, and helpfully cover 16 pixels
at the top and 16 pixels at the bottom of the screen. As a result, they can
hide up to 32 rows of potentially wrapped sprite pixels below them:
And that's how the lowest possible top Y coordinate for sprites blitted
using the master.lib super_roll_*() functions during the
scrolling portions of TH02, TH04, and TH05 is not 0, but -16. Any lower, and
you would actually see some of the sprite's upper pixels at the
bottom of the playfield, as there are no more opaque black text cells to
cover them. Theoretically, you could lower this number for
some animation frames that start with multiple rows of transparent
pixels, but I thankfully haven't found any instance of ZUN using such a
hack. So far, at least…
Visualized like that, it all looks quite simple and logical, but for days, I
did not realize that these sprites were rendered to a scrolling VRAM.
This led to a much more complicated initial explanation involving the
invisible extra space of VRAM between offsets 0x7D00 and
0x7FFF that effectively grant a hidden additional 9.6 lines
below the playfield. Or even above, since PC-98 hardware ignores the highest
bit of any offset into a VRAM bitplane segment
(& 0x7FFF), which prevents blitting operations from
accidentally reaching into a different bitplane. Together with the
aforementioned rows of transparent pixels at the top of these midboss
sprites, the math would have almost worked out exactly.
The need for manual clipping also applies to the X-axis. Due to the lack of
scrolling in this dimension, the boundaries there are much more
straightforward though. The minimum left coordinate of a sprite can't fall
below 0 because any smaller coordinate would wrap around into the
📝 tile source area and overwrite some of the
pixels there, which we obviously don't want to re-blit every frame.
Similarly, the right coordinate must not extend into the HUD, which starts
at 448 pixels.
The last part might be surprising if you aren't familiar with the PC-98 text
chip. Contrary to the CGA and VGA text modes of IBM-compatibles, PC-98 text
cells can only use a single color for either their foreground or
background, with the other pixels being transparent and always revealing the
pixels in VRAM below. If you look closely at the HUD in the images above,
you can see how the background of cells with gaiji glyphs is slightly
brighter (◼ #100) than the opaque black
cells (◼ #000) surrounding them. This
rather custom color clearly implies that those pixels must have been
rendered by the graphics GDC. If any other sprite was rendered below the
HUD, you would equally see it below the glyphs.
So in the end, I did find the clear and logical system I was looking for,
and managed to reduce the new clipping conditions down to a
set of basic rules for each edge. Unfortunately, we also need a second
macro for each edge to differentiate between sprites that are smaller or
larger than the playfield border, which is treated as either 32×32 (for
super_roll_*()) or 32×16 (for non-"rolling"
super_*() functions). Since smaller sprites can be fully
contained within this border, the games can stop rendering them as soon as
their bottom-right coordinate is no longer seen within the playfield, by
comparing against the clipping boundaries with <= and
>=. For example, a 16×16 sprite would be completely
invisible once it reaches (16, 0), so it would still be rendered at
(17, 1). A larger sprite during the scrolling part of a stage, like,
say, the 64×64 midbosses, would still be rendered if their top-left
coordinate was (0, -16), so ZUN used < and
> comparisons to at least get an additional pixel before
having to stop rendering such a sprite. Turbo C++ 4.0J sadly can't
constant-fold away such a difference in comparison operators.
And for the most part, ZUN did follow this system consistently. Except for,
of course, the typical mistakes you make when faced with such manual
decisions, like how he treated TH04's Stage 4 midboss as a "small" sprite
below 32×32 pixels (it's 64×64), losing that precious one extra pixel. Or
how the entire rendering code for the 48×48 boss explosion sprite pretends
that it's actually 64×64 pixels large, which causes even the initial
transformation into screen space to be misaligned from the get-go.
But these are additional bugs on top of the single
one that led to all this research.
Because that's what this is, a bug. 🐞 Every resulting pixel boundary is a
systematic result of master.lib's unfortunate lack of clipping. It's as much
of a bug as TH01's byte-aligned rendering of entities whose internal
position is not byte-aligned. In both cases, the entities are alive,
simulated, and partake in collision detection, but their rendered appearance
doesn't accurately reflect their internal position.
Initially, I classified
📝 the sudden pop-in of TH05's Stage 5 midboss
as a quirk because we had no conclusive evidence that this wasn't
intentional, but now we do. There have been multiple explanations for why
ZUN put borders around the playfield, but master.lib's lack of sprite
clipping might be the biggest reason.
And just like byte-aligned rendering, the clipping conditions can easily be
removed when porting the game away from PC-98 hardware. That's also what
uth05win chose to do: By using OpenGL and not having to rely on hardware
scrolling, it can simply place every sprite as a textured quad at its exact
position in screen space, and then draw the black playfield borders on top
in the end to clip everything in a single draw call. This way, the Stage 5
midboss can smoothly fly into the playfield, just as defined by its movement
code:
Meanwhile, I designed the interface of the 📝 generic blitter used in the TH01 Anniversary Edition entirely around
clipping the blitted sprite at any explicit combination of VRAM edges. This
was nothing I tacked on in the end, but a core aspect that informed the
architecture of the code from the very beginning. You really want to
have one and only one place where sprite clipping is done right – and
only once per sprite, regardless of how many bitplanes you want to write to.
Which brings us to the goal that the final ¼ of this push went toward. I
thought I was going to start cleaning up the
📝 player movement and rendering code, but
that turned out too complicated for that amount of time – especially if you
want to start with just cleanup, preserving all original bugs for the
time being.
Fixing and smoothening player and Orb movement would be the next big task in
Anniversary Edition development, needing about 3 pushes. It would start with
more performance research into runtime-shifting of larger sprites, followed
by extending my generic blitter according to the results, writing new
optimized loaders for the original image formats, and finally rewriting all
rendering code accordingly. With that code in place, we can then start
cleaning up and fixing the unique code for each boss, one by one.
Until that's funded, the code still contains a few smaller and easier pieces
of code that are equally related to rendering bugs, but could be dealt with
in a more incremental way. Line rendering is one of those, and first needs
some refactoring of every call site, including
📝 the rotating squares around Mima and
📝 YuugenMagan's pentagram. So far, I managed
to remove another 1,360 bytes from the binary within this final ¼ of a push,
but there's still quite a bit to do in that regard.
This is the perfect kind of feature for smaller (micro-)transactions. Which
means that we've now got meaningful TH01 code cleanup and Anniversary
Edition subtasks at every price range, no matter whether you want to invest
a lot or just a little into this goal.
If you can, because Ember2528 revealed the plan behind
his Shuusou Gyoku contributions: A full-on Linux port of the game, which
will be receiving all the funding it needs to happen. 🐧 Next up, therefore:
Turning this into my main project within ReC98 for the next couple of
months, and getting started by shipping the long-awaited first step towards
that goal.
I've raised the cap to avoid the potential of rounding errors, which might
prevent the last needed Shuusou Gyoku push from being correctly funded. I
already had to pick the larger one of the two pending TH02 transactions for
this push, because we would have mathematically ended up
1/25500 short of a full push with the smaller
transaction. And if I'm already at it, I might
as well free up enough capacity to potentially ship the complete OpenGL
backend in a single delivery, which is currently estimated to cost 7 pushes
in total.
Two years after
📝 the first look at TH04's and TH05's bullets,
we finally get to finish their logic code by looking at the special motion
types. Bullets as a whole still aren't completely finished as the
rendering code is still waiting to be RE'd, but now we've got everything
about them that's required for decompiling the midboss and boss fights of
these games.
Just like the motion types of TH01's pellets, the ones we've got here really
are special enough to warrant an enum, despite all the
overlap in the "slow down and turn" and "bounce at certain edges of the
playfield" types. Sure, including them in the bitfield I proposed two years
ago would have allowed greater variety, but it wouldn't have saved any
memory. On the contrary: These types use a single global state variable for
the maximum turn count and delta speed, which a proper customizable
architecture would have to integrate into the bullet structure. Maybe it is
possible to stuff everything into the same amount of bytes, but not without
first completely rearchitecting the bullet structure and removing every
single piece of redundancy in there. Simply extending the system by adding a
new enum value for a new motion type would be way more
straightforward for modders.
Speaking about memory, TH05 already extends the bullet structure by 6 bytes
for the "exact linear movement" type exclusive to that game. This type is
particularly interesting for all the prospective PC-98 game developers out
there, as it nicely points out the precision limits of Q12.4 subpixels.
Regular bullet movement works by adding a Q12.4 velocity to a Q12.4 position
every frame, with the velocity typically being calculated only once on spawn
time from an 8-bit angle and a Q12.4 speed. Quantization errors from this
initial calculation can quickly compound over all the frames a bullet spends
moving across the playfield. If a bullet is only supposed to move on a
straight line though, there is a more precise way of calculating its
position: By storing the origin point, movement angle, and total distance
traveled, you can perform a full polar→Cartesian transformation every frame.
Out of the 10 danmaku patterns in TH05 that use this motion type, the
difference to regular bullet movement can be best seen in Louise's final
pattern:
Not far away from the regular bullet code, we've also got the movement
function for the infamous curve / "cheeto" bullets. I would have almost
called them "cheetos" in the code as well, which surely fits more nicely
into 8.3 filenames than "curve bullets" does, but eh, trademarks…
As for hitboxes, we got a 16×16 one on the head node, and a 12×12 one on the
16 trail nodes. The latter simply store the position of the head node during
the last 16 frames, Snake style. But what you're all here for is probably
the turning and homing algorithm, right? Boiled down to its essence, it
works like this:
// [head] points to the controlled "head" part of a curve bullet entity.
// Angles are stored with 8 bits representing a full circle, providing free
// normalization on arithmetic overflow.
// The directions are ordered as you would expect:
// • 0x00: right (sin(0x00) = 0, cos(0x00) = +1)
// • 0x40: down (sin(0x40) = +1, cos(0x40) = 0)
// • 0x80: left (sin(0x80) = 0, cos(0x80) = -1)
// • 0xC0: up (sin(0xC0) = -1, cos(0xC0) = 0)
uint8_t angle_delta = (head->angle - player_angle_from(
head->pos.cur.x, head->pos.cur.y
));
// Stop turning if the player is 1/128ths of a circle away from this bullet
const uint8_t SNAP = 0x02;
// Else, turn either clockwise or counterclockwise by 1/256th of a circle,
// depending on what would reach the player the fastest.
if((angle_delta > SNAP) && (angle_delta < static_cast<uint8_t>(-SNAP))) {
angle_delta = (angle_delta >= 0x80) ? -0x01 : +0x01;
}
head_p->angle -= angle_delta;
5 lines of code, and not all too difficult to follow once you are familiar
with 8-bit angles… unlike what ZUN actually wrote. Which is 26 lines,
and includes an unused "friction" variable that is never set to any value
that makes a difference in the formula. uth05win
correctly saw through that all and simplified this code to something
equivalent to my explanation. Redoing that work certainly wasted a bit of my
time, and means that I now definitely need to spend another push on RE'ing
all the shared boss functions before I can start with Shinki.
So while a curve bullet's speed does get faster over time, its
angular velocity is always limited to 1/256th of a
circle per frame. This reveals the optimal strategy for dodging them:
Maximize this delta angle by staying as close to 180° away from their
current direction as possible, and let their acceleration do the rest.
At least that's the theory for dodging a single one. As a danmaku
designer, you can now of course place other bullets at these technically
optimal places to prevent a curve bullet pattern from being cheesed like
that. I certainly didn't record the video above in a single take either…
After another bunch of boring entity spawn and update functions, the
playfield shaking feature turned out as the most notable (and tricky) one to
round out these two pushes. It's actually implemented quite well in how it
simply "un-shakes" the screen by just marking every stage tile to be
redrawn. In the context of all the other tile invalidation that can take
place during a frame, that's definitely more performant than
📝 doing another EGC-accelerated memmove().
Due to these two games being double-buffered via page flipping, this
invalidation only really needs to happen for the frame after the next
one though. The immediately next frame will show the regular, un-shaken
playfield on the other VRAM page first, except during the multi-frame
shake animation when defeating a midboss, where it will also appear shifted
in a different direction… 😵 Yeah, no wonder why ZUN just always invalidates
all stage tiles for the next two frames after every shaking animation, which
is guaranteed to handle both sporadic single-frame shakes and continuous
ones. So close to good-code here.
Finally, this delivery was delayed a bit because -Tom-
requested his round-up amount to be limited to the cap in the future. Since
that makes it kind of hard to explain on a static page how much money he
will exactly provide, I now properly modeled these discounts in the website
code. The exact round-up amount is now included in both the pre-purchase
breakdown, as well as the cap bar on the main page.
With that in place, the system is now also set up for round-up offers from
other patrons. If you'd also like to support certain goals in this way, with
any amount of money, now's the time for getting in touch with me about that.
Known contributors only, though! 😛
Next up: The final bunch of shared boring boss functions. Which certainly
will give me a break from all the maintenance and research work, and speed
up delivery progress again… right?
Long time no see! And this is exactly why I've been procrastinating
bullets while there was still meaningful progress to be had in other parts
of TH04 and TH05: There was bound to be quite some complexity in this most
central piece of game logic, and so I couldn't possibly get to a
satisfying understanding in just one push.
Or in two, because their rendering involves another bunch of
micro-optimized functions adapted from master.lib.
Or in three, because we'd like to actually name all the bullet sprites,
since there are a number of sprite ID-related conditional branches. And
so, I was refining things I supposedly RE'd in the the commits from the
first push until the very end of the fourth.
When we talk about "bullets" in TH04 and TH05, we mean just two things:
the white 8×8 pellets, with a cap of 240 in TH04 and 180 in TH05, and any
16×16 sprites from MIKO16.BFT, with a cap of 200 in TH04 and
220 in TH05. These are by far the most common types of… err, "things the
player can collide with", and so ZUN provides a whole bunch of pre-made
motion, animation, and
n-way spread / ring / stack group options for those, which can be
selected by simply setting a few fields in the bullet template. All the
other "non-bullets" have to be fired and controlled individually.
Which is nothing new, since uth05win covered this part pretty accurately –
I don't think anyone could just make up these structure member
overloads. The interesting insights here all come from applying this
research to TH04, and figuring out its differences compared to TH05. The
most notable one there is in the default groups: TH05 allows you to add
a stack
to any single bullet, n-way spread or ring, but TH04 only lets you create
stacks separately from n-way spreads and rings, and thus gets by with
fewer fields in its bullet template structure. On the other hand, TH04 has
a separate "n-way spread with random angles, yet still aimed at the
player" group? Which seems to be unused, at least as far as
midbosses and bosses are concerned; can't say anything about stage enemies
yet.
In fact, TH05's larger bullet template structure illustrates that these
distinct group types actually are a rather redundant piece of
over-engineering. You can perfectly indicate any permutation of the basic
groups through just the stack bullet count (1 = no stack), spread bullet
count (1 = no spread), and spread delta angle (0 = ring instead of
spread). Add a 4-flag bitfield to cover the rest (aim to player, randomize
angle, randomize speed, force single bullet regardless of difficulty or
rank), and the result would be less redundant and even slightly
more capable.
Even those 4 pushes didn't quite finish all of the bullet-related types,
stopping just shy of the most trivial and consistent enum that defines
special movement. This also left us in a
📝 TH03-like situation, in which we're still
a bit away from actually converting all this research into actual RE%. Oh
well, at least this got us way past 50% in overall position independence.
On to the second half! 🎉
For the next push though, we'll first have a quick detour to the remaining
C code of all the ZUN.COM binaries. Now that the
📝 TH04 and TH05 resident structures no
longer block those, -Tom- has requested TH05's
RES_KSO.COM to be covered in one of his outstanding pushes.
And since 32th System
recently RE'd TH03's resident structure, it makes sense to also review and
merge that, before decompiling all three remaining RES_*.COM
binaries in hopefully a single push. It might even get done faster than
that, in which case I'll then review and merge some more of
WindowsTiger's
research.
Turns out I had only been about half done with the drawing routines. The rest was all related to redrawing the scrolling stage backgrounds after other sprites were drawn on top. Since the PC-98 does have hardware-accelerated scrolling, but no hardware-accelerated sprites, everything that draws animated sprites into a scrolling VRAM must then also make sure that the background tiles covered by the sprite are redrawn in the next frame, which required a bit of ZUN code. And that are the functions that have been in the way of the expected rapid reverse-engineering progress that uth05win was supposed to bring. So, looks like everything's going to go really fast now?