- 📝 Posted:
- 🚚 Summary of:
- ⌨ Commits:
- 💰 Funded by:
- Blue Bolt, Ember2528, [Anonymous], Yanga
- 🏷 Tags:
- rec98 th02 th04 th05 midboss blitting animation master.lib pc98 uth05win debloating
And then, the supposed boilerplate code revealed yet another confusing issue that quickly forced me back to serial work, leading to no parallel progress made with Shuusou Gyoku after all. 🥲 The list of functions I put together for the first ½ of this push seemed so boring at first, and I was so sure that there was almost nothing I could possibly talk about:
- TH02's gaiji animations at the start and end of each stage, resembling opening and closing window blind slats. ZUN should have maybe not defined the regular whitespace gaiji as what's technically the last frame of the closing animation, but that's a minor nitpick. Nothing special there otherwise.
- The remaining spawn functions for TH04's and TH05's gather circles. The
only dumb antic there is the way ZUN initializes the template for bullets
fired at the end of the animation, featuring ASM instructions that are
equivalent to what Turbo C++ 4.0J generates for the
__memcpy__intrinsic, but show up in a different order. Which means that they must have been handwritten. I already figured that out in 2022 though, so this was just more of the same.
- EX-Alice's override for the game's main 16×16 sprite sheet, loaded during her dialog script. More of a naming and consistency challenge, if anything.
- The rendering function for TH04's Stage 4 midboss, which seems to feature the same premature clipping quirk we've seen for 📝 TH05's Stage 5 midboss, 7 months ago?
- The rendering function for the big 48×48 explosion sprite, which also features the same clipping quirk?
That's three instances of ZUN removing sprites way earlier than you'd want to, intentionally deciding against those sprites flying smoothly in and out of the playfield. Clearly, there has to be a system and a reason behind it.
Turns out that it can be almost completely blamed on master.lib. None of the
super_*() sprite blitting functions can clip the rendered
sprite to the edges of VRAM, and much less to the custom playfield rectangle
we would actually want here. This is exactly the wrong choice to make for a
game engine: Not only is the game developer now stuck with either rendering
the sprite in full or not at all, but they're also left with the burden of
manually calculating when not to display a sprite.
However, strictly limiting the top-left screen-space coordinate to (0, 0) and the bottom-right one to (640, 400) would actually stop rendering some of the sprites much earlier than the clipping conditions we encounter in these games. So what's going on there?
The answer is a combination of playfield borders, hardware scrolling, and
master.lib needing to provide at least some help to support the
latter. Hardware scrolling on PC-98 works by dividing VRAM into two vertical
partitions along the Y-axis and telling the GDC to display one of them at
the top of the screen and the other one below. The contents of VRAM remain
unmodified throughout, which raises the interesting question of how to deal
with sprites that reach the vertical edges of VRAM. If the top VRAM row that
starts at offset
0x0000 ends up being displayed below
the bottom row of VRAM that starts at offset
0x7CB0 for 399 of
the 400 possible scrolling positions, wouldn't we then need to vertically
wrap most of the rendered sprites?
For this reason, master.lib provides the
functions, which unconditionally perform exactly this vertical wrapping. But
this creates a new problem: If these functions still can't clip, and don't
even know which VRAM rows currently correspond to the top and bottom row of
the screen (since master.lib's
doesn't retain this information), won't we also see sprites wrapping around
the actual edges of the screen? That's something we certainly
wouldn't want in a vertically scrolling game…
The answer is yes, and master.lib offers no solution for this issue. But this is where the playfield borders come in, and helpfully cover 16 pixels at the top and 16 pixels at the bottom of the screen. As a result, they can hide at least 32 pixels of potentially wrapped sprite pixels below them:
And that's how the lowest possible top Y coordinate for sprites blitted
using the master.lib
super_roll_*() functions during the
scrolling portions of TH02, TH04, and TH05 is not 0, but -16. Any lower, and
you would actually see some of the sprite's upper pixels at the
bottom of the playfield, as there are no more opaque black text cells to
cover them. Theoretically, you could lower this number for
some animation frames that start with multiple rows of transparent
pixels, but I thankfully haven't found any instance of ZUN using such a
hack. So far, at least…
Visualized like that, it all looks quite simple and logical, but for days, I did not realize that these sprites were rendered to a scrolling VRAM. This led to a much more complicated initial explanation involving the invisible extra space of VRAM between offsets
0x7FFF that effectively grant a hidden additional 9.6 lines
below the playfield. Or even above, since PC-98 hardware ignores the highest
bit of any offset into a VRAM bitplane segment
& 0x7FFF), which prevents blitting operations from
accidentally reaching into a different bitplane. Together with the
aforementioned rows of transparent pixels at the top of these midboss
sprites, the math would have almost worked out exactly.
The need for manual clipping also applies to the X-axis. Due to the lack of
scrolling in this dimension, the boundaries there are much more
straightforward though. The minimum left coordinate of a sprite can't fall
below 0 because any smaller coordinate would wrap around into the
📝 tile source area and overwrite some of the
pixels there, which we obviously don't want to re-blit every frame.
Similarly, the right coordinate must not extend into the HUD, which starts
at 448 pixels.
The last part might be surprising if you aren't familiar with the PC-98 text chip. Contrary to the CGA and VGA text modes of IBM-compatibles, PC-98 text cells can only use a single color for either their foreground or background, with the other pixels being transparent and always revealing the pixels in VRAM below. If you look closely at the HUD in the images above, you can see how the background of cells with gaiji glyphs is slightly brighter (
◼ #100) than the opaque black
◼ #000) surrounding them. This
rather custom color clearly implies that those pixels must have been
rendered by the graphics GDC. If any other sprite was rendered below the
HUD, you would equally see it below the glyphs.
So in the end, I did find the clear and logical system I was looking for,
and managed to reduce the new clipping conditions down to a
set of basic rules for each edge. Unfortunately, we also need a second
macro for each edge to differentiate between sprites that are smaller or
larger than the playfield border, which is treated as either 32×32 (for
super_roll_*()) or 32×16 (for non-"rolling"
super_*() functions). Since smaller sprites can be fully
contained within this border, the games can stop rendering them as soon as
their bottom-right coordinate is no longer seen within the playfield, by
comparing against the clipping boundaries with
>=. For example, a 16×16 sprite would be completely
invisible once it reaches (16, 0), so it would still be rendered at
(17, 1). A larger sprite during the scrolling part of a stage, like,
say, the 64×64 midbosses, would still be rendered if their top-left
coordinate was (0, -16), so ZUN used
> comparisons to at least get an additional pixel before
having to stop rendering such a sprite. Turbo C++ 4.0J sadly can't
constant-fold away such a difference in comparison operators.
And for the most part, ZUN did follow this system consistently. Except for,
of course, the typical mistakes you make when faced with such manual
decisions, like how he treated TH04's Stage 4 midboss as a "small" sprite
below 32×32 pixels (it's 64×64), losing that precious one extra pixel. Or
how the entire rendering code for the 48×48 boss explosion sprite pretends
that it's actually 64×64 pixels large, which causes even the initial
transformation into screen space to be misaligned from the get-go.
But these are additional bugs on top of the single
one that led to all this research.
Because that's what this is, a bug. 🐞 Every resulting pixel boundary is a systematic result of master.lib's unfortunate lack of clipping. It's as much of a bug as TH01's byte-aligned rendering of entities whose internal position is not byte-aligned. In both cases, the entities are alive, simulated, and partake in collision detection, but their rendered appearance doesn't accurately reflect their internal position.
Initially, I classified 📝 the sudden pop-in of TH05's Stage 5 midboss as a quirk because we had no conclusive evidence that this wasn't intentional, but now we do. There have been multiple explanations for why ZUN put borders around the playfield, but master.lib's lack of sprite clipping might be the biggest reason.
And just like byte-aligned rendering, the clipping conditions can easily be removed when porting the game away from PC-98 hardware. That's also what uth05win chose to do: By using OpenGL and not having to rely on hardware scrolling, it can simply place every sprite as a textured quad at its exact position in screen space, and then draw the black playfield borders on top in the end to clip everything in a single draw call. This way, the Stage 5 midboss can smoothly fly into the playfield, just as defined by its movement code:
Meanwhile, I designed the interface of the 📝 generic blitter used in the TH01 Anniversary Edition entirely around clipping the blitted sprite at any explicit combination of VRAM edges. This was nothing I tacked on in the end, but a core aspect that informed the architecture of the code from the very beginning. You really want to have one and only one place where sprite clipping is done right – and only once per sprite, regardless of how many bitplanes you want to write to.
Which brings us to the goal for the final ¼ of this push went. I thought I
was going to start cleaning up the
📝 player movement and rendering code, but
that turned out too complicated for that amount of time – especially if you
want to start with just cleanup, preserving all original bugs for the
Fixing and smoothening player and Orb movement would be the next big task in Anniversary Edition development, needing about 3 pushes. It would start with more performance research into runtime-shifting of larger sprites, followed by extending my generic blitter according to the results, writing new optimized loaders for the original image formats, and finally rewriting all rendering code accordingly. With that code in place, we can then start cleaning up and fixing the unique code for each boss, one by one.
Until that's funded, the code still contains a few smaller and easier pieces
of code that are equally related to rendering bugs, but could be dealt with
in a more incremental way. Line rendering is one of those, and first needs
some refactoring of every call site, including
📝 the rotating squares around Mima and
📝 YuugenMagan's pentagram. So far, I managed
to remove another 1,360 bytes from the binary within this final ¼ of a push,
but there's still quite a bit to do in that regard.
This is the perfect kind of feature for smaller (micro-)transactions. Which means that we've now got meaningful TH01 code cleanup and Anniversary Edition subtasks at every price range, no matter whether you want to invest a lot or just a little into this goal.
If you can, because Ember2528 revealed the plan behind
his Shuusou Gyoku contributions: A full-on Linux port of the game, which
will be receiving all the funding it needs to happen. 🐧 Next up, therefore:
Turning this into my main project within ReC98 for the next couple of
months, and getting started by shipping the long-awaited first step towards
I've raised the cap to avoid the potential of rounding errors, which might prevent the last needed Shuusou Gyoku push from being correctly funded. I already had to pick the larger one of the two pending TH02 transactions for this push, because we would have mathematically ended up 1/25500 short of a full push with the smaller transaction. And if I'm already at it, I might as well free up enough capacity to potentially ship the complete OpenGL backend in a single delivery, which is currently estimated to cost 7 pushes in total.