Oh, it's 2024 already and I didn't even have a delivery for December or January? Yeah… I can only repeat what I said at the end of November, although the finish line is actually in sight now. With 10 pushes across 4 repositories and a blog post that has already reached a word count of 9,240, the Shuusou Gyoku SC-88Pro BGM release is going to break 📝 both the push record set by TH01 Sariel two years ago, and 📝 the blog post length record set by the last Shuusou Gyoku delivery. Until that's done though, let's clear some more PC-98 Touhou pushes out of the backlog, and continue the preparation work for the non-ASCII translation project starting later this year.
But first, we got another free bugfix according to my policy! 📝 Back in April 2022 when I researched the Divide Error crash that can occur in TH04's Stage 4 Marisa fight, I proposed and implemented four possible workarounds and let the community pick one of them for the generally recommended small bugfix mod. I still pushed the others onto individual branches in case the gameplay community ever wants to look more closely into them and maybe pick a different one… except that I accidentally pushed the wrong code for the warp workaround, probably because I got confused with the second warp variant I developed later on.
Fortunately, I still had the intended code for both variants lying around, and used the occasion to merge the current master branch into all of these mod branches. Thanks to wyatt8740 for spotting and reporting this oversight!
As the final piece of code shared in largely identical form between 4 of the 5 games, the Music Rooms were the biggest remaining piece of low-hanging fruit that guaranteed big finalization% gains for comparatively little effort. They seemed to be especially easy because I already decompiled TH02's Music Room together with the rest of that game's OP.EXE back in early 2015, when this project focused on just raw decompilation with little to no research. 9 years of increased standards later though, it turns out that I missed a lot of details, and ended up renaming most variables and functions. Combined with larger-than-expected changes in later games and the usual quality level of ZUN's menu code, this ended up taking noticeably longer than the single push I expected.
The undoubtedly most interesting part about this screen is the animation in the background, with the spinning and falling polygons cutting into a single-color background to reveal a spacey image below. However, the only background image loaded in the Music Room is OP3.PI (TH02/TH03) or MUSIC3.PI (TH04/TH05), which looks like this in a .PI viewer or when converted into another image format with the usual tools:
That is definitely the color that appears on top of the polygons, but where is the spacey background? If there is no other .PI file where it could come from, it has to be somewhere in that same file, right?
And indeed: This effect is another bitplane/color palette trick, exactly like the 📝 three falling stars in the background of TH04's Stage 5. If we set every bit on the first bitplane and thus change any of the resulting even hardware palette color indices to odd ones, we reveal a full second 8-color sub-image hiding in the same .PI file:
On a high level, the first bitplane therefore acts as a stencil buffer that selects between the blank and spacey sub-image for every pixel. The important part here, however, is that the first bitplane of the blank sub-images does not consist entirely of 0 bits, but does have 1 bits at the pixels that represent the caption that's supposed to be overlaid on top of the animation. Since there now are some pixels that should always be taken from the spacey sub-image regardless of whether they're covered by a polygon, the game can no longer just clear the first bitplane at the start of every frame. Instead, it has to keep a separate copy of the first bitplane's original state (called nopoly_B in the code), captured right after it blitted the .PI image to VRAM. Turns out that this copy also comes in quite handy with the text, but more on that later.
Then, the game simply draws polygons onto only the reblitted first bitplane to conditionally set the respective bits. ZUN used master.lib's grcg_polygon_c() function for this, which means that we can entirely thank the uncredited master.lib developers for this iconic animation – if they hadn't included such a function, the Music Rooms would most certainly look completely different.
This is where we get to complete the series on the PC-98 GRCG chip with the last remaining four bits of its mode register. So far, we only needed the highest bit (0x80) to either activate or deactivate it, and the bit below (0x40) to choose between the 📝 RMW and 📝 TCR/📝 TDW modes. But you can also use the lowest four bits to restrict the GRCG's operations to any subset of the four bitplanes, leaving the other ones untouched:
This could be used for some unusual effects when writing to two or three of the four planes, but it seems rather pointless for this specific case at first. If we only want to write to a single plane, why not just do so directly, without the GRCG? Using that chip only involves more hardware and is therefore slower by definition, and the blitting code would be the same, right?
This is another one of these questions that would be interesting to benchmark one day, but in this case, the reason is purely practical: All of master.lib's polygon drawing functions expect the GRCG to be running in RMW mode. They write their pixels as bitmasks where 1 and 0 represent pixels that should or should not change, and leave it to the GRCG to combine these masks with its tile register and OR the result into the bitplanes instead of doing so themselves. Since GRCG writes are done via MOV instructions, not using the GRCG would turn these bitmasks into actual dot patterns, overwriting any previous contents of each VRAM byte that gets modified.
Technically, you'd only have to replace a few MOV instructions with OR to build a non-GRCG version of such a function, but why would you do that if you haven't measured polygon drawing to be an actual bottleneck.
As far as complexity is concerned though, the worst part is the implicit logic that allows all this text to show up on top of the polygons in the first place. If every single piece of text is only rendered a single time, how can it appear on top of the polygons if those are drawn every frame?
Depending on the game (because of course it's game-specific), the answer involves either the individual bits of the text color index or the actual contents of the palette:
Colors 0 or 1 can't be used, because those don't include any of the bits that can stay constant between frames.
If the lowest bit of a palette color index has no effect on the displayed color, text drawn in either of the two colors won't be visually affected by the polygon animation and will always appear on top. TH04 and TH05 rely on this property with their colors 2/3, 4/5, and 6/7 being identical, but this would work in TH02 and TH03 as well.
But this doesn't apply to TH02 and TH03's palettes, so how do they do it? The secret: They simply include all text pixels in nopoly_B. This allows text to use any color with an odd palette index – the lowest bit then won't be affected by the polygons ORed into the first bitplane, and the other bitplanes remain unchanged.
TH04 is a curious case. Ostensibly, it seems to remove support for odd text colors, probably because the new 10-frame fade-in animation on the comment text would require at least the comment area in VRAM to be captured into nopoly_B on every one of the 10 frames. However, the initial pixels of the tracklist are still included in nopoly_B, which would allow those to still use any odd color in this game. ZUN only removed those from nopoly_B in TH05, where it had to be changed because that game lets you scroll and browse through multiple tracklists.
Finally, here's a list of all the smaller details that turn the Music Rooms into such a mess:
Due to the polygon animation, the Music Room is one of the few double-buffered menus in PC-98 Touhou, rendering to both VRAM pages on alternate frames instead of using the other page to store a background image. Unfortunately though, this doesn't actually translate to tearing-free rendering because ZUN's initial implementation for TH02 mixed up the order of the required operations. You're supposed to first wait for the GDC's VSync interrupt and then, within the display's vertical blanking interval, write to the relevant I/O ports to flip the accessed and shown pages. Doing it the other way around and flipping as soon as you're finished with the last draw call of a frame means that you'll very likely hit a point where the (real or emulated) electron beam is still traveling across the screen. This ensures that there will be a tearing line somewhere on the screen on all but the fastest PC-98 models that can render an entire frame of the Music Room completely within the vertical blanking interval, causing the very issue that double-buffering was supposed to prevent.
ZUN only fixed this landmine in TH05.
The polygons have a fixed vertex count and radius depending on their index, everything else is randomized. They are also never reinitialized while OP.EXE is running – if you leave the Music Room and reenter it, they will continue animating from the same position.
TH02 and TH04 don't handle it at all, causing held keys to be processed again after about a second.
TH03 and TH05 correctly work around the quirk, at the usual cost of a 614.4 µs delay per frame. Except that the delay is actually twice as long in frames in which a previously held key is released, because this code is a mess.
But even in 2024, DOSBox-X is the only emulator that actually replicates this detail of real hardware. On anything else, keyboard input will behave as ZUN intended it to. At least I've now mentioned this once for every game, and can just link back to this blog post for the other menus we still have to go through, in case their game-specific behavior matches this one.
TH02 is the only game that
separately lists the stage and boss themes of the main game, rather than following the in-game order of appearance,
continues playing the selected track when leaving the Music Room,
always loads both MIDI and PMD versions, regardless of the currently selected mode, and
does not stop the currently playing track before loading the new one into the PMD and MMD drivers.
The combination of 2) and 3) allows you to leave the Music Room and change the music mode in the Option menu to listen to the same track in the other version, without the game changing back to the title screen theme. 4), however, might cause the PMD and MMD drivers to play garbage for a short while if the music data is loaded from a slow storage device that takes longer than a single period of the OPN timer to fill the driver's song buffer. Probably not worth mentioning anymore though, now that people no longer try fitting PC-98 Touhou games on floppy disks.
Exactly 40 (TH02/TH03) / 38 (TH04/TH05) visible bytes per line,
padded with 2 bytes that can hold a CR/LF newline sequence for easier editing.
Every track starts with a title line that mostly just duplicates the names from the hardcoded tracklist,
followed by a fixed 19 (TH02/TH03/TH04) / 9 (TH05) comment lines.
In TH04 and TH05, lines can start with a semicolon (;) to prevent them from being rendered. This is purely a performance hint, and is visually equivalent to filling the line with spaces.
All in all, the quality of the code is even slightly below the already poor standard for PC-98 Touhou: More VRAM page copies than necessary, conditional logic that is nested way too deeply, a distinct avoidance of state in favor of loops within loops, and – of course – a couple of gotos to jump around as needed.
In TH05, this gets so bad with the scrolling and game-changing tracklist that it all gives birth to a wonderfully obscure inconsistency: When pressing both ⬆️/⬇️ and ⬅️/➡️ at the same time, the game first processes the vertical input and then the horizontal one in the next frame, making it appear as if the latter takes precedence. Except when the cursor is highlighting the first (⬆️ ) or 12th (⬇️ ) element of the list, and said list element is not the first track (⬆️ ) or the quit option (⬇️ ), in which case the horizontal input is ignored.
And that's all the Music Rooms! The OP.EXE binaries of TH04 and especially TH05 are now very close to being 100% RE'd, with only the respective High Score menus and TH04's title animation still missing. As for actual completion though, the finalization% metric is more relevant as it also includes the ZUN Soft logo, which I RE'd on paper but haven't decompiled. I'm 📝 still hoping that this will be the final piece of code I decompile for these two games, and that no one pays to get it done earlier…
For the rest of the second push, there was a specific goal I wanted to reach for the remaining anything budget, which was blocked by a few functions at the beginning of TH04's and TH05's MAINE.EXE. In another anticlimactic development, this involved yet another way too early decompilation of a main() function…
Generally, this main() function just calls the top-level functions of all other ending-related screens in sequence, but it also handles the TH04-exclusive congratulating All Clear images within itself. After a 1CC, these are an additional reward on top of the Good Ending, showing the player character wearing a different outfit depending on the selected difficulty. On Easy Mode, however, the Good Ending is unattainable because the game always ends after Stage 5 with a Bad Ending, but ZUN still chose to show the EASY ALL CLEAR!! image in this case, regardless of how many continues you used.
While this might seem inconsistent with the other difficulties, it is consistent within Easy Mode itself, as the enforced Bad Ending after Stage 5 also doesn't distinguish between the number of continues. Also, Try to Normal Rank!! could very well be ZUN's roundabout way of implying "because this is how you avoid the Bad Ending".
With that out of the way, I was finally able to separate the VRAM text renderer of TH04 and TH05 into its own assembly unit, 📝 finishing the technical debt repayment project that I couldn't complete in 2021 due to assembly-time code segment label arithmetic in the data segment. This now allows me to translate this undecompilable self-modifying mess of ASM into C++ for the non-ASCII translation project, and thus unify the text renderers of all games and enhance them with support for Unicode characters loaded from a bitmap font. As the final finalized function in the SHARED segment, it also allowed me to remove 143 lines of particularly ugly segmentation workarounds 🙌
The remaining 1/6th of the second push provided the perfect occasion for some light TH02 PI work. The global boss position and damage variables represented some equally low-hanging fruit, being easily identified global variables that aren't part of a larger structure in this game. In an interesting twist, TH02 is the only game that uses an increasing damage value to track boss health rather than decreasing HP, and also doesn't internally distinguish between bosses and midbosses as far as these variables are concerned. Obviously, there's quite a bit of state left to be RE'd, not least because Marisa is doing her own thing with a bunch of redundant copies of her position, but that was too complex to figure out right now.
Also doing their own thing are the Five Magic Stones, which need five positions rather than a single one. Since they don't move, the game doesn't have to keep 📝 separate position variables for both VRAM pages, and can handle their positions in a much simpler way that made for a nice final commit.
And for the first time in a long while, I quite like what ZUN did there!
Not only are their positions stored in an array that is indexed with a consistent ID for every stone, but these IDs also follow the order you fight the stones in: The two inner ones use 0 and 1, the two outer ones use 2 and 3, and the one in the center uses 4. This might look like an odd choice at first because it doesn't match their horizontal order on the playfield. But then you notice that ZUN uses this property in the respective phase control functions to iterate over only the subrange of active stones, and you realize how brilliant it actually is.
This seems like a really basic thing to get excited about, especially since the rest of their data layout sure isn't perfect. Splitting each piece of state and even the individual X and Y coordinates into separate 5-element arrays is still counter-productive because the game ends up paying more memory and CPU cycles to recalculate the element offsets over and over again than this would have ever saved in cache misses on a 486. But that's a minor issue that could be fixed with a few regex replacements, not a misdesigned architecture that would require a full rewrite to clean it up. Compared to the hardcoded and bloated mess that was 📝 YuugenMagan's five eyes, this is definitely an improvement worthy of the good-code tag. The first actual one in two years, and a welcome change after the Music Room!
These three pieces of data alone yielded a whopping 5% of overall TH02 PI in just 1/6th of a push, bringing that game comfortably over the 60% PI mark. MAINE.EXE is guaranteed to reach 100% PI before I start working on the non-ASCII translations, but at this rate, it might even be realistic to go for 100% PI on MAIN.EXE as well? Or at least technical position independence, without the false positives.
Next up: Shuusou Gyoku SC-88Pro BGM. It's going to be wild.
Here we go, TH01 Sariel! This is the single biggest boss fight in all of
PC-98 Touhou: If we include all custom effect code we previously decompiled,
it amounts to a total of 10.31% of all code in TH01 (and 3.14%
overall). These 8 pushes cover the final 8.10% (or 2.47% overall),
and are likely to be the single biggest delivery this project will ever see.
Considering that I only managed to decompile 6.00% across all games in 2021,
2022 is already off to a much better start!
So, how can Sariel's code be that large? Well, we've got:
16 danmaku patterns; including the one snowflake detonating into a giant
94×32 hitbox
Gratuitous usage of floating-point variables, bloating the binary thanks
to Turbo C++ 4.0J's particularly horrid code generation
The hatching birds that shoot pellets
3 separate particle systems, sharing the general idea, overall code
structure, and blitting algorithm, but differing in every little detail
The "gust of wind" background transition animation
5 sets of custom monochrome sprite animations, loaded from
BOSS6GR?.GRC
A further 3 hardcoded monochrome 8×8 sprites for the "swaying leaves"
pattern during the second form
In total, it's just under 3,000 lines of C++ code, containing a total of 8
definite ZUN bugs, 3 of them being subpixel/pixel confusions. That might not
look all too bad if you compare it to the
📝 player control function's 8 bugs in 900 lines of code,
but given that Konngara had 0… (Edit (2022-07-17):
Konngara contains two bugs after all: A
📝 possible heap corruption in test or debug mode,
and the infamous
📝 temporary green discoloration.)
And no, the code doesn't make it obvious whether ZUN coded Konngara or
Sariel first; there's just as much evidence for either.
Some terminology before we start: Sariel's first form is separated
into four phases, indicated by different background images, that
cycle until Sariel's HP reach 0 and the second, single-phase form
starts. The danmaku patterns within each phase are also on a cycle,
and the game picks a random but limited number of patterns per phase before
transitioning to the next one. The fight always starts at pattern 1 of phase
1 (the random purple lasers), and each new phase also starts at its
respective first pattern.
Sariel's bugs already start at the graphics asset level, before any code
gets to run. Some of the patterns include a wand raise animation, which is
stored in BOSS6_2.BOS:
The "lowered wand" sprite is missing in this file simply because it's
captured from the regular background image in VRAM, at the beginning of the
fight and after every background transition. What I previously thought to be
📝 background storage code has therefore a
different meaning in Sariel's case. Since this captured sprite is fully
opaque, it will reset the entire 128×128 wand area… wait, 128×128, rather
than 96×96? Yup, this lowered sprite is larger than necessary, wasting 1,967
bytes of conventional memory. That still doesn't quite explain the
second sprite in BOSS6_2.BOS though. Turns out that the black
part is indeed meant to unblit the purple reflection (?) in the first
sprite. But… that's not how you would correctly unblit that?
The first sprite already eats up part of the red HUD line, and the second
one additionally fails to recover the seal pixels underneath, leaving a nice
little black hole and some stray purple pixels until the next background
transition. Quite ironic given that both
sprites do include the right part of the seal, which isn't even part of the
animation.
Just like Konngara, Sariel continues the approach of using a single function
per danmaku pattern or custom entity. While I appreciate that this allows
all pattern- and entity-specific state to be scoped locally to that one
function, it quickly gets ugly as soon as such a function has to do more than one thing.
The "bird function" is particularly awful here: It's just one if(…)
{…} else if(…) {…} else if(…) {…} chain with different
branches for the subfunction parameter, with zero shared code between any of
these branches. It also uses 64-bit floating-point double as
its subpixel type… and since it also takes four of those as parameters
(y'know, just in case the "spawn new bird" subfunction is called), every
call site has to also push four double values onto the stack.
Thanks to Turbo C++ even using the FPU for pushing a 0.0 constant, we
have already reached maximum floating-point decadence before even having
seen a single danmaku pattern. Why decadence? Every possible spawn position
and velocity in both bird patterns just uses pixel resolution, with no
fractional component in sight. And there goes another 720 bytes of
conventional memory.
Speaking about bird patterns, the red-bird one is where we find the first
code-level ZUN bug: The spawn cross circle sprite suddenly disappears after
it finished spawning all the bird eggs. How can we tell it's a bug? Because
there is code to smoothly fly this sprite off the playfield, that
code just suddenly forgets that the sprite's position is stored in Q12.4
subpixels, and treats it as raw screen pixels instead.
As a result, the well-intentioned 640×400
screen-space clipping rectangle effectively shrinks to 38×23 pixels in the
top-left corner of the screen. Which the sprite is always outside of, and
thus never rendered again.
The intended animation is easily restored though:
Also, did you know that birds actually have a quite unfair 14×38-pixel
hitbox? Not that you'd ever collide with them in any of the patterns…
Another 3 of the 8 bugs can be found in the symmetric, interlaced spawn rays
used in three of the patterns, and the 32×32 debris "sprites" shown at their endpoint, at
the edge of the screen. You kinda have to commend ZUN's attention to detail
here, and how he wrote a lot of code for those few rapidly animated pixels
that you most likely don't
even notice, especially with all the other wrong pixels
resulting from rendering glitches. One of the bugs in the very final pattern
of phase 4 even turns them into the vortex sprites from the second pattern
in phase 1 during the first 5 frames of
the first time the pattern is active, and I had to single-step the blitting
calls to verify it.
It certainly was annoying how much time I spent making sense of these bugs,
and all weird blitting offsets, for just a few pixels… Let's look at
something more wholesome, shall we?
So far, we've only seen the PC-98 GRCG being used in RMW (read-modify-write)
mode, which I previously
📝 explained in the context of TH01's red-white HP pattern.
The second of its three modes, TCR (Tile Compare Read), affects VRAM reads
rather than writes, and performs "color extraction" across all 4 bitplanes:
Instead of returning raw 1bpp data from one plane, a VRAM read will instead
return a bitmask, with a 1 bit at every pixel whose full 4-bit color exactly
matches the color at that offset in the GRCG's tile register, and 0
everywhere else. Sariel uses this mode to make sure that the 2×2 particles
and the wind effect are only blitted on top of "air color" pixels, with
other parts of the background behaving like a mask. The algorithm:
Set the GRCG to TCR mode, and all 8 tile register dots to the air
color
Read N bits from the target VRAM position to obtain an N-bit mask where
all 1 bits indicate air color pixels at the respective position
AND that mask with the alpha plane of the sprite to be drawn, shifted to
the correct start bit within the 8-pixel VRAM byte
Set the GRCG to RMW mode, and all 8 tile register dots to the color that
should be drawn
Write the previously obtained bitmask to the same position in VRAM
Quite clever how the extracted colors double as a secondary alpha plane,
making for another well-earned good-code tag. The wind effect really doesn't deserve it, though:
ZUN calculates every intermediate result inside this function
over and over and over again… Together with some ugly
pointer arithmetic, this function turned into one of the most tedious
decompilations in a long while.
This gradual effect is blitted exclusively to the front page of VRAM,
since parts of it need to be unblitted to create the illusion of a gust of
wind. Then again, anything that moves on top of air-colored background –
most likely the Orb – will also unblit whatever it covered of the effect…
As far as I can tell, ZUN didn't use TCR mode anywhere else in PC-98 Touhou.
Tune in again later during a TH04 or TH05 push to learn about TDW, the final
GRCG mode!
Speaking about the 2×2 particle systems, why do we need three of them? Their
only observable difference lies in the way they move their particles:
Up or down in a straight line (used in phases 4 and 2,
respectively)
Left or right in a straight line (used in the second form)
Left and right in a sinusoidal motion (used in phase 3, the "dark
orange" one)
Out of all possible formats ZUN could have used for storing the positions
and velocities of individual particles, he chose a) 64-bit /
double-precision floating-point, and b) raw screen pixels. Want to take a
guess at which data type is used for which particle system?
If you picked double for 1) and 2), and raw screen pixels for
3), you are of course correct! Not that I'm implying
that it should have been the other way round – screen pixels would have
perfectly fit all three systems use cases, as all 16-bit coordinates
are extended to 32 bits for trigonometric calculations anyway. That's what,
another 1.080 bytes of wasted conventional memory? And that's even
calculated while keeping the current architecture, which allocates
space for 3×30 particles as part of the game's global data, although only
one of the three particle systems is active at any given time.
That's it for the first form, time to put on "Civilization
of Magic"! Or "死なばもろとも"? Or "Theme of 地獄めくり"? Or whatever SYUGEN is
supposed to mean…
… and the code of these final patterns comes out roughly as exciting as
their in-game impact. With the big exception of the very final "swaying
leaves" pattern: After 📝 Q4.4,
📝 Q28.4,
📝 Q24.8, and double variables,
this pattern uses… decimal subpixels? Like, multiplying the number by
10, and using the decimal one's digit to represent the fractional part?
Well, sure, if you really insist on moving the leaves in cleanly
represented integer multiples of ⅒, which is infamously impossible in IEEE
754. Aside from aesthetic reasons, it only really combines less precision
(10 possible fractions rather than the usual 16) with the inferior
performance of having to use integer divisions and multiplications rather
than simple bit shifts. And it's surely not because the leaf sprites needed
an extended integer value range of [-3276, +3276], compared to
Q12.4's [-2047, +2048]: They are clipped to 640×400 screen space
anyway, and are removed as soon as they leave this area.
This pattern also contains the second bug in the "subpixel/pixel confusion
hiding an entire animation" category, causing all of
BOSS6GR4.GRC to effectively become unused:
At least their hitboxes are what you would expect, exactly covering the
30×30 pixels of Reimu's sprite. Both animation fixes are available on the th01_sariel_fixes
branch.
After all that, Sariel's main function turned out fairly unspectacular, just
putting everything together and adding some shake, transition, and color
pulse effects with a bunch of unnecessary hardware palette changes. There is
one reference to a missing BOSS6.GRP file during the
first→second form transition, suggesting that Sariel originally had a
separate "first form defeat" graphic, before it was replaced with just the
shaking effect in the final game.
Speaking about the transition code, it is kind of funny how the… um,
imperative and concrete nature of TH01 leads to these 2×24
lines of straight-line code. They kind of look like ZUN rattling off a
laundry list of subsystems and raw variables to be reinitialized, making
damn sure to not forget anything.
Whew! Second PC-98 Touhou boss completely decompiled, 29 to go, and they'll
only get easier from here! 🎉 The next one in line, Elis, is somewhere
between Konngara and Sariel as far as x86 instruction count is concerned, so
that'll need to wait for some additional funding. Next up, therefore:
Looking at a thing in TH03's main game code – really, I have little
idea what it will be!
Now that the store is open again, also check out the
📝 updated RE progress overview I've posted
together with this one. In addition to more RE, you can now also directly
order a variety of mods; all of these are further explained in the order
form itself.
50% hype! 🎉 But as usual for TH01, even that final set of functions
shared between all bosses had to consume two pushes rather than one…
First up, in the ongoing series "Things that TH01 draws to the PC-98
graphics layer that really should have been drawn to the text layer
instead": The boss HP bar. Oh well, using the graphics layer at least made
it possible to have this half-red, half-white pattern
for the middle section.
This one pattern is drawn by making surprisingly good use of the GRCG. So
far, we've only seen it used for fast monochrome drawing:
// Setting up fast drawing using color #9 (1001 in binary)
grcg_setmode(GC_RMW);
outportb(0x7E, 0xFF); // Plane 0: (B): (********)
outportb(0x7E, 0x00); // Plane 1: (R): ( )
outportb(0x7E, 0x00); // Plane 2: (G): ( )
outportb(0x7E, 0xFF); // Plane 3: (E): (********)
// Write a checkerboard pattern (* * * * ) in color #9 to the top-left corner,
// with transparent blanks. Requires only 1 VRAM write to a single bitplane:
// The GRCG automatically writes to the correct bitplanes, as specified above
*(uint8_t *)(MK_FP(0xA800, 0)) = 0xAA;
But since this is actually an 8-pixel tile register, we can set any
8-pixel pattern for any bitplane. This way, we can get different colors
for every one of the 8 pixels, with still just a single VRAM write of the
alpha mask to a single bitplane:
And I thought TH01 only suffered the drawbacks of PC-98 hardware, making
so little use of its actual features that it's perhaps not fair to even
call it "a PC-98 game"… Still, I'd say that "bad PC-98 port of an idea"
describes it best.
However, after that tiny flash of brilliance, the surrounding HP rendering
code goes right back to being the typical sort of confusing TH01 jank.
There's only a single function for the three distinct jobs of
incrementing HP during the boss entrance animation,
decrementing HP if hit by the Orb, and
redrawing the entire bar, because it's still all in VRAM, and Sariel
wants different backgrounds,
with magic numbers to select between all of these.
VRAM of course also means that the backgrounds behind the individual hit
points have to be stored, so that they can be unblitted later as the boss
is losing HP. That's no big deal though, right? Just allocate some memory,
copy what's initially in VRAM, then blit it back later using your
foundational set of blitting funct– oh, wait, TH01 doesn't have this sort
of thing, right The closest thing,
📝 once again, are the .PTN functions. And
so, the game ends up handling these 8×16 background sprites with 16×16
wrappers around functions for 32×32 sprites.
That's quite the recipe for confusion, especially since ZUN
preferred copy-pasting the necessary ridiculous arithmetic expressions for
calculating positions, .PTN sprite IDs, and the ID of the 16×16 quarter
inside the 32×32 sprite, instead of just writing simple helper functions.
He did manage to make the result mostly bug-free this time
around, though! (Edit (2022-05-31): Nope, there's a
📝 potential heap corruption after all, which can be triggered in some fights in test mode (game t) or debug mode (game d).)
There's one minor hit point discoloration bug if the red-white or white
sections start at an odd number of hit points, but that's never the case for
any of the original 7 bosses.
The remaining sloppiness is ultimately inconsequential as well: The game
always backs up twice the number of hit point backgrounds, and thus
uses twice the amount of memory actually required. Also, this
self-restriction of only unblitting 16×16 pixels at a time requires any
remaining odd hit point at the last position to, of course, be rendered
again
After stumbling over the weakest imaginable random number
generator, we finally arrive at the shared boss↔orb collision
handling function, the final blocker among the final blockers. This
function takes a whopping 12 parameters, 3 of them being references to
int values, some of which are duplicated for every one of the
7 bosses, with no generic boss struct anywhere.
📝 Previously, I speculated that YuugenMagan might have been the first boss to be programmed for TH01.
With all these variables though, there is some new evidence that SinGyoku
might have been the first one after all: It's the only boss to use its own
HP and phase frame variables, with the other bosses sharing the same two
globals.
While this function only handles the response to a boss↔orb
collision, it still does way too much to describe it briefly. Took me
quite a while to frame it in terms of invincibility (which is the
main impact of all of this that can be observed in gameplay code). That
made at least some sort of sense, considering the other usages of
the variables passed as references to that function. Turns out that
YuugenMagan, Kikuri, and Elis abuse what's meant to be the "invincibility
frame" variable as a frame counter for some of their animations 🙄
Oh well, the game at least doesn't call the collision handling function
during those, so "invincibility frame" is technically still a
correct variable name there.
And that's it! We're finally ready to start with Konngara, in 2021. I've
been waiting quite a while for this, as all this high-level boss code is
very likely to speed up TH01 progress quite a bit. Next up though: Closing
out 2020 with more of the technical debt in the other games.
Wait, PI for FUUIN.EXE is mainly blocked by the high score
menu? That one should really be properly decompiled in a separate
RE push, since it's also present in largely identical form in
REIIDEN.EXE… but I currently lack the explicit funding to do
that.
And as it turns out, I shouldn't really capture any of the existing generic
RE contributions for it either. Back in 2018 when I ran the crowdfunding
on the Touhou Patch Center Discord server, I said that generic RE
contributions would never go towards TH01. No one was interested in that
game back then, and as it's significantly different from all the other
games, it made sense to only cover it if explicitly requested.
As Touhou Patch Center still remains one of the biggest supporters and
advertisers for ReC98, someone recently believed that this rule was still
in effect, despite not being mentioned anywhere on this website.
Fast forward to today, and TH01 has become the single most supported game
lately, with plenty of incomplete pushes still open to be completed.
Reverse-engineering it has proven to be quite efficient, yielding lots of
completion percentage points per push. This, I suppose, is exactly what
backers that don't give any specific priorities are mainly interested in.
Therefore, I will allocate future partial
contributions to TH01, whenever it makes sense.
So, instead of rushing TH01 PI, let's wait for Ember2528's
April subscription, and get the 25% total RE milestone with some TH05 PI
progress instead. This one primarily focused on the gather circles
(spirals…?), the third-last missing entity type in TH05. These are
rendered using the same 8×8 pellet sprite introduced in TH02… except that
the actual pellets received a darkened bottom part in TH04
.
Which, in turn, is actually rendered quite efficiently – the games first
render the top white part of all pellets, followed by the bottom gray part
of all pellets. The PC-98 GRCG is used throughout the process, doing its
typical job of accelerating monochrome blitting, and by arranging the
rendering like this, only two GRCG color changes are required to draw any
number of pellets. I guess that makes it quite a worthwhile
optimization? Don't ask me for specific performance numbers or even saved
cycles, though
Nope, RL has given me plenty of things to do from home after all,
so the current cap still remains an accurate representation of my free
time. 😕
For now though, we've got one more TH01 file format push, covering the
core functions for loading and displaying the 32×32 and 16×16 sprites from
the .PTN files, as announced – and probably one of the last ones for quite
a while to yield both RE and PI progress way above average. But what is
this, error return values in a ZUN game?! And actually good code
for deriving the alpha channel from the 16th color in the hardware
palette?! Sure, the rest of the code could still be improved a lot, but
that was quite a surprise, especially after the spaghetti code of
📝 the last push. That makes up for two of
the .PTN structure fields (one of them always 0, and one of them always 1)
remaining unused, and therefore unknown.
ZUN also uses the .PTN image slots to store the background of frequently
updated VRAM sections, in order to be able to repeatedly draw on top of
them – like for example the HUD area where the score and time numbers are
drawn. Future games would simply use the text RAM and gaiji for those
numbers. This would have worked just fine for TH01 too – especially since
all the functions decompiled so far align the VRAM X coordinate to the
8-pixel byte grid, which is the simplest way of accessing VRAM given the
PC-98's
planar
memory layout. Looks as if ZUN simply wasn't aware of gaiji during the
development of TH01.
This won't be the last time I cover the .PTN format, since all the
blitting functions that actually use alpha are exclusive to
REIIDEN.EXE, and currently out of decompilation reach. But after
some more long overdue cleaning work, TH01 has now passed both TH02 and
even TH04 to become the second-most reverse-engineered game in
all of ReC98, in terms of absolute numbers! 🎉
Also, PI for TH01's OP.EXE is imminent. Next up though, we've
first got the probably final double-speed push for TH01, covering the last
set of duplicated functions between the three binaries – quite fitting for
the currently last fully funded, outstanding TH01 RE push. Then, we also
might get FUUIN.EXE PI within the same push
afterwards? After that, TH01 progress will be slowing down, since
I'd then have to cover either the main menu or in-game code
or the cutscenes, depending on what the backers request. (By
default, it's going to be in-game code, of course.)