⮜ Blog

⮜ List of tags

Showing all posts tagged
and

📝 Posted:
🚚 Summary of:
P0235, P0236, P0237
Commits:
e7a9262...62c4b7f, 62c4b7f...7fa9038, 7fa9038...c5e51e6
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

So, TH02! Being the only game whose main binary hadn't seen any dedicated attention ever, we get to start the TH02-related blog posts at the very beginning with the most foundational pieces of code. The stage tile system is the best place to start here: It not only blocks every entity that is rendered on top of these tiles, but is curiously placed right next to master.lib code in TH02, and would need to be separated out into its own translation unit before we can do the same with all the master.lib functions.

In late 2018, I already RE'd 📝 TH04's and TH05's stage tile implementation, but haven't properly documented it on this blog yet, so this post is also going to include the details that are unique to those games. On a high level, the stage tile system works identically in all three games:

The differences between the three games can best be summarized in a table:

:th02: TH02 :th04: TH04 :th05: TH05
Tile image file extension .MPN
Tile section format .MAP
Tile section order defined as part of .DT1 .STD
Tile section index format 0-based ID 0-based ID × 2
Tile image index format Index between 0 and 100, 1 byte VRAM offset in tile source area, 2 bytes
Scroll speed control Hardcoded Part of the .STD format, defined per referenced tile section
Redraw granularity Full tiles (16×16) Half tiles (16×8)
Rows per tile section 8 5
Maximum number of tile sections 16 32
Lowest number of tile sections used 5 (Stage 3 / Extra) 8 (Stage 6) 11 (Stage 2 / 4)
Highest number of tile sections used 13 (Stage 4) 19 (Extra) 24 (Stage 3)
Maximum length of a map 320 sections (static buffer) 256 sections (format limitation)
Shortest map 14 sections (Stage 5) 20 sections (Stage 5) 15 sections (Stage 2)
Longest map 143 sections (Stage 4) 95 sections (Stage 4) 40 sections (Stage 1 / 4 / Extra)

The most interesting part about stage tiles is probably the fact that some of the .MAP files contain unused tile sections. 👀 Many of these are empty, duplicates, or don't really make sense, but a few are unique, fit naturally into their respective stage, and might have been part of the map during development. In TH02, we can find three unused sections in Stage 5:

Section 0 of TH02's STAGE4.MAPSection 1 of TH02's STAGE4.MAPSection 2 of TH02's STAGE4.MAPSection 3 of TH02's STAGE4.MAPSection 4 of TH02's STAGE4.MAPSection 5 of TH02's STAGE4.MAPSection 6 of TH02's STAGE4.MAPSection 7 of TH02's STAGE4.MAP
The non-empty tile sections defined in TH02's STAGE4.MAP, showing off three unused ones.
These unused tile sections are much more common in the later games though, where we can find them in TH04's Stage 3, 4, and 5, and TH05's Stage 1, 2, and 4. I'll document those once I get to finalize the tile rendering code of these games, to leave some more content for that blog post. TH04/TH05 tile code would be quite an effective investment of your money in general, as most of it is identical across both games. Or how about going for a full-on PC-98 Touhou map viewer and editor GUI?


Compared to TH04 and TH05, TH02's stage tile code definitely feels like ZUN was just starting to understand how to pull off smooth vertical scrolling on a PC-98. As such, it comes with a few inefficiencies and suboptimal implementation choices:

Even though this was ZUN's first attempt at scrolling tiles, he already saw it fit to write most of the code in assembly. This was probably a reaction to all of TH01's performance issues, and the frame rate reduction workarounds he implemented to keep the game from slowing down too much in busy places. "If TH01 was all C++ and slow, TH02 better contain more ASM code, and then it will be fast, right?" :zunpet:
Another reason for going with ASM might be found in the kind of documentation that may have been available to ZUN. Last year, the PC-98 community discovered and scanned two new game programming tutorial books from 1991 (1, 2). Their example code is not only entirely written in assembly, but restricts itself to the bare minimum of x86 instructions that were available on the 8086 CPU used by the original PC-9801 model 9 years earlier. Such code is not only suboptimal on the 486, but can often be actually worse than what your C++ compiler would generate. TH02 is where the trend of bad hand-written ASM code started, and it 📝 only intensified in ZUN's later games. So, don't copy code from these books unless you absolutely want to target the earlier 8086 and 286 models. Which, 📝 as we've gathered from the recent blitting benchmark results, are not all too common among current real-hardware owners.
That said, all that ASM code really only impacts readability and maintainability. Apart from the aforementioned issues, the algorithms themselves are mostly fine – especially since most EGC and GRCG operations are decently batched this time around, in contrast to TH01.


Luckily, the tile functions merely use inline assembly within a typical C function and can therefore be at least part of a C++ source file, even if the result is pretty ugly. This time, we can actually be sure that they weren't written directly in a .ASM file, because they feature x86 instruction encodings that can only be generated with Turbo C++ 4.0J's inline assembler, not with TASM. The same can't unfortunately be said about the following function in the same segment, which marks the tiles covered by the spark sprites for redrawing. In this one, it took just one dumb hand-written ASM inconsistency in the function's epilog to make the entire function undecompilable.
The standard x86 instruction sequence to set up a stack frame in a function prolog looks like this:

PUSH	BP
MOV 	BP, SP
SUB 	SP, ?? ; if the function needs the stack for local variables
When compiling without optimizations, Turbo C++ 4.0J will replace this sequence with a single ENTER instruction. That one is two bytes smaller, but much slower on every x86 CPU except for the 80186 where it was introduced.
In functions without local variables, BP and SP remain identical, and a single POP BP is all that's needed in the epilog to tear down such a stack frame before returning from the function. Otherwise, the function needs an additional MOV SP, BP instruction to pop all local variables. With x86 being the helpful CISC architecture that it is, the 80186 also introduced the LEAVE instruction to perform both tasks. Unlike ENTER, this single instruction is faster than the raw two instructions on a lot of x86 CPUs (and even current ones!), and it's always smaller, taking up just 1 byte instead of 3.
So what if you use LEAVE even if your function doesn't use local variables? :thonk: The fact that the instruction first does the equivalent of MOV SP, BP doesn't matter if these registers are identical, and who cares about the additional CPU cycles of LEAVE compared to just POP BP, right? So that's definitely something you could theoretically do, but not something that any compiler would ever generate.

And so, TH02 MAIN.EXE decompilation already hits the first brick wall after two pushes. Awesome! :godzun: Theoretically, we could slowly mash through this wall using the 📝 code generator. But having such an inconsistency in the function epilog would mean that we'd have to keep Turbo C++ 4.0J from emitting any epilog or prolog code so that we can write our own. This means that we'd once again have to hide any use of the SI and DI registers from the compiler… and doing that requires code generation macros for 22 of the 49 instructions of the function in question, almost none of which we currently have. So, this gets quite silly quite fast, especially if we only need to do it for one single byte.

Instead, wouldn't it be much better if we had a separate build step between compile and link time that allowed us to replicate mistakes like these by just patching the compiled .OBJ files? These files still contain the names of exported functions for linking, which would allow us to look up the code of a function in a robust manner, navigate to specific instructions using a disassembler, replace them, and write the modified .OBJ back to disk before linking. Such a system could then naturally expand to cover all other decompilation issues, culminating in a full-on optimizer that could even recreate ZUN's self-modifying code. At that point, we would have sealed away all of ZUN's ugly ASM code within a separate build step, and could finally decompile everything into readable C++.

Pulling that off would require a significant tooling investment though. Patching that one byte in TH02's spark invalidation function could be done within 1 or 2 pushes, but that's just one issue, and we currently have 32 other .ASM files with undecompilable code. Also, note that this is fundamentally different from what we're doing with the debloated branch and the Anniversary Editions. Mistake patching would purely be about having readable code on master that compiles into ZUN's exact binaries, without fixing weird code. The Anniversary Editions go much further and rewrite such code in a much more fundamental way, improving it further than mistake patching ever could.
Right now, the Anniversary Editions seem much more popular, which suggests that people just want 100% RE as fast as possible so that I can start working on them. In that case, why bother with such undecompilable functions, and not just leave them in raw and unreadable x86 opcode form if necessary… :tannedcirno: But let's first see how much backer support there actually is for mistake patching before falling back on that.

The best part though: Once we've made a decision and then covered TH02's spark and particle systems, that was it, and we will have already RE'd all ZUN-written PC-98-specific blitting code in this game. Every further sprite or shape is rendered via master.lib, and is thus decently abstracted. Guess I'll need to update 📝 the assessment of which PC-98 Touhou game is the easiest to port, because it sure isn't TH01, as we've seen with all the work required for the first Anniversary Edition build.


Until then, there are still enough parts of the game that don't use any of the remaining few functions in the _TEXT segment. Previously, I mentioned in the 📝 status overview blog post that TH02 had a seemingly weird sprite system, but the spark and point popup (〇一二三四五六七八九十×÷) structures showed that the game just stores the current and previous position of its entities in a slightly different way compared to the rest of PC-98 Touhou. Instead of having dedicated structure fields, TH02 uses two-element arrays indexed with the active VRAM page. Same thing, and such a pattern even helps during RE since it's easy to spot once you know what to look for.
There's not much to criticize about the point popup system, except for maybe a landmine that causes sprite glitches when trying to display more than 99,990 points. Sadly, the final push in this delivery was rounded out by yet another piece of code at the opposite end of the quality spectrum. The particle and smear effects for Reimu's bomb animations consist almost entirely of assembly bloat, which would just be replaced with generic calls to the generic blitter in this game's future Anniversary Edition.

If I continue to decompile TH02 while avoiding the brick wall, items would be next, but they probably require two pushes. Next up, therefore: Integrating Stripe as an alternative payment provider into the order form. There have been at least three people who reported issues with PayPal, and Stripe has been working much better in tests. In the meantime, here's a temporary Stripe order link for everyone. This one is not connected to the cap yet, so please make sure to stay within whatever value is currently shown on the front page – I will treat any excess money as donations. :onricdennat: If there's some time left afterward, I might also add some small improvements to the TH01 Anniversary Edition.

📝 Posted:
🚚 Summary of:
P0207, P0208, P0209, P0210, P0211
Commits:
454c105...c26ef4b, c26ef4b...239a3ec, 239a3ec...5030867, 5030867...149fbca, 149fbca...d398a94
💰 Funded by:
GhostPhanom, Yanga, Arandui, Lmocinemod
🏷 Tags:

Whew, TH01's boss code just had to end with another beast of a boss, taking way longer than it should have and leaving uncomfortably little time for the rest of the game. Let's get right into the overview of YuugenMagan, the most sequential and scripted battle in this game:


At a pixel-perfect 81×61 pixels, the Orb hitboxes are laid out rather generously this time, reaching quite a bit outside the 64×48 eye sprites:

TH01 YuugenMagan's hitboxes.

And that's about the only positive thing I can say about a position calculation in this fight. Phase 0 already starts with the lasers being off by 1 pixel from the center of the iris. Sure, 28 may be a nicer number to add than 29, but the result won't be byte-aligned either way? This is followed by the eastern laser's hitbox somehow being 24 pixels larger than the others, stretching a rather unexpected 70 pixels compared to the 46 of every other laser.
On a more hilarious note, the eye closing keyframe contains the following (pseudo-)code, comprising the only real accidentally "unused" danmaku subpattern in TH01:

// Did you mean ">= RANK_HARD"?
if(rank == RANK_HARD) {
	eye_north.fire_aimed_wide_5_spread();
	eye_southeast.fire_aimed_wide_5_spread();
	eye_southwest.fire_aimed_wide_5_spread();

	// Because this condition can never be true otherwise.
	// As a result, no pellets will be spawned on Lunatic mode.
	// (There is another Lunatic-exclusive subpattern later, though.)
	if(rank == RANK_LUNATIC) {
		eye_west.fire_aimed_wide_5_spread();
		eye_east.fire_aimed_wide_5_spread();
	}
}

Featuring the weirdly extended hitbox for the eastern laser, as well as an initial Reimu position that points out the disparity between byte-aligned rendering and the internal coordinates one final time.

After a few utility functions that look more like a quickly abandoned refactoring attempt, we quickly get to the main attraction: YuugenMagan combines the entire boss script and most of the pattern code into a single 2,634-instruction function, totaling 9,677 bytes inside REIIDEN.EXE. For comparison, ReC98's version of this code consists of at least 49 functions, excluding those I had to add to work around ZUN's little inconsistencies, or the ones I added for stylistic reasons.
In fact, this function is so large that Turbo C++ 4.0J refuses to generate assembly output for it via the -S command-line option, aborting with a Compiler table limit exceeded in function error. Contrary to what the Borland C++ 4.0 User Guide suggests, this instance of the error is not at all related to the number of function bodies or any metric of algorithmic complexity, but is simply a result of the compiler's internal text representation for a single function overflowing a 64 KiB memory segment. Merely shortening the names of enough identifiers within the function can help to get that representation down below 64 KiB. If you encounter this error during regular software development, you might interpret it as the compiler's roundabout way of telling you that it inlined way more function calls than you probably wanted to have inlined. Because you definitely won't explicitly spell out such a long function in newly-written code, right? :tannedcirno:
At least it wasn't the worst copy-pasting job in this game; that trophy still goes to 📝 Elis. And while the tracking code for adjusting an eye's sprite according to the player's relative position is one of the main causes behind all the bloat, it's also 100% consistent, and might have been an inlined class method in ZUN's original code as well.

The clear highlight in this fight though? Almost no coordinate is precisely calculated where you'd expect it to be. In particular, all bullet spawn positions completely ignore the direction the eyes are facing to:

Pellets unexpectedly spawned at the exact
	bottom center of an eye
Combining the bottom of the pupil with the exact horizontal center of the sprite as a whole might sound like a good idea, but looks especially wrong if the eye is facing right.
Missile spawn positions in the TH01
	YuugenMagan fight
Here it's the other way round: OK for a right-facing eye, really wrong for a left-facing one.
Spawn position of the 3-pixel laser in the
	TH01 YuugenMagan fight
Dude, the eye is even supposed to track the laser in this one!
The final center position of the regular
	pentagram in the TH01 YuugenMagan fight
Hint: That's not the center of the playfield. At least the pellets spawned from the corners are sort of correct, but with the corner calculates precomputed, you could only get them wrong on purpose.

Due to their effect on gameplay, these inaccuracies can't even be called "bugs", and made me devise a new "quirk" category instead. More on that in the TH01 100% blog post, though.


While we did see an accidentally unused bullet pattern earlier, I can now say with certainty that there are no truly unused danmaku patterns in TH01, i.e., pattern code that exists but is never called. However, the code for YuugenMagan's phase 5 reveals another small piece of danmaku design intention that never shows up within the parameters of the original game.
By default, pellets are clipped when they fly past the top of the playfield, which we can clearly observe for the first few pellets of this pattern. Interestingly though, the second subpattern actually configures its pellets to fall straight down from the top of the playfield instead. You never see this happening in-game because ZUN limited that subpattern to a downwards angle range of 0x73 or 162°, resulting in none of its pellets ever getting close to the top of the playfield. If we extend that range to a full 360° though, we can see how ZUN might have originally planned the pattern to end:

YuugenMagan's phase 5 patterns on every difficulty, with the second subpattern extended to reveal the different pellet behavior that remained in the final game code. In the original game, the eyes would stop spawning bullets on the marked frame.

If we also disregard everything else about YuugenMagan that fits the upcoming definition of quirk, we're left with 6 "fixable" bugs, all of which are a symptom of general blitting and unblitting laziness. Funnily enough, they can all be demonstrated within a short 9-second part of the fight, from the end of phase 9 up until the pentagram starts spinning in phase 13:

  1. General flickering whenever any sprite overlaps an eye. This is caused by only reblitting each eye every 3 frames, and is an issue all throughout the fight. You might have already spotted it in the videos above.
  2. Each of the two lasers is unblitted and blitted individually instead of each operation being done for both lasers together. Remember how 📝 ZUN unblits 32 horizontal pixels for every row of a line regardless of its width? That's why the top part of the left, right-moving laser is never visible, because it's blitted before the other laser is unblitted.
  3. ZUN forgot to unblit the lasers when phase 9 ends. This footage was recorded by pressing ↵ Return in test mode (game t or game d), and it's probably impossible to achieve this during actual gameplay without TAS techniques. You would have to deal the required 6 points of damage within 491 frames, with the eye being invincible during 240 of them. Simply shooting up an Orb with a horizontal velocity of 0 would also only work a single time, as boss entities always repel the Orb with a horizontal velocity of ±4.
  4. The shrinking pentagram is unblitted after the eyes were blitted, adding another guaranteed frame of flicker on top of the ones in 1). Like in 2), the blockiness of the holes is another result of unblitting 32 pixels per row at a time.
  5. Another missing unblitting call in a phase transition, as the pentagram switches from its not quite correctly interpolated shrunk form to a regular star polygon with a radius of 64 pixels. Indirectly caused by the massively bloated coordinate calculation for the shrink animation being done separately for the unblitting and blitting calls. Instead of, y'know, just doing it once and storing the result in variables that can later be reused.
  6. The pentagram is not reblitted at all during the first 100 frames of phase 13. During that rather long time, it's easily possible to remove it from VRAM completely by covering its area with player shots. Or HARRY UP pellets.

Definitely an appropriate end for this game's entity blitting code. :onricdennat: I'm really looking forward to writing a proper sprite system for the Anniversary Edition…

And just in case you were wondering about the hitboxes of these pentagrams as they slam themselves into Reimu:

62 pixels on the X axis, centered around each corner point of the star, 16 pixels below, and extending infinitely far up. The latter part becomes especially devious because the game always collision-detects all 5 corners, regardless of whether they've already clipped through the bottom of the playfield. The simultaneously occurring shape distortions are simply a result of the line drawing function's rather poor re-interpolation of any line that runs past the 640×400 VRAM boundaries; 📝 I described that in detail back when I debugged the shootout laser crash. Ironically, using fixed-size hitboxes for a variable-sized pentagram means that the larger one is easier to dodge.


The final puzzle in TH01's boss code comes 📝 once again in the form of weird hardware palette changes. The kanji on the background image goes through various colors throughout the fight, which ZUN implemented by gradually incrementing and decrementing either a single one or none of the color's three 4-bit components at the beginning of each even-numbered phase. The resulting color sequence, however, doesn't quite seem to follow these simple rules:

Adding some debug output sheds light on what's going on there:

Since each iteration of phase 12 adds 63 to the red component, integer overflow will cause the color to infinitely alternate between dark-blue and red colors on every 2.03 iterations of the pentagram phase loop. The 65th iteration will therefore be the first one with a dark-blue color for a third iteration in a row – just in case you manage to stall the fight for that long.

Yup, ZUN had so much trust in the color clamping done by his hardware palette functions that he did not clamp the increment operation on the stage_palette itself. :zunpet: Therefore, the 邪 colors and even the timing of their changes from Phase 6 onwards are "defined" by wildly incrementing color components beyond their intended domain, so much that even the underlying signed 8-bit integer ends up overflowing. Given that the decrement operation on the stage_palette is clamped though, this might be another one of those accidents that ZUN deliberately left in the game, 📝 similar to the conclusion I reached with infinite bumper loops.
But guess what, that's also the last time we're going to encounter this type of palette component domain quirk! Later games use master.lib's 8-bit palette system, which keeps the comfort of using a single byte per component, but shifts the actual hardware color into the top 4 bits, leaving the bottom 4 bits for added precision during fades.

OK, but now we're done with TH01's bosses! 🎉That was the 8th PC-98 Touhou boss in total, leaving 23 to go.


With all the necessary research into these quirks going well into a fifth push, I spent the remaining time in that one with transferring most of the data between YuugenMagan and the upcoming rest of REIIDEN.EXE into C land. This included the one piece of technical debt in TH01 we've been carrying around since March 2015, as well as the final piece of the ending sequence in FUUIN.EXE. Decompiling that executable's main() function in a meaningful way requires pretty much all remaining data from REIIDEN.EXE to also be moved into C land, just in case you were wondering why we're stuck at 99.46% there.
On a more disappointing note, the static initialization code for the 📝 5 boss entity slots ultimately revealed why YuugenMagan's code is as bloated and redundant as it is: The 5 slots really are 5 distinct variables rather than a single 5-element array. That's why ZUN explicitly spells out all 5 eyes every time, because the array he could have just looped over simply didn't exist. 😕 And while these slot variables are stored in a contiguous area of memory that I could just have taken the address of and then indexed it as if it were an array, I didn't want to annoy future port authors with what would technically be out-of-bounds array accesses for purely stylistic reasons. At least it wasn't that big of a deal to rewrite all boss code to use these distinct variables, although I certainly had to get a bit creative with Elis.

Next up: Finding out how many points we got in totle, and hoping that ZUN didn't hide more unexpected complexities in the remaining 45 functions of this game. If you have to spare, there are two ways in which that amount of money would help right now:

📝 Posted:
🚚 Summary of:
P0174, P0175, P0176, P0177, P0178, P0179, P0180, P0181
Commits:
27f901c...a0fe812, a0fe812...40ac9a7, 40ac9a7...c5dc45b, c5dc45b...5f0cabc, 5f0cabc...60621f8, 60621f8...9e5b344, 9e5b344...091f19f, 091f19f...313450f
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

Here we go, TH01 Sariel! This is the single biggest boss fight in all of PC-98 Touhou: If we include all custom effect code we previously decompiled, it amounts to a total of 10.31% of all code in TH01 (and 3.14% overall). These 8 pushes cover the final 8.10% (or 2.47% overall), and are likely to be the single biggest delivery this project will ever see. Considering that I only managed to decompile 6.00% across all games in 2021, 2022 is already off to a much better start!

So, how can Sariel's code be that large? Well, we've got:

In total, it's just under 3,000 lines of C++ code, containing a total of 8 definite ZUN bugs, 3 of them being subpixel/pixel confusions. That might not look all too bad if you compare it to the 📝 player control function's 8 bugs in 900 lines of code, but given that Konngara had 0… (Edit (2022-07-17): Konngara contains two bugs after all: A 📝 possible heap corruption in test or debug mode, and the infamous 📝 temporary green discoloration.) And no, the code doesn't make it obvious whether ZUN coded Konngara or Sariel first; there's just as much evidence for either.

Some terminology before we start: Sariel's first form is separated into four phases, indicated by different background images, that cycle until Sariel's HP reach 0 and the second, single-phase form starts. The danmaku patterns within each phase are also on a cycle, and the game picks a random but limited number of patterns per phase before transitioning to the next one. The fight always starts at pattern 1 of phase 1 (the random purple lasers), and each new phase also starts at its respective first pattern.


Sariel's bugs already start at the graphics asset level, before any code gets to run. Some of the patterns include a wand raise animation, which is stored in BOSS6_2.BOS:

TH01 BOSS6_2.BOS
Umm… OK? The same sprite twice, just with slightly different colors? So how is the wand lowered again?

The "lowered wand" sprite is missing in this file simply because it's captured from the regular background image in VRAM, at the beginning of the fight and after every background transition. What I previously thought to be 📝 background storage code has therefore a different meaning in Sariel's case. Since this captured sprite is fully opaque, it will reset the entire 128×128 wand area… wait, 128×128, rather than 96×96? Yup, this lowered sprite is larger than necessary, wasting 1,967 bytes of conventional memory.
That still doesn't quite explain the second sprite in BOSS6_2.BOS though. Turns out that the black part is indeed meant to unblit the purple reflection (?) in the first sprite. But… that's not how you would correctly unblit that?

VRAM after blitting the first sprite of TH01's BOSS6_2.BOS VRAM after blitting the second sprite of TH01's BOSS6_2.BOS

The first sprite already eats up part of the red HUD line, and the second one additionally fails to recover the seal pixels underneath, leaving a nice little black hole and some stray purple pixels until the next background transition. :tannedcirno: Quite ironic given that both sprites do include the right part of the seal, which isn't even part of the animation.


Just like Konngara, Sariel continues the approach of using a single function per danmaku pattern or custom entity. While I appreciate that this allows all pattern- and entity-specific state to be scoped locally to that one function, it quickly gets ugly as soon as such a function has to do more than one thing.
The "bird function" is particularly awful here: It's just one if(…) {…} else if(…) {…} else if(…) {…} chain with different branches for the subfunction parameter, with zero shared code between any of these branches. It also uses 64-bit floating-point double as its subpixel type… and since it also takes four of those as parameters (y'know, just in case the "spawn new bird" subfunction is called), every call site has to also push four double values onto the stack. Thanks to Turbo C++ even using the FPU for pushing a 0.0 constant, we have already reached maximum floating-point decadence before even having seen a single danmaku pattern. Why decadence? Every possible spawn position and velocity in both bird patterns just uses pixel resolution, with no fractional component in sight. And there goes another 720 bytes of conventional memory.

Speaking about bird patterns, the red-bird one is where we find the first code-level ZUN bug: The spawn cross circle sprite suddenly disappears after it finished spawning all the bird eggs. How can we tell it's a bug? Because there is code to smoothly fly this sprite off the playfield, that code just suddenly forgets that the sprite's position is stored in Q12.4 subpixels, and treats it as raw screen pixels instead. :zunpet: As a result, the well-intentioned 640×400 screen-space clipping rectangle effectively shrinks to 38×23 pixels in the top-left corner of the screen. Which the sprite is always outside of, and thus never rendered again.
The intended animation is easily restored though:

Sariel's third pattern, and the first to spawn birds, in its original and fixed versions. Note that I somewhat fixed the bird hatch animation as well: ZUN's code never unblits any frame of animation there, and simply blits every new one on top of the previous one.

Also, did you know that birds actually have a quite unfair 14×38-pixel hitbox? Not that you'd ever collide with them in any of the patterns…

Another 3 of the 8 bugs can be found in the symmetric, interlaced spawn rays used in three of the patterns, and the 32×32 debris "sprites" shown at their endpoint, at the edge of the screen. You kinda have to commend ZUN's attention to detail here, and how he wrote a lot of code for those few rapidly animated pixels that you most likely don't even notice, especially with all the other wrong pixels resulting from rendering glitches. One of the bugs in the very final pattern of phase 4 even turns them into the vortex sprites from the second pattern in phase 1 during the first 5 frames of the first time the pattern is active, and I had to single-step the blitting calls to verify it.
It certainly was annoying how much time I spent making sense of these bugs, and all weird blitting offsets, for just a few pixels… Let's look at something more wholesome, shall we?


So far, we've only seen the PC-98 GRCG being used in RMW (read-modify-write) mode, which I previously 📝 explained in the context of TH01's red-white HP pattern. The second of its three modes, TCR (Tile Compare Read), affects VRAM reads rather than writes, and performs "color extraction" across all 4 bitplanes: Instead of returning raw 1bpp data from one plane, a VRAM read will instead return a bitmask, with a 1 bit at every pixel whose full 4-bit color exactly matches the color at that offset in the GRCG's tile register, and 0 everywhere else. Sariel uses this mode to make sure that the 2×2 particles and the wind effect are only blitted on top of "air color" pixels, with other parts of the background behaving like a mask. The algorithm:

  1. Set the GRCG to TCR mode, and all 8 tile register dots to the air color
  2. Read N bits from the target VRAM position to obtain an N-bit mask where all 1 bits indicate air color pixels at the respective position
  3. AND that mask with the alpha plane of the sprite to be drawn, shifted to the correct start bit within the 8-pixel VRAM byte
  4. Set the GRCG to RMW mode, and all 8 tile register dots to the color that should be drawn
  5. Write the previously obtained bitmask to the same position in VRAM

Quite clever how the extracted colors double as a secondary alpha plane, making for another well-earned good-code tag. The wind effect really doesn't deserve it, though:

As far as I can tell, ZUN didn't use TCR mode anywhere else in PC-98 Touhou. Tune in again later during a TH04 or TH05 push to learn about TDW, the final GRCG mode!


Speaking about the 2×2 particle systems, why do we need three of them? Their only observable difference lies in the way they move their particles:

  1. Up or down in a straight line (used in phases 4 and 2, respectively)
  2. Left or right in a straight line (used in the second form)
  3. Left and right in a sinusoidal motion (used in phase 3, the "dark orange" one)

Out of all possible formats ZUN could have used for storing the positions and velocities of individual particles, he chose a) 64-bit / double-precision floating-point, and b) raw screen pixels. Want to take a guess at which data type is used for which particle system?

If you picked double for 1) and 2), and raw screen pixels for 3), you are of course correct! :godzun: Not that I'm implying that it should have been the other way round – screen pixels would have perfectly fit all three systems use cases, as all 16-bit coordinates are extended to 32 bits for trigonometric calculations anyway. That's what, another 1.080 bytes of wasted conventional memory? And that's even calculated while keeping the current architecture, which allocates space for 3×30 particles as part of the game's global data, although only one of the three particle systems is active at any given time.

That's it for the first form, time to put on "Civilization of Magic"! Or "死なばもろとも"? Or "Theme of 地獄めくり"? Or whatever SYUGEN is supposed to mean…


… and the code of these final patterns comes out roughly as exciting as their in-game impact. With the big exception of the very final "swaying leaves" pattern: After 📝 Q4.4, 📝 Q28.4, 📝 Q24.8, and double variables, this pattern uses… decimal subpixels? Like, multiplying the number by 10, and using the decimal one's digit to represent the fractional part? Well, sure, if you really insist on moving the leaves in cleanly represented integer multiples of ⅒, which is infamously impossible in IEEE 754. Aside from aesthetic reasons, it only really combines less precision (10 possible fractions rather than the usual 16) with the inferior performance of having to use integer divisions and multiplications rather than simple bit shifts. And it's surely not because the leaf sprites needed an extended integer value range of [-3276, +3276], compared to Q12.4's [-2047, +2048]: They are clipped to 640×400 screen space anyway, and are removed as soon as they leave this area.

This pattern also contains the second bug in the "subpixel/pixel confusion hiding an entire animation" category, causing all of BOSS6GR4.GRC to effectively become unused:

The "swaying leaves" pattern. ZUN intended a splash animation to be shown once each leaf "spark" reaches the top of the playfield, which is never displayed in the original game.

At least their hitboxes are what you would expect, exactly covering the 30×30 pixels of Reimu's sprite. Both animation fixes are available on the th01_sariel_fixes branch.

After all that, Sariel's main function turned out fairly unspectacular, just putting everything together and adding some shake, transition, and color pulse effects with a bunch of unnecessary hardware palette changes. There is one reference to a missing BOSS6.GRP file during the first→second form transition, suggesting that Sariel originally had a separate "first form defeat" graphic, before it was replaced with just the shaking effect in the final game.
Speaking about the transition code, it is kind of funny how the… um, imperative and concrete nature of TH01 leads to these 2×24 lines of straight-line code. They kind of look like ZUN rattling off a laundry list of subsystems and raw variables to be reinitialized, making damn sure to not forget anything.


Whew! Second PC-98 Touhou boss completely decompiled, 29 to go, and they'll only get easier from here! 🎉 The next one in line, Elis, is somewhere between Konngara and Sariel as far as x86 instruction count is concerned, so that'll need to wait for some additional funding. Next up, therefore: Looking at a thing in TH03's main game code – really, I have little idea what it will be!

Now that the store is open again, also check out the 📝 updated RE progress overview I've posted together with this one. In addition to more RE, you can now also directly order a variety of mods; all of these are further explained in the order form itself.