⮜ Blog

⮜ List of tags

Showing all posts tagged gameplay-

📝 Posted:
🚚 Summary of:
P0201, P0202
Commits:
9342665...ff49e9e, ff49e9e...4568bf7
💰 Funded by:
Ember2528, Yanga, [Anonymous]
🏷 Tags:
rec98+ th01+ gameplay- boss+ singyoku+ blitting+ glitch+ animation+ bullet+

The positive:

The negative:

The overview:


This time, we're back to the Orb hitbox being a logical 49×49 pixels in SinGyoku's center, and the shot hitbox being the weird one. What happens if you want the shot hitbox to be both offset to the left a bit and stretch the entire width of SinGyoku's sprite? You get a hitbox that ends in mid-air, far away from the right edge of the sprite:

Due to VRAM byte alignment, all player shots fired between gx = 376 and gx = 383 inclusive appear at the same visual X position, but are internally already partly outside the hitbox and therefore won't hit SinGyoku – compare gx = 376 to gx = 380. So much for precisely visualizing hitboxes in this game…

Since the female and male forms also use the sphere entity's coordinates, they share the same hitbox.


Onto the rendering glitches then, which can – you guessed it – all be found in the sphere form's slam movement:

By having the sphere move from the right edge of the playfield to the left, this video demonstrates both the lazy reblitting and broken unblitting at the right edge for negative X velocities. Also, isn't it funny how Reimu can partly disappear from all the sloppy SinGyoku-related unblitting going on after her sprite was blitted?

Due to the low contrast of the sphere against the background, you typically don't notice these glitches, but the white invincibility flashing after a hit really does draw attention to them. This time, all of these glitches aren't even directly caused by ZUN having never learned about the EGC's bit length register – if he just wrote correct code for SinGyoku, none of this would have been an issue. Sigh… I wonder how many more glitches will be caused by improper use of this one function in the last 18% of REIIDEN.EXE.

There's even another bug here, with ZUN hardcoding a horizontal delta of 8 pixels rather than just passing the actual X velocity. Luckily, the maximum movement speed is 6 pixels on Lunatic, and this would have only turned into an additional observable glitch if the X velocity were to exceed 24 pixels. But that just means it's the kind of bug that still drains RE attention to prove that you can't actually observe it in-game under some circumstances.


The 5 pellet patterns are all pretty straightforward, with nothing to talk about. The code architecture during phase 2 does hint towards ZUN having had more creative patterns in mind – especially for the male form, which uses the transformation function's three pattern callback slots for three repetitions of the same pellet group.
There is one more oddity to be found at the very end of the fight:

The first frame of TH01 SinGyoku's defeat animation, showing the sphere blitted on top of a potentially active person form

Right before the defeat white-out animation, the sphere form is explicitly reblitted for no reason, on top of the form that was blitted to VRAM in the previous frame, and regardless of which form is currently active. If SinGyoku was meant to immediately transform back to the sphere form before being defeated, why isn't the person form unblitted before then? Therefore, the visibility of both forms is undeniably canon, and there is some lore meaning to be found here… :thonk:
In any case, that's SinGyoku done! 6th PC-98 Touhou boss fully decompiled, 25 remaining.


No FUUIN.EXE code rounding out the last push for a change, as the 📝 remaining missile code has been waiting in front of SinGyoku for a while. It already looked bad in November, but the angle-based sprite selection function definitely takes the cake when it comes to unnecessary and decadent floating-point abuse in this game.
The algorithm itself is very trivial: Even with 📝 .PTN requiring an additional quarter parameter to access 16×16 sprites, it's essentially just one bit shift, one addition, and one binary AND. For whatever reason though, ZUN casts the 8-bit missile angle into a 64-bit double, which turns the following explicit comparisons (!) against all possible 4 + 16 boundary angles (!!) into FPU operations. :zunpet: Even with naive and readable division and modulo operations, and the whole existence of this function not playing well with Turbo C++ 4.0J's terrible code generation at all, this could have been 3 lines of code and 35 un-inlined constant-time instructions. Instead, we've got this 207-instruction monster… but hey, at least it works. 🤷
The remaining time then went to YuugenMagan's initialization code, which allowed me to immediately remove more declarations from ASM land, but more on that once we get to the rest of that boss fight.

That leaves 76 functions until we're done with TH01! Next up: Card-flipping stage obstacles.

📝 Posted:
🚚 Summary of:
P0198, P0199, P0200
Commits:
48db0b7...440637e, 440637e...5af2048, 5af2048...67e46b5
💰 Funded by:
Ember2528, Lmocinemod, Yanga
🏷 Tags:
rec98+ th01+ gameplay- boss+ kikuri+ blitting+ glitch+ danmaku-pattern+ cutscene+ pc98+ tcc+

What's this? A simple, straightforward, easy-to-decompile TH01 boss with just a few minor quirks and only two rendering-related ZUN bugs? Yup, 2½ pushes, and Kikuri was done. Let's get right into the overview:

So yeah, there's your new timeout challenge. :godzun:


The few issues in this fight all relate to hitboxes, starting with the main one of Kikuri against the Orb. The coordinates in the code clearly describe a hitbox in the upper center of the disc, but then ZUN wrote a < sign instead of a > sign, resulting in an in-game hitbox that's not quite where it was intended to be…

TODO TH01 Kikuri's intended hitboxTH01 Kikuri's actual hitbox
Kikuri's actual hitbox. Since the Orb sprite doesn't change its shape, we can visualize the hitbox in a pixel-perfect way here. The Orb must be completely within the red area for a hit to be registered.

Much worse, however, are the teardrop ripples. It already starts with their rendering routine, which places the sprites from TAMAYEN.PTN at byte-aligned VRAM positions in the ultimate piece of if(…) {…} else if(…) {…} else if(…) {…} meme code. Rather than tracking the position of each of the five ripple sprites, ZUN suddenly went purely functional and manually hardcoded the exact rendering and collision detection calls for each frame of the animation, based on nothing but its total frame counter. :zunpet:
Each of the (up to) 5 columns is also unblitted and blitted individually before moving to the next column, starting at the center and then symmetrically moving out to the left and right edges. This wouldn't be a problem if ZUN's EGC-powered unblitting function didn't word-align its X coordinates to a 16×1 grid. If the ripple sprites happen to start at an odd VRAM byte position, their unblitting coordinates get rounded both down and up to the nearest 16 pixels, thus touching the adjacent 8 pixels of the previously blitted columns and leaving the well-known black vertical bars in their place. :tannedcirno:

OK, so where's the hitbox issue here? If you just look at the raw calculation, it's a slightly confusingly expressed, but perfectly logical 17 pixels. But this is where byte-aligned blitting has a direct effect on gameplay: These ripples can be spawned at any arbitrary, non-byte-aligned VRAM position, and collisions are calculated relative to this internal position. Therefore, the actual hitbox is shifted up to 7 pixels to the right, compared to where you would expect it from a ripple sprite's on-screen position:

Due to the deterministic nature of this part of the fight, it's always 5 pixels for this first set of ripples. These visualizations are obviously not pixel-perfect due to the different potential, shapes of Reimu's sprite, so they instead relate to her 32×32 bounding box, which needs to be entirely inside the red area.

We've previously seen the same issue with the 📝 shot hitbox of Elis' bat form, where pixel-perfect collision detection against a byte-aligned sprite was merely a sidenote compared to the more serious X=Y coordinate bug. So why do I elevate it to bug status here? Because it directly affects dodging: Reimu's regular movement speed is 4 pixels per frame, and with the internal position of an on-screen ripple sprite varying by up to 7 pixels, any micrododging (or "grazing") attempt turns into a coin flip. It's sort of mitigated by the fact that Reimu is also only ever rendered at byte-aligned VRAM positions, but I wouldn't say that these two bugs cancel out each other.
Oh well, another set of rendering issues to be fixed in the hypothetical Anniversary Edition – obviously, the hitboxes should remain unchanged. Until then, you can always memorize the exact internal positions. The sequence of teardrop spawn points is completely deterministic and only controlled by the fixed per-difficulty spawn interval.


Aside from more minor coordinate inaccuracies, there's not much of interest in the rest of the pattern code. In another parallel to Elis though, the first soul pattern in phase 4 is aimed on every difficulty except Lunatic, where the pellets are once again statically fired downwards. This time, however, the pattern's difficulty is much more appropriately distributed across the four levels, with the simultaneous spinning circle pellets adding a constant aimed component to every difficulty level.

Kikuri's phase 4 patterns, on .


That brings us to 5 fully decompiled PC-98 Touhou bosses, with 26 remaining… and another ½ of a push going to the cutscene code in FUUIN.EXE.
You wouldn't expect something as mundane as the boss slideshow code to contain anything interesting, but there is in fact a slight bit of speculation fuel there. The text typing functions take explicit string lengths, which precisely match the corresponding strings… for the most part. For the "Gatekeeper 'SinGyoku'" string though, ZUN passed 23 characters, not 22. Could that have been the "h" from the Hepburn romanization of 神玉?!
Also, come on, if this text is already blitted to VRAM for no reason, you could have gone for perfect centering at unaligned byte positions; the rendering function would have perfectly supported it. Instead, the X coordinates are still rounded up to the nearest byte.

The hardcoded ending cutscene functions should be even less interesting – don't they just show a bunch of images followed by frame delays? Until they don't, and we reach the 地獄/Jigoku Bad Ending with its special shake/"boom" effect, and this picture:

Picture #2 from ED2A.GRP.

Which is rendered by the following code:

for(int i = 0; i <= boom_duration; i++) { // (yes, off-by-one)
	if((i & 3) == 0) {
		graph_scrollup(8);
	} else {
		graph_scrollup(0);
	}

	end_pic_show(1); // ← different picture is rendered
	frame_delay(2);  // ← blocks until 2 VSync interrupts have occurred

	if(i & 1) {
		end_pic_show(2); // ← picture above is rendered
	} else {
		end_pic_show(1);
	}
}

Notice something? You should never see this picture because it's immediately overwritten before the frame is supposed to end. And yet it's clearly flickering up for about one frame with common emulation settings as well as on my real PC-9821 Nw133, clocked at 133 MHz. master.lib's graph_scrollup() doesn't block until VSync either, and removing these calls doesn't change anything about the blitted images. end_pic_show() uses the EGC to blit the given 320×200 quarter of VRAM from page 1 to the visible page 0, so the bottleneck shouldn't be there either…

…or should it? After setting it up via a few I/O port writes, the common method of EGC-powered blitting works like this:

  1. Read 16 bits from the source VRAM position on any single bitplane. This fills the EGC's 4 16-bit tile registers with the VRAM contents at that specific position on every bitplane. You do not care about the value the CPU returns from the read – in optimized code, you would make sure to just read into a register to avoid useless additional stores into local variables.
  2. Write any 16 bits to the target VRAM position on any single bitplane. This copies the contents of the EGC's tile registers to that specific position on every bitplane.

To transfer pixels from one VRAM page to another, you insert an additional write to I/O port 0xA6 before 1) and 2) to set your source and destination page… and that's where we find the bottleneck. Taking a look at the i486 CPU and its cycle counts, a single one of these page switches costs 17 cycles – 1 for MOVing the page number into AL, and 16 for the OUT instruction itself. Therefore, the 8,000 page switches required for EGC-copying a 320×200-pixel image require 136,000 cycles in total.

And that's the optimal case of using only those two instructions. 📝 As I implied last time, TH01 uses a function call for VRAM page switches, complete with creating and destroying a useless stack frame and unnecessarily updating a global variable in main memory. I tried optimizing ZUN's code by throwing out unnecessary code and using 📝 pseudo-registers to generate probably optimal assembly code, and that did speed up the blitting to almost exactly 50% of the original version's run time. However, it did little about the flickering itself. Here's a comparison of the first loop with boom_duration = 16, recorded in DOSBox-X with cputype=auto and cycles=max, and with i overlaid using the text chip. Caution, flashing lights:

(Note how the background of the ドカーン image is shifted 1 pixel to the left compared to pic #1.)

I pushed the optimized code to the th01_end_pic_optimize branch, to also serve as an example of how to get close to optimal code out of Turbo C++ 4.0J without writing a single ASM instruction.
And if you really want to use the EGC for this, that's the best you can do. It really sucks that it merely expanded the GRCG's 4×8-bit tile register to 4×16 bits. With 32 bits, ≥386 CPUs could have taken advantage of their wider registers and instructions to double the blitting performance. Instead, we now know the reason why 📝 Promisence Soft's EGC-powered sprite driver that ZUN later stole for TH03 is called SPRITE16 and not SPRITE32. What a massive disappointment.

But what's perhaps a bigger surprise: Blitting planar images from main memory is much faster than EGC-powered inter-page VRAM copies, despite the required manual access to all 4 bitplanes. In fact, the blitting functions for the .CDG/.CD2 format, used from TH03 onwards, would later demonstrate the optimal method of using REP MOVSD for blitting every line in 32-pixel chunks. If that was also used for these ending images, the core blitting operation would have taken ((12 + (3 × (320 / 32))) × 200 × 4) = 33,600 cycles, with not much more overhead for the surrounding row and bitplane loops. Sure, this doesn't factor in the whole infamous issue of VRAM being slow on PC-98, but the aforementioned 136,000 cycles don't even include any actual blitting either. And as you move up to later PC-98 models with Pentium CPUs, the gap between OUT and REP MOVSD only becomes larger. (Note that the page I linked above has a typo in the cycle count of REP MOVSD on Pentium CPUs: According to the original Intel Architecture and Programming Manual, it's 13+𝑛, not 3+𝑛.)
This difference explains why later games rarely use EGC-"accelerated" inter-page VRAM copies, and keep all of their larger images in main memory. It especially explains why TH04 and TH05 can get away with naively redrawing boss backdrop images on every frame.

In the end, the whole fact that ZUN did not define how long this image should be visible is enough for me to increment the game's overall bug counter. Who would have thought that looking at endings of all things would teach us a PC-98 performance lesson… Sure, optimizing TH01 already seemed promising just by looking at its bloated code, but I had no idea that its performance issues extended so far past that level.

That only leaves the common beginning part of all endings and a short main() function before we're done with FUUIN.EXE, and 98 functions until all of TH01 is decompiled! Next up: SinGyoku, who not only is the quickest boss to defeat in-game, but also comes with the least amount of code. See you very soon!

📝 Posted:
🚚 Summary of:
P0193, P0194, P0195, P0196, P0197
Commits:
e1f3f9f...183d7a2, 183d7a2...5d93a50, 5d93a50...e18c53d, e18c53d...57c9ac5, 57c9ac5...48db0b7
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98+ th01+ gameplay- boss+ elis+ mima-th01+ rng+ danmaku-pattern+ bug+ jank+ glitch+ unused+ pc98+ blitting+ mod+ score+

With Elis, we've not only reached the midway point in TH01's boss code, but also a bunch of other milestones: Both REIIDEN.EXE and TH01 as a whole have crossed the 75% RE mark, and overall position independence has also finally cracked 80%!

And it got done in 4 pushes again? Yup, we're back to 📝 Konngara levels of redundancy and copy-pasta. This time, it didn't even stop at the big copy-pasted code blocks for the rift sprite and 256-pixel circle animations, with the words "redundant" and "unnecessary" ending up a total of 18 times in my source code comments.
But damn is this fight broken. As usual with TH01 bosses, let's start with a high-level overview:

This puts the earliest possible end of the fight at the first frame of phase 5. However, nothing prevents Elis' HP from reaching 0 before that point. You can nicely see this in 📝 debug mode: Wait until the HP bar has filled up to avoid heap corruption, hold ↵ Return to reduce her HP to 0, and watch how Elis still goes through a total of two patterns* and four teleport animations before accepting defeat.

But wait, heap corruption? Yup, there's a bug in the HP bar that already affected Konngara as well, and it isn't even just about the graphical glitches generated by negative HP:

Since Elis starts with 14 HP, which is an even number, this corruption is trivial to cause: Simply hold ↵ Return from the beginning of the fight, and the completion condition will never be true, as the HP and frame numbers run past the off-by-one meeting point.

Regular gameplay, however, entirely prevents this due to the fixed start positions of Reimu and the Orb, the Orb's fixed initial trajectory, and the 50 frames of delay until a bomb deals damage to a boss. These aspects make it impossible to hit Elis within the first 14 frames of phase 1, and ensure that her HP bar is always filled up completely. So ultimately, this bug ends up comparable in seriousness to the 📝 recursion / stack overflow bug in the memory info screen.


These wavy teleport animations point to a quite frustrating architectural issue in this fight. It's not even the fact that unblitting the yellow star sprites rips temporary holes into Elis' sprite; that's almost expected from TH01 at this point. Instead, it's all because of this unused frame of the animation:

An unused wave animation frame from TH01's BOSS5.BOS

With this sprite still being part of BOSS5.BOS, Girl-Elis has a total of 9 animation frames, 1 more than the 📝 8 per-entity sprites allowed by ZUN's architecture. The quick and easy solution would have been to simply bump the sprite array size by 1, but… nah, this would have added another 20 bytes to all 6 of the .BOS image slots. :zunpet: Instead, ZUN wrote the manual position synchronization code I mentioned in that 2020 blog post. Ironically, he then copy-pasted this snippet of code often enough that it ended up taking up more than 120 bytes in the Elis fight alone – with, you guessed it, some of those copies being redundant. Not to mention that just going from 8 to 9 sprites would have allowed ZUN to go down from 6 .BOS image slots to 3. That would have actually saved 420 bytes in addition to the manual synchronization trouble. Looking forward to SinGyoku, that's going to be fun again…


As for the fight itself, it doesn't take long until we reach its most janky danmaku pattern, right in phase 1:

The "pellets along circle" pattern on Lunatic, in its original version and with fanfiction fixes for everything that can potentially be interpreted as a bug.

Then again, it might very well be that all of this was intended, or, most likely, just left in the game as a happy accident. The latter interpretation would explain why ZUN didn't just delete the rendering calls for the lower-right quarter of the circle, because seriously, how would you not spot that? The phase 3 patterns continue with more minor graphical glitches that aren't even worth talking about anymore.


And then Elis transforms into her bat form at the beginning of Phase 5, which displays some rather unique hitboxes. The one against the Orb is fine, but the one against player shots…

… uses the bat's X coordinate for both X and Y dimensions. :zunpet: In regular gameplay, it's not too bad as most of the bat patterns fire aimed pellets which typically don't allow you to move below her sprite to begin with. But if you ever tried destroying these pellets while standing near the middle of the playfield, now you know why that didn't work. This video also nicely points out how the bat, like any boss sprite, is only ever blitted at positions on the 8×1-pixel VRAM byte grid, while collision detection uses the actual pixel position.

The bat form patterns are all relatively simple, with little variation depending on the difficulty level, except for the "slow pellet spreads" pattern. This one is almost easiest to dodge on Lunatic, where the 5-spreads are not only always fired downwards, but also at the hardcoded narrow delta angle, leaving plenty of room for the player to move out of the way:

The "slow pellet spreads" pattern of Elis' bat form, on . Which version do you think is the easiest one?

Finally, we've got another potential timesave in the girl form's "safety circle" pattern:

After the circle spawned completely, you lose a life by moving outside it, but doing that immediately advances the pattern past the circle part. This part takes 200 frames, but the defeat animation only takes 82 frames, so you can save up to 118 frames there.

Final funny tidbit: As with all dynamic entities, this circle is only blitted to VRAM page 0 to allow easy unblitting. However, it's also kind of static, and there needs to be some way to keep the Orb, the player shots, and the pellets from ripping holes into it. So, ZUN just re-blits the circle every… 4 frames?! 🤪 The same is true for the Star of David and its surrounding circle, but there you at least get a flash animation to justify it. All the overlap is actually quite a good reason for not even attempting to 📝 mess with the hardware color palette instead.


And that's the 4th PC-98 Touhou boss decompiled, 27 to go… but wait, all these quirks, and I still got nothing about the one actual crash that can appear in regular gameplay? There has even been a recent video about it. The cause has to be in Elis' main function, after entering the defeat branch and before the blocking white-out animation. It can't be anywhere else other than in the 📝 central line blitting and unblitting function, called from 📝 that one broken laser reset+unblit function, because everything else in that branch looks fine… and I think we can rule out a crash in MDRV2's non-blocking fade-out call. That's going to need some extra research, and a 5th push added on top of this delivery.

Reproducing the crash was the whole challenge here. Even after moving Elis and Reimu to the exact positions seen in Pearl's video and setting Elis' HP to 0 on the exact same frame, everything ran fine for me. It's definitely no division by 0 this time, the function perfectly guards against that possibility. The line specified in the function's parameters is always clipped to the VRAM region as well, so we can also rule out illegal memory accesses here…

… or can we? Stepping through it all reminded me of how this function brings unblitting sloppiness to the next level: For each VRAM byte touched, ZUN actually unblits the 4 surrounding bytes, adding one byte to the left and two bytes to the right, and using a single 32-bit read and write per bitplane. So what happens if the function tries to unblit the topmost byte of VRAM, covering the pixel positions from (0, 0) to (7, 0) inclusive? The VRAM offset of 0x0000 is decremented to 0xFFFF to cover the one byte to the left, 4 bytes are written to this address, the CPU's internal offset overflows… and as it turns out, that is illegal even in Real Mode as of the 80286, and will raise a General Protection Fault. Which is… ignored by DOSBox-X, every Neko Project II version in common use, the CSCP emulators, SL9821, and T98-Next. Only Anex86 accurately emulates the behavior of real hardware here.

OK, but no laser fired by Elis ever reaches the top-left corner of the screen. How can such a fault even happen in practice? That's where the broken laser reset+unblit function comes in: Not only does it just flat out pass the wrong parameters to the line unblitting function – describing the line already traveled by the laser and stopping where the laser begins – but it also passes them wrongly, in the form of raw 32-bit fixed-point Q24.8 values, with no conversion other than a truncation to the signed 16-bit pixels expected by the function. What then follows is an attempt at interpolation and clipping to find a line segment between those garbage coordinates that actually falls within the boundaries of VRAM:

  1. right/bottom correspond to a laser's origin position, and left/top to the leftmost pixel of its moved-out top line. The bug therefore only occurs with lasers that stopped growing and have started moving.
  2. Moreover, it will only happen if either (left % 256) or (right % 256) is ≤ 127 and the other one of the two is ≥ 128. The typecast to signed 16-bit integers then turns the former into a large positive value and the latter into a large negative value, triggering the function's clipping code.
  3. The function then follows Bresenham's algorithm: left is ensured to be smaller than right by swapping the two values if necessary. If that happened, top and bottom are also swapped, regardless of their value – the algorithm does not care about their order.
  4. The slope in the X dimension is calculated using an integer division of ((bottom - top) / (right - left)). Both subtractions are done on signed 16-bit integers, and overflow accordingly.
  5. (-left × slope_x) is added to top, and left is set to 0.
  6. If both top and bottom are < 0 or ≥ 640, there's nothing to be unblitted. Otherwise, the final coordinates are clipped to the VRAM range of [(0, 0), (639, 399)].
  7. If the function got this far, the line to be unblitted is now very likely to reach from
    1. the top-left to the bottom-right corner, starting out at (0, 0) right away, or
    2. from the bottom-left corner to the top-right corner. In this case, you'd expect unblitting to end at (639, 0), but thanks to an off-by-one error, it actually ends at (640, -1), which is equivalent to (0, 0). Why add clipping to VRAM offset calculations when everything else is clipped already, right? :godzun:
Possible laser states that will cause the fault, with some debug output to help understand the cause, and any pellets removed for better readability. This can happen for all bosses that can potentially have shootout lasers on screen when being defeated, so it also applies to Mima. Fixing this is easier than understanding why it happens, but since y'all love reading this stuff…

tl;dr: TH01 has a high chance of freezing at a boss defeat sequence if there are diagonally moving lasers on screen, and if your PC-98 system raises a General Protection Fault on a 4-byte write to offset 0xFFFF, and if you don't run a TSR with an INT 0Dh handler that might handle this fault differently.

The easiest fix option would be to just remove the attempted laser unblitting entirely, but that would also have an impact on this game's… distinctive visual glitches, in addition to touching a whole lot of code bytes. If I ever get funded to work on a hypothetical TH01 Anniversary Edition that completely rearchitects the game to fix all these glitches, it would be appropriate there, but not for something that purports to be the original game.

(Sidenote to further hype up this Anniversary Edition idea for PC-98 hardware owners: With the amount of performance left on the table at every corner of this game, I'm pretty confident that we can get it to work decently on PC-98 models with just an 80286 CPU.)

Since we're in critical infrastructure territory once again, I went for the most conservative fix with the least impact on the binary: Simply changing any VRAM offsets >= 0xFFFD to 0x0000 to avoid the GPF, and leave all other bugs in place. Sure, it's rather lazy and "incorrect"; the function still unblits a 32-pixel block there, but adding a special case for blitting 24 pixels would add way too much code. And seriously, it's not like anything happens in the 8 pixels between (24, 0) and (31, 0) inclusive during gameplay to begin with. To balance out the additional per-row if() branch, I inlined the VRAM page change I/O, saving two function calls and one memory write per unblitted row.

That means it's time for a new community_choice_fixes build, containing the new definitive bugfixed versions of these games: 2022-05-31-community-choice-fixes.zip Check the th01_critical_fixes branch for the modified TH01 code. It also contains a fix for the HP bar heap corruption in debug mode – simply changing the == comparison to <= is enough to avoid it, and negative HP will still create aesthetic glitch art.


Once again, I then was left with ½ of a push, which I finally filled with some FUUIN.EXE code, specifically the verdict screen. The most interesting part here is the player title calculation, which is quite sneaky: There are only 6 skill levels, but three groups of titles for each level, and the title you'll see is picked from a random group. It looks like this is the first time anyone has documented the calculation?
As for the levels, ZUN definitely didn't expect players to do particularly well. With a 1cc being the standard goal for completing a Touhou game, it's especially funny how TH01 expects you to continue a lot: The code has branches for up to 21 continues, and the on-screen table explicitly leaves room for 3 digits worth of continues per 5-stage scene. Heck, these counts are even stored in 32-bit long variables.

Next up: 📝 Finally finishing the long overdue Touhou Patch Center MediaWiki update work, while continuing with Kikuri in the meantime. Originally I wasn't sure about what to do between Elis and Seihou, but with Ember2528's surprise contribution last week, y'all have demonstrated more than enough interest in the idea of getting TH01 done sooner rather than later. And I agree – after all, we've got the 25th anniversary of its first public release coming up on August 15, and I might still manage to completely decompile this game by that point…

📝 Posted:
🚚 Summary of:
P0190, P0191, P0192
Commits:
5734815...293e16a, 293e16a...71cb7b5, 71cb7b5...e1f3f9f
💰 Funded by:
nrook, -Tom-, [Anonymous]
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ gameplay- boss+ shinki+ danmaku-pattern+ midboss+ master.lib+ file-format+ tcc+ pc98+ micro-optimization+

The important things first:

So, Shinki! As far as final boss code is concerned, she's surprisingly economical, with 📝 her background animations making up more than ⅓ of her entire code. Going straight from TH01's 📝 final 📝 bosses to TH05's final boss definitely showed how much ZUN had streamlined danmaku pattern code by the end of PC-98 Touhou. Don't get me wrong, there is still room for improvement: TH05 not only 📝 reuses the same 16 bytes of generic boss state we saw in TH04 last month, but also uses them 4× as often, and even for midbosses. Most importantly though, defining danmaku patterns using a single global instance of the group template structure is just bad no matter how you look at it:

Declaring a separate structure instance with the static data for every pattern would be both safer and more space-efficient, and there's more than enough space left for that in the game's data segment.
But all in all, the pattern functions are short, sweet, and easy to follow. The "devil" pattern is significantly more complex than the others, but still far from TH01's final bosses at their worst. I especially like the clear architectural separation between "one-shot pattern" functions that return true once they're done, and "looping pattern" functions that run as long as they're being called from a boss's main function. Not many all too interesting things in these pattern functions for the most part, except for two pieces of evidence that Shinki was coded after Yumeko:


Speaking about that wing sprite: If you look at ST05.BB2 (or any other file with a large sprite, for that matter), you notice a rather weird file layout:

Raw file layout of TH05's ST05.BB2, demonstrating master.lib's supposed BFNT width limit of 64 pixels
A large sprite split into multiple smaller ones with a width of 64 pixels each? What's this, hardware sprite limitations? On my PC-98?!

And it's not a limitation of the sprite width field in the BFNT+ header either. Instead, it's master.lib's BFNT functions which are limited to sprite widths up to 64 pixels… or at least that's what MASTER.MAN claims. Whatever the restriction was, it seems to be completely nonexistent as of master.lib version 0.23, and none of the master.lib functions used by the games have any issues with larger sprites.
Since ZUN stuck to the supposed 64-pixel width limit though, it's now the game that expects Shinki's winged form to consist of 4 physical sprites, not just 1. Any conversion from another, more logical sprite sheet layout back into BFNT+ must therefore replicate the original number of sprites. Otherwise, the sequential IDs ("patnums") assigned to every newly loaded sprite no longer match ZUN's hardcoded IDs, causing the game to crash. This is exactly what used to happen with -Tom-'s MysticTK automation scripts, which combined these exact sprites into a single large one. This issue has now been fixed – just in case there are some underground modders out there who used these scripts and wonder why their game crashed as soon as the Shinki fight started.


And then the code quality takes a nosedive with Shinki's main function. :onricdennat: Even in TH05, these boss and midboss update functions are still very imperative:

The biggest WTF in there, however, goes to using one of the 16 state bytes as a "relative phase" variable for differentiating between boss phases that share the same branch within the switch(boss.phase) statement. While it's commendable that ZUN tried to reduce code duplication for once, he could have just branched depending on the actual boss.phase variable? The same state byte is then reused in the "devil" pattern to track the activity state of the big jerky lasers in the second half of the pattern. If you somehow managed to end the phase after the first few bullets of the pattern, but before these lasers are up, Shinki's update function would think that you're still in the phase before the "devil" pattern. The main function then sequence-breaks right to the defeat phase, skipping the final pattern with the burning Makai background. Luckily, the HP boundaries are far away enough to make this impossible.
The takeaway here: If you want to use the state bytes for your custom boss script mods, alias them to your own 16-byte structure, and limit each of the bytes to a clearly defined meaning across your entire boss script.

One final discovery that doesn't seem to be documented anywhere yet: Shinki actually has a hidden bomb shield during her two purple-wing phases. uth05win got this part slightly wrong though: It's not a complete shield, and hitting Shinki will still deal 1 point of chip damage per frame. For comparison, the first phase lasts for 3,000 HP, and the "devil" pattern phase lasts for 5,800 HP.

And there we go, 3rd PC-98 Touhou boss script* decompiled, 28 to go! 🎉 In case you were expecting a fix for the Shinki death glitch: That one is more appropriately fixed as part of the Mai & Yuki script. It also requires new code, should ideally look a bit prettier than just removing cheetos between one frame and the next, and I'd still like it to fit within the original position-dependent code layout… Let's do that some other time.
Not much to say about the Stage 1 midboss, or midbosses in general even, except that their update functions have to imperatively handle even more subsystems, due to the relative lack of helper functions.


The remaining ¾ of the third push went to a bunch of smaller RE and finalization work that would have hardly got any attention otherwise, to help secure that 50% RE mark. The nicest piece of code in there shows off what looks like the optimal way of setting up the 📝 GRCG tile register for monochrome blitting in a variable color:

mov ah, palette_index ; Any other non-AL 8-bit register works too.
                      ; (x86 only supports AL as the source operand for OUTs.)

rept 4                ; For all 4 bitplanes…
    shr ah,  1        ; Shift the next color bit into the x86 carry flag
    sbb al,  al       ; Extend the carry flag to a full byte
                      ; (CF=0 → 0x00, CF=1 → 0xFF)
    out 7Eh, al       ; Write AL to the GRCG tile register
endm

Thanks to Turbo C++'s inlining capabilities, the loop body even decompiles into a surprisingly nice one-liner. What a beautiful micro-optimization, at a place where micro-optimization doesn't hurt and is almost expected.
Unfortunately, the micro-optimizations went all downhill from there, becoming increasingly dumb and undecompilable. Was it really necessary to save 4 x86 instructions in the highly unlikely case of a new spark sprite being spawned outside the playfield? That one 2D polar→Cartesian conversion function then pointed out Turbo C++ 4.0J's woefully limited support for 32-bit micro-optimizations. The code generation for 32-bit 📝 pseudo-registers is so bad that they almost aren't worth using for arithmetic operations, and the inline assembler just flat out doesn't support anything 32-bit. No use in decompiling a function that you'd have to entirely spell out in machine code, especially if the same function already exists in multiple other, more idiomatic C++ variations.
Rounding out the third push, we got the TH04/TH05 DEMO?.REC replay file reading code, which should finally prove that nothing about the game's original replay system could serve as even just the foundation for community-usable replays. Just in case anyone was still thinking that.


Next up: Back to TH01, with the Elis fight! Got a bit of room left in the cap again, and there are a lot of things that would make a lot of sense now:

📝 Posted:
🚚 Summary of:
P0189
Commits:
22abdd1...b4876b6
💰 Funded by:
Arandui, Lmocinemod
🏷 Tags:
rec98+ th04+ th05+ gameplay- boss+ kurumi+ marisa-4+ bug+ danmaku-pattern+ mod+ sara+ blitting+ animation+

(Before we start: Make sure you've read the current version of the FAQ section on a potential takedown of this project, updated in light of the recent DMCA claims against PC-98 Touhou game downloads.)


Slight change of plans, because we got instructions for reliably reproducing the TH04 Kurumi Divide Error crash! Major thanks to Colin Douglas Howell. With those, it also made sense to immediately look at the crash in the Stage 4 Marisa fight as well. This way, I could release both of the obligatory bugfix mods at the same time.
Especially since it turned out that I was wrong: Both crashes are entirely unrelated to the custom entity structure that would have required PI-centric progress. They are completely specific to Kurumi's and Marisa's danmaku-pattern code, and really are two separate bugs with no connection to each other. All of the necessary research nicely fit into Arandui's 0.5 pushes, with no further deep understanding required here.

But why were there still three weeks between Colin's message and this blog post? DMCA distractions aside: There are no easy fixes this time, unlike 📝 back when I looked at the Stage 5 Yuuka crash. Just like how division by zero is undefined in mathematics, it's also, literally, undefined what should happen instead of these two Divide error crashes. This means that any possible "fix" can only ever be a fanfiction interpretation of the intentions behind ZUN's code. The gameplay community should be aware of this, and might decide to handle these cases differently. And if we have to go into fanfiction territory to work around crashes in the canon games, we'd better document what exactly we're fixing here and how, as comprehensible as possible.


With that out of the way, let's look at Kurumi's crash first, since it's way easier to grasp. This one is known to primarily happen to new players, and it's easy to see why:


The pattern that causes the crash in Kurumi's fight. Also demonstrates how the number of bullets in a ring is always halved on Easy Mode after the rank-based tuning, leading to just a 3-ring on playperf = 16.

So, what should the workaround look like? Obviously, we want to modify neither the default number of ring bullets nor the tuning algorithm – that would change all other non-crashing variations of this pattern on other difficulties and ranks, creating a fork of the original gameplay. Instead, I came up with four possible workarounds that all seemed somewhat logical to me:

  1. Firing no bullet, i.e., interpreting 0-ring literally. This would create the only constellation in which a call to the bullet group spawn functions would not spawn at least one new bullet.
  2. Firing a "1-ring", i.e., a single bullet. This would be consistent with how the bullet spawn functions behave for "0-way" stack and spread groups.
  3. Firing a "∞-ring", i.e., 200 bullets, which is as much as the game's cap on 16×16 bullets would allow. This would poke fun at the whole "division by zero" idea… but given that we're still talking about Easy Mode (and especially new players) here, it might be a tad too cruel. Certainly the most trollish interpretation.
  4. Triggering an immediate Game Over, exchanging the hard crash for a softer and more controlled shutdown. Certainly the option that would be closest to the behavior of the original games, and perhaps the only one to be accepted in Serious, High-Level Play™.

As I was writing this post, it felt increasingly wrong for me to make this decision. So I once again went to Twitter, where 56.3% voted in favor of the 1-bullet option. Good that I asked! I myself was more leaning towards the 0-bullet interpretation, which only got 28.7% of the vote. Also interesting are the 2.3% in favor of the Game Over option but I get it, low-rank Easy Mode isn't exactly the most competitive mode of playing TH04.
There are reports of Kurumi crashing on higher difficulties as well, but I could verify none of them. If they aren't fixed by this workaround, they're caused by an entirely different bug that we have yet to discover.


Onto the Stage 4 Marisa crash then, which does in fact apply to all difficulty levels. I was also wrong on this one – it's a hell of a lot more intricate than being just a division by the number of on-screen bits. Without having decompiled the entire fight, I can't give a completely accurate picture of what happens there yet, but here's the rough idea:

Reference points for Marisa's point-reflected movement. Cyan: Marisa's position, green: (192, 112), yellow: the intended end point.

One of the two patterns in TH04's Stage 4 Marisa boss fight that feature frame number-dependent point-reflected movement. The bits were hacked to self-destruct on .

tl;dr: "Game crashes if last bit destroyed within 4-frame window near end of two patterns". For an informed decision on a new movement behavior for these last 8 frames, we definitely need to know all the details behind the crash though. Here's what I would interpret into the code:

  1. Not moving at all, i.e., interpreting 0 as the middle ground between positive and negative movement. This would also make sense because a 12-frame duration implies 100% of the movement to consist of the braking phase – and Marisa wasn't moving before, after all.
  2. Move at maximum speed, i.e., dividing by 1 rather than 0. Since the movement duration is still 12 in this case, Marisa will immediately start braking. In total, she will move exactly ¾ of the way from her initial position to (192, 112) within the 8 frames before the pattern ends.
  3. Directly warping to (192, 112) on frame 0, and to the point-reflected target on 4, respectively. This "emulates" the division by zero by moving Marisa at infinite speed to the exact two points indicated by the velocity formula. It also fits nicely into the 8 frames we have to fill here. Sure, Marisa can't reach these points at any other duration, but why shouldn't she be able to, with infinite speed? Then again, if Marisa is far away enough from (192, 112), this workaround would warp her across the entire playfield. Can Marisa teleport according to lore? I have no idea… :tannedcirno:
  4. Triggering an immediate Game O– hell no, this is the Stage 4 boss, people already hate losing runs to this bug!

Asking Twitter worked great for the Kurumi workaround, so let's do it again! Gotta attach a screenshot of an earlier draft of this blog post though, since this stuff is impossible to explain in tweets…

…and it went through the roof, becoming the most successful ReC98 tweet so far?! Apparently, y'all really like to just look at descriptions of overly complex bugs that I'd consider way beyond the typical attention span that can be expected from Twitter. Unfortunately, all those tweet impressions didn't quite translate into poll turnout. The results were pretty evenly split between 1) and 2), with option 1) just coming out slightly ahead at 49.1%, compared to 41.5% of option 2).

(And yes, I only noticed after creating the poll that warping to both the green and yellow points made more sense than warping to just one of the two. Let's hope that this additional variant wouldn't have shifted the results too much. Both warp options only got 9.4% of the vote after all, and no one else came up with the idea either. :onricdennat: In the end, you can always merge together your preferred combination of workarounds from the Git branches linked below.)


So here you go: The new definitive version of TH04, containing not only the community-chosen Kurumi and Stage 4 Marisa workaround variant, but also the 📝 No-EMS bugfix from last year. Edit (2022-05-31): This package is outdated, 📝 the current version is here! 2022-04-18-community-choice-fixes.zip Oh, and let's also add spaztron64's TH03 GDC clock fix from 2019 because why not. This binary was built from the community_choice_fixes branch, and you can find the code for all the individual workarounds on these branches:

Again, because it can't be stated often enough: These fixes are fanfiction. The gameplay community should be aware of this, and might decide to handle these cases differently.


With all of that taking way more time to evaluate and document, this research really had to become part of a proper push, instead of just being covered in the quick non-push blog post I initially intended. With ½ of a push left at the end, TH05's Stage 1-5 boss background rendering functions fit in perfectly there. If you wonder how these static backdrop images even need any boss-specific code to begin with, you're right – it's basically the same function copy-pasted 4 times, differing only in the backdrop image coordinates and some other inconsequential details.
Only Sara receives a nice variation of the typical 📝 blocky entrance animation: The usually opaque bitmap data from ST00.BB is instead used as a transition mask from stage tiles to the backdrop image, by making clever use of the tile invalidation system:

TH04 uses the same effect a bit more frequently, for its first three bosses.

Next up: Shinki, for real this time! I've already managed to decompile 10 of her 11 danmaku patterns within a little more than one push – and yes, that one is included in there. Looks like I've slightly overestimated the amount of work required for TH04's and TH05's bosses…

📝 Posted:
🚚 Summary of:
P0186, P0187, P0188
Commits:
a21ab3d...bab5634, bab5634...426a531, 426a531...e881f95
💰 Funded by:
Blue Bolt, [Anonymous], nrook
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay- boss+ portability+ tcc+ master.lib+ bug+ marisa-4+ gengetsu+ score+

Did you know that moving on top of a boss sprite doesn't kill the player in TH04, only in TH05?

Screenshot of Reimu moving on top of Stage 6 Yuuka, demonstrating the lack of boss↔player collision in TH04
Yup, Reimu is not getting hit… yet.

That's the first of only three interesting discoveries in these 3 pushes, all of which concern TH04. But yeah, 3 for something as seemingly simple as these shared boss functions… that's still not quite the speed-up I had hoped for. While most of this can be blamed, again, on TH04 and all of its hardcoded complexities, there still was a lot of work to be done on the maintenance front as well. These functions reference a bunch of code I RE'd years ago and that still had to be brought up to current standards, with the dependencies reaching from 📝 boss explosions over 📝 text RAM overlay functionality up to in-game dialog loading.

The latter provides a good opportunity to talk a bit about x86 memory segmentation. Many aspiring PC-98 developers these days are very scared of it, with some even going as far as to rather mess with Protected Mode and DOS extenders just so that they don't have to deal with it. I wonder where that fear comes from… Could it be because every modern programming language I know of assumes memory to be flat, and lacks any standard language-level features to even express something like segments and offsets? That's why compilers have a hard time targeting 16-bit x86 these days: Doing anything interesting on the architecture requires giving the programmer full control over segmentation, which always comes down to adding the typical non-standard language extensions of compilers from back in the day. And as soon as DOS stopped being used, these extensions no longer made sense and were subsequently removed from newer tools. A good example for this can be found in an old version of the NASM manual: The project started as an attempt to make x86 assemblers simple again by throwing out most of the segmentation features from MASM-style assemblers, which made complete sense in 1996 when 16-bit DOS and Windows were already on their way out. But there was a point to all those features, and that's why ReC98 still has to use the supposedly inferior TASM.

Not that this fear of segmentation is completely unfounded: All the segmentation-related keywords, directives, and #pragmas provided by Borland C++ and TASM absolutely can be the cause of many weird runtime bugs. Even if the compiler or linker catches them, you are often left with confusing error messages that aged just as poorly as memory segmentation itself.
However, embracing the concept does provide quite the opportunity for optimizations. While it definitely was a very crazy idea, there is a small bit of brilliance to be gained from making proper use of all these segmentation features. Case in point: The buffer for the in-game dialog scripts in TH04 and TH05.

// Thanks to the semantics of `far` pointers, we only need a single 32-bit
// pointer variable for the following code.
extern unsigned char far *dialog_p;

// This master.lib function returns a `void __seg *`, which is a 16-bit
// segment-only pointer. Converting to a `far *` yields a full segment:offset
// pointer to offset 0000h of that segment.
dialog_p = (unsigned char far *)hmem_allocbyte(/* … */);

// Running the dialog script involves pointer arithmetic. On a far pointer,
// this only affects the 16-bit offset part, complete with overflow at 64 KiB,
// from FFFFh back to 0000h.
dialog_p += /* … */;
dialog_p += /* … */;
dialog_p += /* … */;

// Since the segment part of the pointer is still identical to the one we
// allocated above, we can later correctly free the buffer by pulling the
// segment back out of the pointer.
hmem_free((void __seg *)dialog_p);

If dialog_p was a huge pointer, any pointer arithmetic would have also adjusted the segment part, requiring a second pointer to store the base address for the hmem_free call. Doing that will also be necessary for any port to a flat memory model. Depending on how you look at it, this compression of two logical pointers into a single variable is either quite nice, or really, really dumb in its reliance on the precise memory model of one single architecture. :tannedcirno:


Why look at dialog loading though, wasn't this supposed to be all about shared boss functions? Well, TH04 unnecessarily puts certain stage-specific code into the boss defeat function, such as loading the alternate Stage 5 Yuuka defeat dialog before a Bad Ending, or initializing Gengetsu after Mugetsu's defeat in the Extra Stage.
That's TH04's second core function with an explicit conditional branch for Gengetsu, after the 📝 dialog exit code we found last year during EMS research. And I've heard people say that Shinki was the most hardcoded fight in PC-98 Touhou… Really, Shinki is a perfectly regular boss, who makes proper use of all internal mechanics in the way they were intended, and doesn't blast holes into the architecture of the game. Even within TH05, it's Mai and Yuki who rely on hacks and duplicated code, not Shinki.

The worst part about this though? How the function distinguishes Mugetsu from Gengetsu. Once again, it uses its own global variable to track whether it is called the first or the second time within TH04's Extra Stage, unrelated to the same variable used in the dialog exit function. But this time, it's not just any newly created, single-use variable, oh no. In a misguided attempt to micro-optimize away a few bytes of conventional memory, TH04 reserves 16 bytes of "generic boss state", which can (and are) freely used for anything a boss doesn't want to store in a more dedicated variable.
It might have been worth it if the bosses actually used most of these 16 bytes, but the majority just use (the same) two, with only Stage 4 Reimu using a whopping seven different ones. To reverse-engineer the various uses of these variables, I pretty much had to map out which of the undecompiled danmaku-pattern functions corresponds to which boss fight. In the end, I assigned 29 different variable names for each of the semantically different use cases, which made up another full push on its own.

Now, 16 bytes of wildly shared state, isn't that the perfect recipe for bugs? At least during this cursory look, I haven't found any obvious ones yet. If they do exist, it's more likely that they involve reused state from earlier bosses – just how the Shinki death glitch in TH05 is caused by reusing cheeto data from way back in Stage 4 – and hence require much more boss-specific progress.
And yes, it might have been way too early to look into all these tiny details of specific boss scripts… but then, this happened:

TH04 crashing to the DOS prompt in the Stage 4 Marisa fight, right as the last of her bits is destroyed

Looks similar to another screenshot of a crash in the same fight that was reported in December, doesn't it? I was too much in a hurry to figure it out exactly, but notice how both crashes happen right as the last of Marisa's four bits is destroyed. KirbyComment has suspected this to be the cause for a while, and now I can pretty much confirm it to be an unguarded division by the number of on-screen bits in Marisa-specific pattern code. But what's the cause for Kurumi then? :thonk:
As for fixing it, I can go for either a fast or a slow option:

  1. Superficially fixing only this crash will probably just take a fraction of a push.
  2. But I could also go for a deeper understanding by looking at TH04's version of the 📝 custom entity structure. It not only stores the data of Marisa's bits, but is also very likely to be involved in Kurumi's crash, and would get TH04 a lot closer to 100% PI. Taking that look will probably need at least 2 pushes, and might require another 3-4 to completely decompile Marisa's fight, and 2-3 to decompile Kurumi's.

OK, now that that's out of the way, time to finish the boss defeat function… but not without stumbling over the third of TH04's quirks, relating to the Clear Bonus for the main game or the Extra Stage:

And after another few collision-related functions, we're now truly, finally ready to decompile bosses in both TH04 and TH05! Just as the anything funds were running out… :onricdennat: The remaining ¼ of the third push then went to Shinki's 32×32 ball bullets, rounding out this delivery with a small self-contained piece of the first TH05 boss we're probably going to look at.

Next up, though: I'm not sure, actually. Both Shinki and Elis seem just a little bit larger than the 2¼ or 4 pushes purchased so far, respectively. Now that there's a bunch of room left in the cap again, I'll just let the next contribution decide – with a preference for Shinki in case of a tie. And if it will take longer than usual for the store to sell out again this time (heh), there's still the 📝 PC-98 text RAM JIS trail word rendering research waiting to be documented.

📝 Posted:
🚚 Summary of:
P0184, P0185
Commits:
f9d983e...f918298, f918298...a21ab3d
💰 Funded by:
-Tom-, Blue Bolt, [Anonymous]
🏷 Tags:
rec98+ th04+ th05+ gameplay- louise+ shinki+ bullet+ danmaku-pattern+ uth05win+ blitting+

Two years after 📝 the first look at TH04's and TH05's bullets, we finally get to finish their logic code by looking at the special motion types. Bullets as a whole still aren't completely finished as the rendering code is still waiting to be RE'd, but now we've got everything about them that's required for decompiling the midboss and boss fights of these games.

Just like the motion types of TH01's pellets, the ones we've got here really are special enough to warrant an enum, despite all the overlap in the "slow down and turn" and "bounce at certain edges of the playfield" types. Sure, including them in the bitfield I proposed two years ago would have allowed greater variety, but it wouldn't have saved any memory. On the contrary: These types use a single global state variable for the maximum turn count and delta speed, which a proper customizable architecture would have to integrate into the bullet structure. Maybe it is possible to stuff everything into the same amount of bytes, but not without first completely rearchitecting the bullet structure and removing every single piece of redundancy in there. Simply extending the system by adding a new enum value for a new motion type would be way more straightforward for modders.

Speaking about memory, TH05 already extends the bullet structure by 6 bytes for the "exact linear movement" type exclusive to that game. This type is particularly interesting for all the prospective PC-98 game developers out there, as it nicely points out the precision limits of Q12.4 subpixels.
Regular bullet movement works by adding a Q12.4 velocity to a Q12.4 position every frame, with the velocity typically being calculated only once on spawn time from an 8-bit angle and a Q12.4 speed. Quantization errors from this initial calculation can quickly compound over all the frames a bullet spends moving across the playfield. If a bullet is only supposed to move on a straight line though, there is a more precise way of calculating its position: By storing the origin point, movement angle, and total distance traveled, you can perform a full polar→Cartesian transformation every frame. Out of the 10 danmaku patterns in TH05 that use this motion type, the difference to regular bullet movement can be best seen in Louise's final pattern:

Louise's final pattern in its original form, demonstrating exact linear bullet movement. Note how each bullet spawns slightly behind the delay cloud: ZUN simply forgot to shift the fixed origin point along with it.
The same pattern with standard bullet movement, corrupting its intended appearance. No delay cloud-related oversights here though, at least.

Not far away from the regular bullet code, we've also got the movement function for the infamous curve / "cheeto" bullets. I would have almost called them "cheetos" in the code as well, which surely fits more nicely into 8.3 filenames than "curve bullets" does, but eh, trademarks…

As for hitboxes, we got a 16×16 one on the head node, and a 12×12 one on the 16 trail nodes. The latter simply store the position of the head node during the last 16 frames, Snake style. But what you're all here for is probably the turning and homing algorithm, right? Boiled down to its essence, it works like this:

// [head] points to the controlled "head" part of a curve bullet entity.
// Angles are stored with 8 bits representing a full circle, providing free
// normalization on arithmetic overflow.
// The directions are ordered as you would expect:
// • 0x00: right	(sin(0x00) =  0, cos(0x00) = +1)
// • 0x40: down 	(sin(0x40) = +1, cos(0x40) =  0)
// • 0x80: left 	(sin(0x80) =  0, cos(0x80) = -1)
// • 0xC0: up   	(sin(0xC0) = -1, cos(0xC0) =  0)
uint8_t angle_delta = (head->angle - player_angle_from(
	head->pos.cur.x, head->pos.cur.y
));

// Stop turning if the player is 1/128ths of a circle away from this bullet
const uint8_t SNAP = 0x02;

// Else, turn either clockwise or counterclockwise by 1/256th of a circle,
// depending on what would reach the player the fastest.
if((angle_delta > SNAP) && (angle_delta < static_cast(-SNAP))) {
	angle_delta = (angle_delta >= 0x80) ? -0x01 : +0x01;
}
head_p->angle -= angle_delta;

5 lines of code, and not all too difficult to follow once you are familiar with 8-bit angles… unlike what ZUN actually wrote. Which is 26 lines, and includes an unused "friction" variable that is never set to any value that makes a difference in the formula. :zunpet:. uth05win correctly saw through that all and simplified this code to something equivalent to my explanation. Redoing that work certainly wasted a bit of my time, and means that I now definitely need to spend another push on RE'ing all the shared boss functions before I can start with Shinki.

So while a curve bullet's speed does get faster over time, its angular velocity is always limited to 1/256th of a circle per frame. This reveals the optimal strategy for dodging them: Maximize this delta angle by staying as close to 180° away from their current direction as possible, and let their acceleration do the rest.

At least that's the theory for dodging a single one. As a danmaku designer, you can now of course place other bullets at these technically optimal places to prevent a curve bullet pattern from being cheesed like that. I certainly didn't record the video above in a single take either… :tannedcirno:


After another bunch of boring entity spawn and update functions, the playfield shaking feature turned out as the most notable (and tricky) one to round out these two pushes. It's actually implemented quite well in how it simply "un-shakes" the screen by just marking every stage tile to be redrawn. In the context of all the other tile invalidation that can take place during a frame, that's definitely more performant than 📝 doing another EGC-accelerated memmove() . Due to these two games being double-buffered via page flipping, this invalidation only really needs to happen for the frame after the next one though. The immediately next frame will show the regular, un-shaken playfield on the other VRAM page first, except during the multi-frame shake animation when defeating a midboss, where it will also appear shifted in a different direction… 😵 Yeah, no wonder why ZUN just always invalidates all stage tiles for the next two frames after every shaking animation, which is guaranteed to handle both sporadic single-frame shakes and continuous ones. So close to good-code here.

Finally, this delivery was delayed a bit because -Tom- requested his round-up amount to be limited to the cap in the future. Since that makes it kind of hard to explain on a static page how much money he will exactly provide, I now properly modeled these discounts in the website code. The exact round-up amount is now included in both the pre-purchase breakdown, as well as the cap bar on the main page.
With that in place, the system is now also set up for round-up offers from other patrons. If you'd also like to support certain goals in this way, with any amount of money, now's the time for getting in touch with me about that. Known contributors only, though! 😛

Next up: The final bunch of shared boring boss functions. Which certainly will give me a break from all the maintenance and research work, and speed up delivery progress again… right?

📝 Posted:
🚚 Summary of:
P0182, P0183
Commits:
313450f...1e2c7ad, 1e2c7ad...f9d983e
💰 Funded by:
Lmocinemod, [Anonymous], Yanga
🏷 Tags:
rec98+ th03+ pc98+ gameplay- player+ micro-optimization+ tcc+ portability+ mod+

Been 📝 a while since we last looked at any of TH03's game code! But before that, we need to talk about Y coordinates.

During TH03's MAIN.EXE, the PC-98 graphics GDC runs in its line-doubled 640×200 resolution, which gives the in-game portion its distinctive stretched low-res look. This lower resolution is a consequence of using 📝 Promisence Soft's SPRITE16 driver: Its performance simply stems from the fact that it expects sprites to be stored in the bottom half of VRAM, which allows them to be blitted using the same EGC-accelerated VRAM-to-VRAM copies we've seen again and again in all other games. Reducing the visible resolution also means that the sprites can be stored on both VRAM pages, allowing the game to still be double-buffered. If you force the graphics chip to run at 640×400, you can see them:

TH03's VRAM at regular line-doubled 640×200 resolutionTH03's VRAM at full 640×400 resolution, including the SPRITE16 sprite areaTH03's text layer during an in-game round.
The full VRAM contents during TH03's in-game portion, as seen when forcing the system into a 640×400 resolution.

Note that the text chip still displays its overlaid contents at 640×400, which means that TH03's in-game portion technically runs at two resolutions at the same time.

But that means that any mention of a Y coordinate is ambiguous: Does it refer to undoubled VRAM pixels, or on-screen stretched pixels? Especially people who have known about the line doubling for years might almost expect technical blog posts on this game to use undoubled VRAM coordinates. So, let's introduce a new formatting convention for both on-screen 640×400 and undoubled 640×200 coordinates, and always write out both to minimize the confusion.


Alright, now what's the thing gonna be? The enemy structure is highly overloaded, being used for enemies, fireballs, and explosions with seemingly different semantics for each. Maybe a bit too much to be figured out in what should ideally be a single push, especially with all the functions that would need to be decompiled? Bullet code would be easier, but not exactly single-push material either. As it turns out though, there's something more fundamental left to be done first, which both of these subsystems depend on: collision detection!

And it's implemented exactly how I always naively imagined collision detection to be implemented in a fixed-resolution 2D bullet hell game with small hitboxes: By keeping a separate 1bpp bitmap of both playfields in memory, drawing in the collidable regions of all entities on every frame, and then checking whether any pixels at the current location of the player's hitbox are set to 1. It's probably not done in the other games because their single data segment was already too packed for the necessary 17,664 bytes to store such a bitmap at pixel resolution, and 282,624 bytes for a bitmap at Q12.4 subpixel resolution would have been prohibitively expensive in 16-bit Real Mode DOS anyway. In TH03, on the other hand, this bitmap is doubly useful, as the AI also uses it to elegantly learn what's on the playfield. By halving the resolution and only tracking tiles of 2×2 / 2×1 pixels, TH03 only requires an adequate total of 6,624 bytes of memory for the collision bitmaps of both playfields.

So how did the implementation not earn the good-code tag this time? Because the code for drawing into these bitmaps is undecompilable hand-written x86 assembly. :zunpet: And not just your usual ASM that was basically compiled from C and then edited to maybe optimize register allocation and maybe replace a bunch of local variables with self-modifying code, oh no. This code is full of overly clever bit twiddling, abusing the fact that the 16-bit AX, BX, CX, and DX registers can also be accessed as two 8-bit registers, calculations that change the semantic meaning behind the value of a register, or just straight-up reassignments of different values to the same small set of registers. Sure, in some way it is impressive, and it all does work and correctly covers every edge case, but come on. This could have all been a lot more readable in exchange for just a few CPU cycles.

What's most interesting though are the actual shapes that these functions draw into the collision bitmap. On the surface, we have:

  1. vertical slopes at any angle across the whole playfield; exclusively used for Chiyuri's diagonal laser EX attack
  2. straight vertical lines, with a width of 1 tile; exclusively used for the 2×2 / 2×1 hitboxes of bullets
  3. rectangles at arbitrary sizes

But only 2) actually draws a full solid line. 1) and 3) are only ever drawn as horizontal stripes, with a hardcoded distance of 2 vertical tiles between every stripe of a slope, and 4 vertical tiles between every stripe of a rectangle. That's 66-75% of each rectangular entity's intended hitbox not actually taking part in collision detection. Now, if player hitboxes were ≤ 6 / 3 pixels, we'd have one possible explanation of how the AI can "cheat", because it could just precisely move through those blank regions at TAS speeds. So, let's make this two pushes after all and tell the complete story, since this is one of the more interesting aspects to still be documented in this game.


And the code only gets worse. :godzun: While the player collision detection function is decompilable, it might as well not have been, because it's just more of the same "optimized", hard-to-follow assembly. With the four splittable 16-bit registers having a total of 20 different meanings in this function, I would have almost preferred self-modifying code…

In fact, it was so bad that it prompted some maintenance work on my inline assembly coding standards as a whole. Turns out that the _asm keyword is not only still supported in modern Visual Studio compilers, but also in Clang with the -fms-extensions flag, and compiles fine there even for 64-bit targets. While that might sound like amazing news at first ("awesome, no need to rewrite this stuff for my x86_64 Linux port!"), you quickly realize that almost all inline assembly in this codebase assumes either PC-98 hardware, segmented 16-bit memory addressing, or is a temporary hack that will be removed with further RE progress.
That's mainly because most of the raw arithmetic code uses Turbo C++'s register pseudovariables where possible. While they certainly have their drawbacks, being a non-standard extension that's not supported in other x86-targeting C compilers, their advantages are quite significant: They allow this code to stay in the same language, and provide slightly more immediate portability to any other architecture, together with 📝 readability and maintainability improvements that can get quite significant when combined with inlining:

// This one line compiles to five ASM instructions, which would need to be
// spelled out in any C compiler that doesn't support register pseudovariables.
// By adding typed aliases for these registers via `#define`, this code can be
// both made even more readable, and be prepared for an easier transformation
// into more portable local variables.
_ES = (((_AX * 4) + _BX) + SEG_PLANE_B);

However, register pseudovariables might cause potential portability issues as soon as they are mixed with inline assembly instructions that rely on their state. The lazy way of "supporting pseudo-registers" in other compilers would involve declaring the full set as global variables, which would immediately break every one of those instances:

_DI = 0;
_AX = 0xFFFF;

// Special x86 instruction doing the equivalent of
//
// 	*reinterpret_cast(MK_FP(_ES, _DI)) = _AX;
// 	_DI += sizeof(uint16_t);
//
// Only generated by Turbo C++ in very specific cases, and therefore only
// reliably available through inline assembly.
asm { movsw; }

What's also not all too standardized, though, are certain variants of the asm keyword. That's why I've now introduced a distinction between the _asm keyword for "decently sane" inline assembly, and the slightly less standard asm keyword for inline assembly that relies on the contents of pseudo-registers, and should break on compilers that don't support them.
So yeah, have some minor portability work in exchange for these two pushes not having all that much in RE'd content.

With that out of the way and the function deciphered, we can confirm the player hitboxes to be a constant 8×8 / 8×4 pixels, and prove that the hit stripes are nothing but an adequate optimization that doesn't affect gameplay in any way.


And what's the obvious thing to immediately do if you have both the collision bitmap and the player hitbox? Writing a "real hitbox" mod, of course:

  1. Reorder the calls to rendering functions so that player and shot sprites are rendered after bullets
  2. Blank out all player sprite pixels outside an 8×8 / 8×4 box around the center point
  3. After the bullet rendering function, turn on the GRCG in RMW mode and set the tile register set to the background color
  4. Stretch the negated contents of collision bitmap onto each playfield, leaving only collidable pixels untouched
  5. Do the same with the actual, non-negated contents and a white color, for extra contrast against the background. This also makes sure to show any collidable areas whose sprite pixels are transparent, such as with the moon enemy. (Yeah, how unfair.) Doing that also loses a lot of information about the playfield, such as enemy HP indicated by their color, but what can you do:
A decently busy TH03 in-game frame.The underlying content of the collision bitmap, showing off all three different shapes together with the player hitboxes.
A decently busy TH03 in-game frame and its underlying collision bitmap, showing off all three different collision shapes together with the player hitboxes.

2022-02-18-TH03-real-hitbox.zip The secret for writing such mods before having reached a sufficient level of position independence? Put your new code segment into DGROUP, past the end of the uninitialized data section. That's why this modded MAIN.EXE is a lot larger than you would expect from the raw amount of new code: The file now actually needs to store all these uninitialized 0 bytes between the end of the data segment and the first instruction of the mod code – normally, this number is simply a part of the MZ EXE header, and doesn't need to be redundantly stored on disk. Check the th03_real_hitbox branch for the code.

And now we know why so many "real hitbox" mods for the Windows Touhou games are inaccurate: The games would simply be unplayable otherwise – or can you dodge rapidly moving 2×2 / 2×1 blocks as an 8×8 / 8×4 rectangle that is smaller than your shot sprites, especially without focused movement? I can't. :tannedcirno: Maybe it will feel more playable after making explosions visible, but that would need more RE groundwork first.
It's also interesting how adding two full GRCG-accelerated redraws of both playfields per frame doesn't significantly drop the game's frame rate – so why did the drawing functions have to be micro-optimized again? It would be possible in one pass by using the GRCG's TDW mode, which should theoretically be 8× faster, but I have to stop somewhere. :onricdennat:

Next up: The final missing piece of TH04's and TH05's bullet-moving code, which will include a certain other type of projectile as well.

📝 Posted:
🚚 Summary of:
P0174, P0175, P0176, P0177, P0178, P0179, P0180, P0181
Commits:
27f901c...a0fe812, a0fe812...40ac9a7, 40ac9a7...c5dc45b, c5dc45b...5f0cabc, 5f0cabc...60621f8, 60621f8...9e5b344, 9e5b344...091f19f, 091f19f...313450f
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98+ th01+ gameplay- boss+ sariel+ danmaku-pattern+ rng+ waste+ pc98+ blitting+ good-code+ glitch+ tcc+ unused+ jank+

Here we go, TH01 Sariel! This is the single biggest boss fight in all of PC-98 Touhou: If we include all custom effect code we previously decompiled, it amounts to a total of 10.31% of all code in TH01 (and 3.14% overall). These 8 pushes cover the final 8.10% (or 2.47% overall), and are likely to be the single biggest delivery this project will ever see. Considering that I only managed to decompile 6.00% across all games in 2021, 2022 is already off to a much better start!

So, how can Sariel's code be that large? Well, we've got:

In total, it's 3,020 lines of C++ code, containing a total of 8 definite ZUN bugs, 3 of them being subpixel/pixel confusions. That might not look all too bad if you compare it to the 📝 player control function's 8 bugs in 900 lines of code, but given that Konngara had 0… And no, the code doesn't make it obvious whether ZUN coded Konngara or Sariel first; there's just as much evidence for either.

Some terminology before we start: Sariel's first form is separated into four phases, indicated by different background images, that cycle until Sariel's HP reach 0 and the second, single-phase form starts. The danmaku patterns within each phase are also on a cycle, and the game picks a random but limited number of patterns per phase before transitioning to the next one. The fight always starts at pattern 1 of phase 1 (the random purple lasers), and each new phase also starts at its respective first pattern.


Sariel's bugs already start at the graphics asset level, before any code gets to run. Some of the patterns include a wand raise animation, which is stored in BOSS6_2.BOS:

TH01 BOSS6_2.BOS
Umm… OK? The same sprite twice, just with slightly different colors? So how is the wand lowered again?

The "lowered wand" sprite is missing in this file simply because it's captured from the regular background image in VRAM, at the beginning of the fight and after every background transition. What I previously thought to be 📝 background storage code has therefore a different meaning in Sariel's case. Since this captured sprite is fully opaque, it will reset the entire 128×128 wand area… wait, 128×128, rather than 96×96? Yup, this lowered sprite is larger than necessary, wasting 1,967 bytes of conventional memory.
That still doesn't quite explain the second sprite in BOSS6_2.BOS though. Turns out that the black part is indeed meant to unblit the purple reflection (?) in the first sprite. But… that's not how you would correctly unblit that?

VRAM after blitting the first sprite of TH01's BOSS6_2.BOS VRAM after blitting the second sprite of TH01's BOSS6_2.BOS

The first sprite already eats up part of the red HUD line, and the second one additionally fails to recover the seal pixels underneath, leaving a nice little black hole and some stray purple pixels until the next background transition. :tannedcirno: Quite ironic given that both sprites do include the right part of the seal, which isn't even part of the animation.


Just like Konngara, Sariel continues the approach of using a single function per danmaku pattern or custom entity. While I appreciate that this allows all pattern- and entity-specific state to be scoped locally to that one function, it quickly gets ugly as soon as such a function has to do more than one thing.
The "bird function" is particularly awful here: It's just one if(…) {…} else if(…) {…} else if(…) {…} chain with different branches for the subfunction parameter, with zero shared code between any of these branches. It also uses 64-bit floating-point double as its subpixel type… and since it also takes four of those as parameters (y'know, just in case the "spawn new bird" subfunction is called), every call site has to also push four double values onto the stack. Thanks to Turbo C++ even using the FPU for pushing a 0.0 constant, we have already reached maximum floating-point decadence before even having seen a single danmaku pattern. Why decadence? Every possible spawn position and velocity in both bird patterns just uses pixel resolution, with no fractional component in sight. And there goes another 720 bytes of conventional memory.

Speaking about bird patterns, the red-bird one is where we find the first code-level ZUN bug: The spawn cross circle sprite suddenly disappears after it finished spawning all the bird eggs. How can we tell it's a bug? Because there is code to smoothly fly this sprite off the playfield, that code just suddenly forgets that the sprite's position is stored in Q12.4 subpixels, and treats it as raw screen pixels instead. :zunpet: As a result, the well-intentioned 640×400 screen-space clipping rectangle effectively shrinks to 38×23 pixels in the top-left corner of the screen. Which the sprite is always outside of, and thus never rendered again.
The intended animation is easily restored though:

Sariel's third pattern, and the first to spawn birds, in its original and fixed versions. Note that I somewhat fixed the bird hatch animation as well: ZUN's code never unblits any frame of animation there, and simply blits every new one on top of the previous one.

Also, did you know that birds actually have a quite unfair 14×38-pixel hitbox? Not that you'd ever collide with them in any of the patterns…

Another 3 of the 8 bugs can be found in the symmetric, interlaced spawn rays used in three of the patterns, and the 32×32 debris "sprites" shown at their endpoint, at the edge of the screen. You kinda have to commend ZUN's attention to detail here, and how he wrote a lot of code for those few rapidly animated pixels that you most likely don't even notice, especially with all the other wrong pixels resulting from rendering glitches. One of the bugs in the very final pattern of phase 4 even turns them into the vortex sprites from the second pattern in phase 1 during the first 5 frames of the first time the pattern is active, and I had to single-step the blitting calls to verify it.
It certainly was annoying how much time I spent making sense of these bugs, and all weird blitting offsets, for just a few pixels… Let's look at something more wholesome, shall we?


So far, we've only seen the PC-98 GRCG being used in RMW (read-modify-write) mode, which I previously 📝 explained in the context of TH01's red-white HP pattern. The second of its three modes, TCR (Tile Compare Read), affects VRAM reads rather than writes, and performs "color extraction" across all 4 bitplanes: Instead of returning raw 1bpp data from one plane, a VRAM read will instead return a bitmask, with a 1 bit at every pixel whose full 4-bit color exactly matches the color at that offset in the GRCG's tile register, and 0 everywhere else. Sariel uses this mode to make sure that the 2×2 particles and the wind effect are only blitted on top of "air color" pixels, with other parts of the background behaving like a mask. The algorithm:

  1. Set the GRCG to TCR mode, and all 8 tile register dots to the air color
  2. Read N bits from the target VRAM position to obtain an N-bit mask where all 1 bits indicate air color pixels at the respective position
  3. AND that mask with the alpha plane of the sprite to be drawn, shifted to the correct start bit within the 8-pixel VRAM byte
  4. Set the GRCG to RMW mode, and all 8 tile register dots to the color that should be drawn
  5. Write the previously obtained bitmask to the same position in VRAM

Quite clever how the extracted colors double as a secondary alpha plane, making for another well-earned good-code tag. The wind effect really doesn't deserve it, though:

As far as I can tell, ZUN didn't use TCR mode anywhere else in PC-98 Touhou. Tune in again later during a TH04 or TH05 push to learn about TDW, the final GRCG mode!


Speaking about the 2×2 particle systems, why do we need three of them? Their only observable difference lies in the way they move their particles:

  1. Up or down in a straight line (used in phases 4 and 2, respectively)
  2. Left or right in a straight line (used in the second form)
  3. Left and right in a sinusoidal motion (used in phase 3, the "dark orange" one)

Out of all possible formats ZUN could have used for storing the positions and velocities of individual particles, he chose a) 64-bit / double-precision floating-point, and b) raw screen pixels. Want to take a guess at which data type is used for which particle system?

If you picked double for 1) and 2), and raw screen pixels for 3), you are of course correct! :godzun: Not that I'm implying that it should have been the other way round – screen pixels would have perfectly fit all three systems use cases, as all 16-bit coordinates are extended to 32 bits for trigonometric calculations anyway. That's what, another 1.080 bytes of wasted conventional memory? And that's even calculated while keeping the current architecture, which allocates space for 3×30 particles as part of the game's global data, although only one of the three particle systems is active at any given time.

That's it for the first form, time to put on "Civilization of Magic"! Or "死なばもろとも"? Or "Theme of 地獄めくり"? Or whatever SYUGEN is supposed to mean…


… and the code of these final patterns comes out roughly as exciting as their in-game impact. With the big exception of the very final "swaying leaves" pattern: After 📝 Q4.4, 📝 Q28.4, 📝 Q24.8, and double variables, this pattern uses… decimal subpixels? Like, multiplying the number by 10, and using the decimal one's digit to represent the fractional part? Well, sure, if you really insist on moving the leaves in cleanly represented integer multiples of ⅒, which is infamously impossible in IEEE 754. Aside from aesthetic reasons, it only really combines less precision (10 possible fractions rather than the usual 16) with the inferior performance of having to use integer divisions and multiplications rather than simple bit shifts. And it's surely not because the leaf sprites needed an extended integer value range of [-3276, +3276], compared to Q12.4's [-2047, +2048]: They are clipped to 640×400 screen space anyway, and are removed as soon as they leave this area.

This pattern also contains the second bug in the "subpixel/pixel confusion hiding an entire animation" category, causing all of BOSS6GR4.GRC to effectively become unused:

The "swaying leaves" pattern. ZUN intended a splash animation to be shown once each leaf "spark" reaches the top of the playfield, which is never displayed in the original game.

At least their hitboxes are what you would expect, exactly covering the 30×30 pixels of Reimu's sprite. Both animation fixes are available on the th01_sariel_fixes branch.

After all that, Sariel's main function turned out fairly unspectacular, just putting everything together and adding some shake, transition, and color pulse effects with a bunch of unnecessary hardware palette changes. There is one reference to a missing BOSS6.GRP file during the first→second form transition, suggesting that Sariel originally had a separate "first form defeat" graphic, before it was replaced with just the shaking effect in the final game.
Speaking about the transition code, it is kind of funny how the… um, imperative and concrete nature of TH01 leads to these 2×24 lines of straight-line code. They kind of look like ZUN rattling off a laundry list of subsystems and raw variables to be reinitialized, making damn sure to not forget anything.


Whew! Second PC-98 Touhou boss completely decompiled, 29 to go, and they'll only get easier from here! 🎉 The next one in line, Elis, is somewhere between Konngara and Sariel as far as x86 instruction count is concerned, so that'll need to wait for some additional funding. Next up, therefore: Looking at a thing in TH03's main game code – really, I have little idea what it will be!

Now that the store is open again, also check out the 📝 updated RE progress overview I've posted together with this one. In addition to more RE, you can now also directly order a variety of mods; all of these are further explained in the order form itself.

📝 Posted:
🚚 Summary of:
P0165, P0166, P0167
Commits:
7a0e5d8...f2bca01, f2bca01...e697907, e697907...c2de6ab
💰 Funded by:
Ember2528
🏷 Tags:
rec98+ th01+ gameplay- animation+ blitting+ bullet+ glitch+ waste+ boss+ singyoku+ mima-th01+ elis+ tcc+ menu+

OK, TH01 missile bullets. Can we maybe have a well-behaved entity type, without any weirdness? Just once?

Ehh, kinda. Apart from another 150 bytes wasted on unused structure members, this code is indeed more on the low end in terms of overall jank. It does become very obvious why dodging these missiles in the YuugenMagan, Mima, and Elis fights feels so awful though: An unfair 46×46 pixel hitbox around Reimu's center pixel, combined with the comeback of 📝 interlaced rendering, this time in every stage. ZUN probably did this because missiles are the only 16×16 sprite in TH01 that is blitted to unaligned X positions, which effectively ends up touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would have been totally possible to render every missile in every frame at roughly the same amount of CPU time that the original game uses for interlaced rendering:

That's an optimization that would have significantly benefitted the game, in contrast to all of the fake ones introduced in later games. Then again, this optimization is actually something that the later games do, and it might have in fact been necessary to achieve their higher bullet counts without significant slowdown.

Unfortunately, it was only worth decompiling half of the missile code right now, thanks to gratuitous FPU usage in the other half, where 📝 double variables are compared to float literals. That one will have to wait 📝 until after SinGyoku.


After some effectively unused Mima sprite effect code that is so broken that it's impossible to make sense out of it, we get to the final feature I wanted to cover for all bosses in parallel before returning to Sariel: The separate sprite background storage for moving or animated boss sprites in the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from main memory on every frame. After all, TH01 and TH02 had a minimum required clock speed of 33 MHz, half of the speed required for the later three games. So, he simply blitted these boss sprites to both VRAM pages, leading the usual unblitting calls to only remove the other sprites on top of the boss. However, these bosses themselves want to move across the screen… and this makes it necessary to save the stage background behind them in some other way.

Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM into a sprite slot. No problem with that approach in theory, as the size of all these bigger sprites is a multiple of 32×32; splitting a larger sprite into these smaller 32×32 chunks makes the code look just a little bit clumsy (and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is blitted on top of her regular sprite, using just regular sprite transparency, she ends up with her infamous third arm:

TH01 Mima's third arm

Ironically, there's an unused code path in Mima's unblit function where ZUN assumes a height of 48 pixels for Mima's animation sprites rather than the actual 64. This leads to even clumsier .PTN function calls for the bottom 128×16 pixels… Failing to unblit the bottom 16 pixels would have also yielded that third arm, although it wouldn't have looked as natural. Still wouldn't say that it was intentional; maybe this casting sprite was just added pretty late in the game's development?


So, mission accomplished, Sariel unblocked… at 2¼ pushes. :thonk: That's quite some time left for some smaller stage initialization code, which bundles a bunch of random function calls in places where they logically really don't belong. The stage opening animation then adds a bunch of VRAM inter-page copies that are not only redundant but can't even be understood without knowing the hidden internal state of the last VRAM page accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any complexity limit on inlining arithmetic expressions, as long as they only operate on compile-time constants. That's how we get macro-free, compile-time Shift-JIS to JIS X 0208 conversion of the individual code points in the 東方★靈異伝 string, in a compiler from 1994. As long as you don't store any intermediate results in variables, that is… :tannedcirno:

But wait, there's more! With still ¼ of a push left, I also went for the boss defeat animation, which includes the route selection after the SinGyoku fight.
As in all other instances, the 2× scaled font is accomplished by first rendering the text at regular 1× resolution to the other, invisible VRAM page, and then scaled from there to the visible one. However, the route selection is unique in that its scaled text is both drawn transparently on top of the stage background (not onto a black one), and can also change colors depending on the selection. It would have been no problem to unblit and reblit the text by rendering the 1× version to a position on the invisible VRAM page that isn't covered by the 2× version on the visible one, but ZUN (needlessly) clears the invisible page before rendering any text. :zunpet: Instead, he assigned a separate VRAM color for both the 魔界 and 地獄 options, and only changed the palette value for these colors to white or gray, depending on the correct selection. This is another one of the 📝 rare cases where TH01 demonstrates good use of PC-98 hardware, as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.

Then, why does this still not count as good-code? When changing palette colors, you kinda need to be aware of everything else that can possibly be on screen, which colors are used there, and which aren't and can therefore be used for such an effect without affecting other sprites. In this case, well… hover over the image below, and notice how Reimu's hair and the bomb sprites in the HUD light up when Makai is selected:

Demonstration of palette changes in TH01's route selection

This push did end on a high note though, with the generic, non-SinGyoku version of the defeat animation being an easily parametrizable copy. And that's how you decompile another 2.58% of TH01 in just slightly over three pushes.


Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and SinGyoku without needing any more detours into non-boss code. Thanks to the current TH01 funding subscriptions, I can plan to cover most, if not all, of Sariel in a single push series, but the currently 3 pending pushes probably won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got quite a lot of not specifically TH01-related funds in the backlog to pass the time though.

Due to recent developments, it actually makes quite a lot of sense to take a break from TH01: spaztron64 has managed what every Touhou download site so far has failed to do: Bundling all 5 game onto a single .HDI together with pre-configured PC-98 emulators and a nice boot menu, and hosting the resulting package on a proper website. While this first release is already quite good (and much better than my attempt from 2014), there is still a bit of room for improvement to be gained from specific ReC98 research. Next up, therefore:

📝 Posted:
🚚 Summary of:
P0162, P0163, P0164
Commits:
81dd96e...24b3a0d, 24b3a0d...6d572b3, 6d572b3...7a0e5d8
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98+ th01+ gameplay- player+ shot+ animation+ glitch+ waste+ jank+ unused+

No technical obstacles for once! Just pure overcomplicated ZUN code. Unlike 📝 Konngara's main function, the main TH01 player function was every bit as difficult to decompile as you would expect from its size.

With TH01 using both separate left- and right-facing sprites for all of Reimu's moves and separate classes for Reimu's 32×32 and 48×* sprites, we're already off to a bad start. Sure, sprite mirroring is minimally more involved on PC-98, as the planar nature of VRAM requires the bits within an 8-pixel byte to also be mirrored, in addition to writing the sprite bytes from right to left. TH03 uses a 256-byte lookup table for this, generated at runtime by an infamous micro-optimized and undecompilable ASM algorithm. With TH01's existing architecture, ZUN would have then needed to write 3 additional blitting functions. But instead, he chose to waste a total of 26,112 bytes of memory on pre-mirrored sprites… :godzun:

Alright, but surely selecting those sprites from code is no big deal? Just store the direction Reimu is facing in, and then add some branches to the rendering code. And there is in fact a variable for Reimu's direction… during regular arrow-key movement, and another one while shooting and sliding, and a third as part of the special attack types, launched out of a slide.
Well, OK, technically, the last two are the same variable. But that's even worse, because it means that ZUN stores two distinct enums at the same place in memory: Shooting and sliding uses 1 for left, 2 for right, and 3 for the "invalid" direction of holding both, while the special attack types indicate the direction in their lowest bit, with 0 for right and 1 for left. I decompiled the latter as bitflags, but in ZUN's code, each of the 8 permutations is handled as a distinct type, with copy-pasted and adapted code… :zunpet: The interpretation of this two-enum "sub-mode" union variable is controlled by yet another "mode" variable… and unsurprisingly, two of the bugs in this function relate to the sub-mode variable being interpreted incorrectly.

Also, "rendering code"? This one big function basically consists of separate unblit→update→render code snippets for every state and direction Reimu can be in (moving, shooting, swinging, sliding, special-attacking, and bombing), pasted together into a tangled mess of nested if(…) statements. While a lot of the code is copy-pasted, there are still a number of inconsistencies that defeat the point of my usual refactoring treatment. After all, with a total of 85 conditional branches, anything more than I did would have just obscured the control flow too badly, making it even harder to understand what's going on.
In the end, I spotted a total of 8 bugs in this function, all of which leave Reimu invisible for one or more frames:

Thanks to the last one, Reimu's first swing animation frame is never actually rendered. So whenever someone complains about TH01 sprite flickering on an emulator: That emulator is accurate, it's the game that's poorly written. :tannedcirno:

And guess what, this function doesn't even contain everything you'd associate with per-frame player behavior. While it does handle Yin-Yang Orb repulsion as part of slides and special attacks, it does not handle the actual player/Orb collision that results in lives being lost. The funny thing about this: These two things are done in the same function… :onricdennat:

Therefore, the life loss animation is also part of another function. This is where we find the final glitch in this 3-push series: Before the 16-frame shake, this function only unblits a 32×32 area around Reimu's center point, even though it's possible to lose a life during the non-deflecting part of a 48×48-pixel animation. In that case, the extra pixels will just stay on screen during the shake. They are unblitted afterwards though, which suggests that ZUN was at least somewhat aware of the issue?
Finally, the chance to see the alternate life loss sprite Alternate TH01 life loss sprite is exactly ⅛.


As for any new insights into game mechanics… you know what? I'm just not going to write anything, and leave you with this flowchart instead. Here's the definitive guide on how to control Reimu in TH01 we've been waiting for 24 years:

(SVG download)

Pellets are deflected during all gray states. Not shown is the obvious "double-tap Z and X" transition from all non-(#1) states to the Bomb state, but that would have made this diagram even more unwieldy than it turned out. And yes, you can shoot twice as fast while moving left or right.

While I'm at it, here are two more animations from MIKO.PTN which aren't referenced by any code:

An unused animation from TH01's MIKO.PTNAn unused animation from TH01's MIKO.PTN

With that monster of a function taken care of, we've only got boss sprite animation as the final blocker of uninterrupted Sariel progress. Due to some unfavorable code layout in the Mima segment though, I'll need to spend a bit more time with some of the features used there. Next up: The missile bullets used in the Mima and YuugenMagan fights.

📝 Posted:
🚚 Summary of:
P0160, P0161
Commits:
e491cd7...42ba4a5, 42ba4a5...81dd96e
💰 Funded by:
Yanga, [Anonymous]
🏷 Tags:
rec98+ th01+ gameplay- resident+ bullet+ boss+ mima-th01+ animation+ waste+ glitch+ tcc+

Nothing really noteworthy in TH01's stage timer code, just yet another HUD element that is needlessly drawn into VRAM. Sure, ZUN applies his custom boldfacing effect on top of the glyphs retrieved from font ROM, but he could have easily installed those modified glyphs as gaiji.
Well, OK, halfwidth gaiji aren't exactly well documented, and sometimes not even correctly emulated 📝 due to the same PC-98 hardware oddity I was researching last month. I've reserved two of the pending anonymous "anything" pushes for the conclusion of this research, just in case you were wondering why the outstanding workload is now lower after the two delivered here.

And since it doesn't seem to be clearly documented elsewhere: Every 2 ticks on the stage timer correspond to 4 frames.


So, TH01 rank pellet speed. The resident pellet speed value is a factor ranging from a minimum of -0.375 up to a maximum of 0.5 (pixels per frame), multiplied with the difficulty-adjusted base speed for each pellet and added on top of that same speed. This multiplier is modified

Apparently, ZUN noted that these deltas couldn't be losslessly stored in an IEEE 754 floating-point variable, and therefore didn't store the pellet speed factor exactly in a way that would correspond to its gameplay effect. Instead, it's stored similar to Q12.4 subpixels: as a simple integer, pre-multiplied by 40. This results in a raw range of -15 to 20, which is what the undecompiled ASM calls still use. When spawning a new pellet, its base speed is first multiplied by that factor, and then divided by 40 again. This is actually quite smart: The calculation doesn't need to be aware of either Q12.4 or the 40× format, as ((Q12.4 * factor×40) / factor×40) still comes out as a Q12.4 subpixel even if all numbers are integers. The only limiting issue here would be the potential overflow of the 16-bit multiplication at unadjusted base speeds of more than 50 pixels per frame, but that'd be seriously unplayable.
So yeah, pellet speed modifications are indeed gradual, and don't just fall into the coarse three "high, normal, and low" categories.


That's ⅝ of P0160 done, and the continue and pause menus would make good candidates to fill up the remaining ⅜… except that it seemed impossible to figure out the correct compiler options for this code?
The issues centered around the two effects of Turbo C++ 4.0J's -O switch:

  1. Optimizing jump instructions: merging duplicate successive jumps into a single one, and merging duplicated instructions at the end of conditional branches into a single place under a single branch, which the other branches then jump to
  2. Compressing ADD SP and POP CX stack-clearing instructions after multiple successive CALLs to __cdecl functions into a single ADD SP with the combined parameter stack size of all function calls

But how can the ASM for these functions exhibit #1 but not #2? How can it be seemingly optimized and unoptimized at the same time? The only option that gets somewhat close would be -O- -y, which emits line number information into the .OBJ files for debugging. This combination provides its own kind of #1, but these functions clearly need the real deal.

The research into this issue ended up consuming a full push on its own. In the end, this solution turned out to be completely unrelated to compiler options, and instead came from the effects of a compiler bug in a totally different place. Initializing a local structure instance or array like

const uint4_t flash_colors[3] = { 3, 4, 5 };

always emits the { 3, 4, 5 } array into the program's data segment, and then generates a call to the internal SCOPY@ function which copies this data array to the local variable on the stack. And as soon as this SCOPY@ call is emitted, the -O optimization #1 is disabled for the entire rest of the translation unit?!
So, any code segment with an SCOPY@ call followed by __cdecl functions must strictly be decompiled from top to bottom, mirroring the original layout of translation units. That means no TH01 continue and pause menus before we haven't decompiled the bomb animation, which contains such an SCOPY@ call. 😕
Luckily, TH01 is the only game where this bug leads to significant restrictions in decompilation order, as later games predominantly use the pascal calling convention, in which each function itself clears its stack as part of its RET instruction.


What now, then? With 51% of REIIDEN.EXE decompiled, we're slowly running out of small features that can be decompiled within ⅜ of a push. Good that I haven't been looking a lot into OP.EXE and FUUIN.EXE, which pretty much only got easy pieces of code left to do. Maybe I'll end up finishing their decompilations entirely within these smaller gaps?
I still ended up finding one more small piece in REIIDEN.EXE though: The particle system, seen in the Mima fight.

I like how everything about this animation is contained within a single function that is called once per frame, but ZUN could have really consolidated the spawning code for new particles a bit. In Mima's fight, particles are only spawned from the top and right edges of the screen, but the function in fact contains unused code for all other 7 possible directions, written in quite a bloated manner. This wouldn't feel quite as unused if ZUN had used an angle parameter instead… :thonk: Also, why unnecessarily waste another 40 bytes of the BSS segment?

But wait, what's going on with the very first spawned particle that just stops near the bottom edge of the screen in the video above? Well, even in such a simple and self-contained function, ZUN managed to include an off-by-one error. This one then results in an out-of-bounds array access on the 80th frame, where the code attempts to spawn a 41st particle. If the first particle was unlucky to be both slow enough and spawned away far enough from the bottom and right edges, the spawning code will then kill it off before its unblitting code gets to run, leaving its pixel on the screen until something else overlaps it and causes it to be unblitted.
Which, during regular gameplay, will quickly happen with the Orb, all the pellets flying around, and your own player movement. Also, the RNG can easily spawn this particle at a position and velocity that causes it to leave the screen more quickly. Kind of impressive how ZUN laid out the structure of arrays in a way that ensured practically no effect of this bug on the game; this glitch could have easily happened every 80 frames instead. He almost got close to all bugs canceling out each other here! :godzun:

Next up: The player control functions, including the second-biggest function in all of PC-98 Touhou.

📝 Posted:
🚚 Summary of:
P0158, P0159
Commits:
bf7bb7e...c0c0ebc, c0c0ebc...e491cd7
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ gameplay- card-flipping+ rng+ score+ bomb+ animation+ jank+ waste+

Of course, Sariel's potentially bloated and copy-pasted code is blocked by even more definitely bloated and copy-pasted code. It's TH01, what did you expect? :tannedcirno:

But even then, TH01's item code is on a new level of software architecture ridiculousness. First, ZUN uses distinct arrays for both types of items, with their own caps of 4 for bomb items, and 10 for point items. Since that obviously makes any type-related switch statement redundant, he also used distinct functions for both types, with copy-pasted boilerplate code. The main per-item update and render function is shared though… and takes every single accessed member of the item structure as its own reference parameter. Like, why, you have a structure, right there?! That's one way to really practice the C++ language concept of passing arbitrary structure fields by mutable reference… :zunpet:
To complete the unwarranted grand generic design of this function, it calls back into per-type collision detection, drop, and collect functions with another three reference parameters. Yeah, why use C++ virtual methods when you can also implement the effectively same polymorphism functionality by hand? Oh, and the coordinate clamping code in one of these callbacks could only possibly have come from nested min() and max() preprocessor macros. And that's how you extend such dead-simple functionality to 1¼ pushes…

Amidst all this jank, we've at least got a sensible item↔player hitbox this time, with 24 pixels around Reimu's center point to the left and right, and extending from 24 pixels above Reimu down to the bottom of the playfield. It absolutely didn't look like that from the initial naive decompilation though. Changing entity coordinates from left/top to center was one of the better lessons from TH01 that ZUN implemented in later games, it really makes collision detection code much more intuitive to grasp.


The card flip code is where we find out some slightly more interesting aspects about item drops in this game, and how they're controlled by a hidden cycle variable:

Then again, score players largely ignore point items anyway, as card combos simply have a much bigger effect on the score. With this, I should have RE'd all information necessary to construct a tool-assisted score run, though?
Edit: Turns out that 1) point items are becoming increasingly important in score runs, and 2) Pearl already did a TAS some months ago. Thanks to spaztron64 for the info!

The Orb↔card hitbox also makes perfect sense, with 24 pixels around the center point of a card in every direction.

The rest of the code confirms the card flip score formula documented on Touhou Wiki, as well as the way cards are flipped by bombs: During every of the 90 "damaging" frames of the 140-frame bomb animation, there is a 75% chance to flip the card at the [bomb_frame % total_card_count_in_stage] array index. Since stages can only have up to 50 cards 📝 thanks to a bug, even a 75% chance is high enough to typically flip most cards during a bomb. Each of these flips still only removes a single card HP, just like after a regular collision with the Orb.
Also, why are the card score popups rendered before the cards themselves? That's two needless frames of flicker during that 25-frame animation. Not all too noticeable, but still.


And that's over 50% of REIIDEN.EXE decompiled as well! Next up: More HUD update and rendering code… with a direct dependency on rank pellet speed modifications?

📝 Posted:
🚚 Summary of:
P0153, P0154, P0155, P0156
Commits:
624e0cb...d05c9ba, d05c9ba...031b526, 031b526...9ad578e, 9ad578e...4bc6405
💰 Funded by:
Ember2528
🏷 Tags:
rec98+ th01+ gameplay- boss+ konngara+ danmaku-pattern+ waste+ rng+

📝 7 pushes to get Konngara done, according to my previous estimate? Well, how about being twice as fast, and getting the entire boss fight done in 3.5 pushes instead? So much copy-pasted code in there… without any flashy unused content, apart from four calculations with an unclear purpose. And the three strings "ANGEL", "OF", "DEATH", which were probably meant to be rendered using those giant upscaled font ROM glyphs that also display the STAGE # and HARRY UP strings? Those three strings are also part of Sariel's code, though.

On to the remaining 11 patterns then! Konngara's homing snakes, shown in the video above, are one of the more notorious parts of this battle. They occur in two patterns – one with two snakes and one with four – with all of the spawn, aim, update, and render code copy-pasted between the two. :zunpet: Three gameplay-related discoveries here:


This was followed by really weird aiming code for the "sprayed pellets from cup" pattern… which can only possibly have been done on purpose, but is sort of mitigated by the spraying motion anyway.
After a bunch of long if(…) {…} else if(…) {…} else if(…) {…} chains, which remain quite popular in certain corners of the game dev scene to this day, we've got the three sword slash patterns as the final notable ones. At first, it seemed as if ZUN just improvised those raw number constants involved in the pellet spawner's movement calculations to describe some sort of path that vaguely resembles the sword slash. But once I tried to express these numbers in terms of the slash animation's keyframes, it all worked out perfectly, and resulted in this:

Triangular path of the pellet spawner during Konngara's slash patterns

Yup, the spawner always takes an exact path along this triangle. Sometimes, I wonder whether I should just rush this project and don't bother about naming these repeated number literals. Then I gain insights like these, and it's all worth it.


Finally, we've got Konngara's main function, which coordinates the entire fight. Third-longest function in both TH01 and all of PC-98 Touhou, only behind some player-related stuff and YuugenMagan's gigantic main function… and it's even more of a copy-pasta, making it feel not nearly as long as it is. Key insights there:

Seriously, 📝 line drawing was much harder to decompile.


And that's it for Konngara! First boss with not a single piece of ASM left, 30 more to go! 🎉 But wait, what about the cause behind the temporary green discoloration after leaving the Pause menu? I expected to find something on that as well, but nope, it's nothing in Konngara's code segment. We'll probably only get to figure that out near the very end of TH01's decompilation, once we get to the one function that directly calls all of the boss-specific main functions in a switch statement.

So, Sariel next? With half of a push left, I did cover Sariel's first few initialization functions, but all the sprite unblitting and HUD manipulation will need some extra attention first. The first one of these functions is related to the HUD, the stage timer, and the HARRY UP mode, whose pellet pattern I've also decompiled now.

All of this brings us past 75% PI in all games, and TH01 to under 30,000 remaining ASM instructions, leaving TH03 as the now most expensive game to be completely decompiled. Looking forward to how much more TH01's code will fall apart if you just tap it lightly… Next up: The aforementioned helper functions related to HARRY UP, drawing the HUD, and unblitting the other bosses whose sprites are a bit more animated.

📝 Posted:
🚚 Summary of:
P0149, P0150, P0151, P0152
Commits:
e1a26bb...05e4c4a, 05e4c4a...768251d, 768251d...4d24ca5, 4d24ca5...81fc861
💰 Funded by:
Blue Bolt, Ember2528, -Tom-, [Anonymous]
🏷 Tags:
rec98+ th04+ th05+ gameplay- bullet+ animation+ score+ glitch+ jank+ waste+ micro-optimization+ tcc+ uth05win+

…or maybe not that soon, as it would have only wasted time to untangle the bullet update commits from the rest of the progress. So, here's all the bullet spawning code in TH04 and TH05 instead. I hope you're ready for this, there's a lot to talk about!

(For the sake of readability, "bullets" in this blog post refers to the white 8×8 pellets and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)


But first, what was going on 📝 in 2020? Spent 4 pushes on the basic types and constants back then, still ended up confusing a couple of things, and even getting some wrong. Like how TH05's "bullet slowdown" flag actually always prevents slowdown and fires bullets at a constant speed instead. :tannedcirno: Or how "random spread" is not the best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen, which deserve different names:

Mechanic #1: Clearing bullets for a custom amount of time, awarding 1000 points for all bullets alive on the first frame, and 100 points for all bullets spawned during the clear time.
Mechanic #2: Zapping bullets for a fixed 16 frames, awarding a semi-exponential and loudly announced Bonus!! for all bullets alive on the first frame, and preventing new bullets from being spawned during those 16 frames. In TH04 at least; thanks to a ZUN bug, zapping got reduced to 1 frame and no animation in TH05…

Bullets are zapped at the end of most midboss and boss phases, and cleared everywhere else – most notably, during bombs, when losing a life, or as rewards for extends or a maximized Dream bonus. The Bonus!! points awarded for zapping bullets are calculated iteratively, so it's not trivial to give an exact formula for these. For a small number 𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛 points – or, using uth05win's (correct) recursive definition, Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10. However, one of the internal step variables is capped at a different number of points for each difficulty (and game), after which the points only increase linearly. Hence, "semi-exponential".


On to TH04's bullet spawn code then, because that one can at least be decompiled. And immediately, we have to deal with a pointless distinction between regular bullets, with either a decelerating or constant velocity, and special bullets, with preset velocity changes during their lifetime. That preset has to be set somewhere, so why have separate functions? In TH04, this separation continues even down to the lowest level of functions, where values are written into the global bullet array. TH05 merges those two functions into one, but then goes too far and uses self-modifying code to save a grand total of two local variables… Luckily, the rest of its actual code is identical to TH04.

Most of the complexity in bullet spawning comes from the (thankfully shared) helper function that calculates the velocities of the individual bullets within a group. Both games handle each group type via a large switch statement, which is where TH04 shows off another Turbo C++ 4.0 optimization: If the range of case values is too sparse to be meaningfully expressed in a jump table, it usually generates a linear search through a second value table. But with the -G command-line option, it instead generates branching code for a binary search through the set of cases. 𝑂(log 𝑛) as the worst case for a switch statement in a C++ compiler from 1994… that's so cool. But still, why are the values in TH04's group type enum all over the place to begin with? :onricdennat:
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only shows up here and in a few places in TH02, compared to at least 50 switch value tables.

In all of its micro-optimized pointlessness, TH05's undecompilable version at least fixes some of TH04's redundancy. While it's still not even optimal, it's at least a decently written piece of ASM… if you take the time to understand what's going on there, because it certainly took quite a bit of that to verify that all of the things which looked like bugs or quirks were in fact correct. And that's how the code for this function ended up with 35% comments and blank lines before I could confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04, where an invalid bullet group type would fill all remaining slots in the bullet array with identical versions of the first bullet.

Something that both games also share in these functions is an over-reliance on globals for return values or other local state. The most ridiculous example here: Tuning the speed of a bullet based on rank actually mutates the global bullet template… which ZUN then works around by adding a wrapper function around both regular and special bullet spawning, which saves the base speed before executing that function, and restores it afterward. :zunpet: Add another set of wrappers to bypass that exact tuning, and you've expanded your nice 1-function interface to 4 functions. Oh, and did I mention that TH04 pointlessly duplicates the first set of wrapper functions for 3 of the 4 difficulties, which can't even be explained with "debugging reasons"? That's 10 functions then… and probably explains why I've procrastinated this feature for so long.

At this point, I also finally stopped decompiling ZUN's original ASM just for the sake of it. All these small TH05 functions would look horribly unidiomatic, are identical to their decompiled TH04 counterparts anyway, except for some unique constant… and, in the case of TH05's rank-based speed tuning function, actually become undecompilable as soon as we want to return a C++ class to preserve the semantic meaning of the return value. Mainly, this is because Turbo C++ does not allow register pseudo-variables like _AX or _AL to be cast into class types, even if their size matches. Decompiling that function would have therefore lowered the quality of the rest of the decompiled code, in exchange for the additional maintenance and compile-time cost of another translation unit. Not worth it – and for a TH05 port, you'd already have to decompile all the rest of the bullet spawning code anyway!


The only thing in there that was still somewhat worth being decompiled was the pre-spawn clipping and collision detection function. Due to what's probably a micro-optimization mistake, the TH05 version continues to spawn a bullet even if it was spawned on top of the player. This might sound like it has a different effect on gameplay… until you realize that the player got hit in this case and will either lose a life or deathbomb, both of which will cause all on-screen bullets to be cleared anyway. So it's at most a visual glitch.

But while we're at it, can we please stop talking about hitboxes? At least in the context of TH04 and TH05 bullets. The actual collision detection is described way better as a kill delta of 8×8 pixels between the center points of the player and a bullet. You can distribute these pixels to any combination of bullet and player "hitboxes" that make up 8×8. 4×4 around both the player and bullets? 1×1 for bullets, and 8×8 for the player? All equally valid… or perhaps none of them, once you keep in mind that other entity types might have different kill deltas. With that in mind, the concept of a "hitbox" turns into just a confusing abstraction.

The same is true for the 36×44 graze box delta. For some reason, this one is not exactly around the center of a bullet, but shifted to the right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the player, but only up to 16 pixels left of the player. uth05win also spotted this… and rotated the deltas clockwise by 90°?!


Which brings us to the bullet updates… for which I still had to research a decompilation workaround, because 📝 P0148 turned out to not help at all? Instead, the solution was to lie to the compiler about the true segment distance of the popup function and declare its signature far rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function call/return per frame, and those precious 2 bytes in the BSS segment that he didn't have to spend on a segment value. 📝 Another function that didn't have just a single declaration in a common header file… really, 📝 how were these games even built???

The function itself is among the longer ones in both games. It especially stands out in the indentation department, with 7 levels at its most indented point – and that's the minimum of what's possible without goto. Only two more notable discoveries there:

  1. Bullets are the only entity affected by Slow Mode. If the number of bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04, or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame rate by 33%, by waiting for one additional VSync event every two frames.
    The code also reveals a second tier, with 50% slowdown for a slightly higher number of bullets, but that conditional branch can never be executed :zunpet:
  2. Bullets must have been grazed in a previous frame before they can be collided with. (Note how this does not apply to bullets that spawned on top of the player, as explained earlier!)

Whew… When did ReC98 turn into a full-on code review?! 😅 And after all this, we're still not done with TH04 and TH05 bullets, with all the special movement types still missing. That should be less than one push though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun rewriting the Touhou Wiki Gameplay pages 😛

📝 Posted:
🚚 Summary of:
P0140, P0141, P0142
Commits:
d985811...d856f7d, d856f7d...5afee78, 5afee78...08bc188
💰 Funded by:
[Anonymous], rosenrose, Yanga
🏷 Tags:
rec98+ th01+ pc98+ dosbox-x+ gameplay- boss+ konngara+ danmaku-pattern+

Alright, onto Konngara! Let's quickly move the escape sequences used later in the battle to C land, and then we can immediately decompile the loading and entrance animation function together with its filenames. Might as well reverse-engineer those escape sequences while I'm at it, though – even if they aren't implemented in DOSBox-X, they're well documented in all those Japanese PDFs, so this should be no big deal…

…wait, ESC )3 switches to "graph mode"? As opposed to the default "kanji mode", which can be re-entered via ESC )0? Let's look up graph mode in the PC-9801 Programmers' Bible then…

> Kanji cannot be handled in this mode.

…and that's apparently all it has to say. Why have it then, on a platform whose main selling point is a kanji ROM, and where Shift-JIS (and, well, 7-bit ASCII) are the only native encodings? No support for graph mode in DOSBox-X either… yeah, let's take a deep dive into NEC's IO.SYS, and get to the bottom of this.

And yes, graph mode pretty much just disables Shift-JIS decoding for characters written via INT 29h, the lowest-level way of "just printing a char" on DOS, which every printf() will ultimately end up calling. Turns out there is a use for it though, which we can spot by looking at the 8×16 half-width section of font ROM:

8×16 half-width section of font ROM, with the characters in the Shift-JIS lead byte range highlighted in red

The half-width glyphs marked in red correspond to the byte ranges from 0x80-0x9F and 0xE0-0xFF… which Shift-JIS defines as lead bytes for two-byte, full-width characters. But if we turn off Shift-JIS decoding…

Visible differences between the kanji and graph modes on PC-98 DOS
(Yes, that g in the function row is how NEC DOS indicates that graph mode is active. Try it yourself by pressing Ctrl+F4!)

Jackpot, we get those half-width characters when printing their corresponding bytes.
I've re-implemented all my findings into DOSBox-X, which will include graph mode in the upcoming 0.83.14 release. If P0140 looks a bit empty as a result, that's why – most of the immediate feature work went into DOSBox-X, not into ReC98. That's the beauty of "anything" pushes. :tannedcirno:

So, after switching to graph mode, TH01 does… one of the slowest possible memset()s over all of text RAM – one printf(" ") call for every single one of its 80×25 half-width cells – before switching back to kanji mode. What a waste of RE time…? Oh well, at least we've now got plenty of proof that these weird escape sequences actually do nothing of interest.


As for the Konngara code itself… well, it's script-like code, what can you say. Maybe minimally sloppy in some places, but ultimately harmless.
One small thing that might not be widely known though: The large, blue-green Siddhaṃ seed syllables are supposed to show up immediately, with no delay between them? Good to know. Clocking your emulator too low tends to roll them down from the top of the screen, and will certainly add a noticeable delay between the four individual images.

… Wait, but this means that ZUN could have intended this "effect". Why else would he not only put those syllables into four individual images (and therefore add at least the latency of disk I/O between them), but also show them on the foreground VRAM page, rather than on the "back buffer"?

Meanwhile, in 📝 another instance of "maybe having gone too far in a few places": Expressing distances on the playfield as fractions of its width and height, just to avoid absolute numbers? Raw numbers are bad because they're in screen space in this game. But we've already been throwing PLAYFIELD_ constants into the mix as a way of explicitly communicating screen space, and keeping raw number literals for the actual playfield coordinates is looking increasingly sloppy… I don't know, fractions really seemed like the most sensible thing to do with what we're given here. 😐


So, 2 pushes in, and we've got the loading code, the entrance animation, facial expression rendering, and the first one out of Konngara's 12 danmaku patterns. Might not sound like much, but since that first pattern involves those ◆ blue-green diamond sprites and therefore is one of the more complicated ones, it all amounts to roughly 21.6% of Konngara's code. That's 7 more pushes to get Konngara done, then? Next up though: Two pushes of website improvements.

📝 Posted:
🚚 Summary of:
P0130, P0131
Commits:
6d69ea8...576def5, 576def5...dc9e3ee
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ pc98+ blitting+ good-code+ jank+ gameplay- boss+ rng+

50% hype! 🎉 But as usual for TH01, even that final set of functions shared between all bosses had to consume two pushes rather than one…

First up, in the ongoing series "Things that TH01 draws to the PC-98 graphics layer that really should have been drawn to the text layer instead": The boss HP bar. Oh well, using the graphics layer at least made it possible to have this half-red, half-white pattern for the middle section.
This one pattern is drawn by making surprisingly good use of the GRCG. So far, we've only seen it used for fast monochrome drawing:

// Setting up fast drawing using color #9 (1001 in binary)
grcg_setmode(GC_RMW);
outportb(0x7E, 0xFF); // Plane 0: (B): (********)
outportb(0x7E, 0x00); // Plane 1: (R): (        )
outportb(0x7E, 0x00); // Plane 2: (G): (        )
outportb(0x7E, 0xFF); // Plane 3: (E): (********)

// Write a checkerboard pattern (* * * * ) in color #9 to the top-left corner,
// with transparent blanks. Requires only 1 VRAM write to a single bitplane:
// The GRCG automatically writes to the correct bitplanes, as specified above
*(uint8_t *)(MK_FP(0xA800, 0)) = 0xAA;

But since this is actually an 8-pixel tile register, we can set any 8-pixel pattern for any bitplane. This way, we can get different colors for every one of the 8 pixels, with still just a single VRAM write of the alpha mask to a single bitplane:

grcg_setmode(GC_RMW); //  Final color: (A7A7A7A7)
outportb(0x7E, 0x55); // Plane 0: (B): ( * * * *)
outportb(0x7E, 0xFF); // Plane 1: (R): (********)
outportb(0x7E, 0x55); // Plane 2: (G): ( * * * *)
outportb(0x7E, 0xAA); // Plane 3: (E): (* * * * )

And I thought TH01 only suffered the drawbacks of PC-98 hardware, making so little use of its actual features that it's perhaps not fair to even call it "a PC-98 game"… Still, I'd say that "bad PC-98 port of an idea" describes it best.

However, after that tiny flash of brilliance, the surrounding HP rendering code goes right back to being the typical sort of confusing TH01 jank. There's only a single function for the three distinct jobs of

with magic numbers to select between all of these.

VRAM of course also means that the backgrounds behind the individual hit points have to be stored, so that they can be unblitted later as the boss is losing HP. That's no big deal though, right? Just allocate some memory, copy what's initially in VRAM, then blit it back later using your foundational set of blitting funct– oh, wait, TH01 doesn't have this sort of thing, right :tannedcirno: The closest thing, 📝 once again, are the .PTN functions. And so, the game ends up handling these 8×16 background sprites with 16×16 wrappers around functions for 32×32 sprites. :zunpet: That's quite the recipe for confusion, especially since ZUN preferred copy-pasting the necessary ridiculous arithmetic expressions for calculating positions, .PTN sprite IDs, and the ID of the 16×16 quarter inside the 32×32 sprite, instead of just writing simple helper functions. He did manage to make the result mostly bug-free this time around, though! (Edit (2022-05-31): Nope, there's a 📝 potential heap corruption after all, which can be triggered in some fights in debug mode.) There's one minor hit point discoloration bug if the red-white or white sections start at an odd number of hit points, but that's never the case for any of the original 7 bosses.
The remaining sloppiness is ultimately inconsequential as well: The game always backs up twice the number of hit point backgrounds, and thus uses twice the amount of memory actually required. Also, this self-restriction of only unblitting 16×16 pixels at a time requires any remaining odd hit point at the last position to, of course, be rendered again :onricdennat:


After stumbling over the weakest imaginable random number generator, we finally arrive at the shared boss↔orb collision handling function, the final blocker among the final blockers. This function takes a whopping 12 parameters, 3 of them being references to int values, some of which are duplicated for every one of the 7 bosses, with no generic boss struct anywhere. 📝 Previously, I speculated that YuugenMagan might have been the first boss to be programmed for TH01. With all these variables though, there is some new evidence that SinGyoku might have been the first one after all: It's the only boss to use its own HP and phase frame variables, with the other bosses sharing the same two globals.

While this function only handles the response to a boss↔orb collision, it still does way too much to describe it briefly. Took me quite a while to frame it in terms of invincibility (which is the main impact of all of this that can be observed in gameplay code). That made at least some sort of sense, considering the other usages of the variables passed as references to that function. Turns out that YuugenMagan, Kikuri, and Elis abuse what's meant to be the "invincibility frame" variable as a frame counter for some of their animations 🙄
Oh well, the game at least doesn't call the collision handling function during those, so "invincibility frame" is technically still a correct variable name there.


And that's it! We're finally ready to start with Konngara, in 2021. I've been waiting quite a while for this, as all this high-level boss code is very likely to speed up TH01 progress quite a bit. Next up though: Closing out 2020 with more of the technical debt in the other games.

📝 Posted:
🚚 Summary of:
P0128, P0129
Commits:
dc65b59...dde36f7, dde36f7...f4c2e45
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ file-format+ gameplay- card-flipping+ waste+ hidden-content+ bug+

So, only one card-flipping function missing, and then we can start decompiling TH01's two final bosses? Unfortunately, that had to be the one big function that initializes and renders all gameplay objects. #17 on the list of longest functions in all of PC-98 Touhou, requiring two pushes to fully understand what's going on there… and then it immediately returns for all "boss" stages whose number is divisible by 5, yet is still called during Sariel's and Konngara's initialization 🤦

Oh well. This also involved the final file format we hadn't looked at yet – the STAGE?.DAT files that describe the layout for all stages within a single 5-stage scene. Which, for a change is a very well-designed form– no, of course it's completely weird, what did you expect? Development must have looked somewhat like this:

With all that, it's almost not worth mentioning how there are 12 turret types, which only differ in which hardcoded pellet group they fire at a hardcoded interval of either 100 or 200 frames, and that they're all explicitly spelled out in every single switch statement. Or how the layout of the internal card and obstacle SoA classes is quite disjointed. So here's the new ZUN bugs you've probably already been expecting!


Cards and obstacles are blitted to both VRAM pages. This way, any other entities moving on top of them can simply be unblitted by restoring pixels from VRAM page 1, without requiring the stationary objects to be redrawn from main memory. Obviously, the backgrounds behind the cards have to be stored somewhere, since the player can remove them. For faster transitions between stages of a scene, ZUN chose to store the backgrounds behind obstacles as well. This way, the background image really only needs to be blitted for the first stage in a scene.

All that memory for the object backgrounds adds up quite a bit though. ZUN actually made the correct choice here and picked a memory allocation function that can return more than the 64 KiB of a single x86 Real Mode segment. He then accesses the individual backgrounds via regular array subscripts… and that's where the bug lies, because he stores the returned address in a regular far pointer rather than a huge one. This way, the game still can only display a total of 102 objects (i. e., cards and obstacles combined) per stage, without any unblitting glitches.
What a shame, that limit could have been 127 if ZUN didn't needlessly allocate memory for alpha planes when backing up VRAM content. :onricdennat:

And since array subscripts on far pointers wrap around after 64 KiB, trying to save the background of the 103rd object is guaranteed to corrupt the memory block header at the beginning of the returned segment. :zunpet:. When TH01 runs in test mode, it correctly reports a corrupted heap in this case.


Finally, some unused content! Upon discovering TH01's debug mode, probably everyone tried to access Stage 21, just to see what happens, and indeed landed in an actual stage, with a black background and a weird color palette. Turns out that ZUN did ship an unused scene in SCENE7.DAT, which is exactly was loaded there.
Unfortunately, it's easy to believe that this is just garbage data (as I initially did): At the beginning of "Stage 22", the game seems to enter an infinite loop somewhere during the flip-in animation.

Well, we've had a heap overflow above, and the cause here is nothing but a stack buffer overflow – a perhaps more modern kind of classic C bug, given its prevalence in the Windows Touhou games. Explained in a few lines of code:

void stageobjs_init_and_render()
{
	int card_animation_frames[50]; // even though there can be up to 200?!
	int total_frames = 0;

	(code that would end up resetting total_frames if it ever tried to reset
	card_animation_frames[50]…)
}

The number of cards in "Stage 22"? 76. There you have it.

But of course, it's trivial to disable this animation and fix these stage transitions. So here they are, Stages 21 to 24, as shipped with the game in STAGE7.DAT:

TH01 stage 21, loaded from <code>STAGE7.DAT</code>TH01 stage 22, loaded from <code>STAGE7.DAT</code>TH01 stage 23, loaded from <code>STAGE7.DAT</code>TH01 stage 24, loaded from <code>STAGE7.DAT</code>

Wow, what a mess. All that was just a bit too much to be covered in two pushes… Next up, assuming the current subscriptions: Taking a vacation with one smaller TH01 push, covering some smaller functions here and there to ensure some uninterrupted Konngara progress later on.

📝 Posted:
🚚 Summary of:
P0122
Commits:
164591f...4406c3d
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ blitting+ waste+ jank+ gameplay- laser+

This time around, laser is 📝 actually not difficult, with TH01's shootout laser class being simple enough to nicely fit into a single push. All other stationary lasers (as used by YuugenMagan, for example) don't even use a class, and are simply treated as regular lines with collision detection.

But of course, the shootout lasers also come with the typical share of TH01 jank we've all come to expect by now. This time, it already starts with the hardcoded sprite data:

TH01 shootout laser 'sprites'

A shootout laser can have a width from 1 to 8 pixels, so ZUN stored a separate 16×1 sprite with a line for each possible width (left-to-right). Then, he shifted all of these sprites 1 pixel to the right for all of the 8 possible start positions within a planar VRAM byte (top-to-bottom). Because… doing that bit shift programmatically is way too expensive, so let's pre-shift at compile time, and use 16× the memory per sprite? :tannedcirno:

Since a bunch of other sprite sheets need to be pre-shifted as well (this is the 5th one we've found so far), our sprite converter has a feature to automatically generate those pre-shifted variations. This way, we can abstract away that implementation detail and leave modders with .BMP files that still only contain a single version of each sprite. But, uh…, wait, in this sprite sheet, the second row for 1-pixel lasers is accidentally shifted right by one more pixel that it should have been?! Which means that

  1. we can't use the auto-preshift feature here, and have to store this weird-looking (and quite frankly, completely unnecessary) sprite sheet in its entirety
  2. ZUN did, at least during TH01's development, not have a sprite converter, and directly hardcoded these dot patterns in the C++ code :zunpet:

The waste continues with the class itself. 69 bytes, with 22 bytes outright unused, and 11 not really necessary. As for actual innovations though, we've got 📝 another 32-bit fixed-point type, this time actually using 8 bits for the fractional part. Therefore, the ray position is tracked to the 1/256th of a pixel, using the full precision of master.lib's 8-bit sin() and cos() lookup tables.
Unblitting is also remarkably efficient: It's only done once the laser stopped extending and started moving, and only for the exact pixels at the start of the ray that the laser traveled by in a single frame. If only the ray part was also rendered as efficiently – it's fully blitted every frame, right next to the collision detection for each row of the ray.


With a public interface of two functions (spawn, and update / collide / unblit / render), that's superficially all there is to lasers in this game. There's another (apparently inlined) function though, to both reset and, uh, "fully unblit" all lasers at the end of every boss fight… except that it fails hilariously at doing the latter, and ends up effectively unblitting random 32-pixel line segments, due to ZUN confusing both the coordinates and the parameter types for the line unblitting function. :zunpet:
A while ago, I was asked about this crash that tends to happen when defeating Elis. And while you can clearly see the random unblitted line segments that are missing from the sprites, I don't quite think we've found the cause for the crash, since the 📝 line unblitting function used there does clip its coordinates to the VRAM range.

Next up: The final piece of image format code in TH01, covering Reimu's sprites!

📝 Posted:
🚚 Summary of:
P0111, P0112
Commits:
8b5c146...4ef4c9e, 4ef4c9e...e447a2d
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay- player+ bomb+ boss+ ex-alice+ animation+ glitch+ jank+

Only one newly ordered push since I've reopened the store? Great, that's all the justification I needed for the extended maintenance delay that was part of these two pushes 😛

Having to write comments to explain whether coordinates are relative to the top-left corner of the screen or the top-left corner of the playfield has finally become old. So, I introduced distinct types for all the coordinate systems we typically encounter, applying them to all code decompiled so far. Note how the planar nature of PC-98 VRAM meant that X and Y coordinates also had to be different from each other. On the X side, there's mainly the distinction between the [0; 640] screen space and the corresponding [0; 80] VRAM byte space. On the Y side, we also have the [0; 400] screen space, but the visible area of VRAM might be limited to [0; 200] when running in the PC-98's line-doubled 640×200 mode. A VRAM Y coordinate also always implies an added offset for vertical scrolling.
During all of the code reconstruction, these types can only have a documenting purpose. Turning them into anything more than just typedefs to int, in order to define conversion operators between them, simply won't recompile into identical binaries. Modding and porting projects, however, now have a nice foundation for doing just that, and can entirely lift coordinate system transformations into the type system, without having to proofread all the meaningless int declarations themselves.


So, what was left in terms of memory references? EX-Alice's fire waves were our final unknown entity that can collide with the player. Decently implemented, with little to say about them.

That left the bomb animation structures as the one big remaining PI blocker. They started out nice and simple in TH04, with a small 6-byte star animation structure used for both Reimu and Marisa. TH05, however, gave each character her own animation… and what the hell is going on with Reimu's blue stars there? Nope, not going to figure this out on ASM level.

A decompilation first required some more bomb-related variables to be named though. Since this was part of a generic RE push, it made sense to do this in all 5 games… which then led to nice PI gains in anything but TH05. :tannedcirno: Most notably, we now got the "pulling all items to player" flag in TH04 and TH05, which is actually separate from bombing. The obvious cheat mod is left as an exercise to the reader.


So, TH05 bomb animations. Just like the 📝 custom entity types of this game, all 4 characters share the same memory, with the superficially same 10-byte structure.
But let's just look at the very first field. Seen from a low level, it's a simple struct { int x, y; } pos, storing the current position of the character-specific bomb animation entity. But all 4 characters use this field differently:

Therefore, I decompiled it as 4 separate structures once again, bundled into an union of arrays.

As for Reimu… yup, that's some pointer arithmetic straight out of Jigoku* for setting and updating the positions of the falling star trails. :zunpet: While that certainly required several comments to wrap my head around the current array positions, the one "bug" in all this arithmetic luckily has no effect on the game.
There is a small glitch with the growing circles, though. They are spawned at the end of the loop, with their position taken from the star pointer… but after that pointer has already been incremented. On the last loop iteration, this leads to an out-of-bounds structure access, with the position taken from some unknown EX-Alice data, which is 0 during most of the game. If you look at the animation, you can easily spot these bugged circles, consistently growing from the top-left corner (0, 0) of the playfield:


After all that, there was barely enough remaining time to filter out and label the final few memory references. But now, TH05's MAIN.EXE is technically position-independent! 🎉 -Tom- is going to work on a pretty extensive demo of this unprecedented level of efficient Touhou game modding. For a more impactful effect of both the 100% PI mark and that demo, I'll be delaying the push covering the remaining false positives in that binary until that demo is done. I've accumulated a pretty huge backlog of minor maintenance issues by now…
Next up though: The first part of the long-awaited build system improvements. I've finally come up with a way of sanely accelerating the 32-bit build part on most setups you could possibly want to build ReC98 on, without making the building experience worse for the other few setups.

📝 Posted:
🚚 Summary of:
P0109
Commits:
dcf4e2c...2c7d86b
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:
rec98+ th04+ th05+ gameplay- bullet+ micro-optimization+ glitch+ uth05win+ tasm+

Back to TH05! Thanks to the good funding situation, I can strike a nice balance between getting TH05 position-independent as quickly as possible, and properly reverse-engineering some missing important parts of the game. Once 100% PI will get the attention of modders, the code will then be in better shape, and a bit more usable than if I just rushed that goal.

By now, I'm apparently also pretty spoiled by TH01's immediate decompilability, after having worked on that game for so long. Reverse-engineering in ASM land is pretty annoying, after all, since it basically boils down to meticulously editing a piece of ASM into something I can confidently call "reverse-engineered". Most of the time, simply decompiling that piece of code would take just a little bit longer, but be massively more useful. So, I immediately tried decompiling with TH05… and it just worked, at every place I tried!? Whatever the issue was that made 📝 segment splitting so annoying at my first attempt, I seem to have completely solved it in the meantime. 🤷 So yeah, backers can now request pretty much any part of TH04 and TH05 to be decompiled immediately, with no additional segment splitting cost.

(Protip for everyone interested in starting their own ReC project: Just declare one segment per function, right from the start, then group them together to restore the original code segmentation…)


Except that TH05 then just throws more of its infamous micro-optimized and undecompilable ASM at you. 🙄 This push covered the function that adjusts the bullet group template based on rank and the selected difficulty, called every time such a group is configured. Which, just like pretty much all of TH05's bullet spawning code, is one of those undecompilable functions. If C allowed labels of other functions as goto targets, it might have been decompilable into something useful to modders… maybe. But like this, there's no point in even trying.

This is such a terrible idea from a software architecture point of view, I can't even. Because now, you suddenly have to mirror your C++ declarations in ASM land, and keep them in sync with each other. I'm always happy when I get to delete an ASM declaration from the codebase once I've decompiled all the instances where it was referenced. But for TH05, we now have to keep those declarations around forever. 😕 And all that for a performance increase you probably couldn't even measure. Oh well, pulling off Galaxy Brain-level ASM optimizations is kind of fun if you don't have portability plans… I guess?

If I started a full fangame mod of a PC-98 Touhou game, I'd base it on TH04 rather than TH05, and backport selected features from TH05 as needed. Just because it was released later doesn't make it better, and this is by far not the only one of ZUN's micro-optimizations that just went way too far.

Dropping down to ASM also makes it easier to introduce weird quirks. Decompiled, one of TH05's tuning conditions for stack groups on Easy Mode would look something like:

case BP_STACK:
	// […]
	if(spread_angle_delta >= 2) {
		stack_bullet_count--;
	}

The fields of the bullet group template aren't typically reset when setting up a new group. So, spread_angle_delta in the context of a stack group effectively refers to "the delta angle of the last spread group that was fired before this stack – whenever that was". uth05win also spotted this quirk, considered it a bug, and wrote fanfiction by changing spread_angle_delta to stack_bullet_count.
As usual for functions that occur in more than one game, I also decompiled the TH04 bullet group tuning function, and it's perfectly sane, with no such quirks.


In the more PI-focused parts of this push, we got the TH05-exclusive smooth boss movement functions, for flying randomly or towards a given point. Pretty unspectacular for the most part, but we've got yet another uth05win inconsistency in the latter one. Once the Y coordinate gets close enough to the target point, it actually speeds up twice as much as the X coordinate would, whereas uth05win used the same speedup factors for both. This might make uth05win a couple of frames slower in all boss fights from Stage 3 on. Hard to measure though – and boss movement partly depends on RNG anyway.


Next up: Shinki's background animations – which are actually the single biggest source of position dependence left in TH05.

📝 Posted:
🚚 Summary of:
P0099, P0100, P0101, P0102
Commits:
1799d67...1b25830, 1b25830...ceb81db, ceb81db...c11a956, c11a956...b60f38d
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98+ th01+ gameplay- bullet+ jank+ contribution-ideas+ bug+ boss+ elis+ kikuri+ sariel+ konngara+

Well, make that three days. Trying to figure out all the details behind the sprite flickering was absolutely dreadful…
It started out easy enough, though. Unsurprisingly, TH01 had a quite limited pellet system compared to TH04 and TH05:

As expected from TH01, the code comes with its fair share of smaller, insignificant ZUN bugs and oversights. As you would also expect though, the sprite flickering points to the biggest and most consequential flaw in all of this.


Apparently, it started with ZUN getting the impression that it's only possible to use the PC-98 EGC for fast blitting of all 4 bitplanes in one CPU instruction if you blit 16 horizontal pixels (= 2 bytes) at a time. Consequently, he only wrote one function for EGC-accelerated sprite unblitting, which can only operate on a "grid" of 16×1 tiles in VRAM. But wait, pellets are not only just 8×8, but can also be placed at any unaligned X position…

… yet the game still insists on using this 16-dot-aligned function to unblit pellets, forcing itself into using a super sloppy 16×8 rectangle for the job. 🤦 ZUN then tried to mitigate the resulting flickering in two hilarious ways that just make it worse:

  1. An… "interlaced rendering" mode? This one's activated for all Stage 15 and 20 fights, and separates pellets into two halves that are rendered on alternating frames. Collision detection with the Yin-Yang Orb and the player is only done for the visible half, but collision detection with player shots is still done for all pellets every frame, as are motion updates – so that pellets don't end up moving half as fast as they should.
    So yeah, your eyes weren't deceiving you. The game does effectively drop its perceived frame rate in the Elis, Kikuri, Sariel, and Konngara fights, and it does so deliberately.
  2. 📝 Just like player shots, pellets are also unblitted, moved, and rendered in a single function. Thanks to the 16×8 rectangle, there's now the (completely unnecessary) possibility of accidentally unblitting parts of a sprite that was previously drawn into the 8 pixels right of a pellet. And this is where ZUN went full :tannedcirno: and went "oh, I know, let's test the entire 16 pixels, and in case we got an entity there, we simply make the pellet invisible for this frame! Then we don't even have to unblit it later!" :zunpet:

    Except that this is only done for the first 3 elements of the player shot array…?! Which don't even necessarily have to contain the 3 shots fired last. It's not done for the player sprite, the Orb, or, heck, other pellets that come earlier in the pellet array. (At least we avoided going 𝑂(𝑛²) there?)

    Actually, and I'm only realizing this now as I type this blog post: This test is done even if the shots at those array elements aren't active. So, pellets tend to be made invisible based on comparisons with garbage data. :onricdennat:

    And then you notice that the player shot unblit​/​move​/​render function is actually only ever called from the pellet unblit​/​move​/​render function on the one global instance of the player shot manager class, after pellets were unblitted. So, we end up with a sequence of

    Pellet unblit → Pellet move → Shot unblit → Shot move → Shot render → Pellet render

    which means that we can't ever unblit a previously rendered shot with a pellet. Sure, as terrible as this one function call is from a software architecture perspective, it was enough to fix this issue. Yet we don't even get the intended positive effect, and walk away with pellets that are made temporarily invisible for no reason at all. So, uh, maybe it all just was an attempt at increasing the ramerate on lower spec PC-98 models?

Yup, that's it, we've found the most stupid piece of code in this game, period. It'll be hard to top this.


I'm confident that it's possible to turn TH01 into a well-written, fluid PC-98 game, with no flickering, and no perceived lag, once it's position-independent. With some more in-depth knowledge and documentation on the EGC (remember, there's still 📝 this one TH03 push waiting to be funded), you might even be able to continue using that piece of blitter hardware. And no, you certainly won't need ASM micro-optimizations – just a bit of knowledge about which optimizations Turbo C++ does on its own, and what you'd have to improve in your own code. It'd be very hard to write worse code than what you find in TH01 itself.

(Godbolt for Turbo C++ 4.0J when? Seriously though, that would 📝 also be a great project for outside contributors!)


Oh well. In contrast to TH04 and TH05, where 4 pushes only covered all the involved data types, they were enough to completely cover all of the pellet code in TH01. Everything's already decompiled, and we never have to look at it again. 😌 And with that, TH01 has also gone from by far the least RE'd to the most RE'd game within ReC98, in just half a year! 🎉
Still, that was enough TH01 game logic for a while. :tannedcirno: Next up: Making up for the delay with some more relaxing and easy pieces of TH01 code, that hopefully make just a bit more sense than all this garbage. More image formats, mainly.

📝 Posted:
🚚 Summary of:
P0096, P0097, P0098
Commits:
8ddb778...8283c5e, 8283c5e...600f036, 600f036...ad06748
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98+ th01+ file-format+ pc98+ blitting+ gameplay- player+ shot+ jank+ mod+ tcc+

So, let's finally look at some TH01 gameplay structures! The obvious choices here are player shots and pellets, which are conveniently located in the last code segment. Covering these would therefore also help in transferring some first bits of data in REIIDEN.EXE from ASM land to C land. (Splitting the data segment would still be quite annoying.) Player shots are immediately at the beginning…

…but wait, these are drawn as transparent sprites loaded from .PTN files. Guess we first have to spend a push on 📝 Part 2 of this format.
Hm, 4 functions for alpha-masked blitting and unblitting of both 16×16 and 32×32 .PTN sprites that align the X coordinate to a multiple of 8 (remember, the PC-98 uses a planar VRAM memory layout, where 8 pixels correspond to a byte), but only one function that supports unaligned blitting to any X coordinate, and only for 16×16 sprites? Which is only called twice? And doesn't come with a corresponding unblitting function? :thonk:

Yeah, "unblitting". TH01 isn't double-buffered, and uses the PC-98's second VRAM page exclusively to store a stage's background and static sprites. Since the PC-98 has no hardware sprites, all you can do is write pixels into VRAM, and any animated sprite needs to be manually removed from VRAM at the beginning of each frame. Not using double-buffering theoretically allows TH01 to simply copy back all 128 KB of VRAM once per frame to do this. :tannedcirno: But that would be pretty wasteful, so TH01 just looks at all animated sprites, and selectively copies only their occupied pixels from the second to the first VRAM page.


Alright, player shot class methods… oh, wait, the collision functions directly act on the Yin-Yang Orb, so we first have to spend a push on that one. And that's where the impression we got from the .PTN functions is confirmed: The orb is, in fact, only ever displayed at byte-aligned X coordinates, divisible by 8. It's only thanks to the constant spinning that its movement appears at least somewhat smooth.
This is purely a rendering issue; internally, its position is tracked at pixel precision. Sadly, smooth orb rendering at any unaligned X coordinate wouldn't be that trivial of a mod, because well, the necessary functions for unaligned blitting and unblitting of 32×32 sprites don't exist in TH01's code. Then again, there's so much potential for optimization in this code, so it might be very possible to squeeze those additional two functions into the same C++ translation unit, even without position independence…

More importantly though, this was the right time to decompile the core functions controlling the orb physics – probably the highlight in these three pushes for most people.
Well, "physics". The X velocity is restricted to the 5 discrete states of -8, -4, 0, 4, and 8, and gravity is applied by simply adding 1 to the Y velocity every 5 frames :zunpet: No wonder that this can easily lead to situations in which the orb infinitely bounces from the ground.
At least fangame authors now have a reference of how ZUN did it originally, because really, this bad approximation of physics had to have been written that way on purpose. But hey, it uses 64-bit floating-point variables! :onricdennat:

…sometimes at least, and quite randomly. This was also where I had to learn about Turbo C++'s floating-point code generation, and how rigorously it defines the order of instructions when mixing double and float variables in arithmetic or conditional expressions. This meant that I could only get ZUN's original instruction order by using literal constants instead of variables, which is impossible right now without somehow splitting the data segment. In the end, I had to resort to spelling out ⅔ of one function, and one conditional branch of another, in inline ASM. 😕 If ZUN had just written 16.0 instead of 16.0f there, I would have saved quite some hours of my life trying to decompile this correctly…

To sort of make up for the slowdown in progress, here's the TH01 orb physics debug mod I made to properly understand them: 2020-06-13-TH01OrbPhysicsDebug.zip To use it, simply replace REIIDEN.EXE, and run the game in debug mode, via game d on the DOS prompt.
Its code might also serve as an example of how to achieve this sort of thing without position independence.

Screenshot of the TH01 orb physics debug mod

Alright, now it's time for player shots though. Yeah, sure, they don't move horizontally, so it's not too bad that those are also always rendered at byte-aligned positions. But, uh… why does this code only use the 16×16 alpha-masked unblitting function for decaying shots, and just sloppily unblits an entire 16×16 square everywhere else?

The worst part though: Unblitting, moving, and rendering player shots is done in a single function, in that order. And that's exactly where TH01's sprite flickering comes from. Since different types of sprites are free to overlap each other, you'd have to first unblit all types, then move all types, and then render all types, as done in later PC-98 Touhou games. If you do these three steps per-type instead, you will unblit sprites of other types that have been rendered before… and therefore end up with flicker.
Oh, and finally, ZUN also added an additional sloppy 16×16 square unblit call if a shot collides with a pellet or a boss, for some guaranteed flicker. Sigh.


And that's ⅓ of all ZUN code in TH01 decompiled! Next up: Pellets!

📝 Posted:
🚚 Summary of:
P0088, P0089
Commits:
97ce7b7...da6b856, da6b856...90252cc
💰 Funded by:
-Tom-, [Anonymous], Blue Bolt
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay- enemy+ hud+ animation+ portability+

As expected, we've now got the TH04 and TH05 stage enemy structure, finishing position independence for all big entity types. This one was quite straightfoward, as the .STD scripting system is pretty simple.

Its most interesting aspect can be found in the way timing is handled. In Windows Touhou, all .ECL script instructions come with a frame field that defines when they are executed. In TH04's and TH05's .STD scripts, on the other hand, it's up to each individual instruction to add a frame time parameter, anywhere in its parameter list. This frame time defines for how long this instruction should be repeatedly executed, before it manually advances the instruction pointer to the next one. From what I've seen so far, these instruction typically apply their effect on the first frame they run on, and then do nothing for the remaining frames.
Oh, and you can't nest the LOOP instruction, since the enemy structure only stores one single counter for the current loop iteration.

Just from the structure, the only innovation introduced by TH05 seems to have been enemy subtypes. These can be used to parametrize scripts via conditional jumps based on this value, as a first attempt at cutting down the need to duplicate entire scripts for similar enemy behavior. And thanks to TH05's favorable segment layout, this game's version of the .STD enemy script interpreter is even immediately ready for decompilation, in one single future push.

As far as I can tell, that now only leaves

until TH05's MAIN.EXE is completely position-independent.
Which, however, won't be all it needs for that 100% PI rating on the front page. And with that many false positives, it's quite easy to get lost with immediately reverse-engineering everything around them. This time, the rendering of the text dissolve circles, used for the stage and BGM title popups, caught my eye… and since the high-level code to handle all of that was near the end of a segment in both TH04 and TH05, I just decided to immediately decompile it all. Like, how hard could it possibly be? Sure, it needed another segment split, which was a bit harder due to all the existing ASM referencing code in that segment, but certainly not impossible…

Oh wait, this code depends on 9 other sets of identifiers that haven't been declared in C land before, some of which require vast reorganizations to bring them up to current consistency standards. Whoops! Good thing that this is the part of the project I'm still offering for free…
Among the referenced functions was tiles_invalidate_around(), which marks the stage background tiles within a rectangular area to be redrawn this frame. And this one must have had the hardest function signature to figure out in all of PC-98 Touhou, because it actually seems impossible. Looking at all the ways the game passes the center coordinate to this function, we have

  1. X and Y as 16-bit integer literals, merged into a single PUSH of a 32-bit immediate
  2. X and Y calculated and pushed independently from each other
  3. by-value copies of entire Point instances

Any single declaration would only lead to at most two of the three cases generating the original instructions. No way around separately declaring the function in every translation unit then, with the correct parameter list for the respective calls. That's how ZUN must have also written it.

Oh well, we would have needed to do all of this some time. At least there were quite a bit of insights to be gained from the actual decompilation, where using const references actually made it possible to turn quite a number of potentially ugly macros into wholesome inline functions.

But still, TH04 and TH05 will come out of ReC98's decompilation as one big mess. A lot of further manual decompilation and refactoring, beyond the limits of the original binary, would be needed to make these games portable to any non-PC-98, non-x86 architecture.
And yes, that includes IBM-compatible DOS – which, for some reason, a number of people see as the obvious choice for a first system to port PC-98 Touhou to. This will barely be easier. Sure, you'll save the effort of decompiling all the remaining original ASM. But even with master.lib's MASTER_DOSV setting, these games still very much rely on PC-98 hardware, with corresponding assumptions all over ZUN's code. You will need to provide abstractions for the PC-98's superimposed text mode, the gaiji, and planar 4-bit color access in general, exchanging the use of the PC-98's GRCG and EGC blitter chips with something else. At that point, you might as well port the game to one generic 640×400 framebuffer and away from the constraints of DOS, resulting in that Doom source code-like situation which made that game easily portable to every architecture to begin with. But ZUN just wasn't a John Carmack, sorry.

Or what do I know. I've never programmed for IBM-compatible DOS, but maybe ReC98's audience does include someone who is intimately familiar with IBM-compatible DOS so that the constraints aren't much of an issue for them? But even then, 16-bit Windows would make much more sense as a first porting target if you don't want to bother with that undecompilable ASM.

At least I won't have to look at TH04 and TH05 for quite a while now. :tannedcirno: The delivery delays have made it obvious that my life has become pretty busy again, probably until September. With a total of 9 TH01 pushes from monthly subscriptions now waiting in the backlog, the shop will stay closed until I've caught up with most of these. Which I'm quite hyped for!

📝 Posted:
🚚 Summary of:
P0086, P0087
Commits:
54ee99b...24b96cd, 24b96cd...97ce7b7
💰 Funded by:
[Anonymous], Blue Bolt, -Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay- animation+ hud+ blitting+ boss+ yuuka-6+

Alright, the score popup numbers shown when collecting items or defeating (mid)bosses. The second-to-last remaining big entity type in TH05… with quite some PI false positives in the memory range occupied by its data. Good thing I still got some outstanding generic RE pushes that haven't been claimed for anything more specific in over a month! These conveniently allowed me to RE most of these functions right away, the right way.

Most of the false positives were boss HP values, passed to a "boss phase end" function which sets the HP value at which the next phase should end. Stage 6 Yuuka, Mugetsu, and EX-Alice have their own copies of this function, in which they also reset certain boss-specific global variables. Since I always like to cover all varieties of such duplicated functions at once, it made sense to reverse-engineer all the involved variables while I was at it… and that's why this was exactly the right time to cover the implementation details of Stage 6 Yuuka's parasol and vanishing animations in TH04. :zunpet:

With still a bit of time left in that RE push afterwards, I could also start looking into some of the smaller functions that didn't quite fit into other pushes. The most notable one there was a simple function that aims from any point to the current player position. Which actually only became a separate function in TH05, probably since it's called 27 times in total. That's 27 places no longer being blocked from further RE progress.

WindowsTiger already did most of the work for the score popup numbers in January, which meant that I only had to review it and bring it up to ReC98's current coding styles and standards. This one turned out to be one of those rare features whose TH05 implementation is significantly less insane than the TH04 one. Both games lazily redraw only the tiles of the stage background that were drawn over in the previous frame, and try their best to minimize the amount of tiles to be redrawn in this way. For these popup numbers, this involves calculating the on-screen width, based on the exact number of digits in the point value. TH04 calculates this width every frame during the rendering function, and even resorts to setting that field through the digit iteration pointer via self-modifying code… yup. TH05, on the other hand, simply calculates the width once when spawning a new popup number, during the conversion of the point value to binary-coded decimal. The "×2" multiplier suffix being removed in TH05 certainly also helped in simplifying that feature in this game.

And that's ⅓ of TH05 reverse-engineered! Next up, one more TH05 PI push, in which the stage enemies hopefully finish all the big entity types. Maybe it will also be accompanied by another RE push? In any case, that will be the last piece of TH05 progress for quite some time. The next TH01 stretch will consist of 6 pushes at the very least, and I currently have no idea of how much time I can spend on ReC98 a month from now…

📝 Posted:
🚚 Summary of:
P0078, P0079
Commits:
f4eb7a8...9e52cb1, 9e52cb1...cd48aa3
💰 Funded by:
iruleatgames, -Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay- animation+ bullet+ boss+ alice+ mai+ yuki+ yumeko+ shinki+ ex-alice+ uth05win+

To finish this TH05 stretch, we've got a feature that's exclusive to TH05 for once! As the final memory management innovation in PC-98 Touhou, TH05 provides a single static (64 * 26)-byte array for storing up to 64 entities of a custom type, specific to a stage or boss portion. The game uses this array for

  1. the Stage 2 star particles,
  2. Alice's puppets,
  3. the tip of curve ("jello") bullets,
  4. Mai's snowballs and Yuki's fireballs,
  5. Yumeko's swords,
  6. and Shinki's 32×32 bullets,

which makes sense, given that only one of those will be active at any given time.

On the surface, they all appear to share the same 26-byte structure, with consistently sized fields, merely using its 5 generic fields for different purposes. Looking closer though, there actually are differences in the signedness of certain fields across the six types. uth05win chose to declare them as entirely separate structures, and given all the semantic differences (pixels vs. subpixels, regular vs. tiny master.lib sprites, …), it made sense to do the same in ReC98. It quickly turned out to be the only solution to meet my own standards of code readability.

Which blew this one up to two pushes once again… But now, modders can trivially resize any of those structures without affecting the other types within the original (64 * 26)-byte boundary, even without full position independence. While you'd still have to reduce the type-specific number of distinct entities if you made any structure larger, you could also have more entities with fewer structure members.

As for the types themselves, they're full of redundancy once again – as you might have already expected from seeing #4, #5, and #6 listed as unrelated to each other. Those could have indeed been merged into a single 32×32 bullet type, supporting all the unique properties of #4 (destructible, with optional revenge bullets), #5 (optional number of twirl animation frames before they begin to move) and #6 (delay clouds). The *_add(), *_update(), and *_render() functions of #5 and #6 could even already be completely reverse-engineered from just applying the structure onto the ASM, with the ones of #3 and #4 only needing one more RE push.

But perhaps the most interesting discovery here is in the curve bullets: TH05 only renders every second one of the 17 nodes in a curve bullet, yet hit-tests every single one of them. In practice, this is an acceptable optimization though – you only start to notice jagged edges and gaps between the fragments once their speed exceeds roughly 11 pixels per second:

And that brings us to the last 20% of TH05 position independence! But first, we'll have more cheap and fast TH01 progress.

📝 Posted:
🚚 Summary of:
P0072, P0073, P0074, P0075
Commits:
4bb04ab...cea3ea6, cea3ea6...5286417, 5286417...1807906, 1807906...222fc99
💰 Funded by:
[Anonymous], -Tom-, Myles
🏷 Tags:
rec98+ th04+ th05+ gameplay- blitting+ bullet+ micro-optimization+ uth05win+

Long time no see! And this is exactly why I've been procrastinating bullets while there was still meaningful progress to be had in other parts of TH04 and TH05: There was bound to be quite some complexity in this most central piece of game logic, and so I couldn't possibly get to a satisfying understanding in just one push.

Or in two, because their rendering involves another bunch of micro-optimized functions adapted from master.lib.

Or in three, because we'd like to actually name all the bullet sprites, since there are a number of sprite ID-related conditional branches. And so, I was refining things I supposedly RE'd in the the commits from the first push until the very end of the fourth.

When we talk about "bullets" in TH04 and TH05, we mean just two things: the white 8×8 pellets, with a cap of 240 in TH04 and 180 in TH05, and any 16×16 sprites from MIKO16.BFT, with a cap of 200 in TH04 and 220 in TH05. These are by far the most common types of… err, "things the player can collide with", and so ZUN provides a whole bunch of pre-made motion, animation, and n-way spread / ring / stack group options for those, which can be selected by simply setting a few fields in the bullet template. All the other "non-bullets" have to be fired and controlled individually.

Which is nothing new, since uth05win covered this part pretty accurately – I don't think anyone could just make up these structure member overloads. The interesting insights here all come from applying this research to TH04, and figuring out its differences compared to TH05. The most notable one there is in the default groups: TH05 allows you to add a stack to any single bullet, n-way spread or ring, but TH04 only lets you create stacks separately from n-way spreads and rings, and thus gets by with fewer fields in its bullet template structure. On the other hand, TH04 has a separate "n-way spread with random angles, yet still aimed at the player" group? Which seems to be unused, at least as far as midbosses and bosses are concerned; can't say anything about stage enemies yet.

In fact, TH05's larger bullet template structure illustrates that these distinct group types actually are a rather redundant piece of over-engineering. You can perfectly indicate any permutation of the basic groups through just the stack bullet count (1 = no stack), spread bullet count (1 = no spread), and spread delta angle (0 = ring instead of spread). Add a 4-flag bitfield to cover the rest (aim to player, randomize angle, randomize speed, force single bullet regardless of difficulty or rank), and the result would be less redundant and even slightly more capable.

Even those 4 pushes didn't quite finish all of the bullet-related types, stopping just shy of the most trivial and consistent enum that defines special movement. This also left us in a 📝 TH03-like situation, in which we're still a bit away from actually converting all this research into actual RE%. Oh well, at least this got us way past 50% in overall position independence. On to the second half! 🎉

For the next push though, we'll first have a quick detour to the remaining C code of all the ZUN.COM binaries. Now that the 📝 TH04 and TH05 resident structures no longer block those, -Tom- has requested TH05's RES_KSO.COM to be covered in one of his outstanding pushes. And since 32th System recently RE'd TH03's resident structure, it makes sense to also review and merge that, before decompiling all three remaining RES_*.COM binaries in hopefully a single push. It might even get done faster than that, in which case I'll then review and merge some more of WindowsTiger's research.

📝 Posted:
🚚 Summary of:
P0071
Commits:
327021b...4bb04ab
💰 Funded by:
KirbyComment, -Tom-
🏷 Tags:
rec98+ th03+ gameplay- player+

Turns out that covering TH03's 128-byte player structure was way more insightful than expected! And while it doesn't include every bit of per-player data, we still got to know quite a bit about the game from just trying to name its members:

The next TH03 pushes can now cover all the functions that reference this structure in one way or another, and actually commit all this research and translate it into some RE%. Since the non-TH05 priorities have become a bit unclear after the last 50 € RE contribution though (as of this writing, it's still 10 € to decide on what game to cover in two RE pushes!), I'll be returning to TH05 until that's decided.

📝 Posted:
🚚 Summary of:
P0070
Commits:
a931758...327021b
💰 Funded by:
KirbyComment
🏷 Tags:
rec98+ th03+ gameplay- score+

As noted in 📝 P0061, TH03 gameplay RE is indeed going to progress very slowly in the beginning. A lot of the initial progress won't even be reflected in the RE% – there are just so many features in this game that are intertwined into each other, and I only consider functions to be "reverse-engineered" once we understand every involved piece of code and data, and labeled every absolute memory reference in it. (Yes, that means that the percentages on the front page are actually underselling ReC98's progress quite a bit, and reflect a pretty low bound of our actual understanding of the games.)

So, when I get asked to look directly at gameplay code right now, it's quite the struggle to find a place that can be covered within a push or two and that would immediately benefit scoreplayers. The basics of score and combo handling themselves managed to fit in pretty well, though:

Oh well, at least we still got a bit of PI% out of this one. From this point though, the next push (or two) should be enough to cover the big 128-byte player structure – which by itself might not be immediately interesting to scoreplayers, but surely is quite a blocker for everything else.

📝 Posted:
🚚 Summary of:
P0062
Commits:
1d6fbb8...f275e04
💰 Funded by:
Touhou Patch Center
🏷 Tags:
rec98+ th04+ th05+ gameplay- player+ shot+ position-independence+ animation+

Big gains, as expected, but not much to say about this one. With TH05 Reimu being way too easy to decompile after 📝 the shot control groundwork done in October, there was enough time to give the comprehensive PI false-positive treatment to two other sets of functions present in TH04's and TH05's OP.EXE. One of them, master.lib's super_*() functions, was used a lot in TH02, more than in any other game… I wonder how much more that game will progress without even focusing on it in particular.

Alright then! 100% PI for TH04's and TH05's OP.EXE upcoming… (Edit: Already got funding to cover this!)

📝 Posted:
🚚 Summary of:
P0061
Commits:
96684f4...5f4f5d8
💰 Funded by:
Touhou Patch Center
🏷 Tags:
rec98+ th03+ gameplay- player+ shot+ animation+

… nope, with a game whose MAIN.EXE is still just 5% reverse-engineered and which naturally makes heavy use of structures, there's still a lot more PI groundwork to be done before RE progress can speed up to the levels that we've now reached with TH05. The good news is that this game is (now) way easier to understand: In contrast to TH04 and TH05, where we needed to work towards player shots over a two-digit number of pushes, TH03 only needed two for SPRITE16, and a half one for the playfield shaking mechanism. After that, I could even already decompile the per-frame shot update and render functions, thanks to TH03's high number of code segments. Now, even the big 128-byte player structure doesn't seem all too far off.

Then again, as TH03 shares no code with any other game, this actually was a completely average PI push. For the remaining three, we'll return to TH04 and TH05 though, which should more than make up for the slight drop in RE speed after this one.

In other news, we've now also reached peak C++, with the introduction of templates! TH03 stores movement speeds in a 4.4 fixed-point format, which is an 8-bit spin on the usual 16-bit, 12.4 fixed-point format.

📝 Posted:
🚚 Summary of:
P0057, P0058
Commits:
1cb9731...ac7540d, ac7540d...fef0299
💰 Funded by:
[Anonymous], -Tom-
🏷 Tags:
rec98+ th04+ th05+ gameplay- item+ animation+ uth05win+ meta+

So, here we have the first two pushes with an explicit focus on position independence… and they start out looking barely different from regular reverse-engineering? They even already deduplicate a bunch of item-related code, which was simple enough that it required little additional work? Because the actual work, once again, was in comparing uth05win's interpretations and naming choices with the original PC-98 code? So that we only ended up removing a handful of memory references there?

(Oh well, you can mod item drops now!)

So, continuing to interpret PI as a mere by-product of reverse-engineering might ultimately drive up the total PI cost quite a bit. But alright then, let's systematically clear out some false positives by looking at master.lib function calls instead… and suddenly we get the PI progress we were looking for, nicely spread out over all games since TH02. That kinda makes it sound like useless work, only done because it's dictated by some counting algorithm on a website. But decompilation will want to convert all of these values to decimal anyway. We're merely doing that right now, across all games.

Then again, it doesn't actually make any game more position-independent, and only proves how position-independent it already was. So I'm really wondering right now whether I should just rush actual position independence by simply identifying structures and their sizes, and not bother with members or false positives until that's done. That would certainly get the job done for TH04 and TH05 in just a few more pushes, but then leave all the proving work (and the road to 100% PI on the front page) to reverse-engineering.

I don't know. Would it be worth it to have a game that's „maybe fully position-independent“, only for there to maybe be rare edge cases where it isn't?

Or maybe, continuing to strike a balance between identifying false positives (fast) and reverse-engineering structures (slow) will continue to work out like it did now, and make us end up close to the current estimate, which was attractive enough to sell out the crowdfunding for the first time… 🤔

Please give feedback! If possible, by Friday evening UTC+1, before I start working on the next PI push, this time with a focus on TH04.

📝 Posted:
🚚 Summary of:
P0036, P0037
Commits:
a533b5d...82b0e1d, 82b0e1d...e7e1cbc
💰 Funded by:
zorg
🏷 Tags:
rec98+ th04+ th05+ gameplay- player+ shot+ tcc+

And just in time for zorg's last outstanding pushes, the TH05 shot type control functions made the speedup happen!

It would have been really nice to also include Reimu's shot control functions in this last push, but figuring out this entire system, with its weird bitflags and switch statement micro-optimizations, was once again taking way longer than it should have. Especially with my new-found insistence on turning this obvious copy-pasta into something somewhat readable and terse…

But with such a rather tabular visual structure, things should now be moddable in hopefully easily consistent way. Of course, since we're only at 54% position independence for MAIN.EXE, this isn't possible yet without crashing the game, but modifiying damage would already work.

Despite my earlier claims of ZUN only having used C++ in TH01, as it's the only game using new and delete, it's now pretty much confirmed that ZUN used it for all games, as inlined functions (and by extension, C++ class methods) are the only way to get certain instructions out of the Turbo C++ code generator. Also, I've kept my promise and started really filling that decompilation pattern file.

And now, with the reverse-engineering backlog finally being cleared out, we wait for the next orders, and the direction they might focus on…

📝 Posted:
🚚 Summary of:
P0034, P0035
Commits:
6cdd229...6f1f367, 6f1f367...a533b5d
💰 Funded by:
zorg
🏷 Tags:
rec98+ th01+ th02+ th04+ th05+ animation+ gameplay- player+ bomb+

Deathbombs confirmed, in both TH04 and TH05! On the surface, it's the same 8-frame window as in most Windows games, but due to the slightly lower PC-98 frame rate of 56.4 Hz, it's actually slightly more lenient in TH04 and TH05.

The last function in front of the TH05 shot type control functions marks the player's previous position in VRAM to be redrawn. But as it turns out, "player" not only means "the player's option satellites on shot levels ≥ 2", but also "the explosion animation if you lose a life", which required reverse-engineering both things, ultimately leading to the confirmation of deathbombs.

It actually was kind of surprising that we then had reverse-engineered everything related to rendering all three things mentioned above, and could also cover the player rendering function right now. Luckily, TH05 didn't decide to also micro-optimize that function into un-decompilability; in fact, it wasn't changed at all from TH04. Unlike the one invalidation function whose decompilation would have actually been the goal here…

But now, we've finally gotten to where we wanted to… and only got 2 outstanding decompilation pushes left. Time to get the website ready for hosting an actual crowdfunding campaign, I'd say – It'll make a better impression if people can still see things being delivered after the big announcement.

📝 Posted:
🚚 Summary of:
P0029, P0030
Commits:
6ff427a...c7fc4ca, c7fc4ca...dea40ad
💰 Funded by:
zorg
🏷 Tags:
rec98+ th02+ th04+ th05+ blitting+ gameplay- midboss+ boss+ tcc+

Here we go, new C code! …eh, it will still take a bit to really get decompilation going at the speeds I was hoping for. Especially with the sheer amount of stuff that is set in the first few significant functions we actually can decompile, which now all has to be correctly declared in the C world. Turns out I spent the last 2 years screwing up the case of exported functions, and even some of their names, so that it didn't actually reflect their calling convention… yup. That's just the stuff you tend to forget while it doesn't matter.

To make up for that, I decided to research whether we can make use of some C++ features to improve code readability after all. Previously, it seemed that TH01 was the only game that included any C++ code, whereas TH02 and later seemed to be 100% C and ASM. However, during the development of the soon to be released new build system, I noticed that even this old compiler from the mid-90's, infamous for prioritizing compile speeds over all but the most trivial optimizations, was capable of quite surprising levels of automatic inlining with class methods…

…leading the research to culminate in the mindblow that is 9d121c7 – yes, we can use C++ class methods and operator overloading to make the code more readable, while still generating the same code than if we had just used C and preprocessor macros.

Looks like there's now the potential for a few pull requests from outside devs that apply C++ features to improve the legibility of previously decompiled and terribly macro-ridden code. So, if anyone wants to help without spending money…

📝 Posted:
🚚 Summary of:
P0028
Commits:
6023f5c...6ff427a
💰 Funded by:
zorg
🏷 Tags:
rec98+ th04+ th05+ gameplay- midboss+ boss+

Back to actual development! Starting off this stretch with something fairly mechanical, the few remaining generic boss and midboss state variables. And once we start converting the constant numbers used for and around those variables into decimal, the estimated position independence probability immediately jumped by 5.31% for TH04's MAIN.EXE, and 4.49% for TH05's – despite not having made the game any more position- independent than it was before. Yup… lots of false positives in there, but who can really know for sure without having put in the work.

But now, we've RE'd enough to finally decompile something again next, 4 years after the last decompilation of anything!

📝 Posted:
🚚 Summary of:
P0049, P0050
Commits:
893bd46...6ed8e60
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th04+ th05+ file-format+ gameplay- boss+ yumeko+

Sometimes, "strategically picking things to reverse-engineer" unfortunately also means "having to move seemingly random and utterly uninteresting stuff, which will only make sense later, out of the way". Really, this was so boring. Gonna get a lot more exciting in the next ones though.

📝 Posted:
🚚 Summary of:
P0047, P0048
Commits:
9a2c6f7...893bd46
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay- player+ shot+ animation+ waste+

So, let's continue with player shots! …eh, or maybe not directly, since they involve two other structure types in TH05, which we'd have to cover first. One of them is a different sort of sprite, and since I like me some context in my reverse-engineering, let's disable every other sprite type first to figure out what it is.

One of those other sprite types were the little sparks flying away from killed stage enemies, midbosses, and grazed bullets; easy enough to also RE right now. Turns out they use the same 8 hardcoded 8×8 sprites in TH02, TH04, and TH05. Except that it's actually 64 16×8 sprites, because ZUN wanted to pre-shift them for all 8 possible start pixels within a planar VRAM byte (rather than, like, just writing a few instructions to shift them programmatically), leading to them taking up 1,024 bytes rather than just 64.
Oh, and the thing I wanted to RE *actually* was the decay animation whenever a shot hits something. Not too complex either, especially since it's exclusive to TH05.

And since there was some time left and I actually have to pick some of the next RE places strategically to best prepare for the upcoming 17 decompilation pushes, here's two more function pointers for good measure.

📝 Posted:
🚚 Summary of:
P0046
Commits:
612beb8...deb45ea
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th04+ th05+ micro-optimization+ gameplay- player+ shot+

Stumbled across one more drawing function in the way… which was only a duplicated and seemingly pointlessly micro-optimized copy of master.lib's super_roll_put_tiny() function, used for fast display of 4-color 16×16 sprites.

With this out of the way, we can tackle player shot sprite animation next. This will get rid of a lot of code, since every power level of every character's shot type is implemented in its own function. Which makes up thousands of instructions in both TH04 and TH05 that we can nicely decompile in the future without going through a dedicated reverse-engineering step.

📝 Posted:
🚚 Summary of:
P0023, P0024
Commits:
807df3d...0cde4b7
💰 Funded by:
zorg
🏷 Tags:
rec98+ th01+ th02+ th04+ th05+ gameplay- laser+ micro-optimization+

Actually, I lied, and lasers ended up coming with everything that makes reverse-engineering ZUN code so difficult: weirdly reused variables, unexpected structures within structures, and those TH05-specific nasty, premature ASM micro-optimizations that will waste a lot of time during decompilation, since the majority of the code actually was C, except for where it wasn't.

📝 Posted:
🚚 Summary of:
P0042
Commits:
f3b6137...807df3d
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th04+ th05+ gameplay- laser+ uth05win+

Laser… is not difficult. In fact, out of the remaining entity types I checked, it's the easiest one to fully grasp from uth05win alone, as it's only drawn using master.lib's line, circle, and polygon functions. Everything else ends up calling… something sprite-related that needs to be RE'd separately, and which uth05win doesn't help with, at all.

Oh, and since the speed of shoot-out lasers (as used by TH05's Stage 2 boss, for example) always depends on rank, we also got this variable now.

This only covers the structure itself – uth05win's member names for the LASER structure were not only a bit too unclear, but also plain wrong and misleading in one instance. The actual implementation will follow in the next one.

📝 Posted:
🚚 Summary of:
P0041
Commits:
b03bc91...f3b6137
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th04+ th05+ gameplay- midboss+ boss+

So, after introducing instruction number statistics… let's go for over 2,000 lines that won't show up there immediately :tannedcirno: That being (mid-)boss HP, position, and sprite ID variables for TH04/TH05. Doesn't sound like much, but it kind of is if you insist on decimal numbers for easier comparison with uth05win's source code.

📝 Posted:
🚚 Summary of:
P0018
Commits:
746681d...178d589
💰 Funded by:
zorg
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay- player+ score+ uth05win+

What do you do if the TH06 text image feature for thcrap should have been done 3 days™ ago, but keeps getting more and more complex, and you have a ton of other pushes to deliver anyway? Get some distraction with some light ReC98 reverse-engineering work. This is where it becomes very obvious how much uth05win helps us with all the games, not just TH05.

5a5c347 is the most important one in there, this was the missing substructure that now makes every other sprite-like structure trivial to figure out.

📝 Posted:
🚚 Summary of:
P0008
Commits:
47e5601...d62dd06
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay- player+ shot+

You could use this to get a homing Mima, for example.