⮜ Blog

⮜ List of tags

Showing all posts tagged

📝 Posted:
🚚 Summary of:
P0258, P0259, P0260, P0261
Commits:
5876755...e8a0b3e, e8a0b3e...dfaa3c6, dfaa3c6...ed9ee93, ed9ee93...ae2fc28
💰 Funded by:
Blue Bolt, [Anonymous], Yanga, Splashman
🏷 Tags:

And we're back to PC-98 Touhou for a brief interruption of the ongoing Shuusou Gyoku Linux port. Let's clear some of the Touhou-related progress from the backlog, and use the unconstrained nature of these contributions to prepare the 📝 upcoming non-ASCII translations commissioned by Touhou Patch Center. The current budget won't cover all of my ambitions, but it would at least be nice if all text in these games was feasibly translatable by the time I officially start working on that project.

At a little over 3 pushes, it might be surprising to see that this took longer than the 📝 TH03/TH04/TH05 cutscene system. It's obvious that TH02 started out with a different system for in-game dialog, but while TH04 and TH05 look identical on the surface, they only actually share 30% of their dialog code. So this felt more like decompiling 2.4 distinct systems, as opposed to one identical base with tons of game-specific differences on top.

The table of contents was pretty popular last time around, so let's have another one:

  1. Overview of TH04's dialog system
  2. Changes introduced in TH05
  3. Command reference for the TH04 and TH05 systems
  4. Overview of TH02's dialog system
  5. TH02's face portrait images
  6. Bugs during TH02's dialog box slide-in animation
  7. Bugs and quirks in Mima's defeat dialog (might be lore-relevant)
  8. TH03 win messages

Let's start with the ones from TH04 and TH05, since they are not that broken. For TH04, ZUN started out by copy-pasting the cutscene system, causing the result to inherit many of the caveats I already described in the cutscene blog post:

Then, however, he greatly simplified the system. Mainly, this was done by moving text rendering from the PC-98 graphics chip to the text chip, which avoids the need for any text-related unblitting code, but ZUN also added a bunch of smaller changes:

While it would seem that TH05 has no issues with ASCII 0x20 spaces, the text as a whole is still blindly processed two bytes at a time, and any commands can only appear at even byte positions within a line. I dimmed the VRAM pixels to 25% of their original brightness to make the text easier to read.
The same text backported to TH04, additionally demonstrating how that game's dialog system inherited the whitespace skipping behavior of TH03's cutscene system. Just like there, ASCII 0x20 spaces only work at odd byte positions because the game treats them as the trailing byte of a full-width Shift-JIS codepoint. I don't know how large the budget for the upcoming non-ASCII translations will be, but I'm going to fix this even in the very basic fully static variant. I dimmed the VRAM pixels to 25% of their original brightness to make the text easier to read.
Demonstrating the lack of automatic line or box breaks in TH05's dialog systemDemonstrating the lack of automatic line or box breaks in TH04's dialog system, in addition to its lack of support for ASCII 0x20 spaces carried over from TH03's cutscene system

TH05 then moved from TH04's plaintext scripts to the binary .TX2 format while removing all the unused commands copy-pasted from the cutscene system. Except for a single additional command intended to clear a text box, TH05's dialog system only supports a strict subset of the features of TH04's system.
This change also introduced the following differences compared to TH04:

Writing the 0x02 byte to text RAM results in an SX character, which is simply the PC-98 font ROM's glyph for that Shift-JIS codepoint.
Also note how each face change is now preceded by two frames of delay.
No problem in TH04. Note how the dialog also runs a bit faster – TH04 only adds the aforementioned one frame of delay to each face change, and has fewer two-byte chunks of text to display overall.

For modding these files, you probably want to use TXDEF from -Tom-'s MysticTK. It decodes these files into a text representation, and its encoder then takes care of the character-specific byte offsets in the 10-byte header. This text representation simplifies the format a lot by avoiding all corner cases and landmines you'd experience during hex-editing – most notably by interpreting the box-starting 0x0D as a command to show text that takes a string parameter, avoiding the broken calls to script commands in the middle of text. However, you'd still have to manually ensure an even number of bytes on every line of text.

In the entry function of TH05's dialog loop, we also encounter the hack that is responsible for properly handling 📝 ZUN's hidden Extra Stage replay. Since the dialog loop doesn't access the replay inputs but still requires key presses to advance through the boxes, ZUN chose to just skip the dialog altogether in the specific case of the Extra Stage replay being active, and replicated all sprite management commands from the dialog script by just hardcoding them.
And you know what? Not only do I not mind this hack, but I would have preferred it over the actual dialog system! The aforementioned sprite management commands effectively boil down to manual memory management, deallocating all stage enemy and midboss sprites and thus ensuring that the boss sprites end up at specific master.lib sprite IDs (patnums). The hardcoded boss rendering function then expects these sprites to be available at these exact IDs… which means that the otherwise hardcoded bosses can't render properly without the dialog script running before them. :zunpet:
There is absolutely no excuse for the game to burden dialog scripts with this functionality. Sure, delayed deallocation would allow them to blit stage-specific sprites, but the original games don't do that; probably because none of the two games feature an unblitting command. And even if they did, it would have still been cleaner to expose the boss-specific sprite setup as a single script command that can then also be called from game code if the script didn't do so. Commands like these just are a recipe for crashes, especially with parsers that expect fullwidth Shift-JIS text and where misaligned ASCII text can easily cause these commands to be skipped.

But then again, it does make for funny screenshot material if you accidentally the deallocation and then see bosses being turned into stage enemies:

TH04's dialog before the Stage 4 Marisa fight without deallocating the stage sprites inside the script, causing Marisa to be turned into one of the stage enemiesTH04's dialog before the Stage 6 Yuuka fight without deallocating the stage sprites inside the script, causing Yuuka to be turned into two different cels of the same stage enemyTH05's dialog before the Louise fight without deallocating the stage sprites inside the script, causing Louise to be turned into one of the ice enemies from TH05's Stage 2TH05's dialog before the Louise fight without deallocating the stage sprites inside the script, causing Mai and Yuki to be turned into a windmill and fairy/demon enemy, respectively
Some of the more amusing consequences of not calling the sprite-deallocating :th04: \c /  :th05: 0x04 command inside a dialog script.
In the case of 4️⃣, the game then even crashes on this frame at the end of the dialog, in a way that resembles the infamous 📝 TH04 crash before Stage 5 Yuuka if no EMS driver is loaded. Both the stage- and boss-specific BFNT sprites are loaded into memory at this point, leaving no room for the 256×256-pixel background image on the size-limited master.lib heap.

With all the general details out of the way, here's the command reference:

:th04: :th05:
0
1
0x00
0x01
Selects either the player character (0) or the boss (1) as the currently speaking character, and moves the cursor to the beginning of the text box. In TH04, this command also directly starts the new dialog box, which is probably why it's not prefixed with a \ as it only makes sense outside of text. TH05 requires a separate 0x0D command to do the same.
\=1 0x02 0x!! Replaces the face portrait of the currently active speaking character with image #1 within her .CD2 file.
\=255 0x02 0xFF Removes the face portrait from the currently active text box.
\l,filename 0x03 filename 0x00 Calls master.lib's super_entry_bfnt() function, which loads sprites from a BFNT file to consecutive IDs starting at the current patnum write cursor.
\c 0x04 Deallocates all stage-specific BFNT sprites (i.e., stage enemies and midbosses), freeing up conventional RAM for the boss sprites and ensuring that master.lib's patnum write cursor ends up at :th04: 128 / :th05: 180.
In TH05's Extra Stage, this command also replaces 📝 the sprites loaded from MIKO16.BFT with the ones from ST06_16.BFT.
\d Deallocates all face portrait images.
The game automatically does this at the end of each dialog sequence. However, ZUN wanted to load Stage 6 Yuuka's 76 KiB of additional animations inside the script via \l, and would have once again run up against the master.lib heap size limit without that extra free memory.
\m,filename 0x05 filename 0x00 Stops the currently playing BGM, loads a new one from the given file, and starts playback.
\m$ 0x05 $ 0x00 Stops the currently playing BGM.
Note that TH05 interprets $ as a null-terminated filename as well.
\m* Restarts playback of the currently loaded BGM from the beginning.
\b0,0,0 0x06 0x!!!! 0x!!!! 0x!! Blits the master.lib patnum with the ID indicated by the third parameter to the current VRAM page at the top-left screen position indicated by the first two parameters.
\e0 Plays the sound effect with the given ID.
\t100 Sets palette brightness via master.lib's palette_settone() to any value from 0 (fully black) to 200 (fully white). 100 corresponds to the palette's original colors.
\fo1
\fi1
Calls master.lib's palette_black_out() or palette_black_in() to play a hardware palette fade animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
\wo1
\wi1
0x09 0x!!
0x0A 0x!!
Calls master.lib's palette_white_out() or palette_white_in() to play a hardware palette fade animation from or to white, spending roughly 1 frame on each of the 16 fade steps.
The TH05 version of 0x09 also clears the text in both boxes before the animation.
\n 0x0B Starts a new line by resetting the X coordinate of the TRAM cursor to the left edge of the text area and incrementing the Y coordinate.
The new line will always be the next one below the last one that was properly started, regardless of whether the text previously wrapped to the next TRAM row at the edge of the screen.
\g8 Plays a blocking 8-frame screen shake animation. Copy-pasted from the cutscene parser, but actually used right at the end of the dialog shown before TH04's Bad Ending.
\ga0 0x0C 0x!! Shows the gaiji with the given ID from 0 to 255 at the current cursor position, ignoring the per-glyph delay.
\k0 Waits 0 frames (0 = forever) for any key to be pressed before continuing script execution.
0x0D Starts a new dialog box with the previously selected speaker. All text until the next 0xFF command will appear on screen.
Inside dialogs, this is a no-op.
0x0E Takes the current dialog cursor as the top-left corner of a 240×48-pixel rectangle, and replaces all text RAM characters within that rectangle with whitespace.
This is only used to clear the player character's text box before Shinki's final いくよ‼ box. Shinki has two consecutive text boxes in all 4 scripts here, and ZUN probably wanted to clear the otherwise blue text to imply a dramatic pause before Shinki's final sentence. Nice touch.
(You could, however, also use it after a box-ending 0xFF command to mess with text RAM in general.)
\# Quits the currently running loop. This returns from either the text loop to the command loop, or it ends the dialog sequence by returning from the command loop back to gameplay. If this stage of the game later starts another dialog sequence, it will start at the next script byte.
\$ Like \#, but first waits for any key to be pressed.
0xFF Behaves like TH04's \$ in the text loop, and like \# in the command loop. Hence, it's not possible in TH05 to automatically end a text box and advance to the next one without waiting for a key press.
Unused commands are in gray.

At the end of the day, you might criticize the system for how its landmines make it annoying to mod in ASCII text, but it all works and does what it's supposed to. ZUN could have written the cleanest single and central Shift-JIS iterator that properly chunks a byte buffer into halfwidth and fullwidth codepoints, and I'd still be throwing it out for the upcoming non-ASCII translations in favor of something that either also supports UTF-8 or performs dictionary lookups with a full box of text.
The only actual bug can be found in the input detection, which once again doesn't correctly handle the infamous key up/key down scancode quirk of PC-98 keyboards. All it takes is one wrongly placed input polling call, and suddenly you have to think about how the update cycle behind the PC-98 keyboard state bytes might cause the game to run the regular 2-frame delay for a single 2-byte chunk of text before it shows the full text of a box after all… But even this bug is highly theoretical and could probably only be observed very, very rarely, and exclusively on real hardware.


The same can't be said about TH02 though, but more on that later. Let's first take a look at its data, which started out much simpler in that game. The STAGE?.TXT files contain just raw Shift-JIS text with no trace of commands or structure. Turning on the whitespace display feature in your editor reveals how the dialog system even assumes a fixed byte length for each box: 36 bytes per line which will appear on screen, followed by 4 bytes of padding, which the original files conveniently use to visually split the lines via a CR/LF newline sequence. Make sure to disable trimming of trailing whitespace in your editor to not ruin the file when modding the text… :onricdennat:

靈夢:あんた、まだ名前も聞いてないの··
······に覚えられないわよ。・・・・・··
里香:あたいは、里香よ。覚えときなさ··
・・・い。・・・・・・················
Two boxes from TH02's STAGE5.TXT with visualized whitespace. These also demonstrate how the CR/LF newlines only make up 2 of the 4 padding bytes, and require each line to be padded with two more bytes; you could not use these trailing spaces for actual text. Also note how the exquisite mixture of fullwidth and halfwidth spaces demands the text to be viewed with only the most metrically consistent monospace fonts to preserve the intended alignment. 🍷 It appears quite misaligned on my phone.

Consequently, everything else is hardcoded – every effect shown between text boxes, the face portrait shown for each box, and even how many boxes are part of each dialog sequence. Which means that the source code now contains a long hardcoded list of face IDs for most of the text boxes in the game, with the rest being part of the dedicated hardcoded dialog scripts for 2/3 of the game's stages.
Without the restriction to a fixed set of scripting commands, TH02 naturally gravitated to having the most varied dialog sequences of all PC-98 Touhou games. This flexibility certainly facilitated Mima's grand entrance animation in Stage 4, or the different lines in Stage 4 and 5 depending on whether you already used a continue or not. Marisa's post-boss dialog even inserts the number of continues into the text itself – by, you guessed it, writing to hardcoded byte offsets inside the dialog text before printing it to the screen. :godzun: But once again, I have nothing to criticize here – not even the fact that the alternate dialog scripts have to mutate the "box cursor" to jump to the intended boxes within the file. I know that some people in my audience like VMs, but I would have considered it more bloated if ZUN had implemented a full-blown scripting language just to handle all these special cases.


Another unique aspect of TH02 is the way it stores its face portraits, which are infamous for how hard they are to find in the original data files. These sprites are actually map tiles, stored in MIKO_K.MPN, and drawn using the same functions used to blit the regular map tiles to the 📝 tile source area in VRAM. We can only guess why ZUN chose this one out of the three graphics formats he used in TH02:

TH02's MIKO_K.PTN, arranged into a 16×16-tile layout that reveals how these tiles are combined into face portraits.
MPNDEF from -Tom-'s MysticTK conveniently uses this exact layout in its .BMP output. Earlier MPNDEF versions crashed when converting this file as its 256 tiles led to an 8-bit overflow bug, so make sure you've updated to the current version from the end of October 2023 if you want to convert this file yourself. The format stores the 4 bitplanes of each 16×16 tile in order, so good luck finding a different planar image viewer that would support both such a tiled layout and a custom palette. Sometimes, a weird internal format is the best type of obfuscation. :tannedcirno:
TH02's MIKO_K.PTN with the 16×16 tile grid overlaid

And since you're certainly wondering about all these black tiles at the edges: Yes, these are not only part of the file and pad it from the required 240×192 pixels to 256×256, but also kept in memory during a stage, wasting 9.5 KiB of conventional RAM. That's 172 seconds of potential input replay data, just for those people who might still think that we need EMS for replays.


Alright, we've got the text, we've got the faces, let's slide in the box and display it all on screen. Apparently though, we also have to blit the player and option sprites using raw, low-level master.lib function calls in the process? :thonk: This can't be right, especially because ZUN always blits the option sprite associated with the Reimu-A shot type, regardless of which one the player actually selected. And if you keep moving above the box area before the dialog starts, you get to see exactly how wrong this is:

Let's look closer at Reimu's sprite during the slide-in animation, and in the two frames before:

Zoomed-in area around Reimu's sprite from frame 35 of the video aboveZoomed-in area around Reimu's sprite from frame 36 of the video aboveZoomed-in area around Reimu's sprite from frame 37 of the video above

This one image shows off no less than 4 bugs:

  1. ZUN blits the stationary player sprite here, regardless of whether the player was previously moving left or right. This is a nice way of indicating that Reimu stops moving once the dialog starts, but maybe ZUN should have unblitted the old sprite so that the new one wouldn't have appeared on top. The game only unblits the 384×64 pixels covered by the dialog box on every frame of the slide-in animation, so Reimu would only appear correctly if her sprite happened to be entirely located within that area.
  2. All sprites are shifted up by 1 pixel in frame 2️⃣. This one is not a bug in the dialog system, but in the main game loop. The game runs the relevant actions in the following order:

    1. Invalidate any map tiles covered by entities
    2. Redraw invalidated tiles
    3. Decrement the Y coordinate at the top of VRAM according to the scroll speed
    4. Update and render all game entities
    5. Scroll in new tiles as necessary according to the scroll speed, and report whether the game has scrolled one pixel past the end of the map
    6. If that happened, pretend it didn't by incrementing the value calculated in #3 for all further frames and skipping to #8.
    7. Issue a GDC SCROLL command to reflect the line calculated in #3 on the display
    8. Wait for VSync
    9. Flip VRAM pages
    10. Start boss if we're past the end of the map

    The problem here: Once the dialog starts, the game has already rendered an entire new frame, with all sprites being offset by a new Y scroll offset, without adjusting the graphics GDC's scroll registers to compensate. Hence, the Y position in 3️⃣ is the correct one, and the whole existence of frame 2️⃣ is a bug in itself. (Well… OK, probably a quirk because speedrunning exists, and it would be pretty annoying to synchronize any video regression tests of the future TH02 Anniversary Edition if it renders one fewer frame in the middle of a stage.)

  3. ZUN blits the option sprites to their position from frame 1️⃣. This brings us back to 📝 TH02's special way of retaining the previous and current position in a two-element array, indexed with a VRAM page ID. Normally, this would be equivalent to using dedicated prev and cur structure fields and you'd just index it with the back page for every rendering call. But if you then decide to go single-buffered for dialogs and render them onto the front page instead… :zunpet:
    Note that fixing bug #2 would not cancel out this one – the sprites would then simply be rendered to their position in the frame before 1️⃣.

  4. And of course, the fixed option sprite ID also counts as a bug.

As for the boxes themselves, it's yet another loop that prints 2-byte chunks of Shift-JIS text at an even slower fixed interval of 3 frames. In an interesting quirk though, ZUN assumes that every box starts with the name of the speaking character in its first two fullwidth Shift-JIS characters, followed by a fullwidth colon. These 6 bytes are displayed immediately at the start of every box, without the usual delay. The resulting alignment looks rather janky with Genjii, whose single right-padded kanji looks quite awkward with the fullwidth space between the name and the colon. Kind of makes you wonder why ZUN just didn't spell out his proper name, 玄爺, instead, but I get the stylistic difference.
In Stage 4, the two-kanji assumption then breaks with Marisa's three-kanji name, which causes the full-width colon to be printed as the first delayed character in each of her boxes:


That's all the issues and quirks in the system itself. The scripts themselves don't leave much room for bugs as they basically just loop over the hardcoded face ID array at this level… until we reach the end of the game. Previously, the slide-in animation could simply use the tile invalidation and re-rendering system to unblit the box on each frame, which also explained why Reimu had to be separately rendered on top. But this no longer works with a custom-rendered boss background, and so the game just chooses to flood-fill the area with graphics chip color #0:

Then again, transferring pixels from the back page would be just as wrong as they lag one frame behind. No way around capturing these 384×64 pixels to main memory here… Oh well, this flood-fill at least adds even more legibility on top of the already half-transparent text box. A property that the following dialog sequence unfortunately lacks…

For Mima's final defeat dialog though, ZUN chose to not even show the box. He might have realized the issue by that point, or simply preferred the more dramatic effect this had on the lines. The resulting issues, however, might even have ramifications for such un-technical things as lore and character dynamics. :zunpet: As it turns out, the code for this dialog sequence does in fact render Mima's smiling face for all boxes?! You only don't see it in the original game because it's rendered to the other VRAM page that remains invisible during the dialog sequence:

Caution, flashing lights.

Here's how I interpret the situation:

So, the future TH02 Anniversary Edition will fix the bug by showing the back page, but retain the quirk by rewriting the dialog code to not blit the face.


And with that, we've secured all in-game dialog for the upcoming non-ASCII translations! The remaining 2/3 of the last push made for a good occasion to also decompile the small amount of code related to TH03's win messages, stored in the @0?TX.TXT files. Similar to TH02's dialog format, these files are also split into fixed-size blocks of 3×60 bytes. But this time, TH03 loads all 60 bytes of a line, including the CR/LF line breaking codepoints in the original files, into the statically allocated buffer that it renders from. These control characters are then only filtered to whitespace by ZUN's graph_putsa_fx() function. If you remove the line breaks, you get to use the full 60 bytes on every line.
The final commits went to the MIKO.CFG loading and saving functions used in TH04's and TH05's OP.EXE, as well as TH04's game startup code to finally catch up with 📝 TH05's counterpart from over 3 years ago. This brought us right in front of the main menu rendering code in both TH04 and TH05, which is identical in both games and will be tackled in the next PC-98 Touhou delivery.

Next up, though: Returning to Shuusou Gyoku, and adding support for SC-88Pro recordings as BGM. Which may or may not come with a slight controversy…

📝 Posted:
🏷 Tags:

Turns out I was not quite done with the TH01 Anniversary Edition yet. You might have noticed some white streaks at the beginning of Sariel's second form, which are in fact a bug that I accidentally added to the initial release. :tannedcirno:
These can be traced back to a quirk I wasn't aware of, and hadn't documented so far. When defeating Sariel's first form during a pattern that spawns pellets, it's likely for the second form to start with additional pellets that resemble the previous pattern, but come out of seemingly nowhere. This shouldn't really happen if you look at the code: Nothing outside the typical pattern code spawns new pellets, and all existing ones are reset before the form transition…

Except if they're currently showing the 10-frame delay cloud animation , activated for all pellets during the symmetrical radial 2-ring pattern in Phase 2 and left activated for the rest of the fight. These pellets will continue their animation after the transition to the second form, and turn into regular pellets you have to dodge once their animation completed.

By itself, this is just one more quirk to keep in mind during refactoring. It only turned into a bug in the Anniversary Edition because the game tracks the number of living pellets in a separate counter variable. After resetting all pellets, this counter is simply set to 0, regardless of any delay cloud pellets that may still be alive, and it's merely incremented or decremented when pellets are spawned or leave the playfield. :zunpet:
In the original game, this counter is only used as an optimization to skip spawning new pellets once the cap is reached. But with batched EGC-accelerated unblitting, it also makes sense to skip the rather costly setup and shutdown of the EGC if no pellets are active anyway. Except if the counter you use to check for that case can be 0 even if there are pellets alive, which consequently don't get unblitted… :onricdennat:
There is an optimal fix though: Instead of unconditionally resetting the living pellet counter to 0, we decrement it for every pellet that does get reset. This preserves the quirk and gives us a consistently correct counter, allowing us to still skip every unnecessary loop over the pellet array.

Cutting out the lengthy defeat animation makes it easier to see where the additional pellets come from.
Cutting out the lengthy defeat animation makes it easier to see where the additional pellets come from. Also, note how regular unblitting resumes once the first pellet gets clipped at the top of the playfield – the living pellet counter then gets decremented to -1, and who uses <= rather than == on a seemingly unsigned counter, right?
Cutting out the lengthy defeat animation makes it easier to see where the additional pellets come from.

Ultimately, this was a harmless bug that didn't affect gameplay, but it's still something that players would have probably reported a few more times. So here's a free bugfix:

:th01: TH01 Anniversary Edition, version P0234-1 2023-03-14-th01-anniv.zip

Thanks to mu021 for reporting this issue and providing helpful videos to identify the cause!

📝 Posted:
🚚 Summary of:
P0227, P0228
Commits:
4f85326...bfd24c6, bfd24c6...739e1d8
💰 Funded by:
nrook, [Anonymous]
🏷 Tags:

Starting the year with a delivery that wasn't delayed until the last day of the month for once, nice! Still, very soon and high-maintenance did not go well together…

It definitely wasn't Sara's fault though. As you would expect from a Stage 1 Boss, her code was no challenge at all. Most of the TH02, TH04, and TH05 bosses follow the same overall structure, so let's introduce a new table to replace most of the boilerplate overview text:

Phase # Patterns HP boundary Timeout condition
Sprite of Sara in TH05 (Entrance) 4,650 288 frames
2 4 2,550 2,568 frames (= 32 patterns)
3 4 450 5,296 frames (= 24 patterns)
4 1 0 1,300 frames
Total 9 9,452 frames

And that's all the gameplay-relevant detail that ZUN put into Sara's code. It doesn't even make sense to describe the remaining patterns in depth, as their groups can significantly change between difficulties and rank values. The 📝 general code structure of TH05 bosses won't ever make for good-code, but Sara's code is just a lesser example of what I already documented for Shinki.
So, no bugs, no unused content, only inconsequential bloat to be found here, and less than 1 push to get it done… That makes 9 PC-98 Touhou bosses decompiled, with 22 to go, and gets us over the sweet 50% overall finalization mark! 🎉 And sure, it might be possible to pass through the lasers in Sara's final pattern, but the boss script just controls the origin, angle, and activity of lasers, so any quirk there would be part of the laser code… wait, you can do what?!?


TH05 expands TH04's one-off code for Yuuka's Master and Double Sparks into a more featureful laser system, and Sara is the first boss to show it off. Thus, it made sense to look at it again in more detail and finalize the code I had purportedly 📝 reverse-engineered over 4 years ago. That very short delivery notice already hinted at a very time-consuming future finalization of this code, and that prediction certainly came true. On the surface, all of the low-level laser ray rendering and collision detection code is undecompilable: It uses the SI and DI registers without Turbo C++'s safety backups on the stack, and its helper functions take their input and output parameters from convenient registers, completely ignoring common calling conventions. And just to raise the confusion even further, the code doesn't just set these registers for the helper function calls and then restores their original values, but permanently shifts them via additions and subtractions. Unfortunately, these convenient registers also include the BP base pointer to the stack frame of a function… and shifting that register throws any intuition behind accessed local variables right out of the window for a good part of the function, requiring a correctly shifted view of the stack frame just to make sense of it again. :godzun: How could such code even have been written?! This goes well beyond the already wrong assumption that using more stack space is somehow bad, and straight into the territory of self-inflicted pain.

So while it's not a lot of instructions, it's quite dense and really hard to follow. This code would really benefit from a decompilation that anchors all this madness as much as possible in existing C++ structures… so let's decompile it anyway? :tannedcirno:
Doing so would involve emitting lots of raw machine code bytes to hide the SI and DI registers from the compiler, but I already had a certain 📝 batshit insane compiler bug workaround abstraction lying around that could make such code more readable. Hilariously, it only took this one additional use case for that abstraction to reveal itself as premature and way too complicated. :onricdennat: Expanding the core idea into a full-on x86 instruction generator ended up simplifying the code structure a lot. All we really want there is a way to set all potential parameters to e.g. a specific form of the MOV instruction, which can all be expressed as the parameters to a force-inlined __emit__() function. Type safety can help by providing overloads for different operand widths here, but there really is no need for classes, templates, or explicit specialization of templates based on classes. We only need a couple of enums with opcode, register, and prefix constants from the x86 reference documentation, and a set of associated macros that token-paste pseudoregisters onto the prefixes of these enum constants.
And that's how you get a custom compile-time assembler in a 1994 C++ compiler and expand the limits of decompilability even further. What's even truly left now? Self-modifying code, layout tricks that can't be replicated with regularly structured control flow… and that's it. That leaves quite a few functions I previously considered undecompilable to be revisited once I get to work on making this game more portable.

With that, we've turned the low-level laser code into the expected horrible monstrosity that exposes all the hidden complexity in those few ASM instructions. The high-level part should be no big deal now… except that we're immediately bombarded with Fixup overflow errors at link time? Oh well, time to finally learn the true way of fixing this highly annoying issue in a second new piece of decompilation tech – and one that might actually be useful for other x86 Real Mode retro developers at that.
Earlier in the RE history of TH04 and TH05, I often wrote about the need to split the two original code segments into multiple segments within two groups, which makes it possible to slot in code from different translation units at arbitrary places within the original segment. If we don't want to define a unique segment name for each of these slotted-in translation units, we need a way to set custom segment and group names in C land. Turbo C++ offers two #pragmas for that:

For the most part, these #pragmas work well, but they seemed to not help much when it came to calling near functions declared in different segments within the same group. It took a bit of trial and error to figure out what was actually going on in that case, but there is a clear logic to it:

Summarized in code:

#pragma option -zCfoo_TEXT -zPfoo

void bar(void);
void near qux(void); // defined somewhere else, maybe in a different segment

#pragma codeseg baz_TEXT baz

// Despite the segment change in the line above, this function will still be
// put into `foo_TEXT`, the active segment during the first appearance of the
// function name.
void bar(void) {
}

// This function hasn't been declared yet, so it will go into `baz_TEXT` as
// expected.
void baz(void) {
	// This `near` function pointer will be calculated by subtracting the
	// flat/linear address of qux() inside the binary from the base address
	// of qux()'s declared segment, i.e., `foo_TEXT`.
	void (near *ptr_to_qux)(void) = qux;
}

So yeah, you might have to put #pragma codeseg into your headers to tell the linker about the correct segment of a near function in advance. 🤯 This is an important insight for everyone using this compiler, and I'm shocked that none of the Borland C++ books documented the interaction of code segment definitions and near references at least at this level of clarity. The TASM manuals did have a few pages on the topic of groups, but that syntax obviously doesn't apply to a C compiler. Fixup overflows in particular are such a common error and really deserved better than the unhelpful 🤷 of an explanation that ended up in the User's Guide. Maybe this whole technique of custom code segment names was considered arcane even by 1993, judging from the mere three sentences that #pragma codeseg was documented with? Still, it must have been common knowledge among Amusement Makers, because they couldn't have built these exact binaries without knowing about these details. This is the true solution to 📝 any issues involving references to near functions, and I'm glad to see that ZUN did not in fact lie to the compiler. 👍


OK, but now the remaining laser code compiles, and we get to write C++ code to draw some hitboxes during the two collision-detected states of each laser. These confirm what the low-level code from earlier already uncovered: Collision detection against lasers is done by testing a 12×12-pixel box at every 16 pixels along the length of a laser, which leaves obvious 4-pixel gaps at regular intervals that the player can just pass through. :zunpet: This adds 📝 yet 📝 another 📝 quirk to the growing list of quirks that were either intentional or must have been deliberately left in the game after their initial discovery. This is what constants were invented for, and there really is no excuse for not using them – especially during intoxicated coding, and/or if you don't have a compile-time abstraction for Q12.4 literals.

When detecting laser collisions, the game checks the player's single center coordinate against any of the aforementioned 12×12-pixel boxes. Therefore, it's correct to split these 12×12 pixels into two 6×6-pixel boxes and assign the other half to the player for a more natural visualization. Always remember that hitbox visualizations need to keep all colliding entities in mind – 📝 assigning a constant-sized hitbox to "the player" and "the bullets" will be wrong in most other cases.

Using subpixel coordinates in collision detection also introduces a slight inaccuracy into any hitbox visualization recorded in-engine on a 16-color PC-98. Since we have to render discrete pixels, we cannot exactly place a Q12.4 coordinate in the 93.75% of cases where the fractional part is non-zero. This is why pretty much every laser segment hitbox in the video above shows up as 7×7 rather than 6×6: The actual W×H area of each box is 13 pixels smaller, but since the hitbox lies between these pixels, we cannot indicate where it lies exactly, and have to err on the side of caution. It's also why Reimu's box slightly changes size as she moves: Her non-diagonal movement speed is 3.5 pixels per frame, and the constant focused movement in the video above halves that to 1.75 pixels, making her end up on an exact pixel every 4 frames. Looking forward to the glorious future of displays that will allow us to scale up the playfield to 16× its original pixel size, thus rendering the game at its exact internal resolution of 6144×5888 pixels. Such a port would definitely add a lot of value to the game…

The remaining high-level laser code is rather unremarkable for the most part, but raises one final interesting question: With no explicitly defined limit, how wide can a laser be? Looking at the laser structure's 1-byte width field and the unsigned comparisons all throughout the update and rendering code, the answer seems to be an obvious 255 pixels. However, the laser system also contains an automated shrinking state, which can be most notably seen in Mai's wheel pattern. This state shrinks a laser by 2 pixels every 2 frames until it reached a width of 0. This presents a problem with odd widths, which would fall below 0 and overflow back to 255 due to the unsigned nature of this variable. So rather than, I don't know, treating width values of 0 as invalid and stopping at a width of 1, or even adding a condition for that specific case, the code just performs a signed comparison, effectively limiting the width of a shrinkable laser to a maximum of 127 pixels. :zunpet: This small signedness inconsistency now forces the distinction between shrinkable and non-shrinkable lasers onto every single piece of code that uses lasers. Yet another instance where 📝 aiming for a cinematic 30 FPS look made the resulting code much more complicated than if ZUN had just evenly spread out the subtraction across 2 frames. 🤷
Oh well, it's not as if any of the fixed lasers in the original scripts came close to any of these limits. Moving lasers are much more streamlined and limited to begin with: Since they're hardcoded to 6 pixels, the game can safely assume that they're always thinner than the 28 pixels they get gradually widened to during their decay animation.

Finally, in case you were missing a mention of hitboxes in the previous paragraph: Yes, the game always uses the aforementioned 12×12 boxes, regardless of a laser's width.

This video also showcases the 127-pixel limit because I wanted to include the shrink animation for a seamless loop.

That was what, 50% of this blog post just being about complications that made laser difficult for no reason? Next up: The first TH01 Anniversary Edition build, where I finally get to reap the rewards of having a 100% decompiled game and write some good code for once.

📝 Posted:
🚚 Summary of:
P0223, P0224, P0225
Commits:
139746c...371292d, 371292d...8118e61, 8118e61...4f85326
💰 Funded by:
rosenrose, Blue Bolt, Splashman, -Tom-, Yanga, Enderwolf, 32th System
🏷 Tags:

More than three months without any reverse-engineering progress! It's been way too long. Coincidentally, we're at least back with a surprising 1.25% of overall RE, achieved within just 3 pushes. The ending script system is not only more or less the same in TH04 and TH05, but actually originated in TH03, where it's also used for the cutscenes before stages 8 and 9. This means that it was one of the final pieces of code shared between three of the four remaining games, which I got to decompile at roughly 3× the usual speed, or ⅓ of the price.
The only other bargains of this nature remain in OP.EXE. The Music Room is largely equivalent in all three remaining games as well, and the sound device selection, ZUN Soft logo screens, and main/option menus are the same in TH04 and TH05. A lot of that code is in the "technically RE'd but not yet decompiled" ASM form though, so it would shift Finalized% more significantly than RE%. Therefore, make sure to order the new Finalization option rather than Reverse-engineering if you want to make number go up.

  1. General overview
  2. Game-specific differences
  3. Command reference
  4. Thoughts about translation support

So, cutscenes. On the surface, the .TXT files look simple enough: You directly write the text that should appear on the screen into the file without any special markup, and add commands to define visuals, music, and other effects at any place within the script. Let's start with the basics of how text is rendered, which are the same in all three games:


Superficially, the list of game-specific differences doesn't look too long, and can be summarized in a rather short table:

:th03: TH03 :th04: TH04 :th05: TH05
Script size limit 65536 bytes (heap-allocated) 8192 bytes (statically allocated)
Delay between every 2 bytes of text 1 frame by default, customizable via \v None
Text delay when holding ESC Varying speed-up factor None
Visibility of new text Immediately typed onto the screen Rendered onto invisible VRAM page, faded in on wait commands
Visibility of old text Unblitted when starting a new box Left on screen until crossfaded out with new text
Key binding for advancing the script Any key ⏎ Return, Shot, or ESC
Animation while waiting for an advance key None ⏎⃣, past right edge of current row
Inexplicable delays None 1 frame before changing pictures and after rendering new text boxes
Additional delay per interpreter loop 614.4 µs None 614.4 µs
The 614.4 µs correspond to the necessary delay for working around the repeated key up and key down events sent by PC-98 keyboards when holding down a key. While the absence of this delay significantly speeds up TH04's interpreter, it's also the reason why that game will stop recognizing a held ESC key after a few seconds, requiring you to press it again.

It's when you get into the implementation that the combined three systems reveal themselves as a giant mess, with more like 56 differences between the games. :zunpet: Every single new weird line of code opened up another can of worms, which ultimately made all of this end up with 24 pieces of bloat and 14 bugs. The worst of these should be quite interesting for the general PC-98 homebrew developers among my audience:


That brings us to the individual script commands… and yes, I'm going to document every single one of them. Some of their interactions and edge cases are not clear at all from just looking at the code.

Almost all commands are preceded by… well, a 0x5C lead byte. :thonk: Which raises the question of whether we should document it as an ASCII-encoded \ backslash, or a Shift-JIS-encoded ¥ yen sign. From a gaijin perspective, it seems obvious that it's a backslash, as it's consistently displayed as one in most of the editors you would actually use nowadays. But interestingly, iconv -f shift-jis -t utf-8 does convert any 0x5C lead bytes to actual ¥ U+00A5 YEN SIGN code points :tannedcirno:.
Ultimately, the distinction comes down to the font. There are fonts that still render 0x5C as ¥, but mainly do so out of an obvious concern about backward compatibility to JIS X 0201, where this mapping originated. Unsurprisingly, this group includes MS Gothic/Mincho, the old Japanese fonts from Windows 3.1, but even Meiryo and Yu Gothic/Mincho, Microsoft's modern Japanese fonts. Meanwhile, pretty much every other modern font, and freely licensed ones in particular, render this code point as \, even if you set your editor to Shift-JIS. And while ZUN most definitely saw it as a ¥, documenting this code point as \ is less ambiguous in the long run. It can only possibly correspond to one specific code point in either Shift-JIS or UTF-8, and will remain correct even if we later mod the cutscene system to support full-blown Unicode.

Now we've only got to clarify the parameter syntax, and then we can look at the big table of commands:

:th03: :th04: :th05: \@ Clears both VRAM pages by filling them with VRAM color 0.
🐞 In TH03 and TH04, this command does not update the internal text area background used for unblitting. This bug effectively restricts usage of this command to either the beginning of a script (before the first background image is shown) or its end (after no more new text boxes are started). See the image below for an example of using it anywhere else.
:th03: :th04: :th05: \b2 Sets the font weight to a value between 0 (raw font ROM glyphs) to 3 (very thicc). Specifying any other value has no effect.
:th04: :th05: 🐞 In TH04 and TH05, \b3 leads to glitched pixels when rendering half-width glyphs due to a bug in the newly micro-optimized ASM version of 📝 graph_putsa_fx(); see the image below for an example.
In these games, the parameter also directly corresponds to the graph_putsa_fx() effect function, removing the sanity check that was present in TH03. In exchange, you can also access the four dissolve masks for the bold font (\b2) by specifying a parameter between 4 (fewest pixels) to 7 (most pixels). Demo video below.
:th03: :th04: :th05: \c15 Changes the text color to VRAM color 15.
:th05: \c=,15 Adds a color map entry: If is the first code point inside the name area on a new line, the text color is automatically set to 15. Up to 8 such entries can be registered before overflowing the statically allocated buffer.
🐞 The comma is assumed to be present even if the color parameter is omitted.
:th03: :th04: :th05: \e0 Plays the sound effect with the given ID.
:th03: :th04: :th05: \f (no-op)
:th03: :th04: :th05: \fi1
\fo1
Calls master.lib's palette_black_in() or palette_black_out() to play a hardware palette fade animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
:th03: :th04: :th05: \fm1 Fades out BGM volume via PMD's AH=02h interrupt call, in a non-blocking way. The fade speed can range from 1 (slowest) to 127 (fastest).
Values from 128 to 255 technically correspond to AH=02h's fade-in feature, which can't be used from cutscene scripts because it requires BGM volume to first be lowered via AH=19h, and there is no command to do that.
:th03: :th04: :th05: \g8 Plays a blocking 8-frame screen shake animation.
:th03: :th04: \ga0 Shows the gaiji with the given ID from 0 to 255 at the current cursor position. Even in TH03, gaiji always ignore the text delay interval configured with \v.
:th05: @3 TH05's replacement for the \ga command from TH03 and TH04. The default ID of 3 corresponds to the ♫ gaiji. Not to be confused with \@, which starts with a backslash, unlike this command.
:th05: @h Shows the 🎔 gaiji.
:th05: @t Shows the 💦 gaiji.
:th05: @! Shows the ! gaiji.
:th05: @? Shows the ? gaiji.
:th05: @!! Shows the ‼ gaiji.
:th05: @!? Shows the ⁉ gaiji.
:th03: :th04: :th05: \k0 Waits 0 frames (0 = forever) for an advance key to be pressed before continuing script execution. Before waiting, TH05 crossfades in any new text that was previously rendered to the invisible VRAM page…
🐞 …but TH04 doesn't, leaving the text invisible during the wait time. As a workaround, \vp1 can be used before \k to immediately display that text without a fade-in animation.
:th03: :th04: :th05: \m$ Stops the currently playing BGM.
:th03: :th04: :th05: \m* Restarts playback of the currently loaded BGM from the beginning.
:th03: :th04: :th05: \m,filename Stops the currently playing BGM, loads a new one from the given file, and starts playback.
:th03: :th04: :th05: \n Starts a new line at the leftmost X coordinate of the box, i.e., the start of the name area. This is how scripts can "change" the name of the currently speaking character, or use the entire 480×64 pixels without being restricted to the non-name area.
Note that automatic line breaks already move the cursor into a new line. Using this command at the "end" of a line with the maximum number of 30 full-width glyphs would therefore start a second new line and leave the previously started line empty.
If this command moved the cursor into the 5th line of a box, \s is executed afterward, with any of \n's parameters passed to \s.
:th03: :th04: :th05: \p (no-op)
:th03: :th04: :th05: \p- Deallocates the loaded .PI image.
:th03: :th04: :th05: \p,filename Loads the .PI image with the given file into the single .PI slot available to cutscenes. TH04 and TH05 automatically deallocate any previous image, 🐞 TH03 would leak memory without a manual prior call to \p-.
:th03: :th04: :th05: \pp Sets the hardware palette to the one of the loaded .PI image.
:th03: :th04: :th05: \p@ Sets the loaded .PI image as the full-screen 640×400 background image and overwrites both VRAM pages with its pixels, retaining the current hardware palette.
:th03: :th04: :th05: \p= Runs \pp followed by \p@.
:th03: :th04: :th05: \s0
\s-
Ends a text box and starts a new one. Fades in any text rendered to the invisible VRAM page, then waits 0 frames (0 = forever) for an advance key to be pressed. Afterward, the new text box is started with the cursor moved to the top-left corner of the name area.
\s- skips the wait time and starts the new box immediately.
:th03: :th04: :th05: \t100 Sets palette brightness via master.lib's palette_settone() to any value from 0 (fully black) to 200 (fully white). 100 corresponds to the palette's original colors. Preceded by a 1-frame delay unless ESC is held.
:th03: \v1 Sets the number of frames to wait between every 2 bytes of rendered text.
:th04: Sets the number of frames to spend on each of the 4 fade steps when crossfading between old and new text. The game-specific default value is also used before the first use of this command.
:th05: \v2
:th03: :th04: :th05: \vp0 Shows VRAM page 0. Completely useless in TH03 (this game always synchronizes both VRAM pages at a command boundary), only of dubious use in TH04 (for working around a bug in \k), and the games always return to their intended shown page before every blitting operation anyway. A debloated mod of this game would just remove this command, as it exposes an implementation detail that script authors should not need to worry about. None of the original scripts use it anyway.
:th03: :th04: :th05: \w64
  • \w and \wk wait for the given number of frames
  • \wm and \wmk wait until PMD has played back the current BGM for the total number of measures, including loops, given in the first parameter, and fall back on calling \w and \wk with the second parameter as the frame number if BGM is disabled.
    🐞 Neither PMD nor MMD reset the internal measure when stopping playback. If no BGM is playing and the previous BGM hasn't been played back for at least the given number of measures, this command will deadlock.
Since both TH04 and TH05 fade in any new text from the invisible VRAM page, these commands can be used to simulate TH03's typing effect in those games. Demo video below.
Contrary to \k and \s, specifying 0 frames would simply remove any frame delay instead of waiting forever.
The TH03-exclusive k variants allow the delay to be interrupted if ⏎ Return or Shot are held down. TH04 and TH05 recognize the k as well, but removed its functionality.
All of these commands have no effect if ESC is held.
\wm64,64
:th03: \wk64
\wmk64,64
:th03: :th04: :th05: \wi1
\wo1
Calls master.lib's palette_white_in() or palette_white_out() to play a hardware palette fade animation from or to white, spending roughly 1 frame on each of the 16 fade steps.
:th03: :th04: :th05: \=4 Immediately displays the given quarter of the loaded .PI image in the picture area, with no fade effect. Any value ≥ 4 resets the picture area to black.
:th03: :th04: :th05: \==4,1 Crossfades the picture area between its current content and quarter #4 of the loaded .PI image, spending 1 frame on each of the 4 fade steps unless ESC is held. Any value ≥ 4 is replaced with quarter #0.
:th03: :th04: :th05: \$ Stops script execution. Must be called at the end of each file; otherwise, execution continues into whatever lies after the script buffer in memory.
TH05 automatically deallocates the loaded .PI image, TH03 and TH04 require a separate manual call to \p- to not leak its memory.
Bold values signify the default if the parameter is omitted; \c is therefore equivalent to \c15.
Using the \@ command in the middle of a TH03 or TH04 cutscene script
The \@ bug. Yes, the ¥ is fake. It was easier to GIMP it than to reword the sentences so that the backslashes landed on the second byte of a 2-byte half-width character pair. :onricdennat:
Cutscene font weights in TH03Cutscene font weights in TH05, demonstrating the <code>\b3</code> bug that also affects TH04Cutscene font weights in TH03, rendered at a hypothetical unaligned X positionCutscene font weights in TH05, rendered at a hypothetical unaligned X position
The font weights and effects available through \b, including the glitch with \b3 in TH04 and TH05.
Font weight 3 is technically not rendered correctly in TH03 either; if you compare 1️⃣ with 4️⃣, you notice a single missing column of pixels at the left side of each glyph, which would extend into the previous VRAM byte. Ironically, the TH04/TH05 version is more correct in this regard: For half-width glyphs, it preserves any further pixel columns generated by the weight functions in the high byte of the 16-dot glyph variable. Unlike TH03, which still cuts them off when rendering text to unaligned X positions (3️⃣), TH04 and TH05 do bit-rotate them towards their correct place (4️⃣). It's only at byte-aligned X positions (2️⃣) where they remain at their internally calculated place, and appear on screen as these glitched pixel columns, 15 pixels away from the glyph they belong to. It's easy to blame bugs like these on micro-optimized ASM code, but in this instance, you really can't argue against it if the original C++ version was equally incorrect.
Combining \b and s- into a partial dissolve animation. The speed can be controlled with \v.
Simulating TH03's typing effect in TH04 and TH05 via \w. Even prettier in TH05 where we also get an additional fade animation after the box ends.

So yeah, that's the cutscene system. I'm dreading the moment I will have to deal with the other command interpreter in these games, i.e., the stage enemy system. Luckily, that one is completely disconnected from any other system, so I won't have to deal with it until we're close to finishing MAIN.EXE… that is, unless someone requests it before. And it won't involve text encodings or unblitting…


The cutscene system got me thinking in greater detail about how I would implement translations, being one of the main dependencies behind them. This goal has been on the order form for a while and could soon be implemented for these cutscenes, with 100% PI being right around the corner for the TH03 and TH04 cutscene executables.
Once we're there, the "Virgin" old-school way of static translation patching for Latin-script languages could be implemented fairly quickly:

  1. Establish basic UTF-8 parsing for less painful manual editing of the source files
  2. Procedurally generate glyphs for the few required additional letters based on existing font ROM glyphs. For example, we'd generate ä by painting two short lines on top of the font ROM's a glyph, or generate ¿ by vertically flipping the question mark. This way, the text retains a consistent look regardless of whether the translated game is run with an NEC or EPSON font ROM, or the hideous abomination that Neko Project II auto-generates if you don't provide either.
  3. (Optional) Change automatic line breaks to work on a per-word basis, rather than per-glyph

That's it – script editing and distribution would be handled by your local translation group. It might seem as if this would also work for Greek and Cyrillic scripts due to their presence in the PC-98 font ROM, but I'm not sure if I want to attempt procedurally shrinking these glyphs from 16×16 to 8×16… For any more thorough solution, we'd need to go for a more "Chad" kind of full-blown translation support:

  1. Implement text subdivisions at a sensible granularity while retaining automatic line and box breaks
  2. Compile translatable text into a Japanese→target language dictionary (I'm too old to develop any further translation systems that would overwrite modded source text with translations of the original text)
  3. Implement a custom Unicode font system (glyphs would be taken from GNU Unifont unless translators provide a different 8×16 font for their language)
  4. Combine the text compiler with the font compiler to only store needed glyphs as part of the translation's font file (dealing with a multi-MB font file would be rather ugly in a Real Mode game)
  5. Write a simple install/update/patch stacking tool that supports both .HDI and raw-file DOSBox-X scenarios (it's different enough from thcrap to warrant a separate tool – each patch stack would be statically compiled into a single package file in the game's directory)
  6. Add a nice language selection option to the main menu
  7. (Optional) Support proportional fonts

Which sounds more like a separate project to be commissioned from Touhou Patch Center's Open Collective funds, separate from the ReC98 cap. This way, we can make sure that the feature is completely implemented, and I can talk with every interested translator to make sure that their language works.
It's still cheaper overall to do this on PC-98 than to first port the games to a modern system and then translate them. On the other hand, most of the tasks in the Chad variant (3, 4, 5, and half of 2) purely deal with the difficulty of getting arbitrary Unicode characters to work natively in a PC-98 DOS game at all, and would be either unnecessary or trivial if we had already ported the game. Depending on where the patrons' interests lie, it may not be worth it. So let's see what all of you think about which way we should go, or whether it's worth doing at all. (Edit (2022-12-01): With Splashman's order towards the stage dialogue system, we've pretty much confirmed that it is.) Maybe we want to meet in the middle – using e.g. procedural glyph generation for dynamic translations to keep text rendering consistent with the rest of the PC-98 system, and just not support non-Latin-script languages in the beginning? In any case, I've added both options to the order form.
Edit (2023-07-28): Touhou Patch Center has agreed to fund a basic feature set somewhere between the Virgin and Chad level. Check the 📝 dedicated announcement blog post for more details and ideas, and to find out how you can support this goal!


Surprisingly, there was still a bit of RE work left in the third push after all of this, which I filled with some small rendering boilerplate. Since I also wanted to include TH02's playfield overlay functions, 1/15 of that last push went towards getting a TH02-exclusive function out of the way, which also ended up including that game in this delivery. :tannedcirno:
The other small function pointed out how TH05's Stage 5 midboss pops into the playfield quite suddenly, since its clipping test thinks it's only 32 pixels tall rather than 64:

Good chance that the pop-in might have been intended.
Edit (2023-06-30): Actually, it's a 📝 systematic consequence of ZUN having to work around the lack of clipping in master.lib's sprite functions.
There's even another quirk here: The white flash during its first frame is actually carried over from the previous midboss, which the game still considers as actively getting hit by the player shot that defeated it. It's the regular boilerplate code for rendering a midboss that resets the responsible damage variable, and that code doesn't run during the defeat explosion animation.

Next up: Staying with TH05 and looking at more of the pattern code of its boss fights. Given the remaining TH05 budget, it makes the most sense to continue in in-game order, with Sara and the Stage 2 midboss. If more money comes in towards this goal, I could alternatively go for the Mai & Yuki fight and immediately develop a pretty fix for the cheeto storage glitch. Also, there's a rather intricate pull request for direct ZMBV decoding on the website that I've still got to review…

📝 Posted:
🚚 Summary of:
P0214, P0215
Commits:
158a91e...414770c, 414770c...3123c9d
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

Last blog post before the 100% completion of TH01! The final parts of REIIDEN.EXE would feel rather out of place in a celebratory blog post, after all. They provided quite a neat summary of the typical technical details that are wrong with this game, and that I now get to mention for one final time:

But hey, there's an error message if you start REIIDEN.EXE without a resident MDRV2 or a correctly prepared resident structure! And even a good, user-friendly one, asking the user to launch the batch file instead. For some reason, this convenience went out of fashion in the later games.


The Game Over animation (how fitting) gives us TH01's final piece of weird sprite blitting code, which seriously manages to include 2 bugs and 3 quirks in under 50 lines of code. In test mode (game t or game d), you can trigger this effect by pressing the ⬇️ down arrow key, which certainly explains why I encountered seemingly random Game Over events during all the tests I did with this game…
The animation appears to have changed quite a bit during development, to the point that probably even ZUN himself didn't know what he wanted it to look like in the end:

The original version unblits a 32×32 rectangle around Reimu that only grows on the X axis… for the first 5 frames. The unblitting call is only run if the corresponding sprite wasn't clipped at the edges of the playfield in the frame before, and ZUN uses the animation's frame number rather than the sprite loop variable to index the per-sprite clip flag array. The resulting out-of-bounds access then reads the sprite coordinates instead, which are never 0, thus interpreting all 5 sprites as clipped.
This variant would interpret the declared 5 effect coordinates as distinct sprites and unblit them correctly every frame. The end result is rather wimpy though… hardly appropriate for a Game Over, especially with the original animation in mind.
This variant would not unblit anything, and is probably closest to what the final animation should have been.

Finally, we get to the big main() function, serving as the duct tape that holds this game together. It may read rather disorganized with all the (actually necessary) assignments and function calls, but the only actual minor issue I've seen there is that you're robbed of any pellet destroy bonus collected on the final frame of the final boss. There is a certain charm in directly nesting the infinite main gameplay loop within the infinite per-life loop within the infinite stage loop. But come on, why is there no fourth scene loop? :zunpet: Instead, the game just starts a new REIIDEN.EXE process before and after a boss fight. With all the wildly mutated global state, that was probably a much saner choice.

The final secrets can be found in the debug stage selection. ZUN implemented the prompts using the C standard library's scanf() function, which is the natural choice for quick-and-dirty testing features like this one. However, the C standard library is also complete and utter trash, and so it's not surprising that both of the scanf() calls do… well, probably not what ZUN intended. The guaranteed out-of-bounds memory access in the select_flag route prompt thankfully has no real effect on the game, but it gets really interesting with the 面数 stage prompt.
Back in 2020, I already wrote about 📝 stages 21-24, and how they're loaded from actual data that ZUN shipped with the game. As it now turns out, the code that maps stage IDs to STAGE?.DAT scene numbers contains an explicit branch that maps any (1-based) stage number ≥21 to scene 7. Does this mean that an Extra Stage was indeed planned at some point? That branch seems way too specific to just be meant as a fallback. Maybe Asprey was on to something after all…

However, since ZUN passed the stage ID as a signed integer to scanf(), you can also enter negative numbers. The only place that kind of accidentally checks for them is the aforementioned stage ID → scene mapping, which ensures that (1-based) stages < 5 use the shrine's background image and BGM. With no checks anywhere else, we get a new set of "glitch stages":

TH01's stage -1
Stage -1
TH01's stage -2
Stage -2
TH01's stage -3
Stage -3
TH01's stage -4
Stage -4
TH01's stage -5
Stage -5

The scene loading function takes the entered 0-based stage ID value modulo 5, so these 4 are the only ones that "exist", and lower stage numbers will simply loop around to them. When loading these stages, the function accesses the data in REIIDEN.EXE that lies before the statically allocated 5-element stages-of-scene array, which happens to encompass Borland C++'s locale and exception handling data, as well as a small bit of ZUN's global variables. In particular, the obstacle/card HP on the tile I highlighted in green corresponds to the lowest byte of the 32-bit RNG seed. If it weren't for that and the fact that the obstacles/card HP on the few tiles before are similarly controlled by the x86 segment values of certain initialization function addresses, these glitch stages would be completely deterministic across PC-98 systems, and technically canon… :tannedcirno:
Stage -4 is the only playable one here as it's the only stage to end up below the 📝 heap corruption limit of 102 stage objects. Completing it loads Stage -3, which crashes with a Divide Error just like it does if it's directly selected. Unsurprisingly, this happens because all 50 card bytes at that memory location are 0, so one division (or in this case, modulo operation) by the number of cards is enough to crash the game.
Stage -5 is modulo'd to 0 and thus loads the first regular stage. The only apparent broken element there is the timer, which is handled by a completely different function that still operates with a (0-based) stage ID value of -5. Completing the stage loads Stage -4, which also crashes, but only because its 61 cards naturally cause the 📝 stack overflow in the flip-in animation for any stage with more than 50 cards.

And that's REIIDEN.EXE, the biggest and most bloated PC-98 Touhou executable, fully decompiled! Next up: Finishing this game with the main menu, and hoping I'll actually pull it off within 24 hours. (If I do, we might all have to thank 32th System, who independently decompiled half of the remaining 14 functions…)

📝 Posted:
🚚 Summary of:
P0205, P0206
Commits:
3259190...327730f, 327730f...454c105
💰 Funded by:
[Anonymous], Yanga
🏷 Tags:

Oh look, it's another rather short and straightforward boss with a rather small number of bugs and quirks. Yup, contrary to the character's popularity, Mima's premiere is really not all that special in terms of code, and continues the trend established with 📝 Kikuri and 📝 SinGyoku. I've already covered 📝 the initial sprite-related bugs last November, so this post focuses on the main code of the fight itself. The overview:


And there aren't even any weird hitboxes this time. What is maybe special about Mima, however, is how there's something to cover about all of her patterns. Since this is TH01, it's won't surprise anyone that the rotating square patterns are one giant copy-pasta of unblitting, updating, and rendering code. At least ZUN placed the core polar→Cartesian transformation in a separate function for creating regular polygons with an arbitrary number of sides, which might hint toward some more varied shapes having been planned at one point?
5 of the 6 patterns even follow the exact same steps during square update frames:

  1. Calculate square corner coordinates
  2. Unblit the square
  3. Update the square angle and radius
  4. Use the square corner coordinates for spawning pellets or missiles
  5. Recalculate square corner coordinates
  6. Render the square

Notice something? Bullets are spawned before the corner coordinates are updated. That's why their initial positions seem to be a bit off – they are spawned exactly in the corners of the square, it's just that it's the square from 8 frames ago. :tannedcirno:

Mima's first pattern on Normal difficulty.

Once ZUN reached the final laser pattern though, he must have noticed that there's something wrong there… or maybe he just wanted to fire those lasers independently from the square unblit/update/render timer for a change. Spending an additional 16 bytes of the data segment for conveniently remembering the square corner coordinates across frames was definitely a decent investment.

Mima's laser pattern on Lunatic difficulty, now with correct laser spawn positions. If this pattern reminds you of the game crashing immediately when defeating Mima, 📝 check out the Elis blog post for the details behind this bug, and grab the bugfix patch from there.

When Mima isn't shooting bullets from the corners of a square or hopping across the playfield, she's raising flame pillars from the bottom of the playfield within very specifically calculated random ranges… which are then rendered at byte-aligned VRAM positions, while collision detection still uses their actual pixel position. Since I don't want to sound like a broken record all too much, I'll just direct you to 📝 Kikuri, where we've seen the exact same issue with the teardrop ripple sprites. The conclusions are identical as well.

Mima's flame pillar pattern. This video was recorded on a particularly unlucky seed that resulted in great disparities between a pillar's internal X coordinate and its byte-aligned on-screen appearance, leading to lots of right-shifted hitboxes.
Also note how the change from the meteor animation to the three-arm 🚫 casting sprite doesn't unblit the meteor, and leaves that job to any sprite that happens to fly over those pixels.

However, I'd say that the saddest part about this pattern is how choppy it is, with the circle/pillar entities updating and rendering at a meager 7 FPS. Why go that low on purpose when you can just make the game render ✨ smoothly ✨ instead?

So smooth it's almost uncanny.

The reason quickly becomes obvious: With TH01's lack of optimization, going for the full 56.4 FPS would have significantly slowed down the game on its intended 33 MHz CPUs, requiring more than cheap surface-level ASM optimization for a stable frame rate. That might very well have been ZUN's reason for only ever rendering one circle per frame to VRAM, and designing the pattern with these time offsets in mind. It's always been typical for PC-98 developers to target the lowest-spec models that could possibly still run a game, and implementing dynamic frame rates into such an engine-less game is nothing I would wish on anybody. And it's not like TH01 is particularly unique in its choppiness anyway; low frame rates are actually a rather typical part of the PC-98 game aesthetic.


The final piece of weirdness in this fight can be found in phase 1's hop pattern, and specifically its palette manipulation. Just from looking at the pattern code itself, each of the 4 hops is supposed to darken the hardware palette by subtracting #444 from every color. At the last hop, every color should have therefore been reduced to a pitch-black #000, leaving the player completely blind to the movement of the chasing pellets for 30 frames and making the pattern quite ghostly indeed. However, that's not what we see in the actual game:

Nothing in the pattern's code would cause the hardware palette to get brighter before the end of the pattern, and yet…
The expected version doesn't look all too unfair, even on Lunatic… well, at least at the default rank pellet speed shown in this video. At maximum pellet speed, it is in fact rather brutal.

Looking at the frame counter, it appears that something outside the pattern resets the palette every 40 frames. The only known constant with a value of 40 would be the invincibility frames after hitting a boss with the Orb, but we're not hitting Mima here… :thonk:
But as it turns out, that's exactly where the palette reset comes from: The hop animation darkens the hardware palette directly, while the 📝 infamous 12-parameter boss collision handler function unconditionally resets the hardware palette to the "default boss palette" every 40 frames, regardless of whether the boss was hit or not. I'd classify this as a bug: That function has no business doing periodic hardware palette resets outside the invincibility flash effect, and it completely defies common sense that it does.

That explains one unexpected palette change, but could this function possibly also explain the other infamous one, namely, the temporary green discoloration in the Konngara fight? That glitch comes down to how the game actually uses two global "default" palettes: a default boss palette for undoing the invincibility flash effect, and a default stage palette for returning the colors back to normal at the end of the bomb animation or when leaving the Pause menu. And sure enough, the stage palette is the one with the green color, while the boss palette contains the intended colors used throughout the fight. Sending the latter palette to the graphics chip every 40 frames is what corrects the discoloration, which would otherwise be permanent.

The green color comes from BOSS7_D1.GRP, the scrolling background of the entrance animation. That's what turns this into a clear bug: The stage palette is only set a single time in the entire fight, at the beginning of the entrance animation, to the palette of this image. Apart from consistency reasons, it doesn't even make sense to set the stage palette there, as you can't enter the Pause menu or bomb during a blocking animation function.
And just 3 lines of code later, ZUN loads BOSS8_A1.GRP, the main background image of the fight. Moving the stage palette assignment there would have easily prevented the discoloration.

But yeah, as you can tell, palette manipulation is complete jank in this game. Why differentiate between a stage and a boss palette to begin with? The blocking Pause menu function could have easily copied the original palette to a local variable before darkening it, and then restored it after closing the menu. It's not so easy for bombs as the intended palette could change between the start and end of the animation, but the code could have still been simplified a lot if there was just one global "default palette" variable instead of two. Heck, even the other bosses who manipulate their palettes correctly only do so because they manually synchronize the two after every change. The proper defense against bugs that result from wild mutation of global state is to get rid of global state, and not to put up safety nets hidden in the middle of existing effect code.

The easiest way of reproducing the green discoloration bug in the TH01 Konngara fight, timed to show the maximum amount of time the discoloration can possibly last.

In any case, that's Mima done! 7th PC-98 Touhou boss fully decompiled, 24 bosses remaining, and 59 functions left in all of TH01.


In other thrilling news, my call for secondary funding priorities in new TH01 contributions has given us three different priorities so far. This raises an interesting question though: Which of these contributions should I now put towards TH01 immediately, and which ones should I leave in the backlog for the time being? Since I've never liked deciding on priorities, let's turn this into a popularity contest instead: The contributions with the least popular secondary priorities will go towards TH01 first, giving the most popular priorities a higher chance to still be left over after TH01 is done. As of this delivery, we'd have the following popularity order:

  1. TH05 (1.67 pushes), from T0182
  2. Seihou (1 push), from T0184
  3. TH03 (0.67 pushes), from T0146

Which means that T0146 will be consumed for TH01 next, followed by T0184 and then T0182. I only assign transactions immediately before a delivery though, so you all still have the chance to change up these priorities before the next one.

Next up: The final boss of TH01 decompilation, YuugenMagan… if the current or newly incoming TH01 funds happen to be enough to cover the entire fight. If they don't turn out to be, I will have to pass the time with some Seihou work instead, missing the TH01 anniversary deadline as a result. Edit (2022-07-18): Thanks to Yanga for securing the funding for YuugenMagan after all! That fight will feature slightly more than half of all remaining code in TH01's REIIDEN.EXE and the single biggest function in all of PC-98 Touhou, let's go!

📝 Posted:
🚚 Summary of:
P0201, P0202
Commits:
9342665...ff49e9e, ff49e9e...4568bf7
💰 Funded by:
Ember2528, Yanga, [Anonymous]
🏷 Tags:

The positive:

The negative:

The overview:


This time, we're back to the Orb hitbox being a logical 49×49 pixels in SinGyoku's center, and the shot hitbox being the weird one. What happens if you want the shot hitbox to be both offset to the left a bit and stretch the entire width of SinGyoku's sprite? You get a hitbox that ends in mid-air, far away from the right edge of the sprite:

Due to VRAM byte alignment, all player shots fired between gx = 376 and gx = 383 inclusive appear at the same visual X position, but are internally already partly outside the hitbox and therefore won't hit SinGyoku – compare the marked shot at gx = 376 to the one at gx = 380. So much for precisely visualizing hitboxes in this game…

Since the female and male forms also use the sphere entity's coordinates, they share the same hitbox.


Onto the rendering glitches then, which can – you guessed it – all be found in the sphere form's slam movement:

By having the sphere move from the right edge of the playfield to the left, this video demonstrates both the lazy reblitting and broken unblitting at the right edge for negative X velocities. Also, isn't it funny how Reimu can partly disappear from all the sloppy SinGyoku-related unblitting going on after her sprite was blitted?

Due to the low contrast of the sphere against the background, you typically don't notice these glitches, but the white invincibility flashing after a hit really does draw attention to them. This time, all of these glitches aren't even directly caused by ZUN having never learned about the EGC's bit length register – if he just wrote correct code for SinGyoku, none of this would have been an issue. Sigh… I wonder how many more glitches will be caused by improper use of this one function in the last 18% of REIIDEN.EXE.

There's even another bug here, with ZUN hardcoding a horizontal delta of 8 pixels rather than just passing the actual X velocity. Luckily, the maximum movement speed is 6 pixels on Lunatic, and this would have only turned into an additional observable glitch if the X velocity were to exceed 24 pixels. But that just means it's the kind of bug that still drains RE attention to prove that you can't actually observe it in-game under some circumstances.


The 5 pellet patterns are all pretty straightforward, with nothing to talk about. The code architecture during phase 2 does hint towards ZUN having had more creative patterns in mind – especially for the male form, which uses the transformation function's three pattern callback slots for three repetitions of the same pellet group.
There is one more oddity to be found at the very end of the fight:

The first frame of TH01 SinGyoku's defeat animation, showing the sphere blitted on top of a potentially active person form

Right before the defeat white-out animation, the sphere form is explicitly reblitted for no reason, on top of the form that was blitted to VRAM in the previous frame, and regardless of which form is currently active. If SinGyoku was meant to immediately transform back to the sphere form before being defeated, why isn't the person form unblitted before then? Therefore, the visibility of both forms is undeniably canon, and there is some lore meaning to be found here… :thonk:
In any case, that's SinGyoku done! 6th PC-98 Touhou boss fully decompiled, 25 remaining.


No FUUIN.EXE code rounding out the last push for a change, as the 📝 remaining missile code has been waiting in front of SinGyoku for a while. It already looked bad in November, but the angle-based sprite selection function definitely takes the cake when it comes to unnecessary and decadent floating-point abuse in this game.
The algorithm itself is very trivial: Even with 📝 .PTN requiring an additional quarter parameter to access 16×16 sprites, it's essentially just one bit shift, one addition, and one binary AND. For whatever reason though, ZUN casts the 8-bit missile angle into a 64-bit double, which turns the following explicit comparisons (!) against all possible 4 + 16 boundary angles (!!) into FPU operations. :zunpet: Even with naive and readable division and modulo operations, and the whole existence of this function not playing well with Turbo C++ 4.0J's terrible code generation at all, this could have been 3 lines of code and 35 un-inlined constant-time instructions. Instead, we've got this 207-instruction monster… but hey, at least it works. 🤷
The remaining time then went to YuugenMagan's initialization code, which allowed me to immediately remove more declarations from ASM land, but more on that once we get to the rest of that boss fight.

That leaves 76 functions until we're done with TH01! Next up: Card-flipping stage obstacles.

📝 Posted:
🚚 Summary of:
P0198, P0199, P0200
Commits:
48db0b7...440637e, 440637e...5af2048, 5af2048...67e46b5
💰 Funded by:
Ember2528, Lmocinemod, Yanga
🏷 Tags:

What's this? A simple, straightforward, easy-to-decompile TH01 boss with just a few minor quirks and only two rendering-related ZUN bugs? Yup, 2½ pushes, and Kikuri was done. Let's get right into the overview:

So yeah, there's your new timeout challenge. :godzun:


The few issues in this fight all relate to hitboxes, starting with the main one of Kikuri against the Orb. The coordinates in the code clearly describe a hitbox in the upper center of the disc, but then ZUN wrote a < sign instead of a > sign, resulting in an in-game hitbox that's not quite where it was intended to be…

Kikuri's actual hitbox. Since the Orb sprite doesn't change its shape, we can visualize the hitbox in a pixel-perfect way here. The Orb must be completely within the red area for a hit to be registered.
TODO TH01 Kikuri's intended hitboxTH01 Kikuri's actual hitbox

Much worse, however, are the teardrop ripples. It already starts with their rendering routine, which places the sprites from TAMAYEN.PTN at byte-aligned VRAM positions in the ultimate piece of if(…) {…} else if(…) {…} else if(…) {…} meme code. Rather than tracking the position of each of the five ripple sprites, ZUN suddenly went purely functional and manually hardcoded the exact rendering and collision detection calls for each frame of the animation, based on nothing but its total frame counter. :zunpet:
Each of the (up to) 5 columns is also unblitted and blitted individually before moving to the next column, starting at the center and then symmetrically moving out to the left and right edges. This wouldn't be a problem if ZUN's EGC-powered unblitting function didn't word-align its X coordinates to a 16×1 grid. If the ripple sprites happen to start at an odd VRAM byte position, their unblitting coordinates get rounded both down and up to the nearest 16 pixels, thus touching the adjacent 8 pixels of the previously blitted columns and leaving the well-known black vertical bars in their place. :tannedcirno:

OK, so where's the hitbox issue here? If you just look at the raw calculation, it's a slightly confusingly expressed, but perfectly logical 17 pixels. But this is where byte-aligned blitting has a direct effect on gameplay: These ripples can be spawned at any arbitrary, non-byte-aligned VRAM position, and collisions are calculated relative to this internal position. Therefore, the actual hitbox is shifted up to 7 pixels to the right, compared to where you would expect it from a ripple sprite's on-screen position:

Due to the deterministic nature of this part of the fight, it's always 5 pixels for this first set of ripples. These visualizations are obviously not pixel-perfect due to the different potential shapes of Reimu's sprite, so they instead relate to her 32×32 bounding box, which needs to be entirely inside the red area.

We've previously seen the same issue with the 📝 shot hitbox of Elis' bat form, where pixel-perfect collision detection against a byte-aligned sprite was merely a sidenote compared to the more serious X=Y coordinate bug. So why do I elevate it to bug status here? Because it directly affects dodging: Reimu's regular movement speed is 4 pixels per frame, and with the internal position of an on-screen ripple sprite varying by up to 7 pixels, any micrododging (or "grazing") attempt turns into a coin flip. It's sort of mitigated by the fact that Reimu is also only ever rendered at byte-aligned VRAM positions, but I wouldn't say that these two bugs cancel out each other.
Oh well, another set of rendering issues to be fixed in the hypothetical Anniversary Edition – obviously, the hitboxes should remain unchanged. Until then, you can always memorize the exact internal positions. The sequence of teardrop spawn points is completely deterministic and only controlled by the fixed per-difficulty spawn interval.


Aside from more minor coordinate inaccuracies, there's not much of interest in the rest of the pattern code. In another parallel to Elis though, the first soul pattern in phase 4 is aimed on every difficulty except Lunatic, where the pellets are once again statically fired downwards. This time, however, the pattern's difficulty is much more appropriately distributed across the four levels, with the simultaneous spinning circle pellets adding a constant aimed component to every difficulty level.

Kikuri's phase 4 patterns, on every difficulty.


That brings us to 5 fully decompiled PC-98 Touhou bosses, with 26 remaining… and another ½ of a push going to the cutscene code in FUUIN.EXE.
You wouldn't expect something as mundane as the boss slideshow code to contain anything interesting, but there is in fact a slight bit of speculation fuel there. The text typing functions take explicit string lengths, which precisely match the corresponding strings… for the most part. For the "Gatekeeper 'SinGyoku'" string though, ZUN passed 23 characters, not 22. Could that have been the "h" from the Hepburn romanization of 神玉?!
Also, come on, if this text is already blitted to VRAM for no reason, you could have gone for perfect centering at unaligned byte positions; the rendering function would have perfectly supported it. Instead, the X coordinates are still rounded up to the nearest byte.

The hardcoded ending cutscene functions should be even less interesting – don't they just show a bunch of images followed by frame delays? Until they don't, and we reach the 地獄/Jigoku Bad Ending with its special shake/"boom" effect, and this picture:

Picture #2 from ED2A.GRP.

Which is rendered by the following code:

for(int i = 0; i <= boom_duration; i++) { // (yes, off-by-one)
	if((i & 3) == 0) {
		graph_scrollup(8);
	} else {
		graph_scrollup(0);
	}

	end_pic_show(1); // ← different picture is rendered
	frame_delay(2);  // ← blocks until 2 VSync interrupts have occurred

	if(i & 1) {
		end_pic_show(2); // ← picture above is rendered
	} else {
		end_pic_show(1);
	}
}

Notice something? You should never see this picture because it's immediately overwritten before the frame is supposed to end. And yet it's clearly flickering up for about one frame with common emulation settings as well as on my real PC-9821 Nw133, clocked at 133 MHz. master.lib's graph_scrollup() doesn't block until VSync either, and removing these calls doesn't change anything about the blitted images. end_pic_show() uses the EGC to blit the given 320×200 quarter of VRAM from page 1 to the visible page 0, so the bottleneck shouldn't be there either…

…or should it? After setting it up via a few I/O port writes, the common method of EGC-powered blitting works like this:

  1. Read 16 bits from the source VRAM position on any single bitplane. This fills the EGC's 4 16-bit tile registers with the VRAM contents at that specific position on every bitplane. You do not care about the value the CPU returns from the read – in optimized code, you would make sure to just read into a register to avoid useless additional stores into local variables.
  2. Write any 16 bits to the target VRAM position on any single bitplane. This copies the contents of the EGC's tile registers to that specific position on every bitplane.

To transfer pixels from one VRAM page to another, you insert an additional write to I/O port 0xA6 before 1) and 2) to set your source and destination page… and that's where we find the bottleneck. Taking a look at the i486 CPU and its cycle counts, a single one of these page switches costs 17 cycles – 1 for MOVing the page number into AL, and 16 for the OUT instruction itself. Therefore, the 8,000 page switches required for EGC-copying a 320×200-pixel image require 136,000 cycles in total.

And that's the optimal case of using only those two instructions. 📝 As I implied last time, TH01 uses a function call for VRAM page switches, complete with creating and destroying a useless stack frame and unnecessarily updating a global variable in main memory. I tried optimizing ZUN's code by throwing out unnecessary code and using 📝 pseudo-registers to generate probably optimal assembly code, and that did speed up the blitting to almost exactly 50% of the original version's run time. However, it did little about the flickering itself. Here's a comparison of the first loop with boom_duration = 16, recorded in DOSBox-X with cputype=auto and cycles=max, and with i overlaid using the text chip. Caution, flashing lights:

The original animation, completing in 50 frames instead of the expected 34, thanks to slow blitting. Combined with the lack of double-buffering, this results in noticeable tearing as the screen refreshes while blitting is still in progress. (Note how the background of the ドカーン image is shifted 1 pixel to the left compared to pic #1.)
This optimized version completes in the expected 34 frames. No tearing happens to be visible in this recording, but the ドカーン image is still visible on every second loop iteration. (Note how the background of the ドカーン image is shifted 1 pixel to the left compared to pic #1.)

I pushed the optimized code to the th01_end_pic_optimize branch, to also serve as an example of how to get close to optimal code out of Turbo C++ 4.0J without writing a single ASM instruction.
And if you really want to use the EGC for this, that's the best you can do. It really sucks that it merely expanded the GRCG's 4×8-bit tile register to 4×16 bits. With 32 bits, ≥386 CPUs could have taken advantage of their wider registers and instructions to double the blitting performance. Instead, we now know the reason why 📝 Promisence Soft's EGC-powered sprite driver that ZUN later stole for TH03 is called SPRITE16 and not SPRITE32. What a massive disappointment.

But what's perhaps a bigger surprise: Blitting planar images from main memory is much faster than EGC-powered inter-page VRAM copies, despite the required manual access to all 4 bitplanes. In fact, the blitting functions for the .CDG/.CD2 format, used from TH03 onwards, would later demonstrate the optimal method of using REP MOVSD for blitting every line in 32-pixel chunks. If that was also used for these ending images, the core blitting operation would have taken ((12 + (3 × (320 / 32))) × 200 × 4) = 33,600 cycles, with not much more overhead for the surrounding row and bitplane loops. Sure, this doesn't factor in the whole infamous issue of VRAM being slow on PC-98, but the aforementioned 136,000 cycles don't even include any actual blitting either. And as you move up to later PC-98 models with Pentium CPUs, the gap between OUT and REP MOVSD only becomes larger. (Note that the page I linked above has a typo in the cycle count of REP MOVSD on Pentium CPUs: According to the original Intel Architecture and Programming Manual, it's 13+𝑛, not 3+𝑛.)
This difference explains why later games rarely use EGC-"accelerated" inter-page VRAM copies, and keep all of their larger images in main memory. It especially explains why TH04 and TH05 can get away with naively redrawing boss backdrop images on every frame.

In the end, the whole fact that ZUN did not define how long this image should be visible is enough for me to increment the game's overall bug counter. Who would have thought that looking at endings of all things would teach us a PC-98 performance lesson… Sure, optimizing TH01 already seemed promising just by looking at its bloated code, but I had no idea that its performance issues extended so far past that level.

That only leaves the common beginning part of all endings and a short main() function before we're done with FUUIN.EXE, and 98 functions until all of TH01 is decompiled! Next up: SinGyoku, who not only is the quickest boss to defeat in-game, but also comes with the least amount of code. See you very soon!

📝 Posted:
🚚 Summary of:
P0193, P0194, P0195, P0196, P0197
Commits:
e1f3f9f...183d7a2, 183d7a2...5d93a50, 5d93a50...e18c53d, e18c53d...57c9ac5, 57c9ac5...48db0b7
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

With Elis, we've not only reached the midway point in TH01's boss code, but also a bunch of other milestones: Both REIIDEN.EXE and TH01 as a whole have crossed the 75% RE mark, and overall position independence has also finally cracked 80%!

And it got done in 4 pushes again? Yup, we're back to 📝 Konngara levels of redundancy and copy-pasta. This time, it didn't even stop at the big copy-pasted code blocks for the rift sprite and 256-pixel circle animations, with the words "redundant" and "unnecessary" ending up a total of 18 times in my source code comments.
But damn is this fight broken. As usual with TH01 bosses, let's start with a high-level overview:

This puts the earliest possible end of the fight at the first frame of phase 5. However, nothing prevents Elis' HP from reaching 0 before that point. You can nicely see this in 📝 debug mode: Wait until the HP bar has filled up to avoid heap corruption, hold ↵ Return to reduce her HP to 0, and watch how Elis still goes through a total of two patterns* and four teleport animations before accepting defeat.

But wait, heap corruption? Yup, there's a bug in the HP bar that already affected Konngara as well, and it isn't even just about the graphical glitches generated by negative HP:

Since Elis starts with 14 HP, which is an even number, this corruption is trivial to cause: Simply hold ↵ Return from the beginning of the fight, and the completion condition will never be true, as the HP and frame numbers run past the off-by-one meeting point.

Edit (2023-07-21): Pressing ↵ Return to reduce HP also works in test mode (game t). There, the game doesn't even check the heap, and consequently won't report any corruption, allowing the HP bar to be glitched even further.

Regular gameplay, however, entirely prevents this due to the fixed start positions of Reimu and the Orb, the Orb's fixed initial trajectory, and the 50 frames of delay until a bomb deals damage to a boss. These aspects make it impossible to hit Elis within the first 14 frames of phase 1, and ensure that her HP bar is always filled up completely. So ultimately, this bug ends up comparable in seriousness to the 📝 recursion / stack overflow bug in the memory info screen.


These wavy teleport animations point to a quite frustrating architectural issue in this fight. It's not even the fact that unblitting the yellow star sprites rips temporary holes into Elis' sprite; that's almost expected from TH01 at this point. Instead, it's all because of this unused frame of the animation:

An unused wave animation frame from TH01's BOSS5.BOS

With this sprite still being part of BOSS5.BOS, Girl-Elis has a total of 9 animation frames, 1 more than the 📝 8 per-entity sprites allowed by ZUN's architecture. The quick and easy solution would have been to simply bump the sprite array size by 1, but… nah, this would have added another 20 bytes to all 6 of the .BOS image slots. :zunpet: Instead, ZUN wrote the manual position synchronization code I mentioned in that 2020 blog post. Ironically, he then copy-pasted this snippet of code often enough that it ended up taking up more than 120 bytes in the Elis fight alone – with, you guessed it, some of those copies being redundant. Not to mention that just going from 8 to 9 sprites would have allowed ZUN to go down from 6 .BOS image slots to 3. That would have actually saved 420 bytes in addition to the manual synchronization trouble. Looking forward to SinGyoku, that's going to be fun again…


As for the fight itself, it doesn't take long until we reach its most janky danmaku pattern, right in phase 1:

The "pellets along circle" pattern on Lunatic, in its original version and with fanfiction fixes for everything that can potentially be interpreted as a bug.

Then again, it might very well be that all of this was intended, or, most likely, just left in the game as a happy accident. The latter interpretation would explain why ZUN didn't just delete the rendering calls for the lower-right quarter of the circle, because seriously, how would you not spot that? The phase 3 patterns continue with more minor graphical glitches that aren't even worth talking about anymore.


And then Elis transforms into her bat form at the beginning of Phase 5, which displays some rather unique hitboxes. The one against the Orb is fine, but the one against player shots…

… uses the bat's X coordinate for both X and Y dimensions. :zunpet: In regular gameplay, it's not too bad as most of the bat patterns fire aimed pellets which typically don't allow you to move below her sprite to begin with. But if you ever tried destroying these pellets while standing near the middle of the playfield, now you know why that didn't work. This video also nicely points out how the bat, like any boss sprite, is only ever blitted at positions on the 8×1-pixel VRAM byte grid, while collision detection uses the actual pixel position.

The bat form patterns are all relatively simple, with little variation depending on the difficulty level, except for the "slow pellet spreads" pattern. This one is almost easiest to dodge on Lunatic, where the 5-spreads are not only always fired downwards, but also at the hardcoded narrow delta angle, leaving plenty of room for the player to move out of the way:

The "slow pellet spreads" pattern of Elis' bat form, on every difficulty. Which version do you think is the easiest one?

Finally, we've got another potential timesave in the girl form's "safety circle" pattern:

After the circle spawned completely, you lose a life by moving outside it, but doing that immediately advances the pattern past the circle part. This part takes 200 frames, but the defeat animation only takes 82 frames, so you can save up to 118 frames there.

Final funny tidbit: As with all dynamic entities, this circle is only blitted to VRAM page 0 to allow easy unblitting. However, it's also kind of static, and there needs to be some way to keep the Orb, the player shots, and the pellets from ripping holes into it. So, ZUN just re-blits the circle every… 4 frames?! 🤪 The same is true for the Star of David and its surrounding circle, but there you at least get a flash animation to justify it. All the overlap is actually quite a good reason for not even attempting to 📝 mess with the hardware color palette instead.


And that's the 4th PC-98 Touhou boss decompiled, 27 to go… but wait, all these quirks, and I still got nothing about the one actual crash that can appear in regular gameplay? There has even been a recent video about it. The cause has to be in Elis' main function, after entering the defeat branch and before the blocking white-out animation. It can't be anywhere else other than in the 📝 central line blitting and unblitting function, called from 📝 that one broken laser reset+unblit function, because everything else in that branch looks fine… and I think we can rule out a crash in MDRV2's non-blocking fade-out call. That's going to need some extra research, and a 5th push added on top of this delivery.

Reproducing the crash was the whole challenge here. Even after moving Elis and Reimu to the exact positions seen in Pearl's video and setting Elis' HP to 0 on the exact same frame, everything ran fine for me. It's definitely no division by 0 this time, the function perfectly guards against that possibility. The line specified in the function's parameters is always clipped to the VRAM region as well, so we can also rule out illegal memory accesses here…

… or can we? Stepping through it all reminded me of how this function brings unblitting sloppiness to the next level: For each VRAM byte touched, ZUN actually unblits the 4 surrounding bytes, adding one byte to the left and two bytes to the right, and using a single 32-bit read and write per bitplane. So what happens if the function tries to unblit the topmost byte of VRAM, covering the pixel positions from (0, 0) to (7, 0) inclusive? The VRAM offset of 0x0000 is decremented to 0xFFFF to cover the one byte to the left, 4 bytes are written to this address, the CPU's internal offset overflows… and as it turns out, that is illegal even in Real Mode as of the 80286, and will raise a General Protection Fault. Which is… ignored by DOSBox-X, every Neko Project II version in common use, the CSCP emulators, SL9821, and T98-Next. Only Anex86 accurately emulates the behavior of real hardware here.

OK, but no laser fired by Elis ever reaches the top-left corner of the screen. How can such a fault even happen in practice? That's where the broken laser reset+unblit function comes in: Not only does it just flat out pass the wrong parameters to the line unblitting function – describing the line already traveled by the laser and stopping where the laser begins – but it also passes them wrongly, in the form of raw 32-bit fixed-point Q24.8 values, with no conversion other than a truncation to the signed 16-bit pixels expected by the function. What then follows is an attempt at interpolation and clipping to find a line segment between those garbage coordinates that actually falls within the boundaries of VRAM:

  1. right/bottom correspond to a laser's origin position, and left/top to the leftmost pixel of its moved-out top line. The bug therefore only occurs with lasers that stopped growing and have started moving.
  2. Moreover, it will only happen if either (left % 256) or (right % 256) is ≤ 127 and the other one of the two is ≥ 128. The typecast to signed 16-bit integers then turns the former into a large positive value and the latter into a large negative value, triggering the function's clipping code.
  3. The function then follows Bresenham's algorithm: left is ensured to be smaller than right by swapping the two values if necessary. If that happened, top and bottom are also swapped, regardless of their value – the algorithm does not care about their order.
  4. The slope in the X dimension is calculated using an integer division of ((bottom - top) / (right - left)). Both subtractions are done on signed 16-bit integers, and overflow accordingly.
  5. (-left × slope_x) is added to top, and left is set to 0.
  6. If both top and bottom are < 0 or ≥ 640, there's nothing to be unblitted. Otherwise, the final coordinates are clipped to the VRAM range of [(0, 0), (639, 399)].
  7. If the function got this far, the line to be unblitted is now very likely to reach from
    1. the top-left to the bottom-right corner, starting out at (0, 0) right away, or
    2. from the bottom-left corner to the top-right corner. In this case, you'd expect unblitting to end at (639, 0), but thanks to an off-by-one error, it actually ends at (640, -1), which is equivalent to (0, 0). Why add clipping to VRAM offset calculations when everything else is clipped already, right? :godzun:
Possible laser states that will cause the fault, with some debug output to help understand the cause, and any pellets removed for better readability. This can happen for all bosses that can potentially have shootout lasers on screen when being defeated, so it also applies to Mima. Fixing this is easier than understanding why it happens, but since y'all love reading this stuff…

tl;dr: TH01 has a high chance of freezing at a boss defeat sequence if there are diagonally moving lasers on screen, and if your PC-98 system raises a General Protection Fault on a 4-byte write to offset 0xFFFF, and if you don't run a TSR with an INT 0Dh handler that might handle this fault differently.

The easiest fix option would be to just remove the attempted laser unblitting entirely, but that would also have an impact on this game's… distinctive visual glitches, in addition to touching a whole lot of code bytes. If I ever get funded to work on a hypothetical TH01 Anniversary Edition that completely rearchitects the game to fix all these glitches, it would be appropriate there, but not for something that purports to be the original game.

(Sidenote to further hype up this Anniversary Edition idea for PC-98 hardware owners: With the amount of performance left on the table at every corner of this game, I'm pretty confident that we can get it to work decently on PC-98 models with just an 80286 CPU.)

Since we're in critical infrastructure territory once again, I went for the most conservative fix with the least impact on the binary: Simply changing any VRAM offsets >= 0xFFFD to 0x0000 to avoid the GPF, and leaving all other bugs in place. Sure, it's rather lazy and "incorrect"; the function still unblits a 32-pixel block there, but adding a special case for blitting 24 pixels would add way too much code. And seriously, it's not like anything happens in the 8 pixels between (24, 0) and (31, 0) inclusive during gameplay to begin with. To balance out the additional per-row if() branch, I inlined the VRAM page change I/O, saving two function calls and one memory write per unblitted row.

That means it's time for a new community_choice_fixes build, containing the new definitive bugfixed versions of these games: 2022-05-31-community-choice-fixes.zip Check the th01_critical_fixes branch for the modified TH01 code. It also contains a fix for the HP bar heap corruption in test or debug mode – simply changing the == comparison to <= is enough to avoid it, and negative HP will still create aesthetic glitch art.


Once again, I then was left with ½ of a push, which I finally filled with some FUUIN.EXE code, specifically the verdict screen. The most interesting part here is the player title calculation, which is quite sneaky: There are only 6 skill levels, but three groups of titles for each level, and the title you'll see is picked from a random group. It looks like this is the first time anyone has documented the calculation?
As for the levels, ZUN definitely didn't expect players to do particularly well. With a 1cc being the standard goal for completing a Touhou game, it's especially funny how TH01 expects you to continue a lot: The code has branches for up to 21 continues, and the on-screen table explicitly leaves room for 3 digits worth of continues per 5-stage scene. Heck, these counts are even stored in 32-bit long variables.

Next up: 📝 Finally finishing the long overdue Touhou Patch Center MediaWiki update work, while continuing with Kikuri in the meantime. Originally I wasn't sure about what to do between Elis and Seihou, but with Ember2528's surprise contribution last week, y'all have demonstrated more than enough interest in the idea of getting TH01 done sooner rather than later. And I agree – after all, we've got the 25th anniversary of its first public release coming up on August 15, and I might still manage to completely decompile this game by that point…

📝 Posted:
🚚 Summary of:
P0174, P0175, P0176, P0177, P0178, P0179, P0180, P0181
Commits:
27f901c...a0fe812, a0fe812...40ac9a7, 40ac9a7...c5dc45b, c5dc45b...5f0cabc, 5f0cabc...60621f8, 60621f8...9e5b344, 9e5b344...091f19f, 091f19f...313450f
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

Here we go, TH01 Sariel! This is the single biggest boss fight in all of PC-98 Touhou: If we include all custom effect code we previously decompiled, it amounts to a total of 10.31% of all code in TH01 (and 3.14% overall). These 8 pushes cover the final 8.10% (or 2.47% overall), and are likely to be the single biggest delivery this project will ever see. Considering that I only managed to decompile 6.00% across all games in 2021, 2022 is already off to a much better start!

So, how can Sariel's code be that large? Well, we've got:

In total, it's just under 3,000 lines of C++ code, containing a total of 8 definite ZUN bugs, 3 of them being subpixel/pixel confusions. That might not look all too bad if you compare it to the 📝 player control function's 8 bugs in 900 lines of code, but given that Konngara had 0… (Edit (2022-07-17): Konngara contains two bugs after all: A 📝 possible heap corruption in test or debug mode, and the infamous 📝 temporary green discoloration.) And no, the code doesn't make it obvious whether ZUN coded Konngara or Sariel first; there's just as much evidence for either.

Some terminology before we start: Sariel's first form is separated into four phases, indicated by different background images, that cycle until Sariel's HP reach 0 and the second, single-phase form starts. The danmaku patterns within each phase are also on a cycle, and the game picks a random but limited number of patterns per phase before transitioning to the next one. The fight always starts at pattern 1 of phase 1 (the random purple lasers), and each new phase also starts at its respective first pattern.


Sariel's bugs already start at the graphics asset level, before any code gets to run. Some of the patterns include a wand raise animation, which is stored in BOSS6_2.BOS:

TH01 BOSS6_2.BOS
Umm… OK? The same sprite twice, just with slightly different colors? So how is the wand lowered again?

The "lowered wand" sprite is missing in this file simply because it's captured from the regular background image in VRAM, at the beginning of the fight and after every background transition. What I previously thought to be 📝 background storage code has therefore a different meaning in Sariel's case. Since this captured sprite is fully opaque, it will reset the entire 128×128 wand area… wait, 128×128, rather than 96×96? Yup, this lowered sprite is larger than necessary, wasting 1,967 bytes of conventional memory.
That still doesn't quite explain the second sprite in BOSS6_2.BOS though. Turns out that the black part is indeed meant to unblit the purple reflection (?) in the first sprite. But… that's not how you would correctly unblit that?

VRAM after blitting the first sprite of TH01's BOSS6_2.BOS VRAM after blitting the second sprite of TH01's BOSS6_2.BOS

The first sprite already eats up part of the red HUD line, and the second one additionally fails to recover the seal pixels underneath, leaving a nice little black hole and some stray purple pixels until the next background transition. :tannedcirno: Quite ironic given that both sprites do include the right part of the seal, which isn't even part of the animation.


Just like Konngara, Sariel continues the approach of using a single function per danmaku pattern or custom entity. While I appreciate that this allows all pattern- and entity-specific state to be scoped locally to that one function, it quickly gets ugly as soon as such a function has to do more than one thing.
The "bird function" is particularly awful here: It's just one if(…) {…} else if(…) {…} else if(…) {…} chain with different branches for the subfunction parameter, with zero shared code between any of these branches. It also uses 64-bit floating-point double as its subpixel type… and since it also takes four of those as parameters (y'know, just in case the "spawn new bird" subfunction is called), every call site has to also push four double values onto the stack. Thanks to Turbo C++ even using the FPU for pushing a 0.0 constant, we have already reached maximum floating-point decadence before even having seen a single danmaku pattern. Why decadence? Every possible spawn position and velocity in both bird patterns just uses pixel resolution, with no fractional component in sight. And there goes another 720 bytes of conventional memory.

Speaking about bird patterns, the red-bird one is where we find the first code-level ZUN bug: The spawn cross circle sprite suddenly disappears after it finished spawning all the bird eggs. How can we tell it's a bug? Because there is code to smoothly fly this sprite off the playfield, that code just suddenly forgets that the sprite's position is stored in Q12.4 subpixels, and treats it as raw screen pixels instead. :zunpet: As a result, the well-intentioned 640×400 screen-space clipping rectangle effectively shrinks to 38×23 pixels in the top-left corner of the screen. Which the sprite is always outside of, and thus never rendered again.
The intended animation is easily restored though:

Sariel's third pattern, and the first to spawn birds, in its original and fixed versions. Note that I somewhat fixed the bird hatch animation as well: ZUN's code never unblits any frame of animation there, and simply blits every new one on top of the previous one.

Also, did you know that birds actually have a quite unfair 14×38-pixel hitbox? Not that you'd ever collide with them in any of the patterns…

Another 3 of the 8 bugs can be found in the symmetric, interlaced spawn rays used in three of the patterns, and the 32×32 debris "sprites" shown at their endpoint, at the edge of the screen. You kinda have to commend ZUN's attention to detail here, and how he wrote a lot of code for those few rapidly animated pixels that you most likely don't even notice, especially with all the other wrong pixels resulting from rendering glitches. One of the bugs in the very final pattern of phase 4 even turns them into the vortex sprites from the second pattern in phase 1 during the first 5 frames of the first time the pattern is active, and I had to single-step the blitting calls to verify it.
It certainly was annoying how much time I spent making sense of these bugs, and all weird blitting offsets, for just a few pixels… Let's look at something more wholesome, shall we?


So far, we've only seen the PC-98 GRCG being used in RMW (read-modify-write) mode, which I previously 📝 explained in the context of TH01's red-white HP pattern. The second of its three modes, TCR (Tile Compare Read), affects VRAM reads rather than writes, and performs "color extraction" across all 4 bitplanes: Instead of returning raw 1bpp data from one plane, a VRAM read will instead return a bitmask, with a 1 bit at every pixel whose full 4-bit color exactly matches the color at that offset in the GRCG's tile register, and 0 everywhere else. Sariel uses this mode to make sure that the 2×2 particles and the wind effect are only blitted on top of "air color" pixels, with other parts of the background behaving like a mask. The algorithm:

  1. Set the GRCG to TCR mode, and all 8 tile register dots to the air color
  2. Read N bits from the target VRAM position to obtain an N-bit mask where all 1 bits indicate air color pixels at the respective position
  3. AND that mask with the alpha plane of the sprite to be drawn, shifted to the correct start bit within the 8-pixel VRAM byte
  4. Set the GRCG to RMW mode, and all 8 tile register dots to the color that should be drawn
  5. Write the previously obtained bitmask to the same position in VRAM

Quite clever how the extracted colors double as a secondary alpha plane, making for another well-earned good-code tag. The wind effect really doesn't deserve it, though:

As far as I can tell, ZUN didn't use TCR mode anywhere else in PC-98 Touhou. Tune in again later during a TH04 or TH05 push to learn about TDW, the final GRCG mode!


Speaking about the 2×2 particle systems, why do we need three of them? Their only observable difference lies in the way they move their particles:

  1. Up or down in a straight line (used in phases 4 and 2, respectively)
  2. Left or right in a straight line (used in the second form)
  3. Left and right in a sinusoidal motion (used in phase 3, the "dark orange" one)

Out of all possible formats ZUN could have used for storing the positions and velocities of individual particles, he chose a) 64-bit / double-precision floating-point, and b) raw screen pixels. Want to take a guess at which data type is used for which particle system?

If you picked double for 1) and 2), and raw screen pixels for 3), you are of course correct! :godzun: Not that I'm implying that it should have been the other way round – screen pixels would have perfectly fit all three systems use cases, as all 16-bit coordinates are extended to 32 bits for trigonometric calculations anyway. That's what, another 1.080 bytes of wasted conventional memory? And that's even calculated while keeping the current architecture, which allocates space for 3×30 particles as part of the game's global data, although only one of the three particle systems is active at any given time.

That's it for the first form, time to put on "Civilization of Magic"! Or "死なばもろとも"? Or "Theme of 地獄めくり"? Or whatever SYUGEN is supposed to mean…


… and the code of these final patterns comes out roughly as exciting as their in-game impact. With the big exception of the very final "swaying leaves" pattern: After 📝 Q4.4, 📝 Q28.4, 📝 Q24.8, and double variables, this pattern uses… decimal subpixels? Like, multiplying the number by 10, and using the decimal one's digit to represent the fractional part? Well, sure, if you really insist on moving the leaves in cleanly represented integer multiples of ⅒, which is infamously impossible in IEEE 754. Aside from aesthetic reasons, it only really combines less precision (10 possible fractions rather than the usual 16) with the inferior performance of having to use integer divisions and multiplications rather than simple bit shifts. And it's surely not because the leaf sprites needed an extended integer value range of [-3276, +3276], compared to Q12.4's [-2047, +2048]: They are clipped to 640×400 screen space anyway, and are removed as soon as they leave this area.

This pattern also contains the second bug in the "subpixel/pixel confusion hiding an entire animation" category, causing all of BOSS6GR4.GRC to effectively become unused:

The "swaying leaves" pattern. ZUN intended a splash animation to be shown once each leaf "spark" reaches the top of the playfield, which is never displayed in the original game.

At least their hitboxes are what you would expect, exactly covering the 30×30 pixels of Reimu's sprite. Both animation fixes are available on the th01_sariel_fixes branch.

After all that, Sariel's main function turned out fairly unspectacular, just putting everything together and adding some shake, transition, and color pulse effects with a bunch of unnecessary hardware palette changes. There is one reference to a missing BOSS6.GRP file during the first→second form transition, suggesting that Sariel originally had a separate "first form defeat" graphic, before it was replaced with just the shaking effect in the final game.
Speaking about the transition code, it is kind of funny how the… um, imperative and concrete nature of TH01 leads to these 2×24 lines of straight-line code. They kind of look like ZUN rattling off a laundry list of subsystems and raw variables to be reinitialized, making damn sure to not forget anything.


Whew! Second PC-98 Touhou boss completely decompiled, 29 to go, and they'll only get easier from here! 🎉 The next one in line, Elis, is somewhere between Konngara and Sariel as far as x86 instruction count is concerned, so that'll need to wait for some additional funding. Next up, therefore: Looking at a thing in TH03's main game code – really, I have little idea what it will be!

Now that the store is open again, also check out the 📝 updated RE progress overview I've posted together with this one. In addition to more RE, you can now also directly order a variety of mods; all of these are further explained in the order form itself.

📝 Posted:
🚚 Summary of:
P0172, P0173
Commits:
49e6789...2d5491e, 2d5491e...27f901c
💰 Funded by:
Blue Bolt, [Anonymous]
🏷 Tags:

TH03 finally passed 20% RE, and the newly decompiled code contains no serious ZUN bugs! What a nice way to end the year.

There's only a single unlockable feature in TH03: Chiyuri and Yumemi as playable characters, unlocked after a 1CC on any difficulty. Just like the Extra Stages in TH04 and TH05, YUME.NEM contains a single designated variable for this unlocked feature, making it trivial to craft a fully unlocked score file without recording any high scores that others would have to compete against. So, we can now put together a complete set for all PC-98 Touhou games: 2021-12-27-Fully-unlocked-clean-score-files.zip It would have been cool to set the randomly generated encryption keys in these files to a fixed value so that they cancel out and end up not actually encrypting the file. Too bad that TH03 also started feeding each encrypted byte back into its stream cipher, which makes this impossible.

The main loading and saving code turned out to be the second-cleanest implementation of a score file format in PC-98 Touhou, just behind TH02. Only two of the YUME.NEM functions come with nonsensical differences between OP.EXE and MAINL.EXE, rather than 📝 all of them, as in TH01 or 📝 too many of them, as in TH04 and TH05. As for the rest of the per-difficulty structure though… well, it quickly becomes clear why this was the final score file format to be RE'd. The name, score, and stage fields are directly stored in terms of the internal REGI*.BFT sprite IDs used on the high score screen. TH03 also stores 10 score digits for each place rather than the 9 possible ones, keeps any leading 0 digits, and stores the letters of entered names in reverse order… yeah, let's decompile the high score screen as well, for a full understanding of why ZUN might have done all that. (Answer: For no reason at all. :zunpet:)


And wow, what a breath of fresh air. It's surely not good-code: The overlapping shadows resulting from using a 24-pixel letterspacing with 32-pixel glyphs in the name column led ZUN to do quite a lot of unnecessary and slightly confusing rendering work when moving the cursor back and forth, and he even forgot about the EGC there. But it's nowhere close to the level of jank we saw in 📝 TH01's high score menu last year. Good to see that ZUN had learned a thing or two by his third game – especially when it comes to storing the character map cursor in terms of a character ID, and improving the layout of the character map:

The alphabet available for TH03 high score names.

That's almost a nicely regular grid there. With the question mark and the double-wide SP, BS, and END options, the cursor movement code only comes with a reasonable two exceptions, which are easily handled. And while I didn't get this screen completely decompiled, one additional push was enough to cover all important code there.

The only potential glitch on this screen is a result of ZUN's continued use of binary-coded decimal digits without any bounds check or cap. Like the in-game HUD score display in TH04 and TH05, TH03's high score screen simply uses the next glyph in the character set for the most significant digit of any score above 1,000,000,000 points – in this case, the period. Still, it only really gets bad at 8,000,000,000 points: Once the glyphs are exhausted, the blitting function ends up accessing garbage data and filling the entire screen with garbage pixels. For comparison though, the current world record is 133,650,710 points, so good luck getting 8 billion in the first place.

Next up: Starting 2022 with the long-awaited decompilation of TH01's Sariel fight! Due to the 📝 recent price increase, we now got a window in the cap that is going to remain open until tomorrow, providing an early opportunity to set a new priority after Sariel is done.

📝 Posted:
🚚 Summary of:
P0165, P0166, P0167
Commits:
7a0e5d8...f2bca01, f2bca01...e697907, e697907...c2de6ab
💰 Funded by:
Ember2528
🏷 Tags:

OK, TH01 missile bullets. Can we maybe have a well-behaved entity type, without any weirdness? Just once?

Ehh, kinda. Apart from another 150 bytes wasted on unused structure members, this code is indeed more on the low end in terms of overall jank. It does become very obvious why dodging these missiles in the YuugenMagan, Mima, and Elis fights feels so awful though: An unfair 46×46 pixel hitbox around Reimu's center pixel, combined with the comeback of 📝 interlaced rendering, this time in every stage. ZUN probably did this because missiles are the only 16×16 sprite in TH01 that is blitted to unaligned X positions, which effectively ends up touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would have been totally possible to render every missile in every frame at roughly the same amount of CPU time that the original game uses for interlaced rendering:

That's an optimization that would have significantly benefitted the game, in contrast to all of the fake ones introduced in later games. Then again, this optimization is actually something that the later games do, and it might have in fact been necessary to achieve their higher bullet counts without significant slowdown.

Unfortunately, it was only worth decompiling half of the missile code right now, thanks to gratuitous FPU usage in the other half, where 📝 double variables are compared to float literals. That one will have to wait 📝 until after SinGyoku.


After some effectively unused Mima sprite effect code that is so broken that it's impossible to make sense out of it, we get to the final feature I wanted to cover for all bosses in parallel before returning to Sariel: The separate sprite background storage for moving or animated boss sprites in the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from main memory on every frame. After all, TH01 and TH02 had a minimum required clock speed of 33 MHz, half of the speed required for the later three games. So, he simply blitted these boss sprites to both VRAM pages, leading the usual unblitting calls to only remove the other sprites on top of the boss. However, these bosses themselves want to move across the screen… and this makes it necessary to save the stage background behind them in some other way.

Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM into a sprite slot. No problem with that approach in theory, as the size of all these bigger sprites is a multiple of 32×32; splitting a larger sprite into these smaller 32×32 chunks makes the code look just a little bit clumsy (and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is blitted on top of her regular sprite, using just regular sprite transparency, she ends up with her infamous third arm:

TH01 Mima's third arm

Ironically, there's an unused code path in Mima's unblit function where ZUN assumes a height of 48 pixels for Mima's animation sprites rather than the actual 64. This leads to even clumsier .PTN function calls for the bottom 128×16 pixels… Failing to unblit the bottom 16 pixels would have also yielded that third arm, although it wouldn't have looked as natural. Still wouldn't say that it was intentional; maybe this casting sprite was just added pretty late in the game's development?


So, mission accomplished, Sariel unblocked… at 2¼ pushes. :thonk: That's quite some time left for some smaller stage initialization code, which bundles a bunch of random function calls in places where they logically really don't belong. The stage opening animation then adds a bunch of VRAM inter-page copies that are not only redundant but can't even be understood without knowing the hidden internal state of the last VRAM page accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any complexity limit on inlining arithmetic expressions, as long as they only operate on compile-time constants. That's how we get macro-free, compile-time Shift-JIS to JIS X 0208 conversion of the individual code points in the 東方★靈異伝 string, in a compiler from 1994. As long as you don't store any intermediate results in variables, that is… :tannedcirno:

But wait, there's more! With still ¼ of a push left, I also went for the boss defeat animation, which includes the route selection after the SinGyoku fight.
As in all other instances, the 2× scaled font is accomplished by first rendering the text at regular 1× resolution to the other, invisible VRAM page, and then scaled from there to the visible one. However, the route selection is unique in that its scaled text is both drawn transparently on top of the stage background (not onto a black one), and can also change colors depending on the selection. It would have been no problem to unblit and reblit the text by rendering the 1× version to a position on the invisible VRAM page that isn't covered by the 2× version on the visible one, but ZUN (needlessly) clears the invisible page before rendering any text. :zunpet: Instead, he assigned a separate VRAM color for both the 魔界 and 地獄 options, and only changed the palette value for these colors to white or gray, depending on the correct selection. This is another one of the 📝 rare cases where TH01 demonstrates good use of PC-98 hardware, as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.

Then, why does this still not count as good-code? When changing palette colors, you kinda need to be aware of everything else that can possibly be on screen, which colors are used there, and which aren't and can therefore be used for such an effect without affecting other sprites. In this case, well… hover over the image below, and notice how Reimu's hair and the bomb sprites in the HUD light up when Makai is selected:

Demonstration of palette changes in TH01's route selection

This push did end on a high note though, with the generic, non-SinGyoku version of the defeat animation being an easily parametrizable copy. And that's how you decompile another 2.58% of TH01 in just slightly over three pushes.


Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and SinGyoku without needing any more detours into non-boss code. Thanks to the current TH01 funding subscriptions, I can plan to cover most, if not all, of Sariel in a single push series, but the currently 3 pending pushes probably won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got quite a lot of not specifically TH01-related funds in the backlog to pass the time though.

Due to recent developments, it actually makes quite a lot of sense to take a break from TH01: spaztron64 has managed what every Touhou download site so far has failed to do: Bundling all 5 game onto a single .HDI together with pre-configured PC-98 emulators and a nice boot menu, and hosting the resulting package on a proper website. While this first release is already quite good (and much better than my attempt from 2014), there is still a bit of room for improvement to be gained from specific ReC98 research. Next up, therefore:

📝 Posted:
🚚 Summary of:
P0162, P0163, P0164
Commits:
81dd96e...24b3a0d, 24b3a0d...6d572b3, 6d572b3...7a0e5d8
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

No technical obstacles for once! Just pure overcomplicated ZUN code. Unlike 📝 Konngara's main function, the main TH01 player function was every bit as difficult to decompile as you would expect from its size.

With TH01 using both separate left- and right-facing sprites for all of Reimu's moves and separate classes for Reimu's 32×32 and 48×* sprites, we're already off to a bad start. Sure, sprite mirroring is minimally more involved on PC-98, as the planar nature of VRAM requires the bits within an 8-pixel byte to also be mirrored, in addition to writing the sprite bytes from right to left. TH03 uses a 256-byte lookup table for this, generated at runtime by an infamous micro-optimized and undecompilable ASM algorithm. With TH01's existing architecture, ZUN would have then needed to write 3 additional blitting functions. But instead, he chose to waste a total of 26,112 bytes of memory on pre-mirrored sprites… :godzun:

Alright, but surely selecting those sprites from code is no big deal? Just store the direction Reimu is facing in, and then add some branches to the rendering code. And there is in fact a variable for Reimu's direction… during regular arrow-key movement, and another one while shooting and sliding, and a third as part of the special attack types, launched out of a slide.
Well, OK, technically, the last two are the same variable. But that's even worse, because it means that ZUN stores two distinct enums at the same place in memory: Shooting and sliding uses 1 for left, 2 for right, and 3 for the "invalid" direction of holding both, while the special attack types indicate the direction in their lowest bit, with 0 for right and 1 for left. I decompiled the latter as bitflags, but in ZUN's code, each of the 8 permutations is handled as a distinct type, with copy-pasted and adapted code… :zunpet: The interpretation of this two-enum "sub-mode" union variable is controlled by yet another "mode" variable… and unsurprisingly, two of the bugs in this function relate to the sub-mode variable being interpreted incorrectly.

Also, "rendering code"? This one big function basically consists of separate unblit→update→render code snippets for every state and direction Reimu can be in (moving, shooting, swinging, sliding, special-attacking, and bombing), pasted together into a tangled mess of nested if(…) statements. While a lot of the code is copy-pasted, there are still a number of inconsistencies that defeat the point of my usual refactoring treatment. After all, with a total of 85 conditional branches, anything more than I did would have just obscured the control flow too badly, making it even harder to understand what's going on.
In the end, I spotted a total of 8 bugs in this function, all of which leave Reimu invisible for one or more frames:

Thanks to the last one, Reimu's first swing animation frame is never actually rendered. So whenever someone complains about TH01 sprite flickering on an emulator: That emulator is accurate, it's the game that's poorly written. :tannedcirno:

And guess what, this function doesn't even contain everything you'd associate with per-frame player behavior. While it does handle Yin-Yang Orb repulsion as part of slides and special attacks, it does not handle the actual player/Orb collision that results in lives being lost. The funny thing about this: These two things are done in the same function… :onricdennat:

Therefore, the life loss animation is also part of another function. This is where we find the final glitch in this 3-push series: Before the 16-frame shake, this function only unblits a 32×32 area around Reimu's center point, even though it's possible to lose a life during the non-deflecting part of a 48×48-pixel animation. In that case, the extra pixels will just stay on screen during the shake. They are unblitted afterwards though, which suggests that ZUN was at least somewhat aware of the issue?
Finally, the chance to see the alternate life loss sprite Alternate TH01 life loss sprite is exactly ⅛.


As for any new insights into game mechanics… you know what? I'm just not going to write anything, and leave you with this flowchart instead. Here's the definitive guide on how to control Reimu in TH01 we've been waiting for 24 years:

(SVG download)

Pellets are deflected during all gray states. Not shown is the obvious "double-tap Z and X" transition from all non-(#1) states to the Bomb state, but that would have made this diagram even more unwieldy than it turned out. And yes, you can shoot twice as fast while moving left or right.

While I'm at it, here are two more animations from MIKO.PTN which aren't referenced by any code:

An unused animation from TH01's MIKO.PTNAn unused animation from TH01's MIKO.PTN

With that monster of a function taken care of, we've only got boss sprite animation as the final blocker of uninterrupted Sariel progress. Due to some unfavorable code layout in the Mima segment though, I'll need to spend a bit more time with some of the features used there. Next up: The missile bullets used in the Mima and YuugenMagan fights.

📝 Posted:
🚚 Summary of:
P0160, P0161
Commits:
e491cd7...42ba4a5, 42ba4a5...81dd96e
💰 Funded by:
Yanga, [Anonymous]
🏷 Tags:

Nothing really noteworthy in TH01's stage timer code, just yet another HUD element that is needlessly drawn into VRAM. Sure, ZUN applies his custom boldfacing effect on top of the glyphs retrieved from font ROM, but he could have easily installed those modified glyphs as gaiji.
Well, OK, halfwidth gaiji aren't exactly well documented, and sometimes not even correctly emulated 📝 due to the same PC-98 hardware oddity I was researching last month. I've reserved two of the pending anonymous "anything" pushes for the conclusion of this research, just in case you were wondering why the outstanding workload is now lower after the two delivered here.

And since it doesn't seem to be clearly documented elsewhere: Every 2 ticks on the stage timer correspond to 4 frames.


So, TH01 rank pellet speed. The resident pellet speed value is a factor ranging from a minimum of -0.375 up to a maximum of 0.5 (pixels per frame), multiplied with the difficulty-adjusted base speed for each pellet and added on top of that same speed. This multiplier is modified

Apparently, ZUN noted that these deltas couldn't be losslessly stored in an IEEE 754 floating-point variable, and therefore didn't store the pellet speed factor exactly in a way that would correspond to its gameplay effect. Instead, it's stored similar to Q12.4 subpixels: as a simple integer, pre-multiplied by 40. This results in a raw range of -15 to 20, which is what the undecompiled ASM calls still use. When spawning a new pellet, its base speed is first multiplied by that factor, and then divided by 40 again. This is actually quite smart: The calculation doesn't need to be aware of either Q12.4 or the 40× format, as ((Q12.4 * factor×40) / factor×40) still comes out as a Q12.4 subpixel even if all numbers are integers. The only limiting issue here would be the potential overflow of the 16-bit multiplication at unadjusted base speeds of more than 50 pixels per frame, but that'd be seriously unplayable.
So yeah, pellet speed modifications are indeed gradual, and don't just fall into the coarse three "high, normal, and low" categories.


That's ⅝ of P0160 done, and the continue and pause menus would make good candidates to fill up the remaining ⅜… except that it seemed impossible to figure out the correct compiler options for this code?
The issues centered around the two effects of Turbo C++ 4.0J's -O switch:

  1. Optimizing jump instructions: merging duplicate successive jumps into a single one, and merging duplicated instructions at the end of conditional branches into a single place under a single branch, which the other branches then jump to
  2. Compressing ADD SP and POP CX stack-clearing instructions after multiple successive CALLs to __cdecl functions into a single ADD SP with the combined parameter stack size of all function calls

But how can the ASM for these functions exhibit #1 but not #2? How can it be seemingly optimized and unoptimized at the same time? The only option that gets somewhat close would be -O- -y, which emits line number information into the .OBJ files for debugging. This combination provides its own kind of #1, but these functions clearly need the real deal.

The research into this issue ended up consuming a full push on its own. In the end, this solution turned out to be completely unrelated to compiler options, and instead came from the effects of a compiler bug in a totally different place. Initializing a local structure instance or array like

const uint4_t flash_colors[3] = { 3, 4, 5 };

always emits the { 3, 4, 5 } array into the program's data segment, and then generates a call to the internal SCOPY@ function which copies this data array to the local variable on the stack. And as soon as this SCOPY@ call is emitted, the -O optimization #1 is disabled for the entire rest of the translation unit?!
So, any code segment with an SCOPY@ call followed by __cdecl functions must strictly be decompiled from top to bottom, mirroring the original layout of translation units. That means no TH01 continue and pause menus before we haven't decompiled the bomb animation, which contains such an SCOPY@ call. 😕
Luckily, TH01 is the only game where this bug leads to significant restrictions in decompilation order, as later games predominantly use the pascal calling convention, in which each function itself clears its stack as part of its RET instruction.


What now, then? With 51% of REIIDEN.EXE decompiled, we're slowly running out of small features that can be decompiled within ⅜ of a push. Good that I haven't been looking a lot into OP.EXE and FUUIN.EXE, which pretty much only got easy pieces of code left to do. Maybe I'll end up finishing their decompilations entirely within these smaller gaps?
I still ended up finding one more small piece in REIIDEN.EXE though: The particle system, seen in the Mima fight.

I like how everything about this animation is contained within a single function that is called once per frame, but ZUN could have really consolidated the spawning code for new particles a bit. In Mima's fight, particles are only spawned from the top and right edges of the screen, but the function in fact contains unused code for all other 7 possible directions, written in quite a bloated manner. This wouldn't feel quite as unused if ZUN had used an angle parameter instead… :thonk: Also, why unnecessarily waste another 40 bytes of the BSS segment?

But wait, what's going on with the very first spawned particle that just stops near the bottom edge of the screen in the video above? Well, even in such a simple and self-contained function, ZUN managed to include an off-by-one error. This one then results in an out-of-bounds array access on the 80th frame, where the code attempts to spawn a 41st particle. If the first particle was unlucky to be both slow enough and spawned away far enough from the bottom and right edges, the spawning code will then kill it off before its unblitting code gets to run, leaving its pixel on the screen until something else overlaps it and causes it to be unblitted.
Which, during regular gameplay, will quickly happen with the Orb, all the pellets flying around, and your own player movement. Also, the RNG can easily spawn this particle at a position and velocity that causes it to leave the screen more quickly. Kind of impressive how ZUN laid out the structure of arrays in a way that ensured practically no effect of this bug on the game; this glitch could have easily happened every 80 frames instead. He almost got close to all bugs canceling out each other here! :godzun:

Next up: The player control functions, including the second-biggest function in all of PC-98 Touhou.

📝 Posted:
🚚 Summary of:
P0149, P0150, P0151, P0152
Commits:
e1a26bb...05e4c4a, 05e4c4a...768251d, 768251d...4d24ca5, 4d24ca5...81fc861
💰 Funded by:
Blue Bolt, Ember2528, -Tom-, [Anonymous]
🏷 Tags:

…or maybe not that soon, as it would have only wasted time to untangle the bullet update commits from the rest of the progress. So, here's all the bullet spawning code in TH04 and TH05 instead. I hope you're ready for this, there's a lot to talk about!

(For the sake of readability, "bullets" in this blog post refers to the white 8×8 pellets and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)


But first, what was going on 📝 in 2020? Spent 4 pushes on the basic types and constants back then, still ended up confusing a couple of things, and even getting some wrong. Like how TH05's "bullet slowdown" flag actually always prevents slowdown and fires bullets at a constant speed instead. :tannedcirno: Or how "random spread" is not the best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen, which deserve different names:

Mechanic #1: Clearing bullets for a custom amount of time, awarding 1000 points for all bullets alive on the first frame, and 100 points for all bullets spawned during the clear time.
Mechanic #2: Zapping bullets for a fixed 16 frames, awarding a semi-exponential and loudly announced Bonus!! for all bullets alive on the first frame, and preventing new bullets from being spawned during those 16 frames. In TH04 at least; thanks to a ZUN bug, zapping got reduced to 1 frame and no animation in TH05…

Bullets are zapped at the end of most midboss and boss phases, and cleared everywhere else – most notably, during bombs, when losing a life, or as rewards for extends or a maximized Dream bonus. The Bonus!! points awarded for zapping bullets are calculated iteratively, so it's not trivial to give an exact formula for these. For a small number 𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛 points – or, using uth05win's (correct) recursive definition, Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10. However, one of the internal step variables is capped at a different number of points for each difficulty (and game), after which the points only increase linearly. Hence, "semi-exponential".


On to TH04's bullet spawn code then, because that one can at least be decompiled. And immediately, we have to deal with a pointless distinction between regular bullets, with either a decelerating or constant velocity, and special bullets, with preset velocity changes during their lifetime. That preset has to be set somewhere, so why have separate functions? In TH04, this separation continues even down to the lowest level of functions, where values are written into the global bullet array. TH05 merges those two functions into one, but then goes too far and uses self-modifying code to save a grand total of two local variables… Luckily, the rest of its actual code is identical to TH04.

Most of the complexity in bullet spawning comes from the (thankfully shared) helper function that calculates the velocities of the individual bullets within a group. Both games handle each group type via a large switch statement, which is where TH04 shows off another Turbo C++ 4.0 optimization: If the range of case values is too sparse to be meaningfully expressed in a jump table, it usually generates a linear search through a second value table. But with the -G command-line option, it instead generates branching code for a binary search through the set of cases. 𝑂(log 𝑛) as the worst case for a switch statement in a C++ compiler from 1994… that's so cool. But still, why are the values in TH04's group type enum all over the place to begin with? :onricdennat:
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only shows up here and in a few places in TH02, compared to at least 50 switch value tables.

In all of its micro-optimized pointlessness, TH05's undecompilable version at least fixes some of TH04's redundancy. While it's still not even optimal, it's at least a decently written piece of ASM… if you take the time to understand what's going on there, because it certainly took quite a bit of that to verify that all of the things which looked like bugs or quirks were in fact correct. And that's how the code for this function ended up with 35% comments and blank lines before I could confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04, where an invalid bullet group type would fill all remaining slots in the bullet array with identical versions of the first bullet.

Something that both games also share in these functions is an over-reliance on globals for return values or other local state. The most ridiculous example here: Tuning the speed of a bullet based on rank actually mutates the global bullet template… which ZUN then works around by adding a wrapper function around both regular and special bullet spawning, which saves the base speed before executing that function, and restores it afterward. :zunpet: Add another set of wrappers to bypass that exact tuning, and you've expanded your nice 1-function interface to 4 functions. Oh, and did I mention that TH04 pointlessly duplicates the first set of wrapper functions for 3 of the 4 difficulties, which can't even be explained with "debugging reasons"? That's 10 functions then… and probably explains why I've procrastinated this feature for so long.

At this point, I also finally stopped decompiling ZUN's original ASM just for the sake of it. All these small TH05 functions would look horribly unidiomatic, are identical to their decompiled TH04 counterparts anyway, except for some unique constant… and, in the case of TH05's rank-based speed tuning function, actually become undecompilable as soon as we want to return a C++ class to preserve the semantic meaning of the return value. Mainly, this is because Turbo C++ does not allow register pseudo-variables like _AX or _AL to be cast into class types, even if their size matches. Decompiling that function would have therefore lowered the quality of the rest of the decompiled code, in exchange for the additional maintenance and compile-time cost of another translation unit. Not worth it – and for a TH05 port, you'd already have to decompile all the rest of the bullet spawning code anyway!


The only thing in there that was still somewhat worth being decompiled was the pre-spawn clipping and collision detection function. Due to what's probably a micro-optimization mistake, the TH05 version continues to spawn a bullet even if it was spawned on top of the player. This might sound like it has a different effect on gameplay… until you realize that the player got hit in this case and will either lose a life or deathbomb, both of which will cause all on-screen bullets to be cleared anyway. So it's at most a visual glitch.

But while we're at it, can we please stop talking about hitboxes? At least in the context of TH04 and TH05 bullets. The actual collision detection is described way better as a kill delta of 8×8 pixels between the center points of the player and a bullet. You can distribute these pixels to any combination of bullet and player "hitboxes" that make up 8×8. 4×4 around both the player and bullets? 1×1 for bullets, and 8×8 for the player? All equally valid… or perhaps none of them, once you keep in mind that other entity types might have different kill deltas. With that in mind, the concept of a "hitbox" turns into just a confusing abstraction.

The same is true for the 36×44 graze box delta. For some reason, this one is not exactly around the center of a bullet, but shifted to the right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the player, but only up to 16 pixels left of the player. uth05win also spotted this… and rotated the deltas clockwise by 90°?!


Which brings us to the bullet updates… for which I still had to research a decompilation workaround, because 📝 P0148 turned out to not help at all? Instead, the solution was to lie to the compiler about the true segment distance of the popup function and declare its signature far rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function call/return per frame, and those precious 2 bytes in the BSS segment that he didn't have to spend on a segment value. 📝 Another function that didn't have just a single declaration in a common header file… really, 📝 how were these games even built???

The function itself is among the longer ones in both games. It especially stands out in the indentation department, with 7 levels at its most indented point – and that's the minimum of what's possible without goto. Only two more notable discoveries there:

  1. Bullets are the only entity affected by Slow Mode. If the number of bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04, or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame rate by 33%, by waiting for one additional VSync event every two frames.
    The code also reveals a second tier, with 50% slowdown for a slightly higher number of bullets, but that conditional branch can never be executed :zunpet:
  2. Bullets must have been grazed in a previous frame before they can be collided with. (Note how this does not apply to bullets that spawned on top of the player, as explained earlier!)

Whew… When did ReC98 turn into a full-on code review?! 😅 And after all this, we're still not done with TH04 and TH05 bullets, with all the special movement types still missing. That should be less than one push though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun rewriting the Touhou Wiki Gameplay pages 😛

📝 Posted:
🚚 Summary of:
P0111, P0112
Commits:
8b5c146...4ef4c9e, 4ef4c9e...e447a2d
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:

Only one newly ordered push since I've reopened the store? Great, that's all the justification I needed for the extended maintenance delay that was part of these two pushes 😛

Having to write comments to explain whether coordinates are relative to the top-left corner of the screen or the top-left corner of the playfield has finally become old. So, I introduced distinct types for all the coordinate systems we typically encounter, applying them to all code decompiled so far. Note how the planar nature of PC-98 VRAM meant that X and Y coordinates also had to be different from each other. On the X side, there's mainly the distinction between the [0; 640] screen space and the corresponding [0; 80] VRAM byte space. On the Y side, we also have the [0; 400] screen space, but the visible area of VRAM might be limited to [0; 200] when running in the PC-98's line-doubled 640×200 mode. A VRAM Y coordinate also always implies an added offset for vertical scrolling.
During all of the code reconstruction, these types can only have a documenting purpose. Turning them into anything more than just typedefs to int, in order to define conversion operators between them, simply won't recompile into identical binaries. Modding and porting projects, however, now have a nice foundation for doing just that, and can entirely lift coordinate system transformations into the type system, without having to proofread all the meaningless int declarations themselves.


So, what was left in terms of memory references? EX-Alice's fire waves were our final unknown entity that can collide with the player. Decently implemented, with little to say about them.

That left the bomb animation structures as the one big remaining PI blocker. They started out nice and simple in TH04, with a small 6-byte star animation structure used for both Reimu and Marisa. TH05, however, gave each character her own animation… and what the hell is going on with Reimu's blue stars there? Nope, not going to figure this out on ASM level.

A decompilation first required some more bomb-related variables to be named though. Since this was part of a generic RE push, it made sense to do this in all 5 games… which then led to nice PI gains in anything but TH05. :tannedcirno: Most notably, we now got the "pulling all items to player" flag in TH04 and TH05, which is actually separate from bombing. The obvious cheat mod is left as an exercise to the reader.


So, TH05 bomb animations. Just like the 📝 custom entity types of this game, all 4 characters share the same memory, with the superficially same 10-byte structure.
But let's just look at the very first field. Seen from a low level, it's a simple struct { int x, y; } pos, storing the current position of the character-specific bomb animation entity. But all 4 characters use this field differently:

Therefore, I decompiled it as 4 separate structures once again, bundled into an union of arrays.

As for Reimu… yup, that's some pointer arithmetic straight out of Jigoku* for setting and updating the positions of the falling star trails. :zunpet: While that certainly required several comments to wrap my head around the current array positions, the one "bug" in all this arithmetic luckily has no effect on the game.
There is a small glitch with the growing circles, though. They are spawned at the end of the loop, with their position taken from the star pointer… but after that pointer has already been incremented. On the last loop iteration, this leads to an out-of-bounds structure access, with the position taken from some unknown EX-Alice data, which is 0 during most of the game. If you look at the animation, you can easily spot these bugged circles, consistently growing from the top-left corner (0, 0) of the playfield:


After all that, there was barely enough remaining time to filter out and label the final few memory references. But now, TH05's MAIN.EXE is technically position-independent! 🎉 -Tom- is going to work on a pretty extensive demo of this unprecedented level of efficient Touhou game modding. For a more impactful effect of both the 100% PI mark and that demo, I'll be delaying the push covering the remaining false positives in that binary until that demo is done. I've accumulated a pretty huge backlog of minor maintenance issues by now…
Next up though: The first part of the long-awaited build system improvements. I've finally come up with a way of sanely accelerating the 32-bit build part on most setups you could possibly want to build ReC98 on, without making the building experience worse for the other few setups.

📝 Posted:
🚚 Summary of:
P0109
Commits:
dcf4e2c...2c7d86b
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:

Back to TH05! Thanks to the good funding situation, I can strike a nice balance between getting TH05 position-independent as quickly as possible, and properly reverse-engineering some missing important parts of the game. Once 100% PI will get the attention of modders, the code will then be in better shape, and a bit more usable than if I just rushed that goal.

By now, I'm apparently also pretty spoiled by TH01's immediate decompilability, after having worked on that game for so long. Reverse-engineering in ASM land is pretty annoying, after all, since it basically boils down to meticulously editing a piece of ASM into something I can confidently call "reverse-engineered". Most of the time, simply decompiling that piece of code would take just a little bit longer, but be massively more useful. So, I immediately tried decompiling with TH05… and it just worked, at every place I tried!? Whatever the issue was that made 📝 segment splitting so annoying at my first attempt, I seem to have completely solved it in the meantime. 🤷 So yeah, backers can now request pretty much any part of TH04 and TH05 to be decompiled immediately, with no additional segment splitting cost.

(Protip for everyone interested in starting their own ReC project: Just declare one segment per function, right from the start, then group them together to restore the original code segmentation…)


Except that TH05 then just throws more of its infamous micro-optimized and undecompilable ASM at you. 🙄 This push covered the function that adjusts the bullet group template based on rank and the selected difficulty, called every time such a group is configured. Which, just like pretty much all of TH05's bullet spawning code, is one of those undecompilable functions. If C allowed labels of other functions as goto targets, it might have been decompilable into something useful to modders… maybe. But like this, there's no point in even trying.

This is such a terrible idea from a software architecture point of view, I can't even. Because now, you suddenly have to mirror your C++ declarations in ASM land, and keep them in sync with each other. I'm always happy when I get to delete an ASM declaration from the codebase once I've decompiled all the instances where it was referenced. But for TH05, we now have to keep those declarations around forever. 😕 And all that for a performance increase you probably couldn't even measure. Oh well, pulling off Galaxy Brain-level ASM optimizations is kind of fun if you don't have portability plans… I guess?

If I started a full fangame mod of a PC-98 Touhou game, I'd base it on TH04 rather than TH05, and backport selected features from TH05 as needed. Just because it was released later doesn't make it better, and this is by far not the only one of ZUN's micro-optimizations that just went way too far.

Dropping down to ASM also makes it easier to introduce weird quirks. Decompiled, one of TH05's tuning conditions for stack groups on Easy Mode would look something like:

case BP_STACK:
	// […]
	if(spread_angle_delta >= 2) {
		stack_bullet_count--;
	}

The fields of the bullet group template aren't typically reset when setting up a new group. So, spread_angle_delta in the context of a stack group effectively refers to "the delta angle of the last spread group that was fired before this stack – whenever that was". uth05win also spotted this quirk, considered it a bug, and wrote fanfiction by changing spread_angle_delta to stack_bullet_count.
As usual for functions that occur in more than one game, I also decompiled the TH04 bullet group tuning function, and it's perfectly sane, with no such quirks.


In the more PI-focused parts of this push, we got the TH05-exclusive smooth boss movement functions, for flying randomly or towards a given point. Pretty unspectacular for the most part, but we've got yet another uth05win inconsistency in the latter one. Once the Y coordinate gets close enough to the target point, it actually speeds up twice as much as the X coordinate would, whereas uth05win used the same speedup factors for both. This might make uth05win a couple of frames slower in all boss fights from Stage 3 on. Hard to measure though – and boss movement partly depends on RNG anyway.


Next up: Shinki's background animations – which are actually the single biggest source of position dependence left in TH05.

📝 Posted:
🚚 Summary of:
P0067, P0068, P0069
Commits:
e55a48b...ebb30ce, ebb30ce...2ac00d4, e0d0dcd...0f18dbc
💰 Funded by:
Splashman, Yanga, [Anonymous]
🏷 Tags:

Now that's more like the speed I was expecting! After a few more unused functions for palette fading and rectangle blitting, we've reached the big line drawing functions. And the biggest one among them, drawing a straight line at any angle between two points using Bresenham's algorithm, actually happens to be the single longest function present in more than one binary in all of PC-98 Touhou, and #23 on the list of individual longest functions.

And it technically has a ZUN bug! If you pass a point outside the (0, 0) - (639, 399) screen range, the function will calculate a new point at the edge of the screen, so that the resulting line will retain the angle intended by the points given. Except that it does so by calculating the line slope using an integer division rather than a floating-point one :zunpet: Doesn't seem like it actually causes any weirdly skewed lines to be drawn in-game, though; that case is only hit in the Mima boss fight, which draws a few lines with a bottom coordinate of 400 rather than the maximum of 399. It might also cause the wrong background pixels to be restored during parts of the YuugenMagan fight, leading to flickering sprites, but seriously, that's an issue everywhere you look in this game.

Together with the rendering-text-to-VRAM function we've mostly already known from TH02, this pushed the total RE percentage well over 20%, and almost doubled the TH01 RE percentage, all within three pushes. And comparatively, it went really smoothly, to the point (ha) where I even had enough time left to also include the single-point functions that come next in that code segment. Since about half of the remaining functions in OP.EXE are present in more than just itself, I'll be able to at least keep up this speed until OP.EXE hits the 70% RE mark. That is, as long as the backers' priorities continue to be generic RE or "giving some love to TH01"… we don't have a precedent for TH01's actual game code yet.

And that's all the TH01 progress funded for January! Next up, we actually do have a focus on TH03's game and scoring mechanics… or at least the foundation for that.