⮜ Blog

⮜ List of tags

Showing all posts tagged
,
and

📝 Posted:
🚚 Summary of:
P0258, P0259, P0260, P0261
Commits:
5876755...e8a0b3e, e8a0b3e...dfaa3c6, dfaa3c6...ed9ee93, ed9ee93...ae2fc28
💰 Funded by:
Blue Bolt, [Anonymous], Yanga, Splashman
🏷 Tags:

And we're back to PC-98 Touhou for a brief interruption of the ongoing Shuusou Gyoku Linux port. Let's clear some of the Touhou-related progress from the backlog, and use the unconstrained nature of these contributions to prepare the 📝 upcoming non-ASCII translations commissioned by Touhou Patch Center. The current budget won't cover all of my ambitions, but it would at least be nice if all text in these games was feasibly translatable by the time I officially start working on that project.

At a little over 3 pushes, it might be surprising to see that this took longer than the 📝 TH03/TH04/TH05 cutscene system. It's obvious that TH02 started out with a different system for in-game dialog, but while TH04 and TH05 look identical on the surface, they only actually share 30% of their dialog code. So this felt more like decompiling 2.4 distinct systems, as opposed to one identical base with tons of game-specific differences on top.

The table of contents was pretty popular last time around, so let's have another one:

  1. Overview of TH04's dialog system
  2. Changes introduced in TH05
  3. Command reference for the TH04 and TH05 systems
  4. Overview of TH02's dialog system
  5. TH02's face portrait images
  6. Bugs during TH02's dialog box slide-in animation
  7. Bugs and quirks in Mima's defeat dialog (might be lore-relevant)
  8. TH03 win messages

Let's start with the ones from TH04 and TH05, since they are not that broken. For TH04, ZUN started out by copy-pasting the cutscene system, causing the result to inherit many of the caveats I already described in the cutscene blog post:

Then, however, he greatly simplified the system. Mainly, this was done by moving text rendering from the PC-98 graphics chip to the text chip, which avoids the need for any text-related unblitting code, but ZUN also added a bunch of smaller changes:

While it would seem that TH05 has no issues with ASCII 0x20 spaces, the text as a whole is still blindly processed two bytes at a time, and any commands can only appear at even byte positions within a line. I dimmed the VRAM pixels to 25% of their original brightness to make the text easier to read.
The same text backported to TH04, additionally demonstrating how that game's dialog system inherited the whitespace skipping behavior of TH03's cutscene system. Just like there, ASCII 0x20 spaces only work at odd byte positions because the game treats them as the trailing byte of a full-width Shift-JIS codepoint. I don't know how large the budget for the upcoming non-ASCII translations will be, but I'm going to fix this even in the very basic fully static variant. I dimmed the VRAM pixels to 25% of their original brightness to make the text easier to read.
Demonstrating the lack of automatic line or box breaks in TH05's dialog systemDemonstrating the lack of automatic line or box breaks in TH04's dialog system, in addition to its lack of support for ASCII 0x20 spaces carried over from TH03's cutscene system

TH05 then moved from TH04's plaintext scripts to the binary .TX2 format while removing all the unused commands copy-pasted from the cutscene system. Except for a single additional command intended to clear a text box, TH05's dialog system only supports a strict subset of the features of TH04's system.
This change also introduced the following differences compared to TH04:

Writing the 0x02 byte to text RAM results in an SX character, which is simply the PC-98 font ROM's glyph for that Shift-JIS codepoint.
Also note how each face change is now preceded by two frames of delay.
No problem in TH04. Note how the dialog also runs a bit faster – TH04 only adds the aforementioned one frame of delay to each face change, and has fewer two-byte chunks of text to display overall.

For modding these files, you probably want to use TXDEF from -Tom-'s MysticTK. It decodes these files into a text representation, and its encoder then takes care of the character-specific byte offsets in the 10-byte header. This text representation simplifies the format a lot by avoiding all corner cases and landmines you'd experience during hex-editing – most notably by interpreting the box-starting 0x0D as a command to show text that takes a string parameter, avoiding the broken calls to script commands in the middle of text. However, you'd still have to manually ensure an even number of bytes on every line of text.

In the entry function of TH05's dialog loop, we also encounter the hack that is responsible for properly handling 📝 ZUN's hidden Extra Stage replay. Since the dialog loop doesn't access the replay inputs but still requires key presses to advance through the boxes, ZUN chose to just skip the dialog altogether in the specific case of the Extra Stage replay being active, and replicated all sprite management commands from the dialog script by just hardcoding them.
And you know what? Not only do I not mind this hack, but I would have preferred it over the actual dialog system! The aforementioned sprite management commands effectively boil down to manual memory management, deallocating all stage enemy and midboss sprites and thus ensuring that the boss sprites end up at specific master.lib sprite IDs (patnums). The hardcoded boss rendering function then expects these sprites to be available at these exact IDs… which means that the otherwise hardcoded bosses can't render properly without the dialog script running before them. :zunpet:
There is absolutely no excuse for the game to burden dialog scripts with this functionality. Sure, delayed deallocation would allow them to blit stage-specific sprites, but the original games don't do that; probably because none of the two games feature an unblitting command. And even if they did, it would have still been cleaner to expose the boss-specific sprite setup as a single script command that can then also be called from game code if the script didn't do so. Commands like these just are a recipe for crashes, especially with parsers that expect fullwidth Shift-JIS text and where misaligned ASCII text can easily cause these commands to be skipped.

But then again, it does make for funny screenshot material if you accidentally the deallocation and then see bosses being turned into stage enemies:

TH04's dialog before the Stage 4 Marisa fight without deallocating the stage sprites inside the script, causing Marisa to be turned into one of the stage enemiesTH04's dialog before the Stage 6 Yuuka fight without deallocating the stage sprites inside the script, causing Yuuka to be turned into two different cels of the same stage enemyTH05's dialog before the Louise fight without deallocating the stage sprites inside the script, causing Louise to be turned into one of the ice enemies from TH05's Stage 2TH05's dialog before the Louise fight without deallocating the stage sprites inside the script, causing Mai and Yuki to be turned into a windmill and fairy/demon enemy, respectively
Some of the more amusing consequences of not calling the sprite-deallocating :th04: \c /  :th05: 0x04 command inside a dialog script.
In the case of 4️⃣, the game then even crashes on this frame at the end of the dialog, in a way that resembles the infamous 📝 TH04 crash before Stage 5 Yuuka if no EMS driver is loaded. Both the stage- and boss-specific BFNT sprites are loaded into memory at this point, leaving no room for the 256×256-pixel background image on the size-limited master.lib heap.

With all the general details out of the way, here's the command reference:

:th04: :th05:
0
1
0x00
0x01
Selects either the player character (0) or the boss (1) as the currently speaking character, and moves the cursor to the beginning of the text box. In TH04, this command also directly starts the new dialog box, which is probably why it's not prefixed with a \ as it only makes sense outside of text. TH05 requires a separate 0x0D command to do the same.
\=1 0x02 0x!! Replaces the face portrait of the currently active speaking character with image #1 within her .CD2 file.
\=255 0x02 0xFF Removes the face portrait from the currently active text box.
\l,filename 0x03 filename 0x00 Calls master.lib's super_entry_bfnt() function, which loads sprites from a BFNT file to consecutive IDs starting at the current patnum write cursor.
\c 0x04 Deallocates all stage-specific BFNT sprites (i.e., stage enemies and midbosses), freeing up conventional RAM for the boss sprites and ensuring that master.lib's patnum write cursor ends up at :th04: 128 / :th05: 180.
In TH05's Extra Stage, this command also replaces 📝 the sprites loaded from MIKO16.BFT with the ones from ST06_16.BFT.
\d Deallocates all face portrait images.
The game automatically does this at the end of each dialog sequence. However, ZUN wanted to load Stage 6 Yuuka's 76 KiB of additional animations inside the script via \l, and would have once again run up against the master.lib heap size limit without that extra free memory.
\m,filename 0x05 filename 0x00 Stops the currently playing BGM, loads a new one from the given file, and starts playback.
\m$ 0x05 $ 0x00 Stops the currently playing BGM.
Note that TH05 interprets $ as a null-terminated filename as well.
\m* Restarts playback of the currently loaded BGM from the beginning.
\b0,0,0 0x06 0x!!!! 0x!!!! 0x!! Blits the master.lib patnum with the ID indicated by the third parameter to the current VRAM page at the top-left screen position indicated by the first two parameters.
\e0 Plays the sound effect with the given ID.
\t100 Sets palette brightness via master.lib's palette_settone() to any value from 0 (fully black) to 200 (fully white). 100 corresponds to the palette's original colors.
\fo1
\fi1
Calls master.lib's palette_black_out() or palette_black_in() to play a hardware palette fade animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
\wo1
\wi1
0x09 0x!!
0x0A 0x!!
Calls master.lib's palette_white_out() or palette_white_in() to play a hardware palette fade animation from or to white, spending roughly 1 frame on each of the 16 fade steps.
The TH05 version of 0x09 also clears the text in both boxes before the animation.
\n 0x0B Starts a new line by resetting the X coordinate of the TRAM cursor to the left edge of the text area and incrementing the Y coordinate.
The new line will always be the next one below the last one that was properly started, regardless of whether the text previously wrapped to the next TRAM row at the edge of the screen.
\g8 Plays a blocking 8-frame screen shake animation. Copy-pasted from the cutscene parser, but actually used right at the end of the dialog shown before TH04's Bad Ending.
\ga0 0x0C 0x!! Shows the gaiji with the given ID from 0 to 255 at the current cursor position, ignoring the per-glyph delay.
\k0 Waits 0 frames (0 = forever) for any key to be pressed before continuing script execution.
0x0D Starts a new dialog box with the previously selected speaker. All text until the next 0xFF command will appear on screen.
Inside dialogs, this is a no-op.
0x0E Takes the current dialog cursor as the top-left corner of a 240×48-pixel rectangle, and replaces all text RAM characters within that rectangle with whitespace.
This is only used to clear the player character's text box before Shinki's final いくよ‼ box. Shinki has two consecutive text boxes in all 4 scripts here, and ZUN probably wanted to clear the otherwise blue text to imply a dramatic pause before Shinki's final sentence. Nice touch.
(You could, however, also use it after a box-ending 0xFF command to mess with text RAM in general.)
\# Quits the currently running loop. This returns from either the text loop to the command loop, or it ends the dialog sequence by returning from the command loop back to gameplay. If this stage of the game later starts another dialog sequence, it will start at the next script byte.
\$ Like \#, but first waits for any key to be pressed.
0xFF Behaves like TH04's \$ in the text loop, and like \# in the command loop. Hence, it's not possible in TH05 to automatically end a text box and advance to the next one without waiting for a key press.
Unused commands are in gray.

At the end of the day, you might criticize the system for how its landmines make it annoying to mod in ASCII text, but it all works and does what it's supposed to. ZUN could have written the cleanest single and central Shift-JIS iterator that properly chunks a byte buffer into halfwidth and fullwidth codepoints, and I'd still be throwing it out for the upcoming non-ASCII translations in favor of something that either also supports UTF-8 or performs dictionary lookups with a full box of text.
The only actual bug can be found in the input detection, which once again doesn't correctly handle the infamous key up/key down scancode quirk of PC-98 keyboards. All it takes is one wrongly placed input polling call, and suddenly you have to think about how the update cycle behind the PC-98 keyboard state bytes might cause the game to run the regular 2-frame delay for a single 2-byte chunk of text before it shows the full text of a box after all… But even this bug is highly theoretical and could probably only be observed very, very rarely, and exclusively on real hardware.


The same can't be said about TH02 though, but more on that later. Let's first take a look at its data, which started out much simpler in that game. The STAGE?.TXT files contain just raw Shift-JIS text with no trace of commands or structure. Turning on the whitespace display feature in your editor reveals how the dialog system even assumes a fixed byte length for each box: 36 bytes per line which will appear on screen, followed by 4 bytes of padding, which the original files conveniently use to visually split the lines via a CR/LF newline sequence. Make sure to disable trimming of trailing whitespace in your editor to not ruin the file when modding the text… :onricdennat:

靈夢:あんた、まだ名前も聞いてないの··
······に覚えられないわよ。・・・・・··
里香:あたいは、里香よ。覚えときなさ··
・・・い。・・・・・・················
Two boxes from TH02's STAGE5.TXT with visualized whitespace. These also demonstrate how the CR/LF newlines only make up 2 of the 4 padding bytes, and require each line to be padded with two more bytes; you could not use these trailing spaces for actual text. Also note how the exquisite mixture of fullwidth and halfwidth spaces demands the text to be viewed with only the most metrically consistent monospace fonts to preserve the intended alignment. 🍷 It appears quite misaligned on my phone.

Consequently, everything else is hardcoded – every effect shown between text boxes, the face portrait shown for each box, and even how many boxes are part of each dialog sequence. Which means that the source code now contains a long hardcoded list of face IDs for most of the text boxes in the game, with the rest being part of the dedicated hardcoded dialog scripts for 2/3 of the game's stages.
Without the restriction to a fixed set of scripting commands, TH02 naturally gravitated to having the most varied dialog sequences of all PC-98 Touhou games. This flexibility certainly facilitated Mima's grand entrance animation in Stage 4, or the different lines in Stage 4 and 5 depending on whether you already used a continue or not. Marisa's post-boss dialog even inserts the number of continues into the text itself – by, you guessed it, writing to hardcoded byte offsets inside the dialog text before printing it to the screen. :godzun: But once again, I have nothing to criticize here – not even the fact that the alternate dialog scripts have to mutate the "box cursor" to jump to the intended boxes within the file. I know that some people in my audience like VMs, but I would have considered it more bloated if ZUN had implemented a full-blown scripting language just to handle all these special cases.


Another unique aspect of TH02 is the way it stores its face portraits, which are infamous for how hard they are to find in the original data files. These sprites are actually map tiles, stored in MIKO_K.MPN, and drawn using the same functions used to blit the regular map tiles to the 📝 tile source area in VRAM. We can only guess why ZUN chose this one out of the three graphics formats he used in TH02:

TH02's MIKO_K.PTN, arranged into a 16×16-tile layout that reveals how these tiles are combined into face portraits.
MPNDEF from -Tom-'s MysticTK conveniently uses this exact layout in its .BMP output. Earlier MPNDEF versions crashed when converting this file as its 256 tiles led to an 8-bit overflow bug, so make sure you've updated to the current version from the end of October 2023 if you want to convert this file yourself. The format stores the 4 bitplanes of each 16×16 tile in order, so good luck finding a different planar image viewer that would support both such a tiled layout and a custom palette. Sometimes, a weird internal format is the best type of obfuscation. :tannedcirno:
TH02's MIKO_K.PTN with the 16×16 tile grid overlaid

And since you're certainly wondering about all these black tiles at the edges: Yes, these are not only part of the file and pad it from the required 240×192 pixels to 256×256, but also kept in memory during a stage, wasting 9.5 KiB of conventional RAM. That's 172 seconds of potential input replay data, just for those people who might still think that we need EMS for replays.


Alright, we've got the text, we've got the faces, let's slide in the box and display it all on screen. Apparently though, we also have to blit the player and option sprites using raw, low-level master.lib function calls in the process? :thonk: This can't be right, especially because ZUN always blits the option sprite associated with the Reimu-A shot type, regardless of which one the player actually selected. And if you keep moving above the box area before the dialog starts, you get to see exactly how wrong this is:

Let's look closer at Reimu's sprite during the slide-in animation, and in the two frames before:

Zoomed-in area around Reimu's sprite from frame 35 of the video aboveZoomed-in area around Reimu's sprite from frame 36 of the video aboveZoomed-in area around Reimu's sprite from frame 37 of the video above

This one image shows off no less than 4 bugs:

  1. ZUN blits the stationary player sprite here, regardless of whether the player was previously moving left or right. This is a nice way of indicating that Reimu stops moving once the dialog starts, but maybe ZUN should have unblitted the old sprite so that the new one wouldn't have appeared on top. The game only unblits the 384×64 pixels covered by the dialog box on every frame of the slide-in animation, so Reimu would only appear correctly if her sprite happened to be entirely located within that area.
  2. All sprites are shifted up by 1 pixel in frame 2️⃣. This one is not a bug in the dialog system, but in the main game loop. The game runs the relevant actions in the following order:

    1. Invalidate any map tiles covered by entities
    2. Redraw invalidated tiles
    3. Decrement the Y coordinate at the top of VRAM according to the scroll speed
    4. Update and render all game entities
    5. Scroll in new tiles as necessary according to the scroll speed, and report whether the game has scrolled one pixel past the end of the map
    6. If that happened, pretend it didn't by incrementing the value calculated in #3 for all further frames and skipping to #8.
    7. Issue a GDC SCROLL command to reflect the line calculated in #3 on the display
    8. Wait for VSync
    9. Flip VRAM pages
    10. Start boss if we're past the end of the map

    The problem here: Once the dialog starts, the game has already rendered an entire new frame, with all sprites being offset by a new Y scroll offset, without adjusting the graphics GDC's scroll registers to compensate. Hence, the Y position in 3️⃣ is the correct one, and the whole existence of frame 2️⃣ is a bug in itself. (Well… OK, probably a quirk because speedrunning exists, and it would be pretty annoying to synchronize any video regression tests of the future TH02 Anniversary Edition if it renders one fewer frame in the middle of a stage.)

  3. ZUN blits the option sprites to their position from frame 1️⃣. This brings us back to 📝 TH02's special way of retaining the previous and current position in a two-element array, indexed with a VRAM page ID. Normally, this would be equivalent to using dedicated prev and cur structure fields and you'd just index it with the back page for every rendering call. But if you then decide to go single-buffered for dialogs and render them onto the front page instead… :zunpet:
    Note that fixing bug #2 would not cancel out this one – the sprites would then simply be rendered to their position in the frame before 1️⃣.

  4. And of course, the fixed option sprite ID also counts as a bug.

As for the boxes themselves, it's yet another loop that prints 2-byte chunks of Shift-JIS text at an even slower fixed interval of 3 frames. In an interesting quirk though, ZUN assumes that every box starts with the name of the speaking character in its first two fullwidth Shift-JIS characters, followed by a fullwidth colon. These 6 bytes are displayed immediately at the start of every box, without the usual delay. The resulting alignment looks rather janky with Genjii, whose single right-padded kanji looks quite awkward with the fullwidth space between the name and the colon. Kind of makes you wonder why ZUN just didn't spell out his proper name, 玄爺, instead, but I get the stylistic difference.
In Stage 4, the two-kanji assumption then breaks with Marisa's three-kanji name, which causes the full-width colon to be printed as the first delayed character in each of her boxes:


That's all the issues and quirks in the system itself. The scripts themselves don't leave much room for bugs as they basically just loop over the hardcoded face ID array at this level… until we reach the end of the game. Previously, the slide-in animation could simply use the tile invalidation and re-rendering system to unblit the box on each frame, which also explained why Reimu had to be separately rendered on top. But this no longer works with a custom-rendered boss background, and so the game just chooses to flood-fill the area with graphics chip color #0:

Then again, transferring pixels from the back page would be just as wrong as they lag one frame behind. No way around capturing these 384×64 pixels to main memory here… Oh well, this flood-fill at least adds even more legibility on top of the already half-transparent text box. A property that the following dialog sequence unfortunately lacks…

For Mima's final defeat dialog though, ZUN chose to not even show the box. He might have realized the issue by that point, or simply preferred the more dramatic effect this had on the lines. The resulting issues, however, might even have ramifications for such un-technical things as lore and character dynamics. :zunpet: As it turns out, the code for this dialog sequence does in fact render Mima's smiling face for all boxes?! You only don't see it in the original game because it's rendered to the other VRAM page that remains invisible during the dialog sequence:

Caution, flashing lights.

Here's how I interpret the situation:

So, the future TH02 Anniversary Edition will fix the bug by showing the back page, but retain the quirk by rewriting the dialog code to not blit the face.


And with that, we've secured all in-game dialog for the upcoming non-ASCII translations! The remaining 2/3 of the last push made for a good occasion to also decompile the small amount of code related to TH03's win messages, stored in the @0?TX.TXT files. Similar to TH02's dialog format, these files are also split into fixed-size blocks of 3×60 bytes. But this time, TH03 loads all 60 bytes of a line, including the CR/LF line breaking codepoints in the original files, into the statically allocated buffer that it renders from. These control characters are then only filtered to whitespace by ZUN's graph_putsa_fx() function. If you remove the line breaks, you get to use the full 60 bytes on every line.
The final commits went to the MIKO.CFG loading and saving functions used in TH04's and TH05's OP.EXE, as well as TH04's game startup code to finally catch up with 📝 TH05's counterpart from over 3 years ago. This brought us right in front of the main menu rendering code in both TH04 and TH05, which is identical in both games and will be tackled in the next PC-98 Touhou delivery.

Next up, though: Returning to Shuusou Gyoku, and adding support for SC-88Pro recordings as BGM. Which may or may not come with a slight controversy…

📝 Posted:
🚚 Summary of:
P0168, P0169
Commits:
c2de6ab...8b046da, 8b046da...479b766
💰 Funded by:
rosenrose, Blue Bolt
🏷 Tags:

EMS memory! The infamous stopgap measure between the 640 KiB ("ought to be enough for everyone") of conventional memory offered by DOS from the very beginning, and the later XMS standard for accessing all the rest of memory up to 4 GiB in the x86 Protected Mode. With an optionally active EMS driver, TH04 and TH05 will make use of EMS memory to preload a bunch of situational .CDG images at the beginning of MAIN.EXE:

  1. The "eye catch" game title image, shown while stages are loaded
  2. The character-specific background image, shown while bombing
  3. The player character dialog portraits
  4. TH05 additionally stores the boss portraits there, preloading them at the beginning of each stage. (TH04 instead keeps them in conventional memory during the entire stage.)

Once these images are needed, they can then be copied into conventional memory and accessed as usual.

Uh… wait, copied? It certainly would have been possible to map EMS memory to a regular 16-bit Real Mode segment for direct access, bank-switching out rarely used system or peripheral memory in exchange for the EMS data. However, master.lib doesn't expose this functionality, and only provides functions for copying data from EMS to regular memory and vice versa.
But even that still makes EMS an excellent fit for the large image files it's used for, as it's possible to directly copy their pixel data from EMS to VRAM. (Yes, I tried!) Well… would, because ZUN doesn't do that either, and always naively copies the images to newly allocated conventional memory first. In essence, this dumbs down EMS into just another layer of the memory hierarchy, inserted between conventional memory and disk: Not quite as slow as disk, but still requiring that memcpy() to retrieve the data. Most importantly though: Using EMS in this way does not increase the total amount of memory simultaneously accessible to the game. After all, some other data will have to be freed from conventional memory to make room for the newly loaded data.


The most idiomatic way to define the game-specific layout of the EMS area would be either a struct or an enum. Unfortunately, the total size of all these images exceeds the range of a 16-bit value, and Turbo C++ 4.0J supports neither 32-bit enums (which are silently degraded to 16-bit) nor 32-bit structs (which simply don't compile). That still leaves raw compile-time constants though, you only have to manually define the offset to each image in terms of the size of its predecessor. But instead of doing that, ZUN just placed each image at a nice round decimal offset, each slightly larger than the actual memory required by the previous image, just to make sure that everything fits. :tannedcirno: This results not only in quite a bit of unnecessary padding, but also in technically the single biggest amount of "wasted" memory in PC-98 Touhou: Out of the 180,000 (TH04) and 320,000 (TH05) EMS bytes requested, the game only uses 135,552 (TH04) and 175,904 (TH05) bytes. But hey, it's EMS, so who cares, right? Out of all the opportunities to take shortcuts during development, this is among the most acceptable ones. Any actual PC-98 model that could run these two games comes with plenty of memory for this to not turn into an actual issue.

On to the EMS-using functions themselves, which are the definition of "cross-cutting concerns". Most of these have a fallback path for the non-EMS case, and keep the loaded .CDG images in memory if they are immediately needed. Which totally makes sense, but also makes it difficult to find names that reflect all the global state changed by these functions. Every one of these is also just called from a single place, so inlining them would have saved me a lot of naming and documentation trouble there.
The TH04 version of the EMS allocation code was actually displayed on ZUN's monitor in the 2010 MAG・ネット documentary; WindowsTiger already transcribed the low-quality video image in 2019. By 2015 ReC98 standards, I would have just run with that, but the current project goal is to write better code than ZUN, so I didn't. 😛 We sure ain't going to use magic numbers for EMS offsets.

The dialog init and exit code then is completely different in both games, yet equally cross-cutting. TH05 goes even further in saving conventional memory, loading each individual player or boss portrait into a single .CDG slot immediately before blitting it to VRAM and freeing the pixel data again. People who play TH05 without an active EMS driver are surely going to enjoy the hard drive access lag between each portrait change… :godzun: TH04, on the other hand, also abuses the dialog exit function to preload the Mugetsu defeat / Gengetsu entrance and Gengetsu defeat portraits, using a static variable to track how often the function has been called during the Extra Stage… who needs function parameters anyway, right? :zunpet:

This is also the function in which TH04 infamously crashes after the Stage 5 pre-boss dialog when playing with Reimu and without any active EMS driver. That crash is what motivated this look into the games' EMS usage… but the code looks perfectly fine? Oh well, guess the crash is not related to EMS then. Next u–

OK, of course I can't leave it like that. Everyone is expecting a fix now, and I still got half of a push left over after decompiling the regular EMS code. Also, I've now RE'd every function that could possibly be involved in the crash, and this is very likely to be the last time I'll be looking at them.


Turns out that the bug has little to do with EMS, and everything to do with ZUN limiting the amount of conventional RAM that TH04's MAIN.EXE is allowed to use, and then slightly miscalculating this upper limit. Playing Stage 5 with Reimu is the most asset-intensive configuration in this game, due to the combination of

The star image used in TH04's Stage 5.
The star image used in TH04's Stage 5.

Remove any single one of the above points, and this crash would have never occurred. But with all of them combined, the total amount of memory consumed by TH04's MAIN.EXE just barely exceeds ZUN's limit of 320,000 bytes, by no more than 3,840 bytes, the size of the star image.

But wait: As we established earlier, EMS does nothing to reduce the amount of conventional memory used by the game. In fact, if you disabled TH04's EMS handling, you'd still get this crash even if you are running an EMS driver and loaded DOS into the High Memory Area to free up as much conventional RAM as possible. How can EMS then prevent this crash in the first place?

The answer: It's only because ZUN's usage of EMS bypasses the need to load the cached images back out of the XOR-encrypted 東方幻想.郷 packfile. Leaving aside the general stupidity of any game data file encryption*, master.lib's decryption implementation is also quite wasteful: It uses a separate buffer that receives fixed-size chunks of the file, before decrypting every individual byte and copying it to its intended destination buffer. That really resembles the typical slowness of a C fread() implementation more than it does the highly optimized ASM code that master.lib purports to be… And how large is this well-hidden decryption buffer? 4 KiB. :onricdennat:

So, looking back at the game, here is what happens once the Stage 5 pre-battle dialog ends:

  1. Reimu's bomb background image, which was previously freed to make space for her dialog portraits, has to be loaded back into conventional memory from disk
  2. BB0.CDG is found inside the 東方幻想.郷 packfile
  3. file_ropen() ends up allocating a 4 KiB buffer for the encrypted packfile data, getting us the decisive ~4 KiB closer to the memory limit
  4. The .CDG loader tries to allocate 52 608 contiguous bytes for the pixel data of Reimu's bomb image
  5. This would exceed the memory limit, so hmem_allocbyte() fails and returns a nullptr
  6. ZUN doesn't check for this case (as usual)
  7. The pixel data is loaded to address 0000:0000, overwriting the Interrupt Vector Table and whatever comes after
  8. The game crashes
The final frame rendered before the TH04 Stage 5 Reimu No-EMS crash
The final frame rendered by a crashing TH04.

The 4 KiB encryption buffer would only be freed by the corresponding file_close() call, which of course never happens because the game crashes before it gets there. At one point, I really did suspect the cause to be some kind of memory leak or fragmentation inside master.lib, which would have been quite delightful to fix.
Instead, the most straightforward fix here is to bump up that memory limit by at least 4 KiB. Certainly easier than squeezing in a cdg_free() call for the star image before the pre-boss dialog without breaking position dependence.

Or, even better, let's nuke all these memory limits from orbit because they make little sense to begin with, and fix every other potential out-of-memory crash that modders would encounter when adding enough data to any of the 4 games that impose such limits on themselves. Unless you want to launch other binaries (which need to do their own memory allocations) after launching the game, there's really no reason to restrict the amount of memory available to a DOS process. Heck, whenever DOS creates a new one, it assigns all remaining free memory by default anyway.
Removing the memory limits also removes one of ZUN's few error checks, which end up quitting the game if there isn't at least a given maximum amount of conventional RAM available. While it might be tempting to reserve enough memory at the beginning of execution and then never check any allocation for a potential failure, that's exactly where something like TH04's crash comes from.
This game is also still running on DOS, where such an initial allocation failure is very unlikely to happen – no one fills close to half of conventional RAM with TSRs and then tries running one of these games. It might have been useful to detect systems with less than 640 KiB of actual, physical RAM, but none of the PC-98 models with that little amount of memory are fast enough to run these games to begin with. How ironic… a place where ZUN actually added an error check, and then it's mostly pointless.

Here's an archive that contains both fix variants, just in case. These were compiled from the th04_noems_crash_fix and mem_assign_all branches, and contain as little code changes as possible.
Edit (2022-04-18): For TH04, you probably want to download the 📝 community choice fix package instead, which contains this fix along with other workarounds for the Divide error crashes. 2021-11-29-Memory-limit-fixes.zip

So yeah, quite a complex bug, leaving no time for the TH03 scorefile format research after all. Next up: Raising prices.

📝 Posted:
🚚 Summary of:
P0085
Commits:
110d6dd...54ee99b
💰 Funded by:
-Tom-
🏷 Tags:

Wait, PI for FUUIN.EXE is mainly blocked by the high score menu? That one should really be properly decompiled in a separate RE push, since it's also present in largely identical form in REIIDEN.EXE… but I currently lack the explicit funding to do that.

And as it turns out, I shouldn't really capture any of the existing generic RE contributions for it either. Back in 2018 when I ran the crowdfunding on the Touhou Patch Center Discord server, I said that generic RE contributions would never go towards TH01. No one was interested in that game back then, and as it's significantly different from all the other games, it made sense to only cover it if explicitly requested.
As Touhou Patch Center still remains one of the biggest supporters and advertisers for ReC98, someone recently believed that this rule was still in effect, despite not being mentioned anywhere on this website.

Fast forward to today, and TH01 has become the single most supported game lately, with plenty of incomplete pushes still open to be completed. Reverse-engineering it has proven to be quite efficient, yielding lots of completion percentage points per push. This, I suppose, is exactly what backers that don't give any specific priorities are mainly interested in. Therefore, I will allocate future partial contributions to TH01, whenever it makes sense.

So, instead of rushing TH01 PI, let's wait for Ember2528's April subscription, and get the 25% total RE milestone with some TH05 PI progress instead. This one primarily focused on the gather circles (spirals…?), the third-last missing entity type in TH05. These are rendered using the same 8×8 pellet sprite introduced in TH02… except that the actual pellets received a darkened bottom part in TH04 . Which, in turn, is actually rendered quite efficiently – the games first render the top white part of all pellets, followed by the bottom gray part of all pellets. The PC-98 GRCG is used throughout the process, doing its typical job of accelerating monochrome blitting, and by arranging the rendering like this, only two GRCG color changes are required to draw any number of pellets. I guess that makes it quite a worthwhile optimization? Don't ask me for specific performance numbers or even saved cycles, though :onricdennat:

Next up, one more TH05 PI push!

📝 Posted:
🚚 Summary of:
P0060
Commits:
29385dd...73f5ae7
💰 Funded by:
Touhou Patch Center
🏷 Tags:

So, where to start? Well, TH04 bullets are hard, so let's procrastinate start with TH03 instead :tannedcirno: The 📝 sprite display functions are the obvious blocker for any structure describing a sprite, and therefore most meaningful PI gains in that game… and I actually did manage to fit a decompilation of those three functions into exactly the amount of time that the Touhou Patch Center community votes alloted to TH03 reverse-engineering!

And a pretty amazing one at that. The original code was so obviously written in ASM and was just barely decompilable by exclusively using register pseudovariables and a bit of goto, but I was able to abstract most of that away, not least thanks to a few helpful optimization properties of Turbo C++… seriously, I can't stop marveling at this ancient compiler. The end result is both readable, clear, and dare I say portable?! To anyone interested in porting TH03, take a look. How painful would it be to port that away from 16-bit x86?

However, this push is also a typical example that the RE/PI priorities can only control what I look at, and the outcome can actually differ greatly. Even though the priorities were 65% RE and 35% PI, the progress outcome was +0.13% RE and +1.35% PI. But hey, we've got one more push with a focus on TH03 PI, so maybe that one will include more RE than PI, and then everything will end up just as ordered? :onricdennat: