⮜ Blog

⮜ List of tags

Showing all posts tagged
,
and

📝 Posted:
🚚 Summary of:
P0240, P0241
Commits:
be69ab6...40c900f, 40c900f...08352a5
💰 Funded by:
JonathKane, Blue Bolt, [Anonymous]
🏷 Tags:

Well, well. My original plan was to ship the first step of Shuusou Gyoku OpenGL support on the next day after this delivery. But unfortunately, the complications just kept piling up, to a point where the required solutions definitely blow the current budget for that goal. I'm currently sitting on over 70 commits that would take at least 5 pushes to deliver as a meaningful release, and all of that is just rearchitecting work, preparing the game for a not too Windows-specific OpenGL backend in the first place. I haven't even written a single line of OpenGL yet… 🥲
This shifts the intended Big Release Month™ to June after all. Now I know that the next round of Shuusou Gyoku features should better start with the SC-88Pro recordings, which are much more likely to get done within their current budget. At least I've already completed the configuration versioning system required for that goal, which leaves only the actual audio part.

So, TH04 position independence. Thanks to a bit of funding for stage dialogue RE, non-ASCII translations will soon become viable, which finally presents a reason to push TH04 to 100% position independence after 📝 TH05 had been there for almost 3 years. I haven't heard back from Touhou Patch Center about how much they want to be involved in funding this goal, if at all, but maybe other backers are interested as well.
And sure, it would be entirely possible to implement non-ASCII translations in a way that retains the layout of the original binaries and can be easily compared at a binary level, in case we consider translations to be a critical piece of infrastructure. This wouldn't even just be an exercise in needless perfectionism, and we only have to look to Shuusou Gyoku to realize why: Players expected that my builds were compatible with existing SpoilerAL SSG files, which was something I hadn't even considered the need for. I mean, the game is open-source 📝 and I made it easy to build. You can just fork the code, implement all the practice features you want in a much more efficient way, and I'd probably even merge your code into my builds then?
But I get it – recompiling the game yields just yet another build that can't be easily compared to the original release. A cheat table is much more trustworthy in giving players the confidence that they're still practicing the same original game. And given the current priorities of my backers, it'll still take a while for me to implement proof by replay validation, which will ultimately free every part of the community from depending on the original builds of both Seihou and PC-98 Touhou.

However, such an implementation within the original binary layout would significantly drive up the budget of non-ASCII translations, and I sure don't want to constantly maintain this layout during development. So, let's chase TH04 position independence like it's 2020, and quickly cover a larger amount of PI-relevant structures and functions at a shallow level. The only parts I decompiled for now contain calculations whose intent can't be clearly communicated in ASM. Hitbox visualizations or other more in-depth research would have to wait until I get to the proper decompilation of these features.
But even this shallow work left us with a large amount of TH04-exclusive code that had its worst parts RE'd and could be decompiled fairly quickly. If you want to see big TH04 finalization% gains, general TH04 progress would be a very good investment.


The first push went to the often-mentioned stage-specific custom entities that share a single statically allocated buffer. Back in 2020, I 📝 wrongly claimed that these were a TH05 innovation, but the system actually originated in TH04. Both games use a 26-byte structure, but TH04 only allocates a 32-element array rather than TH05's 64-element one. The conclusions from back then still apply, but I also kept wondering why these games used a static array for these entities to begin with. You know what they call an area of memory that you can cleanly repurpose for things? That's right, a heap! :tannedcirno: And absolutely no one would mind one additional heap allocation at the start of a stage, next to the ones for all the sprites and portraits.
However, we are still running in Real Mode with segmented memory. Accessing anything outside a common data segment involves modifying segment registers, which has a nonzero CPU cycle cost, and Turbo C++ 4.0J is terrible at optimizing away the respective instructions. Does this matter? Probably not, but you don't take "risks" like these if you're in a permanent micro-optimization mindset… :godzun:

In TH04, this system is used for:

  1. Kurumi's symmetric bullet spawn rays, fired from her hands towards the left and right edges of the playfield. These are rather infamous for being the last thing you see before 📝 the Divide Error crash that can happen in ZUN's original build. Capped to 6 entities.

  2. The 4 📝 bits used in Marisa's Stage 4 boss fight. Coincidentally also related to the rare Divide Error crash in that fight.

  3. Stage 4 Reimu's spinning orbs. Note how the game uses two different sets of sprites just to have two different outline colors. This was probably better than messing with the palette, which can easily cause unintended effects if you only have 16 colors to work with. Heck, I have an entire blog post tag just to highlight these cases. Capped to the full 32 entities.

  4. The chasing cross bullets, seen in Phase 14 of the same Stage 6 Yuuka fight. Featuring some smart sprite work, making use of point symmetry to achieve a fluid animation in just 4 frames. This is good-code in sprite form. Capped to 31 entities, because the 32nd custom entity during this fight is defined to be…

  5. The single purple pulsating and shrinking safety circle, seen in Phase 4 of the same fight. The most interesting aspect here is actually still related to the cross bullets, whose spawn function is wrongly limited to 32 entities and could theoretically overwrite this circle. :zunpet: This is strictly landmine territory though:

    • Yuuka never uses these bullets and the safety circle simultaneously
    • She never spawns more than 24 cross bullets
    • All cross bullets are fast enough to have left the screen by the time Yuuka restarts the corresponding subpattern
    • The cross bullets spawn at Yuuka's center position, and assign its Q12.4 coordinates to structure fields that the safety circle interprets as raw pixels. The game does try to render the circle afterward, but since Yuuka's static position during this phase is nowhere near a valid pixel coordinate, it is immediately clipped.

  6. The flashing lines seen in Phase 5 of the Gengetsu fight, telegraphing the slightly random bullet columns.

    The spawn column lines in the TH05 Gengetsu fight, in the first of their two flashing colors.The spawn column lines in the TH05 Gengetsu fight, in the second of their two flashing colors.

These structures only took 1 push to reverse-engineer rather than the 2 I needed for their TH05 counterparts because they are much simpler in this game. The "structure" for Gengetsu's lines literally uses just a single X position, with the remaining 24 bytes being basically padding. The only minor bug I found on this shallow level concerns Marisa's bits, which are clipped at the right and bottom edges of the playfield 16 pixels earlier than you would expect:


The remaining push went to a bunch of smaller structures and functions:


To top off the second push, we've got the vertically scrolling checkerboard background during the Stage 6 Yuuka fight, made up of 32×32 squares. This one deserves a special highlight just because of its needless complexity. You'd think that even a performant implementation would be pretty simple:

  1. Set the GRCG to TDW mode
  2. Set the GRCG tile to one of the two square colors
  3. Start with Y as the current scroll offset, and X as some indicator of which color is currently shown at the start of each row of squares
  4. Iterate over all lines of the playfield, filling in all pixels that should be displayed in the current color, skipping over the other ones
  5. Count down Y for each line drawn
  6. If Y reaches 0, reset it to 32 and flip X
  7. At the bottom of the playfield, change the GRCG tile to the other color, and repeat with the initial value of X flipped

The most important aspect of this algorithm is how it reduces GRCG state changes to a minimum, avoiding the costly port I/O that we've identified time and time again as one of the main bottlenecks in TH01. With just 2 state variables and 3 loops, the resulting code isn't that complex either. A naive implementation that just drew the squares from top to bottom in a single pass would barely be simpler, but much slower: By changing the GRCG tile on every color, such an implementation would burn a low 5-digit number of CPU cycles per frame for the 12×11.5-square checkerboard used in the game.
And indeed, ZUN retained all important aspects of this algorithm… but still implemented it all in ASM, with a ridiculous layer of x86 segment arithmetic on top? :zunpet: Which blows up the complexity to 4 state variables, 5 nested loops, and a bunch of constants in unusual units. I'm not sure what this code is supposed to optimize for, especially with that rather questionable register allocation that nevertheless leaves one of the general-purpose registers unused. :onricdennat: Fortunately, the function was still decompilable without too many code generation hacks, and retains the 5 nested loops in all their goto-connected glory. If you want to add a checkerboard to your next PC-98 demo, just stick to the algorithm I gave above.
(Using a single XOR for flipping the starting X offset between 32 and 64 pixels is pretty nice though, I have to give him that.)


This makes for a good occasion to talk about the third and final GRCG mode, completing the series I started with my previous coverage of the 📝 RMW and 📝 TCR modes. The TDW (Tile Data Write) mode is the simplest of the three and just writes the 8×1 GRCG tile into VRAM as-is, without applying any alpha bitmask. This makes it perfect for clearing rectangular areas of pixels – or even all of VRAM by doing a single memset():

// Set up the GRCG in TDW mode.
outportb(0x7C, 0x80);

// Fill the tile register with color #7 (0111 in binary).
outportb(0x7E, 0xFF); // Plane 0: (B): (********)
outportb(0x7E, 0xFF); // Plane 1: (R): (********)
outportb(0x7E, 0xFF); // Plane 2: (G): (********)
outportb(0x7E, 0x00); // Plane 3: (E): (        )

// Set the 32 pixels at the top-left corner of VRAM to the exact contents of
// the tile register, effectively repeating the tile 4 times. In TDW mode, the
// GRCG ignores the CPU-supplied operand, so we might as well just pass the
// contents of a register with the intended width. This eliminates useless load
// instructions in the compiled assembly, and even sort of signals to readers
// of this code that we do not care about the source value.
*reinterpret_cast<uint32_t far *>(MK_FP(0xA800, 0)) = _EAX;

// Fill the entirety of VRAM with the GRCG tile. A simple C one-liner that will
// probably compile into a single `REP STOS` instruction. Unfortunately, Turbo
// C++ 4.0J only ever generates the 16-bit `REP STOSW` here, even when using
// the `__memset__` intrinsic and when compiling in 386 mode. When targeting
// that CPU and above, you'd ideally want `REP STOSD` for twice the speed.
memset(MK_FP(0xA800, 0), _AL, ((640 / 8) * 400));

However, this might make you wonder why TDW mode is even necessary. If it's functionally equivalent to RMW mode with a CPU-supplied bitmask made up entirely of 1 bits (i.e., 0xFF, 0xFFFF, or 0xFFFFFFFF), what's the point? The difference lies in the hardware implementation: If all you need to do is write tile data to VRAM, you don't need the read and modify parts of RMW mode which require additional processing time. The PC-9801 Programmers' Bible claims a speedup of almost 2× when using TDW mode over equivalent operations in RMW mode.
And that's the only performance claim I found, because none of these old PC-98 hardware and programming books did any benchmarks. Then again, it's not too interesting of a question to benchmark either, as the byte-aligned nature of TDW blitting severely limits its use in a game engine anyway. Sure, maybe it makes sense to temporarily switch from RMW to TDW mode if you've identified a large rectangular and byte-aligned section within a sprite that could be blitted without a bitmask? But the necessary identification work likely nullifies the performance gained from TDW mode, I'd say. In any case, that's pretty deep micro-optimization territory. Just use TDW mode for the few cases it's good at, and stick to RMW mode for the rest.

So is this all that can be said about the GRCG? Not quite, because there are 4 bits I haven't talked about yet…


And now we're just 5.37% away from 100% position independence for TH04! From this point, another 2 pushes should be enough to reach this goal. It might not look like we're that close based on the current estimate, but a big chunk of the remaining numbers are false positives from the player shot control functions. Since we've got a very special deadline to hit, I'm going to cobble these two pushes together from the two current general subscriptions and the rest of the backlog. But you can, of course, still invest in this goal to allow the existing contributions to go to something else.
… Well, if the store was actually open. :thonk: So I'd better continue with a quick task to free up some capacity sooner rather than later. Next up, therefore: Back to TH02, and its item and player systems. Shouldn't take that long, I'm not expecting any surprises there. (Yeah, I know, famous last words…)

📝 Posted:
🚚 Summary of:
P0238, P0239
Commits:
(Website) 4698397...edf2926, c5e51e6...P0239
💰 Funded by:
Ember2528
🏷 Tags:

:stripe: Stripe is now properly integrated into this website as an alternative to PayPal! Now, you can also financially support the project if PayPal doesn't work for you, or if you prefer using a provider out of Stripe's greater variety. It's unfortunate that I had to ship this integration while the store is still sold out, but the Shuusou Gyoku OpenGL backend has turned out way too complicated to be finished next to these two pushes within a month. It will take quite a while until the store reopens and you all can start using Stripe, so I'll just link back to this blog post when it happens.

Integrating Stripe wasn't the simplest task in the world either. At first, the Checkout API seems pretty friendly to developers: The entire payment flow is handled on the backend, in the server language of your choice, and requires no frontend JavaScript except for the UI feedback code you choose to write. Your backend API endpoint initiates the Stripe Checkout session, answers with a redirect to Stripe, and Stripe then sends a redirect back to your server if the customer completed the payment. Superficially, this server-based approach seems much more GDPR-friendly than PayPal, because there are no remote scripts to obtain consent for. In reality though, Stripe shares much more potential personal data about your credit card or bank account with a merchant, compared to PayPal's almost bare minimum of necessary data. :thonk:
It's also rather annoying how the backend has to persist the order form information throughout the entire Checkout session, because it would otherwise be lost if the server restarts while a customer is still busy entering data into Stripe's Checkout form. Compare that to the PayPal JavaScript SDK, which only POSTs back to your server after the customer completed a payment. In Stripe's case, more JavaScript actually only makes the integration harder: If you trigger the initial payment HTTP request from JavaScript, you will have to improvise a bit to avoid the CORS error when redirecting away to a different domain.

But sure, it's all not too bad… for regular orders at least. With subscriptions, however, things get much worse. Unlike PayPal, Stripe kind of wants to stay out of the way of the payment process as much as possible, and just be a wrapper around its supported payment methods. So if customers aren't really meant to register with Stripe, how would they cancel their subscriptions? :thonk:
Answer: Through the… merchant? Which I quite dislike in principle, because why should you have to trust me to actually cancel your subscription after you requested it? It also means that I probably should add some sort of UI for self-canceling a Stripe subscription, ideally without adding full-blown user accounts. Not that this solves the underlying trust issue, but it's more convenient than contacting me via email or, worse, going through your bank somehow. Here is how my solution works:

I might have gone a bit overboard with the crypto there, but I liked the idea of not storing any of the Stripe session IDs in the server database. It's not like that makes the system more complex anyway, and it's nice to have a separate confirmation step before canceling a subscription.

But even that wasn't everything I had to keep in mind here. Once you switch from test to production mode for the final tests, you'll notice that certain SEPA-based payment providers take their sweet time to process and activate new subscriptions. The Checkout session object even informs you about that, by including a payment status field. Which initially seems just like another field that could indicate hacking attempts, but treating it as such and rejecting any unpaid session can also reject perfectly valid subscriptions. I don't want all this control… 🥲
Instead, all I can do in this case is to tell you about it. In my test, the Stripe dashboard said that it might take days or even weeks for the initial subscription transaction to be confirmed. In such a case, the respective fraction of the cap will unfortunately need to remain red for that entire time.

And that was 1½ pushes just to replicate the basic functionality of a simple PayPal integration with the simplest type of Stripe integration. On the architectural site, all the necessary refactoring work made me finally upgrade my frontend code to TypeScript at least, using the amazing esbuild to handle transpilation inside the server binary. Let's see how long it will now take for me to upgrade to SCSS…


With the new payment options, it makes sense to go for another slight price increase, from up to per push. The amount of taxes I have to pay on this income is slowly becoming significant, and the store has been selling out almost immediately for the last few months anyway. If demand remains at the current level or even increases, I plan to gradually go up to by the end of the year.
📝 As 📝 usual, I'm going to deliver existing orders in the backlog at the value they were originally purchased at. Due to the way the cap has to be calculated, these contributions now appear to have increased in value by a rather awkward 13.33%.


This left ½ of a push for some more work on the TH01 Anniversary Edition. Unfortunately, this was too little time for the grand issue of removing byte-aligned rendering of bigger sprites, which will need some additional blitting performance research. Instead, I went for a bunch of smaller bugfixes:

The final point, however, raised the question of what we're now going to do about 📝 a certain issue in the 地獄/Jigoku Bad Ending. ZUN's original expensive way of switching the accessed VRAM page was the main reason behind the lag frames on slower PC-98 systems, and search-replacing the respective function calls would immediately get us to the optimized version shown in that blog post. But is this something we actually want? If we wanted to retain the lag, we could surely preserve that function just for this one instance…
The discovery of this issue predates the clear distinction between bloat, quirks, and bugs, so it makes sense to first classify what this issue even is. The distinction comes all down to observability, which I defined as changes to rendered frames between explicitly defined frame boundaries. That alone would be enough to categorize any cause behind lag frames as bloat, but it can't hurt to be more explicit here.

Therefore, I now officially judge observability in terms of an infinitely fast PC-98 that can instantly render everything between two explicitly defined frames, and will never add additional lag frames. If we plan to port the games to faster architectures that aren't bottlenecked by disappointing blitter chips, this is the only reasonable assumption to make, in my opinion: The minimum system requirements in the games' README files are minimums, after all, not recommendations. Chasing the exact frame drop behavior that ZUN must have experienced during the time he developed these games can only be a guessing game at best, because how can we know which PC-98 model ZUN actually developed the games on? There might even be more than one model, especially when it comes to TH01 which had been in development for at least two years before ZUN first sold it. It's also not like any current PC-98 emulator even claims to emulate the specific timing of any existing model, and I sure hope that nobody expects me to import a bunch of bulky obsolete hardware just to count dropped frames.

That leaves the tearing, where it's much more obvious how it's a bug. On an infinitely fast PC-98, the ドカーン frame would never be visible, and thus falls into the same category as the 📝 two unused animations in the Sariel fight. With only a single unconditional 2-frame delay inside the animation loop, it becomes clear that ZUN intended both frames of the animation to be displayed for 2 frames each:

No tearing, and 34 frames in total for the first of the two instances of this animation.

:th01: TH01 Anniversary Edition, version P0239 2023-05-01-th01-anniv.zip

Next up: Taking the oldest still undelivered push and working towards TH04 position independence in preparation for multilingual translations. The Shuusou Gyoku OpenGL backend shouldn't take that much longer either, so I should have lots of stuff coming up in May afterward.

📝 Posted:
🚚 Summary of:
P0223, P0224, P0225
Commits:
139746c...371292d, 371292d...8118e61, 8118e61...4f85326
💰 Funded by:
rosenrose, Blue Bolt, Splashman, -Tom-, Yanga, Enderwolf, 32th System
🏷 Tags:

More than three months without any reverse-engineering progress! It's been way too long. Coincidentally, we're at least back with a surprising 1.25% of overall RE, achieved within just 3 pushes. The ending script system is not only more or less the same in TH04 and TH05, but actually originated in TH03, where it's also used for the cutscenes before stages 8 and 9. This means that it was one of the final pieces of code shared between three of the four remaining games, which I got to decompile at roughly 3× the usual speed, or ⅓ of the price.
The only other bargains of this nature remain in OP.EXE. The Music Room is largely equivalent in all three remaining games as well, and the sound device selection, ZUN Soft logo screens, and main/option menus are the same in TH04 and TH05. A lot of that code is in the "technically RE'd but not yet decompiled" ASM form though, so it would shift Finalized% more significantly than RE%. Therefore, make sure to order the new Finalization option rather than Reverse-engineering if you want to make number go up.

  1. General overview
  2. Game-specific differences
  3. Command reference
  4. Thoughts about translation support

So, cutscenes. On the surface, the .TXT files look simple enough: You directly write the text that should appear on the screen into the file without any special markup, and add commands to define visuals, music, and other effects at any place within the script. Let's start with the basics of how text is rendered, which are the same in all three games:


Superficially, the list of game-specific differences doesn't look too long, and can be summarized in a rather short table:

:th03: TH03 :th04: TH04 :th05: TH05
Script size limit 65536 bytes (heap-allocated) 8192 bytes (statically allocated)
Delay between every 2 bytes of text 1 frame by default, customizable via \v None
Text delay when holding ESC Varying speed-up factor None
Visibility of new text Immediately typed onto the screen Rendered onto invisible VRAM page, faded in on wait commands
Visibility of old text Unblitted when starting a new box Left on screen until crossfaded out with new text
Key binding for advancing the script Any key ⏎ Return, Shot, or ESC
Animation while waiting for an advance key None ⏎⃣, past right edge of current row
Inexplicable delays None 1 frame before changing pictures and after rendering new text boxes
Additional delay per interpreter loop 614.4 µs None 614.4 µs
The 614.4 µs correspond to the necessary delay for working around the repeated key up and key down events sent by PC-98 keyboards when holding down a key. While the absence of this delay significantly speeds up TH04's interpreter, it's also the reason why that game will stop recognizing a held ESC key after a few seconds, requiring you to press it again.

It's when you get into the implementation that the combined three systems reveal themselves as a giant mess, with more like 56 differences between the games. :zunpet: Every single new weird line of code opened up another can of worms, which ultimately made all of this end up with 24 pieces of bloat and 14 bugs. The worst of these should be quite interesting for the general PC-98 homebrew developers among my audience:


That brings us to the individual script commands… and yes, I'm going to document every single one of them. Some of their interactions and edge cases are not clear at all from just looking at the code.

Almost all commands are preceded by… well, a 0x5C lead byte. :thonk: Which raises the question of whether we should document it as an ASCII-encoded \ backslash, or a Shift-JIS-encoded ¥ yen sign. From a gaijin perspective, it seems obvious that it's a backslash, as it's consistently displayed as one in most of the editors you would actually use nowadays. But interestingly, iconv -f shift-jis -t utf-8 does convert any 0x5C lead bytes to actual ¥ U+00A5 YEN SIGN code points :tannedcirno:.
Ultimately, the distinction comes down to the font. There are fonts that still render 0x5C as ¥, but mainly do so out of an obvious concern about backward compatibility to JIS X 0201, where this mapping originated. Unsurprisingly, this group includes MS Gothic/Mincho, the old Japanese fonts from Windows 3.1, but even Meiryo and Yu Gothic/Mincho, Microsoft's modern Japanese fonts. Meanwhile, pretty much every other modern font, and freely licensed ones in particular, render this code point as \, even if you set your editor to Shift-JIS. And while ZUN most definitely saw it as a ¥, documenting this code point as \ is less ambiguous in the long run. It can only possibly correspond to one specific code point in either Shift-JIS or UTF-8, and will remain correct even if we later mod the cutscene system to support full-blown Unicode.

Now we've only got to clarify the parameter syntax, and then we can look at the big table of commands:

:th03: :th04: :th05: \@ Clears both VRAM pages by filling them with VRAM color 0.
🐞 In TH03 and TH04, this command does not update the internal text area background used for unblitting. This bug effectively restricts usage of this command to either the beginning of a script (before the first background image is shown) or its end (after no more new text boxes are started). See the image below for an example of using it anywhere else.
:th03: :th04: :th05: \b2 Sets the font weight to a value between 0 (raw font ROM glyphs) to 3 (very thicc). Specifying any other value has no effect.
:th04: :th05: 🐞 In TH04 and TH05, \b3 leads to glitched pixels when rendering half-width glyphs due to a bug in the newly micro-optimized ASM version of 📝 graph_putsa_fx(); see the image below for an example.
In these games, the parameter also directly corresponds to the graph_putsa_fx() effect function, removing the sanity check that was present in TH03. In exchange, you can also access the four dissolve masks for the bold font (\b2) by specifying a parameter between 4 (fewest pixels) to 7 (most pixels). Demo video below.
:th03: :th04: :th05: \c15 Changes the text color to VRAM color 15.
:th05: \c=,15 Adds a color map entry: If is the first code point inside the name area on a new line, the text color is automatically set to 15. Up to 8 such entries can be registered before overflowing the statically allocated buffer.
🐞 The comma is assumed to be present even if the color parameter is omitted.
:th03: :th04: :th05: \e0 Plays the sound effect with the given ID.
:th03: :th04: :th05: \f (no-op)
:th03: :th04: :th05: \fi1
\fo1
Calls master.lib's palette_black_in() or palette_black_out() to play a hardware palette fade animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
:th03: :th04: :th05: \fm1 Fades out BGM volume via PMD's AH=02h interrupt call, in a non-blocking way. The fade speed can range from 1 (slowest) to 127 (fastest).
Values from 128 to 255 technically correspond to AH=02h's fade-in feature, which can't be used from cutscene scripts because it requires BGM volume to first be lowered via AH=19h, and there is no command to do that.
:th03: :th04: :th05: \g8 Plays a blocking 8-frame screen shake animation.
:th03: :th04: \ga0 Shows the gaiji with the given ID from 0 to 255 at the current cursor position. Even in TH03, gaiji always ignore the text delay interval configured with \v.
:th05: @3 TH05's replacement for the \ga command from TH03 and TH04. The default ID of 3 corresponds to the ♫ gaiji. Not to be confused with \@, which starts with a backslash, unlike this command.
:th05: @h Shows the 🎔 gaiji.
:th05: @t Shows the 💦 gaiji.
:th05: @! Shows the ! gaiji.
:th05: @? Shows the ? gaiji.
:th05: @!! Shows the ‼ gaiji.
:th05: @!? Shows the ⁉ gaiji.
:th03: :th04: :th05: \k0 Waits 0 frames (0 = forever) for an advance key to be pressed before continuing script execution. Before waiting, TH05 crossfades in any new text that was previously rendered to the invisible VRAM page…
🐞 …but TH04 doesn't, leaving the text invisible during the wait time. As a workaround, \vp1 can be used before \k to immediately display that text without a fade-in animation.
:th03: :th04: :th05: \m$ Stops the currently playing BGM.
:th03: :th04: :th05: \m* Restarts playback of the currently loaded BGM from the beginning.
:th03: :th04: :th05: \m,filename Stops the currently playing BGM, loads a new one from the given file, and starts playback.
:th03: :th04: :th05: \n Starts a new line at the leftmost X coordinate of the box, i.e., the start of the name area. This is how scripts can "change" the name of the currently speaking character, or use the entire 480×64 pixels without being restricted to the non-name area.
Note that automatic line breaks already move the cursor into a new line. Using this command at the "end" of a line with the maximum number of 30 full-width glyphs would therefore start a second new line and leave the previously started line empty.
If this command moved the cursor into the 5th line of a box, \s is executed afterward, with any of \n's parameters passed to \s.
:th03: :th04: :th05: \p (no-op)
:th03: :th04: :th05: \p- Deallocates the loaded .PI image.
:th03: :th04: :th05: \p,filename Loads the .PI image with the given file into the single .PI slot available to cutscenes. TH04 and TH05 automatically deallocate any previous image, 🐞 TH03 would leak memory without a manual prior call to \p-.
:th03: :th04: :th05: \pp Sets the hardware palette to the one of the loaded .PI image.
:th03: :th04: :th05: \p@ Sets the loaded .PI image as the full-screen 640×400 background image and overwrites both VRAM pages with its pixels, retaining the current hardware palette.
:th03: :th04: :th05: \p= Runs \pp followed by \p@.
:th03: :th04: :th05: \s0
\s-
Ends a text box and starts a new one. Fades in any text rendered to the invisible VRAM page, then waits 0 frames (0 = forever) for an advance key to be pressed. Afterward, the new text box is started with the cursor moved to the top-left corner of the name area.
\s- skips the wait time and starts the new box immediately.
:th03: :th04: :th05: \t100 Sets palette brightness via master.lib's palette_settone() to any value from 0 (fully black) to 200 (fully white). 100 corresponds to the palette's original colors. Preceded by a 1-frame delay unless ESC is held.
:th03: \v1 Sets the number of frames to wait between every 2 bytes of rendered text.
:th04: Sets the number of frames to spend on each of the 4 fade steps when crossfading between old and new text. The game-specific default value is also used before the first use of this command.
:th05: \v2
:th03: :th04: :th05: \vp0 Shows VRAM page 0. Completely useless in TH03 (this game always synchronizes both VRAM pages at a command boundary), only of dubious use in TH04 (for working around a bug in \k), and the games always return to their intended shown page before every blitting operation anyway. A debloated mod of this game would just remove this command, as it exposes an implementation detail that script authors should not need to worry about. None of the original scripts use it anyway.
:th03: :th04: :th05: \w64
  • \w and \wk wait for the given number of frames
  • \wm and \wmk wait until PMD has played back the current BGM for the total number of measures, including loops, given in the first parameter, and fall back on calling \w and \wk with the second parameter as the frame number if BGM is disabled.
    🐞 Neither PMD nor MMD reset the internal measure when stopping playback. If no BGM is playing and the previous BGM hasn't been played back for at least the given number of measures, this command will deadlock.
Since both TH04 and TH05 fade in any new text from the invisible VRAM page, these commands can be used to simulate TH03's typing effect in those games. Demo video below.
Contrary to \k and \s, specifying 0 frames would simply remove any frame delay instead of waiting forever.
The TH03-exclusive k variants allow the delay to be interrupted if ⏎ Return or Shot are held down. TH04 and TH05 recognize the k as well, but removed its functionality.
All of these commands have no effect if ESC is held.
\wm64,64
:th03: \wk64
\wmk64,64
:th03: :th04: :th05: \wi1
\wo1
Calls master.lib's palette_white_in() or palette_white_out() to play a hardware palette fade animation from or to white, spending roughly 1 frame on each of the 16 fade steps.
:th03: :th04: :th05: \=4 Immediately displays the given quarter of the loaded .PI image in the picture area, with no fade effect. Any value ≥ 4 resets the picture area to black.
:th03: :th04: :th05: \==4,1 Crossfades the picture area between its current content and quarter #4 of the loaded .PI image, spending 1 frame on each of the 4 fade steps unless ESC is held. Any value ≥ 4 is replaced with quarter #0.
:th03: :th04: :th05: \$ Stops script execution. Must be called at the end of each file; otherwise, execution continues into whatever lies after the script buffer in memory.
TH05 automatically deallocates the loaded .PI image, TH03 and TH04 require a separate manual call to \p- to not leak its memory.
Bold values signify the default if the parameter is omitted; \c is therefore equivalent to \c15.
Using the \@ command in the middle of a TH03 or TH04 cutscene script
The \@ bug. Yes, the ¥ is fake. It was easier to GIMP it than to reword the sentences so that the backslashes landed on the second byte of a 2-byte half-width character pair. :onricdennat:
Cutscene font weights in TH03Cutscene font weights in TH05, demonstrating the <code>\b3</code> bug that also affects TH04Cutscene font weights in TH03, rendered at a hypothetical unaligned X positionCutscene font weights in TH05, rendered at a hypothetical unaligned X position
The font weights and effects available through \b, including the glitch with \b3 in TH04 and TH05.
Font weight 3 is technically not rendered correctly in TH03 either; if you compare 1️⃣ with 4️⃣, you notice a single missing column of pixels at the left side of each glyph, which would extend into the previous VRAM byte. Ironically, the TH04/TH05 version is more correct in this regard: For half-width glyphs, it preserves any further pixel columns generated by the weight functions in the high byte of the 16-dot glyph variable. Unlike TH03, which still cuts them off when rendering text to unaligned X positions (3️⃣), TH04 and TH05 do bit-rotate them towards their correct place (4️⃣). It's only at byte-aligned X positions (2️⃣) where they remain at their internally calculated place, and appear on screen as these glitched pixel columns, 15 pixels away from the glyph they belong to. It's easy to blame bugs like these on micro-optimized ASM code, but in this instance, you really can't argue against it if the original C++ version was equally incorrect.
Combining \b and s- into a partial dissolve animation. The speed can be controlled with \v.
Simulating TH03's typing effect in TH04 and TH05 via \w. Even prettier in TH05 where we also get an additional fade animation after the box ends.

So yeah, that's the cutscene system. I'm dreading the moment I will have to deal with the other command interpreter in these games, i.e., the stage enemy system. Luckily, that one is completely disconnected from any other system, so I won't have to deal with it until we're close to finishing MAIN.EXE… that is, unless someone requests it before. And it won't involve text encodings or unblitting…


The cutscene system got me thinking in greater detail about how I would implement translations, being one of the main dependencies behind them. This goal has been on the order form for a while and could soon be implemented for these cutscenes, with 100% PI being right around the corner for the TH03 and TH04 cutscene executables.
Once we're there, the "Virgin" old-school way of static translation patching for Latin-script languages could be implemented fairly quickly:

  1. Establish basic UTF-8 parsing for less painful manual editing of the source files
  2. Procedurally generate glyphs for the few required additional letters based on existing font ROM glyphs. For example, we'd generate ä by painting two short lines on top of the font ROM's a glyph, or generate ¿ by vertically flipping the question mark. This way, the text retains a consistent look regardless of whether the translated game is run with an NEC or EPSON font ROM, or the hideous abomination that Neko Project II auto-generates if you don't provide either.
  3. (Optional) Change automatic line breaks to work on a per-word basis, rather than per-glyph

That's it – script editing and distribution would be handled by your local translation group. It might seem as if this would also work for Greek and Cyrillic scripts due to their presence in the PC-98 font ROM, but I'm not sure if I want to attempt procedurally shrinking these glyphs from 16×16 to 8×16… For any more thorough solution, we'd need to go for a more "Chad" kind of full-blown translation support:

  1. Implement text subdivisions at a sensible granularity while retaining automatic line and box breaks
  2. Compile translatable text into a Japanese→target language dictionary (I'm too old to develop any further translation systems that would overwrite modded source text with translations of the original text)
  3. Implement a custom Unicode font system (glyphs would be taken from GNU Unifont unless translators provide a different 8×16 font for their language)
  4. Combine the text compiler with the font compiler to only store needed glyphs as part of the translation's font file (dealing with a multi-MB font file would be rather ugly in a Real Mode game)
  5. Write a simple install/update/patch stacking tool that supports both .HDI and raw-file DOSBox-X scenarios (it's different enough from thcrap to warrant a separate tool – each patch stack would be statically compiled into a single package file in the game's directory)
  6. Add a nice language selection option to the main menu
  7. (Optional) Support proportional fonts

Which sounds more like a separate project to be commissioned from Touhou Patch Center's Open Collective funds, separate from the ReC98 cap. This way, we can make sure that the feature is completely implemented, and I can talk with every interested translator to make sure that their language works.
It's still cheaper overall to do this on PC-98 than to first port the games to a modern system and then translate them. On the other hand, most of the tasks in the Chad variant (3, 4, 5, and half of 2) purely deal with the difficulty of getting arbitrary Unicode characters to work natively in a PC-98 DOS game at all, and would be either unnecessary or trivial if we had already ported the game. Depending on where the patrons' interests lie, it may not be worth it. So let's see what all of you think about which way we should go, or whether it's worth doing at all. (Edit (2022-12-01): With Splashman's order towards the stage dialogue system, we've pretty much confirmed that it is.) Maybe we want to meet in the middle – using e.g. procedural glyph generation for dynamic translations to keep text rendering consistent with the rest of the PC-98 system, and just not support non-Latin-script languages in the beginning? In any case, I've added both options to the order form.
Edit (2023-07-28): Touhou Patch Center has agreed to fund a basic feature set somewhere between the Virgin and Chad level. Check the 📝 dedicated announcement blog post for more details and ideas, and to find out how you can support this goal!


Surprisingly, there was still a bit of RE work left in the third push after all of this, which I filled with some small rendering boilerplate. Since I also wanted to include TH02's playfield overlay functions, 1/15 of that last push went towards getting a TH02-exclusive function out of the way, which also ended up including that game in this delivery. :tannedcirno:
The other small function pointed out how TH05's Stage 5 midboss pops into the playfield quite suddenly, since its clipping test thinks it's only 32 pixels tall rather than 64:

Good chance that the pop-in might have been intended.
Edit (2023-06-30): Actually, it's a 📝 systematic consequence of ZUN having to work around the lack of clipping in master.lib's sprite functions.
There's even another quirk here: The white flash during its first frame is actually carried over from the previous midboss, which the game still considers as actively getting hit by the player shot that defeated it. It's the regular boilerplate code for rendering a midboss that resets the responsible damage variable, and that code doesn't run during the defeat explosion animation.

Next up: Staying with TH05 and looking at more of the pattern code of its boss fights. Given the remaining TH05 budget, it makes the most sense to continue in in-game order, with Sara and the Stage 2 midboss. If more money comes in towards this goal, I could alternatively go for the Mai & Yuki fight and immediately develop a pretty fix for the cheeto storage glitch. Also, there's a rather intricate pull request for direct ZMBV decoding on the website that I've still got to review…

📝 Posted:
🚚 Summary of:
P0105, P0106, P0107, P0108
Commits:
3622eb6...11b776b, 11b776b...1f1829d, 1f1829d...1650241, 1650241...dcf4e2c
💰 Funded by:
Yanga
🏷 Tags:

And indeed, I got to end my vacation with a lot of image format and blitting code, covering the final two formats, .GRC and .BOS. .GRC was nothing noteworthy – one function for loading, one function for byte-aligned blitting, and one function for freeing memory. That's it – not even a unblitting function for this one. .BOS, on the other hand…

…has no generic (read: single/sane) implementation, and is only implemented as methods of some boss entity class. And then again for Sariel's dress and wand animations, and then again for Reimu's animations, both of which weren't even part of these 4 pushes. Looking forward to decompiling essentially the same algorithms all over again… And that's how TH01 became the largest and most bloated PC-98 Touhou game. So yeah, still not done with image formats, even at 44% RE.

This means I also had to reverse-engineer that "boss entity" class… yeah, what else to call something a boss can have multiple of, that may or may not be part of a larger boss sprite, may or may not be animated, and that may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this class. Since renaming all these variables in ASM land is tedious anyway, I went the extra mile and directly defined separate, meaningful names for the entities of all bosses. These also now document the natural order in which the bosses will ultimately be decompiled. So, unless a backer requests anything else, this order will be:

  1. Konngara
  2. Sariel
  3. Elis
  4. Kikuri
  5. SinGyoku
  6. (code for regular card-flipping stages)
  7. Mima
  8. YuugenMagan

As everyone kind of expects from TH01 by now, this class reveals yet another… um, unique and quirky piece of code architecture. In addition to the position and hitbox members you'd expect from a class like this, the game also stores the .BOS metadata – width, height, animation frame count, and 📝 bitplane pointer slot number – inside the same class. But if each of those still corresponds to one individual on-screen sprite, how can YuugenMagan have 5 eye sprites, or Kikuri have more than one soul and tear sprite? By duplicating that metadata, of course! And copying it from one entity to another :onricdennat:
At this point, I feel like I even have to congratulate the game for not actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760 bytes of waste would have definitely been noticeable in the DOS days. Makes much more sense to waste that amount of space on an unused C++ exception handler, and a bunch of redundant, unoptimized blitting functions :tannedcirno:

(Thinking about it, YuugenMagan fits this entire system perfectly. And together with its position in the game's code – last to be decompiled means first on the linker command line – we might speculate that YuugenMagan was the first boss to be programmed for TH01?)

So if a boss wants to use sprites with different sizes, there's no way around using another entity. And that's why Girl-Elis and Bat-Elis are two distinct entities internally, and have to manually sync their position. Except that there's also a third one for Attacking-Girl-Elis, because Girl-Elis has 9 frames of animation in total, and the global .BOS bitplane pointers are divided into 4 slots of only 8 images each. :zunpet:
Same for SinGyoku, who is split into a sphere entity, a person entity, and a… white flash entity for all three forms, all at the same resolution. Or Konngara's facial expressions, which also require two entities just for themselves.


And once you decompile all this code, you notice just how much of it the game didn't even use. 13 of the 50 bytes of the boss entity class are outright unused, and 10 bytes are used for a movement clamping and lock system that would have been nice if ZUN also used it outside of Kikuri's soul sprites. Instead, all other bosses ignore this system completely, and just party on the X/Y coordinates of the boss entities directly.

As for the rendering functions, 5 out of 10 are unused. And while those definitely make up less than half of the code, I still must have spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis' entrance animation, the class provides functions for wavy blitting and unblitting, which use a separate X coordinate for every line of the sprite. But there's also an unused and sort of broken one for unblitting two overlapping wavy sprites, located at the same Y coordinate. This might indicate that Elis could originally split herself into two sprites, similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind of animation effect, who knows.


After over 3 months of TH01 progress though, it's finally time to look at other games, to cover the rest of the crowdfunding backlog. Next up: Going back to TH05, and getting rid of those last PI false positives. And since I can potentially spend the next 7 weeks on almost full-time ReC98 work, I've also re-opened the store until October!

📝 Posted:
🚚 Summary of:
P0057, P0058
Commits:
1cb9731...ac7540d, ac7540d...fef0299
💰 Funded by:
[Anonymous], -Tom-
🏷 Tags:

So, here we have the first two pushes with an explicit focus on position independence… and they start out looking barely different from regular reverse-engineering? They even already deduplicate a bunch of item-related code, which was simple enough that it required little additional work? Because the actual work, once again, was in comparing uth05win's interpretations and naming choices with the original PC-98 code? So that we only ended up removing a handful of memory references there?

(Oh well, you can mod item drops now!)

So, continuing to interpret PI as a mere by-product of reverse-engineering might ultimately drive up the total PI cost quite a bit. But alright then, let's systematically clear out some false positives by looking at master.lib function calls instead… and suddenly we get the PI progress we were looking for, nicely spread out over all games since TH02. That kinda makes it sound like useless work, only done because it's dictated by some counting algorithm on a website. But decompilation will want to convert all of these values to decimal anyway. We're merely doing that right now, across all games.

Then again, it doesn't actually make any game more position-independent, and only proves how position-independent it already was. So I'm really wondering right now whether I should just rush actual position independence by simply identifying structures and their sizes, and not bother with members or false positives until that's done. That would certainly get the job done for TH04 and TH05 in just a few more pushes, but then leave all the proving work (and the road to 100% PI on the front page) to reverse-engineering.

I don't know. Would it be worth it to have a game that's "maybe fully position-independent", only for there to maybe be rare edge cases where it isn't?

Or maybe, continuing to strike a balance between identifying false positives (fast) and reverse-engineering structures (slow) will continue to work out like it did now, and make us end up close to the current estimate, which was attractive enough to sell out the crowdfunding for the first time… 🤔

Please give feedback! If possible, by Friday evening UTC+1, before I start working on the next PI push, this time with a focus on TH04.