⮜ Blog

⮜ List of tags

Showing all posts tagged blitting-

📝 Posted:
🚚 Summary of:
P0172, P0173
Commits:
49e6789...2d5491e, 2d5491e...27f901c
💰 Funded by:
Blue Bolt, [Anonymous]
🏷 Tags:
rec98+ th03+ file-format+ menu+ score+ blitting- glitch+

TH03 finally passed 20% RE, and the newly decompiled code contains no serious ZUN bugs! What a nice way to end the year.

There's only a single unlockable feature in TH03: Chiyuri and Yumemi as playable characters, unlocked after a 1CC on any difficulty. Just like the Extra Stages in TH04 and TH05, YUME.NEM contains a single designated variable for this unlocked feature, making it trivial to craft a fully unlocked score file without recording any high scores that others would have to compete against. So, we can now put together a complete set for all PC-98 Touhou games: 2021-12-27-Fully-unlocked-clean-score-files.zip It would have been cool to set the randomly generated encryption keys in these files to a fixed value so that they cancel out and end up not actually encrypting the file. Too bad that TH03 also started feeding each encrypted byte back into its stream cipher, which makes this impossible.

The main loading and saving code turned out to be the second-cleanest implementation of a score file format in PC-98 Touhou, just behind TH02. Only two of the YUME.NEM functions come with nonsensical differences between OP.EXE and MAINL.EXE, rather than 📝 all of them, as in TH01 or 📝 too many of them, as in TH04 and TH05. As for the rest of the per-difficulty structure though… well, it quickly becomes clear why this was the final score file format to be RE'd. The name, score, and stage fields are directly stored in terms of the internal REGI*.BFT sprite IDs used on the high score screen. TH03 also stores 10 score digits for each place rather than the 9 possible ones, keeps any leading 0 digits, and stores the letters of entered names in reverse order… yeah, let's decompile the high score screen as well, for a full understanding of why ZUN might have done all that. (Answer: For no reason at all. :zunpet:)


And wow, what a breath of fresh air. It's surely not good-code: The overlapping shadows resulting from using a 24-pixel letterspacing with 32-pixel glyphs in the name column led ZUN to do quite a lot of unnecessary and slightly confusing rendering work when moving the cursor back and forth, and he even forgot about the EGC there. But it's nowhere close to the level of jank we saw in 📝 TH01's high score menu last year. Good to see that ZUN had learned a thing or two by his third game – especially when it comes to storing the character map cursor in terms of a character ID, and improving the layout of the character map:

That's almost a nicely regular grid there. With the question mark and the double-wide SP, BS, and END options, the cursor movement code only comes with a reasonable two exceptions, which are easily handled. And while I didn't get this screen completely decompiled, one additional push was enough to cover all important code there.

The only potential glitch on this screen is a result of ZUN's continued use of binary-coded decimal digits without any bounds check or cap. Like the in-game HUD score display in TH04 and TH05, TH03's high score screen simply uses the next glyph in the character set for the most significant digit of any score above 1,000,000,000 points – in this case, the period. Still, it only really gets bad at 8,000,000,000 points: Once the glyphs are exhausted, the blitting function ends up accessing garbage data and filling the entire screen with garbage pixels. For comparison though, the current world record is 133,650,710 points, so good luck getting 8 billion in the first place.

Next up: Starting 2022 with the long-awaited decompilation of TH01's Sariel fight! Due to the 📝 recent price increase, we now got a window in the cap that is going to remain open until tomorrow, providing an early opportunity to set a new priority after Sariel is done.

📝 Posted:
🚚 Summary of:
P0168, P0169
Commits:
c2de6ab...8b046da, 8b046da...479b766
💰 Funded by:
rosenrose, Blue Bolt
🏷 Tags:
rec98+ th04+ th05+ boss+ yuuka-5+ blitting- bug+ master.lib+ waste+ mod+

EMS memory! The infamous stopgap measure between the 640 KiB ("ought to be enough for everyone") of conventional memory offered by DOS from the very beginning, and the later XMS standard for accessing all the rest of memory up to 4 GiB in the x86 Protected Mode. With an optionally active EMS driver, TH04 and TH05 will make use of EMS memory to preload a bunch of situational .CDG images at the beginning of MAIN.EXE:

  1. The "eye catch" game title image, shown while stages are loaded
  2. The character-specific background image, shown while bombing
  3. The player character dialog portraits
  4. TH05 additionally stores the boss portraits there, preloading them at the beginning of each stage. (TH04 instead keeps them in conventional memory during the entire stage.)

Once these images are needed, they can then be copied into conventional memory and accessed as usual.

Uh… wait, copied? It certainly would have been possible to map EMS memory to a regular 16-bit Real Mode segment for direct access, bank-switching out rarely used system or peripheral memory in exchange for the EMS data. However, master.lib doesn't expose this functionality, and only provides functions for copying data from EMS to regular memory and vice versa.
But even that still makes EMS an excellent fit for the large image files it's used for, as it's possible to directly copy their pixel data from EMS to VRAM. (Yes, I tried!) Well… would, because ZUN doesn't do that either, and always naively copies the images to newly allocated conventional memory first. In essence, this dumbs down EMS into just another layer of the memory hierarchy, inserted between conventional memory and disk: Not quite as slow as disk, but still requiring that memcpy() to retrieve the data. Most importantly though: Using EMS in this way does not increase the total amount of memory simultaneously accessible to the game. After all, some other data will have to be freed from conventional memory to make room for the newly loaded data.


The most idiomatic way to define the game-specific layout of the EMS area would be either a struct or an enum. Unfortunately, the total size of all these images exceeds the range of a 16-bit value, and Turbo C++ 4.0J supports neither 32-bit enums (which are silently degraded to 16-bit) nor 32-bit structs (which simply don't compile). That still leaves raw compile-time constants though, you only have to manually define the offset to each image in terms of the size of its predecessor. But instead of doing that, ZUN just placed each image at a nice round decimal offset, each slightly larger than the actual memory required by the previous image, just to make sure that everything fits. :tannedcirno: This results not only in quite a bit of unnecessary padding, but also in technically the single biggest amount of "wasted" memory in PC-98 Touhou: Out of the 180,000 (TH04) and 320,000 (TH05) EMS bytes requested, the game only uses 135,552 (TH04) and 175,904 (TH05) bytes. But hey, it's EMS, so who cares, right? Out of all the opportunities to take shortcuts during development, this is among the most acceptable ones. Any actual PC-98 model that could run these two games comes with plenty of memory for this to not turn into an actual issue.

On to the EMS-using functions themselves, which are the definition of "cross-cutting concerns". Most of these have a fallback path for the non-EMS case, and keep the loaded .CDG images in memory if they are immediately needed. Which totally makes sense, but also makes it difficult to find names that reflect all the global state changed by these functions. Every one of these is also just called from a single place, so inlining them would have saved me a lot of naming and documentation trouble there.
The TH04 version of the EMS allocation code was actually displayed on ZUN's monitor in the 2010 MAG・ネット documentary; WindowsTiger already transcribed the low-quality video image in 2019. By 2015 ReC98 standards, I would have just run with that, but the current project goal is to write better code than ZUN, so I didn't. 😛 We sure ain't going to use magic numbers for EMS offsets.

The dialog init and exit code then is completely different in both games, yet equally cross-cutting. TH05 goes even further in saving conventional memory, loading each individual player or boss portrait into a single .CDG slot immediately before blitting it to VRAM and freeing the pixel data again. People who play TH05 without an active EMS driver are surely going to enjoy the hard drive access lag between each portrait change… :godzun: TH04, on the other hand, also abuses the dialog exit function to preload the Mugetsu defeat / Gengetsu entrance and Gengetsu defeat portraits, using a static variable to track how often the function has been called during the Extra Stage… who needs function parameters anyway, right? :zunpet:

This is also the function in which TH04 infamously crashes after the Stage 5 pre-boss dialog when playing with Reimu and without any active EMS driver. That crash is what motivated this look into the games' EMS usage… but the code looks perfectly fine? Oh well, guess the crash is not related to EMS then. Next u–

OK, of course I can't leave it like that. Everyone is expecting a fix now, and I still got half of a push left over after decompiling the regular EMS code. Also, I've now RE'd every function that could possibly be involved in the crash, and this is very likely to be the last time I'll be looking at them.


Turns out that the bug has little to do with EMS, and everything to do with ZUN limiting the amount of conventional RAM that TH04's MAIN.EXE is allowed to use, and then slightly miscalculating this upper limit. Playing Stage 5 with Reimu is the most asset-intensive configuration in this game, due to the combination of

Remove any single one of the above points, and this crash would have never occurred. But with all of them combined, the total amount of memory consumed by TH04's MAIN.EXE just barely exceeds ZUN's limit of 320,000 bytes, by no more than 3,840 bytes, the size of the star image.

But wait: As we established earlier, EMS does nothing to reduce the amount of conventional memory used by the game. In fact, if you disabled TH04's EMS handling, you'd still get this crash even if you are running an EMS driver and loaded DOS into the High Memory Area to free up as much conventional RAM as possible. How can EMS then prevent this crash in the first place?

The answer: It's only because ZUN's usage of EMS bypasses the need to load the cached images back out of the XOR-encrypted 東方幻想.郷 packfile. Leaving aside the general stupidity of any game data file encryption*, master.lib's decryption implementation is also quite wasteful: It uses a separate buffer that receives fixed-size chunks of the file, before decrypting every individual byte and copying it to its intended destination buffer. That really resembles the typical slowness of a C fread() implementation more than it does the highly optimized ASM code that master.lib purports to be… And how large is this well-hidden decryption buffer? 4 KiB. :onricdennat:

So, looking back at the game, here is what happens once the Stage 5 pre-battle dialog ends:

  1. Reimu's bomb background image, which was previously freed to make space for her dialog portraits, has to be loaded back into conventional memory from disk
  2. BB0.CDG is found inside the 東方幻想.郷 packfile
  3. file_ropen() ends up allocating a 4 KiB buffer for the encrypted packfile data, getting us the decisive ~4 KiB closer to the memory limit
  4. The .CDG loader tries to allocate 52 608 contiguous bytes for the pixel data of Reimu's bomb image
  5. This would exceed the memory limit, so hmem_allocbyte() fails and returns a nullptr
  6. ZUN doesn't check for this case (as usual)
  7. The pixel data is loaded to address 0000:0000, overwriting the Interrupt Vector Table and whatever comes after
  8. The game crashes

The 4 KiB encryption buffer would only be freed by the corresponding file_close() call, which of course never happens because the game crashes before it gets there. At one point, I really did suspect the cause to be some kind of memory leak or fragmentation inside master.lib, which would have been quite delightful to fix.
Instead, the most straightforward fix here is to bump up that memory limit by at least 4 KiB. Certainly easier than squeezing in a cdg_free() call for the star image before the pre-boss dialog without breaking position dependence.

Or, even better, let's nuke all these memory limits from orbit because they make little sense to begin with, and fix every other potential out-of-memory crash that modders would encounter when adding enough data to any of the 4 games that impose such limits on themselves. Unless you want to launch other binaries (which need to do their own memory allocations) after launching the game, there's really no reason to restrict the amount of memory available to a DOS process. Heck, whenever DOS creates a new one, it assigns all remaining free memory by default anyway.
Removing the memory limits also removes one of ZUN's few error checks, which end up quitting the game if there isn't at least a given maximum amount of conventional RAM available. While it might be tempting to reserve enough memory at the beginning of execution and then never check any allocation for a potential failure, that's exactly where something like TH04's crash comes from.
This game is also still running on DOS, where such an initial allocation failure is very unlikely to happen – no one fills close to half of conventional RAM with TSRs and then tries running one of these games. It might have been useful to detect systems with less than 640 KiB of actual, physical RAM, but none of the PC-98 models with that little amount of memory are fast enough to run these games to begin with. How ironic… a place where ZUN actually added an error check, and then it's mostly pointless.

Here's an archive that contains both fix variants, just in case. These were compiled from the th04_noems_crash_fix and mem_assign_all branches, and contain as little code changes as possible: 2021-11-29-Memory-limit-fixes.zip

So yeah, quite a complex bug, leaving no time for the TH03 scorefile format research after all. Next up: Raising prices.

📝 Posted:
🚚 Summary of:
P0165, P0166, P0167
Commits:
7a0e5d8...f2bca01, f2bca01...e697907, e697907...c2de6ab
💰 Funded by:
Ember2528
🏷 Tags:
rec98+ th01+ gameplay+ animation+ blitting- bullet+ glitch+ waste+ boss+ singyoku+ mima-th01+ elis+ tcc+ menu+

OK, TH01 missile bullets. Can we maybe have a well-behaved entity type, without any weirdness? Just once?

Ehh, kinda. Apart from another 150 bytes wasted on unused structure members, this code is indeed more on the low end in terms of overall jank. It does become very obvious why dodging these missiles in the YuugenMagan, Mima, and Elis fights feels so awful though: An unfair 46×46 pixel hitbox around Reimu's center pixel, combined with the comeback of 📝 interlaced rendering, this time in every stage. ZUN probably did this because missiles are the only 16×16 sprite in TH01 that is blitted to unaligned X positions, which effectively ends up touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would have been totally possible to render every missile in every frame at roughly the same amount of CPU time that the original game uses for interlaced rendering:

That's an optimization that would have significantly benefitted the game, in contrast to all of the fake ones introduced in later games. Then again, this optimization is actually something that the later games do, and it might have in fact been necessary to achieve their higher bullet counts without significant slowdown.

Unfortunately, it was only worth decompiling half of the missile code right now, thanks to gratuitous FPU usage in the other half, where 📝 double variables are compared to float literals. That one will have to wait 📝 until after SinGyoku.


After some effectively unused Mima sprite effect code that is so broken that it's impossible to make sense out of it, we get to the final feature I wanted to cover for all bosses in parallel before returning to Sariel: The separate sprite background storage for moving or animated boss sprites in the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from main memory on every frame. After all, TH01 and TH02 had a minimum required clock speed of 33 MHz, half of the speed required for the later three games. So, he simply blitted these boss sprites to both VRAM pages, leading the usual unblitting calls to only remove the other sprites on top of the boss. However, these bosses themselves want to move across the screen… and this makes it necessary to save the stage background behind them in some other way.

Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM into a sprite slot. No problem with that approach in theory, as the size of all these bigger sprites is a multiple of 32×32; splitting a larger sprite into these smaller 32×32 chunks makes the code look just a little bit clumsy (and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is blitted on top of her regular sprite, using just regular sprite transparency, she ends up with her infamous third arm:

Ironically, there's an unused code path in Mima's unblit function where ZUN assumes a height of 48 pixels for Mima's animation sprites rather than the actual 64. This leads to even clumsier .PTN function calls for the bottom 128×16 pixels… Failing to unblit the bottom 16 pixels would have also yielded that third arm, although it wouldn't have looked as natural. Still wouldn't say that it was intentional; maybe this casting sprite was just added pretty late in the game's development?


So, mission accomplished, Sariel unblocked… at 2¼ pushes. :thonk: That's quite some time left for some smaller stage initialization code, which bundles a bunch of random function calls in places where they logically really don't belong. The stage opening animation then adds a bunch of VRAM inter-page copies that are not only redundant but can't even be understood without knowing the hidden internal state of the last VRAM page accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any complexity limit on inlining arithmetic expressions, as long as they only operate on compile-time constants. That's how we get macro-free, compile-time Shift-JIS to JIS X 0208 conversion of the individual code points in the 東方★靈異伝 string, in a compiler from 1994. As long as you don't store any intermediate results in variables, that is… :tannedcirno:

But wait, there's more! With still ¼ of a push left, I also went for the boss defeat animation, which includes the route selection after the SinGyoku fight.
As in all other instances, the 2× scaled font is accomplished by first rendering the text at regular 1× resolution to the other, invisible VRAM page, and then scaled from there to the visible one. However, the route selection is unique in that its scaled text is both drawn transparently on top of the stage background (not onto a black one), and can also change colors depending on the selection. It would have been no problem to unblit and reblit the text by rendering the 1× version to a position on the invisible VRAM page that isn't covered by the 2× version on the visible one, but ZUN (needlessly) clears the invisible page before rendering any text. :zunpet: Instead, he assigned a separate VRAM color for both the 魔界 and 地獄 options, and only changed the palette value for these colors to white or gray, depending on the correct selection. This is another one of the 📝 rare cases where TH01 demonstrates good use of PC-98 hardware, as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.

Then, why does this still not count as good-code? When changing palette colors, you kinda need to be aware of everything else that can possibly be on screen, which colors are used there, and which aren't and can therefore be used for such an effect without affecting other sprites. In this case, well… hover over the image below, and notice how Reimu's hair and the bomb sprites in the HUD light up when Makai is selected:

This push did end on a high note though, with the generic, non-SinGyoku version of the defeat animation being an easily parametrizable copy. And that's how you decompile another 2.58% of TH01 in just slightly over three pushes.


Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and SinGyoku without needing any more detours into non-boss code. Thanks to the current TH01 funding subscriptions, I can plan to cover most, if not all, of Sariel in a single push series, but the currently 3 pending pushes probably won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got quite a lot of not specifically TH01-related funds in the backlog to pass the time though.

Due to recent developments, it actually makes quite a lot of sense to take a break from TH01: spaztron64 has managed what every Touhou download site so far has failed to do: Bundling all 5 game onto a single .HDI together with pre-configured PC-98 emulators and a nice boot menu, and hosting the resulting package on a proper website. While this first release is already quite good (and much better than my attempt from 2014), there is still a bit of room for improvement to be gained from specific ReC98 research. Next up, therefore:

📝 Posted:
🚚 Summary of:
P0134
Commits:
1d5db71...a6eed55
💰 Funded by:
[Anonymous]
🏷 Tags:
rec98+ th05+ blitting- portability+ micro-optimization+ jank+ tasm+ tcc+

Technical debt, part 5… and we only got TH05's stupidly optimized .PI functions this time?

As far as actual progress is concerned, that is. In maintenance news though, I was really hyped for the #include improvements I've mentioned in 📝 the last post. The result: A new x86real.h file, bundling all the declarations specific to the 16-bit x86 Real Mode in a smaller file than Turbo C++'s own DOS.H. After all, DOS is something else than the underlying CPU. And while it didn't speed up build times quite as much as I had hoped, it now clearly indicates the x86-specific parts of PC-98 Touhou code to future port authors.

After another couple of improvements to parameter declaration in ASM land, we get to TH05's .PI functions… and really, why did ZUN write all of them in ASM? Why (re)declare all the necessary structures and data in ASM land, when all these functions are merely one layer of abstraction above master.lib, which does all the actual work?
I get that ZUN might have wanted masked blitting to be faster, which is used for the fade-in effect seen during TH05's main menu animation and the ending artwork. But, uh… he knew how to modify master.lib. In fact, he did already modify the graph_pack_put_8() function used for rendering a single .PI image row, to ignore master.lib's VRAM clipping region. For this effect though, he first blits each row regularly to the invisible 400th row of VRAM, and then does an EGC-accelerated VRAM-to-VRAM blit of that row to its actual target position with the mask enabled. It would have been way more efficient to add another version of this function that takes a mask pattern. No amount of REP MOVSW is going to change the fact that two VRAM writes per line are slower than a single one. Not to mention that it doesn't justify writing every other .PI function in ASM to go along with it…
This is where we also find the most hilarious aspect about this: For most of ZUN's pointless micro-optimizations, you could have maybe made the argument that they do save some CPU cycles here and there, and therefore did something positive to the final, PC-98-exclusive result. But some of the hand-written ASM here doesn't even constitute a micro-optimization, because it's worse than what you would have got out of even Turbo C++ 4.0J with its 80386 optimization flags! :zunpet:

At least it was possible to "decompile" 6 out of the 10 functions here, making them easy to clean up for future modders and port authors. Could have been 7 functions if I also decided to "decompile" pi_free(), but all the C++ code is already surrounded by ASM, resulting in 2 ASM translation units and 2 C++ translation units. pi_free() would have needed a single translation unit by itself, which wasn't worth it, given that I would have had to spell out every single ASM instruction anyway.

void pascal pi_free(int slot)
{
	if(pi_buffers[slot]) {
		graph_pi_free(&pi_headers[slot], &pi_buffers[slot]);
		pi_buffers[slot] = NULL;
	}
}

There you go. What about this needed to be written in ASM?!?

The function calls between these small translation units even seemed to glitch out TASM and the linker in the end, leading to one CALL offset being weirdly shifted by 32 bytes. Usually, TLINK reports a fixup overflow error when this happens, but this time it didn't, for some reason? Mirroring the segment grouping in the affected translation unit did solve the problem, and I already knew this, but only thought of it after spending quite some RTFM time… during which I discovered the -lE switch, which enables TLINK to use the expanded dictionaries in Borland's .OBJ and .LIB files to speed up linking. That shaved off roughly another second from the build time of the complete ReC98 repository. The more you know… Binary blobs compiled with non-Borland tools would be the only reason not to use this flag.

So, even more slowdown with this 5th dedicated push, since we've still only repaid 41% of the technical debt in the SHARED segment so far. Next up: Part 6, which hopefully manages to decompile the FM and SSG channel animations in TH05's Music Room, and hopefully ends up being the final one of the slow ones.

📝 Posted:
🚚 Summary of:
P0130, P0131
Commits:
6d69ea8...576def5, 576def5...dc9e3ee
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ pc98+ blitting- good-code+ jank+ gameplay+ boss+ rng+

50% hype! 🎉 But as usual for TH01, even that final set of functions shared between all bosses had to consume two pushes rather than one…

First up, in the ongoing series "Things that TH01 draws to the PC-98 graphics layer that really should have been drawn to the text layer instead": The boss HP bar. Oh well, using the graphics layer at least made it possible to have this half-red, half-white pattern for the middle section.
This one pattern is drawn by making surprisingly good use of the GRCG. So far, we've only seen it used for fast monochrome drawing:

// Setting up fast drawing using color #9 (1001 in binary)
grcg_setmode(GC_RMW);
outportb(0x7E, 0xFF); // Plane 0: (B): (********)
outportb(0x7E, 0x00); // Plane 1: (R): (        )
outportb(0x7E, 0x00); // Plane 2: (G): (        )
outportb(0x7E, 0xFF); // Plane 3: (E): (********)

// Write a checkerboard pattern (* * * * ) in color #9 to the top-left corner,
// with transparent blanks. Requires only 1 VRAM write to a single bitplane:
// The GRCG automatically writes to the correct bitplanes, as specified above
*(uint8_t *)(MK_FP(0xA800, 0)) = 0xAA;

But since this is actually an 8-pixel tile register, we can set any 8-pixel pattern for any bitplane. This way, we can get different colors for every one of the 8 pixels, with still just a single VRAM write of the alpha mask to a single bitplane:

grcg_setmode(GC_RMW); //  Final color: (A7A7A7A7)
outportb(0x7E, 0x55); // Plane 0: (B): ( * * * *)
outportb(0x7E, 0xFF); // Plane 1: (R): (********)
outportb(0x7E, 0x55); // Plane 2: (G): ( * * * *)
outportb(0x7E, 0xAA); // Plane 3: (E): (* * * * )

And I thought TH01 only suffered the drawbacks of PC-98 hardware, making so little use of its actual features that it's perhaps not fair to even call it "a PC-98 game"… Still, I'd say that "bad PC-98 port of an idea" describes it best.

However, after that tiny flash of brilliance, the surrounding HP rendering code goes right back to being the typical sort of confusing TH01 jank. There's only a single function for the three distinct jobs of

with magic numbers to select between all of these.

VRAM of course also means that the backgrounds behind the individual hit points have to be stored, so that they can be unblitted later as the boss is losing HP. That's no big deal though, right? Just allocate some memory, copy what's initially in VRAM, then blit it back later using your foundational set of blitting funct– oh, wait, TH01 doesn't have this sort of thing, right :tannedcirno: The closest thing, 📝 once again, are the .PTN functions. And so, the game ends up handling these 8×16 background sprites with 16×16 wrappers around functions for 32×32 sprites. :zunpet: That's quite the recipe for confusion, especially since ZUN preferred copy-pasting the necessary ridiculous arithmetic expressions for calculating positions, .PTN sprite IDs, and the ID of the 16×16 quarter inside the 32×32 sprite, instead of just writing simple helper functions. He did manage to make the result mostly bug-free this time around, though! There's one minor hit point discoloration bug if the red-white or white sections start at an odd number of hit points, but that's never the case for any of the original 7 bosses.
The remaining sloppiness is ultimately inconsequential as well: The game always backs up twice the number of hit point backgrounds, and thus uses twice the amount of memory actually required. Also, this self-restriction of only unblitting 16×16 pixels at a time requires any remaining odd hit point at the last position to, of course, be rendered again :onricdennat:


After stumbling over the weakest imaginable random number generator, we finally arrive at the shared boss↔orb collision handling function, the final blocker among the final blockers. This function takes a whopping 12 parameters, 3 of them being references to int values, some of which are duplicated for every one of the 7 bosses, with no generic boss struct anywhere. 📝 Previously, I speculated that YuugenMagan might have been the first boss to be programmed for TH01. With all these variables though, there is some new evidence that SinGyoku might have been the first one after all: It's the only boss to use its own HP and phase frame variables, with the other bosses sharing the same two globals.

While this function only handles the response to a boss↔orb collision, it still does way too much to describe it briefly. Took me quite a while to frame it in terms of invincibility (which is the main impact of all of this that can be observed in gameplay code). That made at least some sort of sense, considering the other usages of the variables passed as references to that function. Turns out that YuugenMagan, Kikuri, and Elis abuse what's meant to be the "invincibility frame" variable as a frame counter for some of their animations 🙄
Oh well, the game at least doesn't call the collision handling function during those, so "invincibility frame" is technically still a correct variable name there.


And that's it! We're finally ready to start with Konngara, in 2021. I've been waiting quite a while for this, as all this high-level boss code is very likely to speed up TH01 progress quite a bit. Next up though: Closing out 2020 with more of the technical debt in the other games.

📝 Posted:
🚚 Summary of:
P0123
Commits:
4406c3d...72dfa09
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ file-format+ player+ animation+ blitting- waste+ jank+

Done with the .BOS format, at last! While there's still quite a bunch of undecompiled non-format blitting code left, this was in fact the final piece of graphics format loading code in TH01.

📝 Continuing the trend from three pushes ago, we've got yet another class, this time for the 48×48 and 48×32 sprites used in Reimu's gohei, slide, and kick animations. The only reason these had to use the .BOS format at all is simply because Reimu's regular sprites are 32×32, and are therefore loaded from 📝 .PTN files.
Yes, this makes no sense, because why would you split animations for the same character across two file formats and two APIs, just because of a sprite size difference? This necessity for switching blitting APIs might also explain why Reimu vanishes for a few frames at the beginning and the end of the gohei swing animation, but more on that once we get to the high-level rendering code.

Now that we've decompiled all the .BOS implementations in TH01, here's an overview of all of them, together with .PTN to show that there really was no reason for not using the .BOS API for all of Reimu's sprites:

CBossEntity CBossAnim CPlayerAnim ptn_* (32×32)
Format .BOS .BOS .BOS .PTN
Hitbox
Byte-aligned blitting
Byte-aligned unblitting
Unaligned blitting Single-line and wave only
Precise unblitting
Per-file sprite limit 8 8 32 64
Pixels blitted at once 16 16 8 32

And even that last property could simply be handled by branching based on the sprite width, and wouldn't be a reason for switching formats. But well, it just wouldn't be TH01 without all that redundant bloat though, would it?

The basic loading, freeing, and blitting code was yet another variation on the other .BOS code we've seen before. So this should have caused just as little trouble as the CBossAnim code… except that CPlayerAnim did add one slightly difficult function to the mix, which led to it requiring almost a full push after all. Similar to 📝 the unblitting code for moving lasers we've seen in the last push, ZUN tries to minimize the amount of VRAM writes when unblitting Reimu's slide animations. Technically, it's only necessary to restore the pixels that Reimu traveled by, plus the ones that wouldn't be redrawn by the new animation frame at the new X position.
The theoretically arbitrary distance between the two sprites is, of course, modeled by a fixed-size buffer on the stack :onricdennat:, coming with the further assumption that the sprite surely hasn't moved by more than 1 horizontal VRAM byte compared to the last frame. Which, of course, results in glitches if that's not the case, leaving little Reimu parts in VRAM if the slide speed ever exceeded 8 pixels per frame. :tannedcirno: (Which it never does, being hardcoded to 6 pixels, but still.). As it also turns out, all those bit masking operations easily lead to incredibly sloppy C code. Which compiles into incredibly terrible ASM, which in turn might end up wasting way more CPU time than the final VRAM write optimization would have gained? Then again, in-depth profiling is way beyond the scope of this project at this point.

Next up: The TH04 main menu, and some more technical debt.

📝 Posted:
🚚 Summary of:
P0122
Commits:
164591f...4406c3d
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ blitting- waste+ jank+ gameplay+ laser+

This time around, laser is 📝 actually not difficult, with TH01's shootout laser class being simple enough to nicely fit into a single push. All other stationary lasers (as used by YuugenMagan, for example) don't even use a class, and are simply treated as regular lines with collision detection.

But of course, the shootout lasers also come with the typical share of TH01 jank we've all come to expect by now. This time, it already starts with the hardcoded sprite data:

A shootout laser can have a width from 1 to 8 pixels, so ZUN stored a separate 16×1 sprite with a line for each possible width (left-to-right). Then, he shifted all of these sprites 1 pixel to the right for all of the 8 possible start positions within a planar VRAM byte (top-to-bottom). Because… doing that bit shift programmatically is way too expensive, so let's pre-shift at compile time, and use 16× the memory per sprite? :tannedcirno:

Since a bunch of other sprite sheets need to be pre-shifted as well (this is the 5th one we've found so far), our sprite converter has a feature to automatically generate those pre-shifted variations. This way, we can abstract away that implementation detail and leave modders with .BMP files that still only contain a single version of each sprite. But, uh…, wait, in this sprite sheet, the second row for 1-pixel lasers is accidentally shifted right by one more pixel that it should have been?! Which means that

  1. we can't use the auto-preshift feature here, and have to store this weird-looking (and quite frankly, completely unnecessary) sprite sheet in its entirety
  2. ZUN did, at least during TH01's development, not have a sprite converter, and directly hardcoded these dot patterns in the C++ code :zunpet:

The waste continues with the class itself. 69 bytes, with 22 bytes outright unused, and 11 not really necessary. As for actual innovations though, we've got 📝 another 32-bit fixed-point type, this time actually using 8 bits for the fractional part. Therefore, the ray position is tracked to the 1/256th of a pixel, using the full precision of master.lib's 8-bit sin() and cos() lookup tables.
Unblitting is also remarkably efficient: It's only done once the laser stopped extending and started moving, and only for the exact pixels at the start of the ray that the laser traveled by in a single frame. If only the ray part was also rendered as efficiently – it's fully blitted every frame, right next to the collision detection for each row of the ray.


With a public interface of two functions (spawn, and update / collide / unblit / render), that's superficially all there is to lasers in this game. There's another (apparently inlined) function though, to both reset and, uh, "fully unblit" all lasers at the end of every boss fight… except that it fails hilariously at doing the latter, and ends up effectively unblitting random 32-pixel line segments, due to ZUN confusing both the coordinates and the parameter types for the line unblitting function. :zunpet:
A while ago, I was asked about this crash that tends to happen when defeating Elis. And while you can clearly see the random unblitted line segments that are missing from the sprites, I don't quite think we've found the cause for the crash, since the 📝 line unblitting function used there does clip its coordinates to the VRAM range.

Next up: The final piece of image format code in TH01, covering Reimu's sprites!

📝 Posted:
🚚 Summary of:
P0120, P0121
Commits:
453dd3c...3c008b6, 3c008b6...5c42fcd
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ pc98+ blitting- waste+ jank+ boss+ mima-th01+

Back to TH01, and its boss sprite format… with a separate class for storing animations that only differs minutely from the 📝 regular boss entity class I covered last time? Decompiling this class was almost free, and the main reason why the first of these pushes ended up looking pretty huge.

Next up were the remaining shape drawing functions from the code segment that started with the .GRC functions. P0105 already started these with the (surprisingly sanely implemented) 8×8 diamond, star, and… uh, snowflake (?) sprites , prominently seen in the Konngara, Elis, and Sariel fights, respectively. Now, we've also got:

The weirdness becomes obvious with just a single screenshot:

First, we've got the obvious issue of the sprites not being clipped at the right edge of VRAM, with the rightmost pixels in each row of the sprite extending to the beginning of the next row. Well, that's just what you get if you insist on writing unique low-level blitting code for the majority of the individual sprites in the game… 🤷
More importantly though, the sprite sheet looks like this: So how do we even get these fully filled red diamonds?

Well, turns out that the sprites are never consistently unblitted during their 8 frames of animation. There is a function that looks like it unblits the sprite… except that it starts with by enabling the GRCG and… reading from the first bitplane on the background page? If this was the EGC, such a read would fill some internal registers with the contents of all 4 bitplanes, which can then subsequently be blitted to all 4 bitplanes of any VRAM page with a single memory write. But with the GRCG in RMW mode, reads do nothing special, and simply copy the memory contents of one bitplane to the read destination. Maybe ZUN thought that setting the RMW color to red also sets some internal 4-plane mask register to match that color? :zunpet:
Instead, the rather random pixels read from the first bitplane are then used as a mask for a second blit of the same red sprite. Effectively, this only really "unblits" the invincibility pixels that are drawn on top of Reimu's sprite. Since Reimu is drawn first, the invincibility sprites are overwritten anyway. But due to the palette color layout of Reimu's sprite, its pixels end up fully masking away any invincibility sprite pixels in that second blit, leaving VRAM untouched as a result. Anywhere else though, this animation quickly turns into the union of all animation frames.

Then again, if that 16-dot-aligned rectangular unblitting function is all you know about the EGC, and you can't be bothered to write a perfect unblitter for 8×8 sprites, it becomes obvious why you wouldn't want to use it:

Because Reimu would barely be visible under all that flicker. In comparison, those fully filled diamonds actually look pretty good.


After all that, the remaining time wouldn't have been enough for the next few essential classes, so I closed out the push with three more VRAM effects instead:


And with that, ReC98, as a whole, is not only ⅓ done, but I've also fully caught up with the feature backlog for the first time in the history of this crowdfunding! Time to go into maintenance mode then, while we wait for the next pushes to be funded. Got a huge backlog of tiny maintenance issues to address at a leisurely pace, and of course there's also the 📝 16-bit build system waiting to be finished.

📝 Posted:
🚚 Summary of:
P0118
Commits:
0bb5bc3...cbf14eb
💰 Funded by:
-Tom-, Ember2528
🏷 Tags:
rec98+ th01+ th02+ th04+ th05+ position-independence+ hud+ blitting- unused+

🎉 TH05 is finally fully position-independent! 🎉 To celebrate this milestone, -Tom- coded a little demo, which we recorded on both an emulator and on real PC-98 hardware:

For all the new people who are unfamiliar with PC-98 Touhou internals: Boss behavior is hardcoded into MAIN.EXE, rather than being scriptable via separate .ECL files like in Windows Touhou. That's what makes this kind of a big deal.


What does this mean?

You can now freely add or remove both data and code anywhere in TH05, by editing the ReC98 codebase, writing your mod in ASM or C/C++, and recompiling the code. Since all absolute memory addresses have now been converted to labels, this will work without causing any instability. See the position independence section in the FAQ for a more thorough explanation about why this was a problem.

By extension, this also means that it's now theoretically possible to use a different compiler on the source code. But:

What does this not mean?

The original ZUN code hasn't been completely reverse-engineered yet, let alone decompiled. As the final PC-98 Touhou game, TH05 also happens to have the largest amount of actual ZUN-written ASM that can't ever be decompiled within ReC98's constraints of a legit source code reconstruction. But a lot of the originally-in-C code is also still in ASM, which might make modding a bit inconvenient right now. And while I have decompiled a bunch of functions, I selected them largely because they would help with PI (as requested by the backers), and not because they are particularly relevant to typical modding interests.

As a result, the code might also be a bit confusingly organized. There's quite a conflict between various goals there: On the one hand, I'd like to only have a single instance of every function shared with earlier games, as well as reduce ZUN's code duplication within a single game. On the other hand, this leads to quite a lot of code being scattered all over the place and then #include-pasted back together, except for the places where 📝 this doesn't work, and you'd have to use multiple translation units anyway… I'm only beginning to figure out the best structure here, and some more reverse-engineering attention surely won't hurt.

Also, keep in mind that the code still targets x86 Real Mode. To work effectively in this codebase, you'd need some familiarity with memory segmentation, and how to express it all in code. This tends to make even regular C++ development about an order of magnitude harder, especially once you want to interface with the remaining ASM code. That part made -Tom- struggle quite a bit with implementing his custom scripting language for the demo above. For now, he built that demo on quite a limited foundation – which is why he also chose to release neither the build nor the source publically for the time being.
So yeah, you're definitely going to need the TASM and Borland C++ manuals there.

tl;dr: We now know everything about this game's data, but not quite as much about this game's code.

So, how long until source ports become a realistic project?

You probably want to wait for 100% RE, which is when everything that can be decompiled has been decompiled.

Unless your target system is 16-bit Windows, in which case you could theoretically start right away. 📝 Again, this would be the ideal first system to port PC-98 Touhou to: It would require all the generic portability work to remove the dependency on PC-98 hardware, thus paving the way for a subsequent port to modern systems, yet you could still just drop in any undecompiled ASM.

Porting to IBM-compatible DOS would only be a harder and less universally useful version of that. You'd then simply exchange one architecture, with its idiosyncrasies and limits, for another, with its own set of idiosyncrasies and limits. (Unless, of course, you already happen to be intimately familiar with that architecture.) The fact that master.lib provides DOS/V support would have only mattered if ZUN consistently used it to abstract away PC-98 hardware at every single place in the code, which is definitely not the case.


The list of actually interesting findings in this push is, 📝 again, very short. Probably the most notable discovery: The low-level part of the code that renders Marisa's laser from her TH04 Illusion Laser shot type is still present in TH05. Insert wild mass guessing about potential beta version shot types… Oh, and did you know that the order of background images in the Extra Stage staff roll differs by character?

Next up: Finally driving up the RE% bar again, by decompiling some TH05 main menu code.

📝 Posted:
🚚 Summary of:
P0115, P0116
Commits:
967bb8b...e5328a3, e5328a3...03048c3
💰 Funded by:
Lmocinemod, Blue Bolt, [Anonymous]
🏷 Tags:
rec98+ th03+ th04+ th05+ file-format+ cutscene+ blitting- menu+

Finally, after a long while, we've got two pushes with barely anything to talk about! Continuing the road towards 100% PI for TH05, these were exactly the two pushes that TH05 MAINE.EXE PI was estimated to additionally cost, relative to TH04's. Consequently, they mostly went to TH05's unique data structures in the ending cutscenes, the score name registration menu, and the staff roll.

A unique feature in there is TH05's support for automatic text color changes in its ending scripts, based on the first full-width Shift-JIS codepoint in a line. The \c=codepoint,color commands at the top of the _ED??.TXT set up exactly this codepoint→color mapping. As far as I can tell, TH05 is the only Touhou game with a feature like this – even the Windows Touhou games went back to manually spelling out each color change.

The orb particles in TH05's staff roll also try to be a bit unique by using 32-bit X and Y subpixel variables for their current position. With still just 4 fractional bits, I can't really tell yet whether the extended range was actually necessary. Maybe due to how the "camera scrolling" through "space" was implemented? All other entities were pretty much the usual fare, though.
12.4, 4.4, and now a 28.4 fixed-point format… yup, 📝 C++ templates were definitely the right choice.

At the end of its staff roll, TH05 not only displays the usual performance verdict, but then scrolls in the scores at the end of each stage before switching to the high score menu. The simplest way to smoothly scroll between two full screens on a PC-98 involves a separate bitmap… which is exactly what TH05 does here, reserving 28,160 bytes of its global data segment for just one overly large monochrome 320×704 bitmap where both the screens are rendered to. That's… one benefit of splitting your game into multiple executables, I guess? :tannedcirno:
Not sure if it's common knowledge that you can actually scroll back and forth between the two screens with the Up and Down keys before moving to the score menu. I surely didn't know that before. But it makes sense – might as well get the most out of that memory.


The necessary groundwork for all of this may have actually made TH04's (yes, TH04's) MAINE.EXE technically position-independent. Didn't quite reach the same goal for TH05's – but what we did reach is ⅔ of all PC-98 Touhou code now being position-independent! Next up: Celebrating even more milestones, as -Tom- is about to finish development on his TH05 MAIN.EXE PI demo…

📝 Posted:
🚚 Summary of:
P0113, P0114
Commits:
150d2c6...6204fdd, 6204fdd...967bb8b
💰 Funded by:
Lmocinemod
🏷 Tags:
rec98+ th02+ th03+ th04+ build-process+ pipeline+ tasm+ blitting- file-format+ input+

Alright, tooling and technical debt. Shouldn't be really much to talk about… oh, wait, this is still ReC98 :tannedcirno:

For the tooling part, I finished up the remaining ergonomics and error handling for the 📝 sprite converter that Jonathan Campbell contributed two months ago. While I familiarized myself with the tool, I've actually ran into some unreported errors myself, so this was sort of important to me. Still got no command-line help in there, but the error messages can now do that job probably even better, since we would have had to write them anyway.

So, what's up with the technical debt then? Well, by now we've accumulated quite a number of 📝 ASM code slices that need to be either decompiled or clearly marked as undecompilable. Since we define those slices as "already reverse-engineered", that decision won't affect the numbers on the front page at all. But for a complete decompilation, we'd still have to do this someday. So, rather than incorporating this work into pushes that were purchased with the expectation of measurable progress in a certain area, let's take the "anything goes" pushes, and focus entirely on that during them.

The second code segment seemed like the best place to start with this, since it affects the largest number of games simultaneously. Starting with TH02, this segment contains a set of random "core" functions needed by the binary. Image formats, sounds, input, math, it's all there in some capacity. You could maybe call it all "libzun" or something like that? But for the time being, I simply went with the obvious name, seg2. Maybe I'll come up with something more convincing in the future.


Oh, but wait, why were we assembling all the previous undecompilable ASM translation units in the 16-bit build part? By moving those to the 32-bit part, we don't even need a 16-bit TASM in our list of dependencies, as long as our build process is not fully 16-bit.
And with that, ReC98 now also builds on Windows 95, and thus, every 32-bit Windows version. 🎉 Which is certainly the most user-visible improvement in all of these two pushes. :onricdennat:


Back in 2015, I already decompiled all of TH02's seg2 functions. As suggested by the Borland compiler, I tried to follow a "one translation unit per segment" layout, bundling the binary-specific contents via #include. In the end, it required two translation units – and that was even after manually inserting the original padding bytes via #pragma codestring… yuck. But it worked, compiled, and kept the linker's job (and, by extension, segmentation worries) to a minimum. And as long as it all matched the original binaries, it still counted as a valid reconstruction of ZUN's code. :zunpet:

However, that idea ultimately falls apart once TH03 starts mixing undecompilable ASM code inbetween C functions. Now, we officially have no choice but to use multiple C and ASM translation units, with maybe only just one or two #includes in them…

…or we finally start reconstructing the actual seg2 library, turning every sequence of related functions into its own translation unit. This way, we can simply reuse the once-compiled .OBJ files for all the binaries those functions appear in, without requiring that additional layer of translation units mirroring the original segmentation.
The best example for this is TH03's almost undecompilable function that generates a lookup table for horizontally flipping 8 1bpp pixels. It's part of every binary since TH03, but only used in that game. With the previous approach, we would have had to add 9 C translation units, which would all have just #included that one file. Now, we simply put the .OBJ file into the correct place on the linker command line, as soon as we can.

💡 And suddenly, the linker just inserts the correct padding bytes itself.

The most immediate gains there also happened to come from TH03. Which is also where we did get some tiny RE% and PI% gains out of this after all, by reverse-engineering some of its sprite blitting setup code. Sure, I should have done even more RE here, to also cover those 5 functions at the end of code segment #2 in TH03's MAIN.EXE that were in front of a number of library functions I already covered in this push. But let's leave that to an actual RE push 😛


All in all though, I was just getting started with this; the real gains in terms of removed ASM files are still to come. But in the meantime, the funding situation has become even better in terms of allowing me to focus on things nobody asked for. 🙂 So here's a slightly better idea: Instead of spending two more pushes on this, let's shoot for TH05 MAINE.EXE position independence next. If I manage to get it done, we'll have a 100% position-independent TH05 by the time -Tom- finishes his MAIN.EXE PI demo, rather than the 94% we'd get from just MAIN.EXE. That's bound to make a much better impression on all the people who will then (re-)discover the project.

📝 Posted:
🚚 Summary of:
P0105, P0106, P0107, P0108
Commits:
3622eb6...11b776b, 11b776b...1f1829d, 1f1829d...1650241, 1650241...dcf4e2c
💰 Funded by:
Yanga
🏷 Tags:
rec98+ th01+ meta+ file-format+ animation+ blitting- boss+ singyoku+ yuugenmagan+ elis+ kikuri+ konngara+ waste+

And indeed, I got to end my vacation with a lot of image format and blitting code, covering the final two formats, .GRC and .BOS. .GRC was nothing noteworthy – one function for loading, one function for byte-aligned blitting, and one function for freeing memory. That's it – not even a unblitting function for this one. .BOS, on the other hand…

…has no generic (read: single/sane) implementation, and is only implemented as methods of some boss entity class. And then again for Sariel's dress and wand animations, and then again for Reimu's animations, both of which weren't even part of these 4 pushes. Looking forward to decompiling essentially the same algorithms all over again… And that's how TH01 became the largest and most bloated PC-98 Touhou game. So yeah, still not done with image formats, even at 44% RE.

This means I also had to reverse-engineer that "boss entity" class… yeah, what else to call something a boss can have multiple of, that may or may not be part of a larger boss sprite, may or may not be animated, and that may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this class. Since renaming all these variables in ASM land is tedious anyway, I went the extra mile and directly defined separate, meaningful names for the entities of all bosses. These also now document the natural order in which the bosses will ultimately be decompiled. So, unless a backer requests anything else, this order will be:

  1. Konngara
  2. Sariel
  3. Elis
  4. Kikuri
  5. SinGyoku
  6. (code for regular card-flipping stages)
  7. Mima
  8. YuugenMagan

As everyone kind of expects from TH01 by now, this class reveals yet another… um, unique and quirky piece of code architecture. In addition to the position and hitbox members you'd expect from a class like this, the game also stores the .BOS metadata – width, height, animation frame count, and 📝 bitplane pointer slot number – inside the same class. But if each of those still corresponds to one individual on-screen sprite, how can YuugenMagan have 5 eye sprites, or Kikuri have more than one soul and tear sprite? By duplicating that metadata, of course! And copying it from one entity to another :onricdennat:
At this point, I feel like I even have to congratulate the game for not actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760 bytes of waste would have definitely been noticeable in the DOS days. Makes much more sense to waste that amount of space on an unused C++ exception handler, and a bunch of redundant, unoptimized blitting functions :tannedcirno:

(Thinking about it, YuugenMagan fits this entire system perfectly. And together with its position in the game's code – last to be decompiled means first on the linker command line – we might speculate that YuugenMagan was the first boss to be programmed for TH01?)

So if a boss wants to use sprites with different sizes, there's no way around using another entity. And that's why Girl-Elis and Bat-Elis are two distinct entities internally, and have to manually sync their position. Except that there's also a third one for Attacking-Girl-Elis, because Girl-Elis has 9 frames of animation in total, and the global .BOS bitplane pointers are divided into 4 slots of only 8 images each. :zunpet:
Same for SinGyoku, who is split into a sphere entity, a person entity, and a… white flash entity for all three forms, all at the same resolution. Or Konngara's facial expressions, which also require two entities just for themselves.


And once you decompile all this code, you notice just how much of it the game didn't even use. 13 of the 50 bytes of the boss entity class are outright unused, and 10 bytes are used for a movement clamping and lock system that would have been nice if ZUN also used it outside of Kikuri's soul sprites. Instead, all other bosses ignore this system completely, and just party on the X/Y coordinates of the boss entities directly.

As for the rendering functions, 5 out of 10 are unused. And while those definitely make up less than half of the code, I still must have spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis' entrance animation, the class provides functions for wavy blitting and unblitting, which use a separate X coordinate for every line of the sprite. But there's also an unused and sort of broken one for unblitting two overlapping wavy sprites, located at the same Y coordinate. This might indicate that Elis could originally split herself into two sprites, similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind of animation effect, who knows.


After over 3 months of TH01 progress though, it's finally time to look at other games, to cover the rest of the crowdfunding backlog. Next up: Going back to TH05, and getting rid of those last PI false positives. And since I can potentially spend the next 7 weeks on almost full-time ReC98 work, I've also re-opened the store until October!

📝 Posted:
🚚 Summary of:
P0096, P0097, P0098
Commits:
8ddb778...8283c5e, 8283c5e...600f036, 600f036...ad06748
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98+ th01+ file-format+ pc98+ blitting- gameplay+ player+ shot+ jank+ mod+ tcc+

So, let's finally look at some TH01 gameplay structures! The obvious choices here are player shots and pellets, which are conveniently located in the last code segment. Covering these would therefore also help in transferring some first bits of data in REIIDEN.EXE from ASM land to C land. (Splitting the data segment would still be quite annoying.) Player shots are immediately at the beginning…

…but wait, these are drawn as transparent sprites loaded from .PTN files. Guess we first have to spend a push on 📝 Part 2 of this format.
Hm, 4 functions for alpha-masked blitting and unblitting of both 16×16 and 32×32 .PTN sprites that align the X coordinate to a multiple of 8 (remember, the PC-98 uses a planar VRAM memory layout, where 8 pixels correspond to a byte), but only one function that supports unaligned blitting to any X coordinate, and only for 16×16 sprites? Which is only called twice? And doesn't come with a corresponding unblitting function? :thonk:

Yeah, "unblitting". TH01 isn't double-buffered, and uses the PC-98's second VRAM page exclusively to store a stage's background and static sprites. Since the PC-98 has no hardware sprites, all you can do is write pixels into VRAM, and any animated sprite needs to be manually removed from VRAM at the beginning of each frame. Not using double-buffering theoretically allows TH01 to simply copy back all 128 KB of VRAM once per frame to do this. :tannedcirno: But that would be pretty wasteful, so TH01 just looks at all animated sprites, and selectively copies only their occupied pixels from the second to the first VRAM page.


Alright, player shot class methods… oh, wait, the collision functions directly act on the Yin-Yang Orb, so we first have to spend a push on that one. And that's where the impression we got from the .PTN functions is confirmed: The orb is, in fact, only ever displayed at byte-aligned X coordinates, divisible by 8. It's only thanks to the constant spinning that its movement appears at least somewhat smooth.
This is purely a rendering issue; internally, its position is tracked at pixel precision. Sadly, smooth orb rendering at any unaligned X coordinate wouldn't be that trivial of a mod, because well, the necessary functions for unaligned blitting and unblitting of 32×32 sprites don't exist in TH01's code. Then again, there's so much potential for optimization in this code, so it might be very possible to squeeze those additional two functions into the same C++ translation unit, even without position independence…

More importantly though, this was the right time to decompile the core functions controlling the orb physics – probably the highlight in these three pushes for most people.
Well, "physics". The X velocity is restricted to the 5 discrete states of -8, -4, 0, 4, and 8, and gravity is applied by simply adding 1 to the Y velocity every 5 frames :zunpet: No wonder that this can easily lead to situations in which the orb infinitely bounces from the ground.
At least fangame authors now have a reference of how ZUN did it originally, because really, this bad approximation of physics had to have been written that way on purpose. But hey, it uses 64-bit floating-point variables! :onricdennat:

…sometimes at least, and quite randomly. This was also where I had to learn about Turbo C++'s floating-point code generation, and how rigorously it defines the order of instructions when mixing double and float variables in arithmetic or conditional expressions. This meant that I could only get ZUN's original instruction order by using literal constants instead of variables, which is impossible right now without somehow splitting the data segment. In the end, I had to resort to spelling out ⅔ of one function, and one conditional branch of another, in inline ASM. 😕 If ZUN had just written 16.0 instead of 16.0f there, I would have saved quite some hours of my life trying to decompile this correctly…

To sort of make up for the slowdown in progress, here's the TH01 orb physics debug mod I made to properly understand them: 2020-06-13-TH01OrbPhysicsDebug.zip To use it, simply replace REIIDEN.EXE, and run the game in debug mode, via game d on the DOS prompt.
Its code might also serve as an example of how to achieve this sort of thing without position independence.


Alright, now it's time for player shots though. Yeah, sure, they don't move horizontally, so it's not too bad that those are also always rendered at byte-aligned positions. But, uh… why does this code only use the 16×16 alpha-masked unblitting function for decaying shots, and just sloppily unblits an entire 16×16 square everywhere else?

The worst part though: Unblitting, moving, and rendering player shots is done in a single function, in that order. And that's exactly where TH01's sprite flickering comes from. Since different types of sprites are free to overlap each other, you'd have to first unblit all types, then move all types, and then render all types, as done in later PC-98 Touhou games. If you do these three steps per-type instead, you will unblit sprites of other types that have been rendered before… and therefore end up with flicker.
Oh, and finally, ZUN also added an additional sloppy 16×16 square unblit call if a shot collides with a pellet or a boss, for some guaranteed flicker. Sigh.


And that's ⅓ of all ZUN code in TH01 decompiled! Next up: Pellets!

📝 Posted:
🚚 Summary of:
P0086, P0087
Commits:
54ee99b...24b96cd, 24b96cd...97ce7b7
💰 Funded by:
[Anonymous], Blue Bolt, -Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay+ animation+ hud+ blitting- boss+ yuuka-6+

Alright, the score popup numbers shown when collecting items or defeating (mid)bosses. The second-to-last remaining big entity type in TH05… with quite some PI false positives in the memory range occupied by its data. Good thing I still got some outstanding generic RE pushes that haven't been claimed for anything more specific in over a month! These conveniently allowed me to RE most of these functions right away, the right way.

Most of the false positives were boss HP values, passed to a "boss phase end" function which sets the HP value at which the next phase should end. Stage 6 Yuuka, Mugetsu, and EX-Alice have their own copies of this function, in which they also reset certain boss-specific global variables. Since I always like to cover all varieties of such duplicated functions at once, it made sense to reverse-engineer all the involved variables while I was at it… and that's why this was exactly the right time to cover the implementation details of Stage 6 Yuuka's parasol and vanishing animations in TH04. :zunpet:

With still a bit of time left in that RE push afterwards, I could also start looking into some of the smaller functions that didn't quite fit into other pushes. The most notable one there was a simple function that aims from any point to the current player position. Which actually only became a separate function in TH05, probably since it's called 27 times in total. That's 27 places no longer being blocked from further RE progress.

WindowsTiger already did most of the work for the score popup numbers in January, which meant that I only had to review it and bring it up to ReC98's current coding styles and standards. This one turned out to be one of those rare features whose TH05 implementation is significantly less insane than the TH04 one. Both games lazily redraw only the tiles of the stage background that were drawn over in the previous frame, and try their best to minimize the amount of tiles to be redrawn in this way. For these popup numbers, this involves calculating the on-screen width, based on the exact number of digits in the point value. TH04 calculates this width every frame during the rendering function, and even resorts to setting that field through the digit iteration pointer via self-modifying code… yup. TH05, on the other hand, simply calculates the width once when spawning a new popup number, during the conversion of the point value to binary-coded decimal. The "×2" multiplier suffix being removed in TH05 certainly also helped in simplifying that feature in this game.

And that's ⅓ of TH05 reverse-engineered! Next up, one more TH05 PI push, in which the stage enemies hopefully finish all the big entity types. Maybe it will also be accompanied by another RE push? In any case, that will be the last piece of TH05 progress for quite some time. The next TH01 stretch will consist of 6 pushes at the very least, and I currently have no idea of how much time I can spend on ReC98 a month from now…

📝 Posted:
🚚 Summary of:
P0085
Commits:
110d6dd...54ee99b
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ blitting- animation+ waste+ good-code+

Wait, PI for FUUIN.EXE is mainly blocked by the high score menu? That one should really be properly decompiled in a separate RE push, since it's also present in largely identical form in REIIDEN.EXE… but I currently lack the explicit funding to do that.

And as it turns out, I shouldn't really capture any of the existing generic RE contributions for it either. Back in 2018 when I ran the crowdfunding on the Touhou Patch Center Discord server, I said that generic RE contributions would never go towards TH01. No one was interested in that game back then, and as it's significantly different from all the other games, it made sense to only cover it if explicitly requested.
As Touhou Patch Center still remains one of the biggest supporters and advertisers for ReC98, someone recently believed that this rule was still in effect, despite not being mentioned anywhere on this website.

Fast forward to today, and TH01 has become the single most supported game lately, with plenty of incomplete pushes still open to be completed. Reverse-engineering it has proven to be quite efficient, yielding lots of completion percentage points per push. This, I suppose, is exactly what backers that don't give any specific priorities are mainly interested in. Therefore, I will allocate future partial contributions to TH01, whenever it makes sense.

So, instead of rushing TH01 PI, let's wait for Ember2528's April subscription, and get the 25% total RE milestone with some TH05 PI progress instead. This one primarily focused on the gather circles (spirals…?), the third-last missing entity type in TH05. These are rendered using the same 8×8 pellet sprite introduced in TH02… except that the actual pellets received a darkened bottom part in TH04 . Which, in turn, is actually rendered quite efficiently – the games first render the top white part of all pellets, followed by the bottom gray part of all pellets. The PC-98 GRCG is used throughout the process, doing its typical job of accelerating monochrome blitting, and by arranging the rendering like this, only two GRCG color changes are required to draw any number of pellets. I guess that makes it quite a worthwhile optimization? Don't ask me for specific performance numbers or even saved cycles, though :onricdennat:

Next up, one more TH05 PI push!

📝 Posted:
🚚 Summary of:
P0081
Commits:
0252da2...5ac9b30
💰 Funded by:
Ember2528
🏷 Tags:
rec98+ th01+ file-format+ blitting- boss+ konngara+ waste+ contribution-ideas+

Sadly, we've already reached the end of fast triple-speed TH01 progress with 📝 the last push, which decompiled the last segment shared by all three of TH01's executables. There's still a bit of double-speed progress left though, with a small number of code segments that are shared between just two of the three executables.

At the end of the first one of these, we've got all the code for the .GRZ format – which is yet another run-length encoded image format, but this time storing up to 16 full 640×400 16-color images with an alpha bit. This one is exclusively used to wastefully store Konngara's sword slash and kuji-in kill animations. Due to… suboptimal code organization, the code for the format is also present in OP.EXE, despite not being used there. But hey, that brings TH01 to over 20% in RE!

Decoupling the RLE command stream from the pixel data sounds like a nice idea at first, allowing the format to efficiently encode a variety of animation frames displayed all over the screen… if ZUN actually made use of it. The RLE stream also has quite some ridiculous overhead, starting with 1 byte to store the 1-bit command (putting a single 8×1 pixel block, or entering a run of N such blocks). Run commands then store another 1-byte run length, which has to be followed by another command byte to identify the run as putting N blocks, or skipping N blocks. And the pixel data is just a sequence of these blocks for all 4 bitplanes, in uncompressed form…

Also, have some rips of all the images this format is used for:

To make these, I just wrote a small viewer, calling the same decompiled TH01 code: 2020-03-07-grzview.zip Obviously, this means that it not only must to be run on a PC-98, but also discards the alpha information. If any backers are really interested in having a proper converter to and from PNG, I can implement that in an upcoming push… although that would be the perfect thing for outside contributors to do.

Next up, we got some code for the PI format… oh, wait, the actual files are called "GRP" in TH01.

📝 Posted:
🚚 Summary of:
P0072, P0073, P0074, P0075
Commits:
4bb04ab...cea3ea6, cea3ea6...5286417, 5286417...1807906, 1807906...222fc99
💰 Funded by:
[Anonymous], -Tom-, Myles
🏷 Tags:
rec98+ th04+ th05+ gameplay+ blitting- bullet+ micro-optimization+ uth05win+

Long time no see! And this is exactly why I've been procrastinating bullets while there was still meaningful progress to be had in other parts of TH04 and TH05: There was bound to be quite some complexity in this most central piece of game logic, and so I couldn't possibly get to a satisfying understanding in just one push.

Or in two, because their rendering involves another bunch of micro-optimized functions adapted from master.lib.

Or in three, because we'd like to actually name all the bullet sprites, since there are a number of sprite ID-related conditional branches. And so, I was refining things I supposedly RE'd in the the commits from the first push until the very end of the fourth.

When we talk about "bullets" in TH04 and TH05, we mean just two things: the white 8×8 pellets, with a cap of 240 in TH04 and 180 in TH05, and any 16×16 sprites from MIKO16.BFT, with a cap of 200 in TH04 and 220 in TH05. These are by far the most common types of… err, "things the player can collide with", and so ZUN provides a whole bunch of pre-made motion, animation, and n-way spread / ring / stack group options for those, which can be selected by simply setting a few fields in the bullet template. All the other "non-bullets" have to be fired and controlled individually.

Which is nothing new, since uth05win covered this part pretty accurately – I don't think anyone could just make up these structure member overloads. The interesting insights here all come from applying this research to TH04, and figuring out its differences compared to TH05. The most notable one there is in the default groups: TH05 allows you to add a stack to any single bullet, n-way spread or ring, but TH04 only lets you create stacks separately from n-way spreads and rings, and thus gets by with fewer fields in its bullet template structure. On the other hand, TH04 has a separate "n-way spread with random angles, yet still aimed at the player" group? Which seems to be unused, at least as far as midbosses and bosses are concerned; can't say anything about stage enemies yet.

In fact, TH05's larger bullet template structure illustrates that these distinct group types actually are a rather redundant piece of over-engineering. You can perfectly indicate any permutation of the basic groups through just the stack bullet count (1 = no stack), spread bullet count (1 = no spread), and spread delta angle (0 = ring instead of spread). Add a 4-flag bitfield to cover the rest (aim to player, randomize angle, randomize speed, force single bullet regardless of difficulty or rank), and the result would be less redundant and even slightly more capable.

Even those 4 pushes didn't quite finish all of the bullet-related types, stopping just shy of the most trivial and consistent enum that defines special movement. This also left us in a 📝 TH03-like situation, in which we're still a bit away from actually converting all this research into actual RE%. Oh well, at least this got us way past 50% in overall position independence. On to the second half! 🎉

For the next push though, we'll first have a quick detour to the remaining C code of all the ZUN.COM binaries. Now that the 📝 TH04 and TH05 resident structures no longer block those, -Tom- has requested TH05's RES_KSO.COM to be covered in one of his outstanding pushes. And since 32th System recently RE'd TH03's resident structure, it makes sense to also review and merge that, before decompiling all three remaining RES_*.COM binaries in hopefully a single push. It might even get done faster than that, in which case I'll then review and merge some more of WindowsTiger's research.

📝 Posted:
🚚 Summary of:
P0067, P0068, P0069
Commits:
e55a48b...ebb30ce, ebb30ce...2ac00d4, e0d0dcd...0f18dbc
💰 Funded by:
Splashman, Yanga, [Anonymous]
🏷 Tags:
rec98+ th01+ pc98+ blitting- glitch+ boss+ mima-th01+ yuugenmagan+

Now that's more like the speed I was expecting! After a few more unused functions for palette fading and rectangle blitting, we've reached the big line drawing functions. And the biggest one among them, drawing a straight line at any angle between two points using Bresenham's algorithm, actually happens to be the single longest function present in more than one binary in all of PC-98 Touhou, and #23 on the list of individual longest functions.

And it technically has a ZUN bug! If you pass a point outside the (0, 0) - (639, 399) screen range, the function will calculate a new point at the edge of the screen, so that the resulting line will retain the angle intended by the points given. Except that it does so by calculating the line slope using an integer division rather than a floating-point one :zunpet: Doesn't seem like it actually causes any weirdly skewed lines to be drawn in-game, though; that case is only hit in the Mima boss fight, which draws a few lines with a bottom coordinate of 400 rather than the maximum of 399. It might also cause the wrong background pixels to be restored during parts of the YuugenMagan fight, leading to flickering sprites, but seriously, that's an issue everywhere you look in this game.

Together with the rendering-text-to-VRAM function we've mostly already known from TH02, this pushed the total RE percentage well over 20%, and almost doubled the TH01 RE percentage, all within three pushes. And comparatively, it went really smoothly, to the point (ha) where I even had enough time left to also include the single-point functions that come next in that code segment. Since about half of the remaining functions in OP.EXE are present in more than just itself, I'll be able to at least keep up this speed until OP.EXE hits the 70% RE mark. That is, as long as the backers' priorities continue to be generic RE or "giving some love to TH01"… we don't have a precedent for TH01's actual game code yet.

And that's all the TH01 progress funded for January! Next up, we actually do have a focus on TH03's game and scoring mechanics… or at least the foundation for that.

📝 Posted:
🚚 Summary of:
P0060
Commits:
29385dd...73f5ae7
💰 Funded by:
Touhou Patch Center
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ blitting- micro-optimization+ waste+

So, where to start? Well, TH04 bullets are hard, so let's procrastinate start with TH03 instead :tannedcirno: The 📝 sprite display functions are the obvious blocker for any structure describing a sprite, and therefore most meaningful PI gains in that game… and I actually did manage to fit a decompilation of those three functions into exactly the amount of time that the Touhou Patch Center community votes alloted to TH03 reverse-engineering!

And a pretty amazing one at that. The original code was so obviously written in ASM and was just barely decompilable by exclusively using register pseudovariables and a bit of goto, but I was able to abstract most of that away, not least thanks to a few helpful optimization properties of Turbo C++… seriously, I can't stop marveling at this ancient compiler. The end result is both readable, clear, and dare I say portable?! To anyone interested in porting TH03, take a look. How painful would it be to port that away from 16-bit x86?

However, this push is also a typical example that the RE/PI priorities can only control what I look at, and the outcome can actually differ greatly. Even though the priorities were 65% RE and 35% PI, the progress outcome was +0.13% RE and +1.35% PI. But hey, we've got one more push with a focus on TH03 PI, so maybe that one will include more RE than PI, and then everything will end up just as ordered? :onricdennat:

📝 Posted:
🚚 Summary of:
P0029, P0030
Commits:
6ff427a...c7fc4ca, c7fc4ca...dea40ad
💰 Funded by:
zorg
🏷 Tags:
rec98+ th02+ th04+ th05+ blitting- gameplay+ midboss+ boss+ tcc+

Here we go, new C code! …eh, it will still take a bit to really get decompilation going at the speeds I was hoping for. Especially with the sheer amount of stuff that is set in the first few significant functions we actually can decompile, which now all has to be correctly declared in the C world. Turns out I spent the last 2 years screwing up the case of exported functions, and even some of their names, so that it didn't actually reflect their calling convention… yup. That's just the stuff you tend to forget while it doesn't matter.

To make up for that, I decided to research whether we can make use of some C++ features to improve code readability after all. Previously, it seemed that TH01 was the only game that included any C++ code, whereas TH02 and later seemed to be 100% C and ASM. However, during the development of the soon to be released new build system, I noticed that even this old compiler from the mid-90's, infamous for prioritizing compile speeds over all but the most trivial optimizations, was capable of quite surprising levels of automatic inlining with class methods…

…leading the research to culminate in the mindblow that is 9d121c7 – yes, we can use C++ class methods and operator overloading to make the code more readable, while still generating the same code than if we had just used C and preprocessor macros.

Looks like there's now the potential for a few pull requests from outside devs that apply C++ features to improve the legibility of previously decompiled and terribly macro-ridden code. So, if anyone wants to help without spending money…

📝 Posted:
🚚 Summary of:
P0043, P0044, P0045
Commits:
261d503...612beb8
💰 Funded by:
-Tom-
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ pc98+ blitting- uth05win+

Turns out I had only been about half done with the drawing routines. The rest was all related to redrawing the scrolling stage backgrounds after other sprites were drawn on top. Since the PC-98 does have hardware-accelerated scrolling, but no hardware-accelerated sprites, everything that draws animated sprites into a scrolling VRAM must then also make sure that the background tiles covered by the sprite are redrawn in the next frame, which required a bit of ZUN code. And that are the functions that have been in the way of the expected rapid reverse-engineering progress that uth05win was supposed to bring. So, looks like everything's going to go really fast now?

📝 Posted:
🚚 Summary of:
P0025, P0026, P0027
Commits:
0cde4b7...261d503
💰 Funded by:
zorg
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ pc98+ blitting- file-format+ mod+

… yeah, no, we won't get very far without figuring out these drawing routines.
Which process data that comes from the .STD files. Which has various arrays related to the background… including one to specify the scrolling speed. And wait, setting that to 0 actually is what starts a boss battle?

So, have a TH05 Boss Rush patch: 2018-12-26-TH05BossRush.zip Theoretically, this should have also worked for TH04, but for some reason, the Stage 3 boss gets stuck on the first phase if we do this?

Here's the diff for the Boss Rush. Turning it into a thcrap-style Skipgame patch is left as an exercise for the reader.