⮜ Blog

⮜ List of tags

Showing all posts tagged
,
and

📝 Posted:
🚚 Summary of:
P0182, P0183
Commits:
313450f...1e2c7ad, 1e2c7ad...f9d983e
💰 Funded by:
Lmocinemod, [Anonymous], Yanga
🏷 Tags:

Been 📝 a while since we last looked at any of TH03's game code! But before that, we need to talk about Y coordinates.

During TH03's MAIN.EXE, the PC-98 graphics GDC runs in its line-doubled 640×200 resolution, which gives the in-game portion its distinctive stretched low-res look. This lower resolution is a consequence of using 📝 Promisence Soft's SPRITE16 driver: Its performance simply stems from the fact that it expects sprites to be stored in the bottom half of VRAM, which allows them to be blitted using the same EGC-accelerated VRAM-to-VRAM copies we've seen again and again in all other games. Reducing the visible resolution also means that the sprites can be stored on both VRAM pages, allowing the game to still be double-buffered. If you force the graphics chip to run at 640×400, you can see them:

The full VRAM contents during TH03's in-game portion, as seen when forcing the system into a 640×400 resolution.
TH03's VRAM at regular line-doubled 640×200 resolutionTH03's VRAM at full 640×400 resolution, including the SPRITE16 sprite areaTH03's text layer during an in-game round.

Note that the text chip still displays its overlaid contents at 640×400, which means that TH03's in-game portion technically runs at two resolutions at the same time.

But that means that any mention of a Y coordinate is ambiguous: Does it refer to undoubled VRAM pixels, or on-screen stretched pixels? Especially people who have known about the line doubling for years might almost expect technical blog posts on this game to use undoubled VRAM coordinates. So, let's introduce a new formatting convention for both on-screen 640×400 and undoubled 640×200 coordinates, and always write out both to minimize the confusion.


Alright, now what's the thing gonna be? The enemy structure is highly overloaded, being used for enemies, fireballs, and explosions with seemingly different semantics for each. Maybe a bit too much to be figured out in what should ideally be a single push, especially with all the functions that would need to be decompiled? Bullet code would be easier, but not exactly single-push material either. As it turns out though, there's something more fundamental left to be done first, which both of these subsystems depend on: collision detection!

And it's implemented exactly how I always naively imagined collision detection to be implemented in a fixed-resolution 2D bullet hell game with small hitboxes: By keeping a separate 1bpp bitmap of both playfields in memory, drawing in the collidable regions of all entities on every frame, and then checking whether any pixels at the current location of the player's hitbox are set to 1. It's probably not done in the other games because their single data segment was already too packed for the necessary 17,664 bytes to store such a bitmap at pixel resolution, and 282,624 bytes for a bitmap at Q12.4 subpixel resolution would have been prohibitively expensive in 16-bit Real Mode DOS anyway. In TH03, on the other hand, this bitmap is doubly useful, as the AI also uses it to elegantly learn what's on the playfield. By halving the resolution and only tracking tiles of 2×2 / 2×1 pixels, TH03 only requires an adequate total of 6,624 bytes of memory for the collision bitmaps of both playfields.

So how did the implementation not earn the good-code tag this time? Because the code for drawing into these bitmaps is undecompilable hand-written x86 assembly. :zunpet: And not just your usual ASM that was basically compiled from C and then edited to maybe optimize register allocation and maybe replace a bunch of local variables with self-modifying code, oh no. This code is full of overly clever bit twiddling, abusing the fact that the 16-bit AX, BX, CX, and DX registers can also be accessed as two 8-bit registers, calculations that change the semantic meaning behind the value of a register, or just straight-up reassignments of different values to the same small set of registers. Sure, in some way it is impressive, and it all does work and correctly covers every edge case, but come on. This could have all been a lot more readable in exchange for just a few CPU cycles.

What's most interesting though are the actual shapes that these functions draw into the collision bitmap. On the surface, we have:

  1. vertical slopes at any angle across the whole playfield; exclusively used for Chiyuri's diagonal laser EX attack
  2. straight vertical lines, with a width of 1 tile; exclusively used for the 2×2 / 2×1 hitboxes of bullets
  3. rectangles at arbitrary sizes

But only 2) actually draws a full solid line. 1) and 3) are only ever drawn as horizontal stripes, with a hardcoded distance of 2 vertical tiles between every stripe of a slope, and 4 vertical tiles between every stripe of a rectangle. That's 66-75% of each rectangular entity's intended hitbox not actually taking part in collision detection. Now, if player hitboxes were ≤ 6 / 3 pixels, we'd have one possible explanation of how the AI can "cheat", because it could just precisely move through those blank regions at TAS speeds. So, let's make this two pushes after all and tell the complete story, since this is one of the more interesting aspects to still be documented in this game.


And the code only gets worse. :godzun: While the player collision detection function is decompilable, it might as well not have been, because it's just more of the same "optimized", hard-to-follow assembly. With the four splittable 16-bit registers having a total of 20 different meanings in this function, I would have almost preferred self-modifying code…

In fact, it was so bad that it prompted some maintenance work on my inline assembly coding standards as a whole. Turns out that the _asm keyword is not only still supported in modern Visual Studio compilers, but also in Clang with the -fms-extensions flag, and compiles fine there even for 64-bit targets. While that might sound like amazing news at first ("awesome, no need to rewrite this stuff for my x86_64 Linux port!"), you quickly realize that almost all inline assembly in this codebase assumes either PC-98 hardware, segmented 16-bit memory addressing, or is a temporary hack that will be removed with further RE progress.
That's mainly because most of the raw arithmetic code uses Turbo C++'s register pseudovariables where possible. While they certainly have their drawbacks, being a non-standard extension that's not supported in other x86-targeting C compilers, their advantages are quite significant: They allow this code to stay in the same language, and provide slightly more immediate portability to any other architecture, together with 📝 readability and maintainability improvements that can get quite significant when combined with inlining:

// This one line compiles to five ASM instructions, which would need to be
// spelled out in any C compiler that doesn't support register pseudovariables.
// By adding typed aliases for these registers via `#define`, this code can be
// both made even more readable, and be prepared for an easier transformation
// into more portable local variables.
_ES = (((_AX * 4) + _BX) + SEG_PLANE_B);

However, register pseudovariables might cause potential portability issues as soon as they are mixed with inline assembly instructions that rely on their state. The lazy way of "supporting pseudo-registers" in other compilers would involve declaring the full set as global variables, which would immediately break every one of those instances:

_DI = 0;
_AX = 0xFFFF;

// Special x86 instruction doing the equivalent of
//
// 	*reinterpret_cast(MK_FP(_ES, _DI)) = _AX;
// 	_DI += sizeof(uint16_t);
//
// Only generated by Turbo C++ in very specific cases, and therefore only
// reliably available through inline assembly.
asm { movsw; }

What's also not all too standardized, though, are certain variants of the asm keyword. That's why I've now introduced a distinction between the _asm keyword for "decently sane" inline assembly, and the slightly less standard asm keyword for inline assembly that relies on the contents of pseudo-registers, and should break on compilers that don't support them.
So yeah, have some minor portability work in exchange for these two pushes not having all that much in RE'd content.

With that out of the way and the function deciphered, we can confirm the player hitboxes to be a constant 8×8 / 8×4 pixels, and prove that the hit stripes are nothing but an adequate optimization that doesn't affect gameplay in any way.


And what's the obvious thing to immediately do if you have both the collision bitmap and the player hitbox? Writing a "real hitbox" mod, of course:

  1. Reorder the calls to rendering functions so that player and shot sprites are rendered after bullets
  2. Blank out all player sprite pixels outside an 8×8 / 8×4 box around the center point
  3. After the bullet rendering function, turn on the GRCG in RMW mode and set the tile register set to the background color
  4. Stretch the negated contents of collision bitmap onto each playfield, leaving only collidable pixels untouched
  5. Do the same with the actual, non-negated contents and a white color, for extra contrast against the background. This also makes sure to show any collidable areas whose sprite pixels are transparent, such as with the moon enemy. (Yeah, how unfair.) Doing that also loses a lot of information about the playfield, such as enemy HP indicated by their color, but what can you do:
A decently busy TH03 in-game frame.The underlying content of the collision bitmap, showing off all three different shapes together with the player hitboxes.
A decently busy TH03 in-game frame and its underlying collision bitmap, showing off all three different collision shapes together with the player hitboxes.

2022-02-18-TH03-real-hitbox.zip The secret for writing such mods before having reached a sufficient level of position independence? Put your new code segment into DGROUP, past the end of the uninitialized data section. That's why this modded MAIN.EXE is a lot larger than you would expect from the raw amount of new code: The file now actually needs to store all these uninitialized 0 bytes between the end of the data segment and the first instruction of the mod code – normally, this number is simply a part of the MZ EXE header, and doesn't need to be redundantly stored on disk. Check the th03_real_hitbox branch for the code.

And now we know why so many "real hitbox" mods for the Windows Touhou games are inaccurate: The games would simply be unplayable otherwise – or can you dodge rapidly moving 2×2 / 2×1 blocks as an 8×8 / 8×4 rectangle that is smaller than your shot sprites, especially without focused movement? I can't. :tannedcirno: Maybe it will feel more playable after making explosions visible, but that would need more RE groundwork first.
It's also interesting how adding two full GRCG-accelerated redraws of both playfields per frame doesn't significantly drop the game's frame rate – so why did the drawing functions have to be micro-optimized again? It would be possible in one pass by using the GRCG's TDW mode, which should theoretically be 8× faster, but I have to stop somewhere. :onricdennat:

Next up: The final missing piece of TH04's and TH05's bullet-moving code, which will include a certain other type of projectile as well.

📝 Posted:
🚚 Summary of:
P0071
Commits:
327021b...4bb04ab
💰 Funded by:
KirbyComment, -Tom-
🏷 Tags:

Turns out that covering TH03's 128-byte player structure was way more insightful than expected! And while it doesn't include every bit of per-player data, we still got to know quite a bit about the game from just trying to name its members:

The next TH03 pushes can now cover all the functions that reference this structure in one way or another, and actually commit all this research and translate it into some RE%. Since the non-TH05 priorities have become a bit unclear after the last 50 € RE contribution though (as of this writing, it's still 10 € to decide on what game to cover in two RE pushes!), I'll be returning to TH05 until that's decided.

📝 Posted:
🚚 Summary of:
P0061
Commits:
96684f4...5f4f5d8
💰 Funded by:
Touhou Patch Center
🏷 Tags:

… nope, with a game whose MAIN.EXE is still just 5% reverse-engineered and which naturally makes heavy use of structures, there's still a lot more PI groundwork to be done before RE progress can speed up to the levels that we've now reached with TH05. The good news is that this game is (now) way easier to understand: In contrast to TH04 and TH05, where we needed to work towards player shots over a two-digit number of pushes, TH03 only needed two for SPRITE16, and a half one for the playfield shaking mechanism. After that, I could even already decompile the per-frame shot update and render functions, thanks to TH03's high number of code segments. Now, even the big 128-byte player structure doesn't seem all too far off.

Then again, as TH03 shares no code with any other game, this actually was a completely average PI push. For the remaining three, we'll return to TH04 and TH05 though, which should more than make up for the slight drop in RE speed after this one.

In other news, we've now also reached peak C++, with the introduction of templates! TH03 stores movement speeds in a 4.4 fixed-point format, which is an 8-bit spin on the usual 16-bit, 12.4 fixed-point format.