⮜ Blog

⮜ List of tags

Showing all posts tagged
,
and

📝 Posted:
🚚 Summary of:
P0203, P0204
Commits:
4568bf7...86cdf5f, 86cdf5f...0c682b5
💰 Funded by:
GhostRiderCog, [Anonymous], Yanga
🏷 Tags:

Let's start right with the milestones:


So, how did this card-flipping stage obstacle delivery get so horribly delayed? With all the different layouts showcased in the 28 card-flipping stages, you'd expect this to be among the more stable and bug-free parts of the codebase. Heck, with all stage objects being placed on a 32×32-pixel grid, this is the first TH01-related blog post this year that doesn't have to describe an alignment-related unblitting glitch!

That alone doesn't mean that this code is free from quirky behavior though, and we have to look no further than the first few lines of the collision handling for round bumpers to already find a whole lot of that. Simplified, they do the following:

pixel_t delta_y_between_orb_and_bumper = (orb.top - bumper.top);
if(delta_y_between_orb_and_bumper <= 0) {
	orb.top = (bumper.top - 24);
} else {
	orb.top = (bumper.top + 24);
}

Immediately, you wonder why these assignments only exist for the Y coordinate. Sure, hitting a bumper from the left or right side should happen less often, but it's definitely possible. Is it really a good idea to warp the Orb to the top or bottom edge of a bumper regardless?
What's more important though: The fact that these immediate assignments exist at all. The game's regular Orb physics work by producing a Y velocity from the single force acting on the Orb and a gravity factor, and are completely independent of its current Y position. A bumper collision does also apply a new force onto the Orb further down in the code, but these assignments still bypass the physics system and are bound to have some knock-on effect on the Orb's movement.

To observe that effect, we just have to enter Stage 18 on the 地獄/Jigoku route, where it's particularly trivial to reproduce. At a 📝 horizontal velocity of ±4, these assignments are exactly what can cause the Orb to endlessly bounce between two bumpers. As rudimentary as the Orb's physics may be, just letting them do their work would have entirely prevented these loops:

One of at least three infinite bumper loop constellations within just this 10×5-tile section of TH01's Stage 18 on the 地獄/Jigoku route. With an effective 56 horizontal pixels between both hitboxes, the Orb would have to travel an absolute Y distance of at least 16 vertical pixels within (56 / 4) = 14 frames to escape the other bumper's hitbox. If the initial bounce reduces the Orb's Y velocity far enough for it to not manage that distance the first time, it will never reach the necessary speed again. In this loop, the bounce-off force even stabilizes, though this doesn't have to happen. The blue areas indicate the pixel-perfect* hitboxes of each bumper.
TH01 bumper collision handling without ZUN's manual assignment of the Y coordinate. The Orb still bounces back and forth between two bumpers for a while, but its top position always follows naturally from its Y velocity and the force applied to it, and gravity wins out in the end. The blue areas indicate the pixel-perfect* hitboxes of each bumper.

Now, you might be thinking that these Y assignments were just an attempt to prevent the Orb from colliding with the same bumper again on the next frame. After all, those 24 pixels exactly correspond to ⅓ of the height of a bumper's hitbox with an additional pixel added on top. However, the game already perfectly prevents repeated collisions by turning off collision testing with the same bumper for the next 7 frames after a collision. Thus, we can conclude that ZUN either explicitly coded bumper collision handling to facilitate these loops, or just didn't take out that code after inevitably discovering what it did. This is not janky code, it's not a glitch, it's not sarcasm from my end, and it's not the game's physics being bad.

But wait. Couldn't these assignments just be a remnant from a time in development before ZUN decided on the 7-frame delay on further collisions? Well, even that explanation stops holding water after the next few lines of code. Simplified, again:

pixel_t delta_x_between_orb_and_bumper = (orb.left - bumper.left);
if((orb.velocity.x == +4) && (delta_x_between_orb_and_bumper < 0)) {
	orb.velocity.x = -4;
} else if((orb.velocity.x == -4) && (delta_x_between_orb_and_bumper > 0)) {
	orb.velocity.x = +4;
}

What's important here is the part that's not in the code – namely, anything that handles X velocities of -8 or +8. In those cases, the Orb simply continues in the same horizontal direction. The manual Y assignment is the only part of the code that actually prevents a collision there, as the newly applied force is not guaranteed to be enough:

An infinite loop across three bumpers, made possible by the edge of the playfield and bumper bars on opposite sides, an unchanged horizontal direction, and the Y assignments neatly placing the Orb on either the top or bottom side of a bumper. The alternating sign of the force further ensures that the Orb will travel upwards half the time, canceling out gravity during the short time between two hitboxes.
With the unchanged horizontal direction and the Y assignments removed, nothing keeps an Orb at ±8 pixels per frame from flying into/over a bumper. The collision force pushes the Orb slightly, but not enough to truly matter. The final force sends the Orb on a significant downward trajectory beyond the next bumper's hitbox, breaking the original loop.

Forgetting to handle ⅖ of your discrete X velocity cases is simply not something you do by accident. So we might as well say that ZUN deliberately designed the game to behave exactly as it does in this regard.


Bumpers also come in vertical or horizontal bar shapes. Their collision handling also turns off further collision testing for the next 7 frames, and doesn't do any manual coordinate assignment. That's definitely a step up in cleanliness from round bumpers, but it doesn't seem to keep in mind that the player can fire a new shot every 4 frames when standing still. That makes it immediately obvious why this works:

The green numbers show the amount of frames since the last detected collision with the respective bumper bar, and indicate that collision testing with the bar below is currently disabled.

That's the most well-known case of reducing the Orb's horizontal velocity to 0 by exactly hitting it with shots in its center and then button-mashing it through a horizontal bar. This also works with vertical bars and yields even more interesting results there, but if we want to have any chance of understanding what happens there, we have to first go over some basics:

However, if that were everything the game did, kicking the Orb into a column of vertical bumper bars would lead them to behave more like a rope that the Orb can climb, as the initial collision with two hitboxes cancels out the intended sign change that reflects the Orb away from the bars:

This footage was recorded without the workaround I am about to describe. It does not reflect the behavior of the original game. You cannot do this in the original game.
While the visualization reveals small sections where three hitboxes overlap, the Orb can never actually collide with three of them at the same time, as those 3-hitbox regions are 2 pixels smaller than they would need to be to fit the Orb. That's exactly the difference between using < rather than <= in these hitbox comparisons.

While that would have been a fun gameplay mechanic on its own, it immediately breaks apart once you place two vertical bumper bars next to each other. Due to how these bumper bar hitboxes extend past their sprites, any two adjacent vertical bars will end up with the exact same hitbox in absolute screen coordinates. Stage 17 on the 魔界/Makai route contains exactly such a layout:

The collision handlers of adjacent vertical bars always activate in the same frame, independently invert the Orb's X velocity, and therefore fully cancel out their intended effect on the Orb… if the game did not have the workaround I am about to describe. This cannot happen in the original game.

ZUN's workaround: Setting a "vertical bumper bar block flag" after any collision with such a bar, which simply disables any collision with any vertical bar for the next 7 frames. This quick hack made all vertical bars work as intended, and avoided the need for involving the Orb's X velocity in any kind of physics system. :zunpet:


Edit (2022-07-12): This flag only works around glitches that would be caused by simultaneously colliding with more than one vertical bar. The actual response to a bumper bar collision still remains unaffected, and is very naive:

These conditions are only correct if the Orb comes in at an angle roughly between 45° and 135° on either side of a bar. If it's anywhere close to 0° or 180°, this response will be incorrect, and send the Orb straight through the bar. Since the large hitboxes make this easily possible, you can still get the Orb to climb a vertical column, or glide along a horizontal row:

Here's the hitbox overlay for 地獄/Jigoku Stage 19, and here's an updated version of the 📝 Orb physics debug mod that now also shows bumper bar collision frame numbers: 2022-07-10-TH01OrbPhysicsDebug.zip See the th01_orb_debug branch for the code. To use it, simply replace REIIDEN.EXE, and run the game in debug mode, via game d on the DOS prompt. If you encounter a gameplay situation that doesn't seem to be covered by this blog post, you can now verify it for yourself. Thanks to touhou-memories for bringing these issues to my attention! That definitely was a glaring omission from the initial version of this blog post.


With that clarified, we can now try mashing the Orb into these two vertical bars:

At first, that workaround doesn't seem to make a difference here. As we expect, the frame numbers now tell us that only one of the two bumper bars in a row activates, but we couldn't have told otherwise as the number of bars has no effect on newly applied Y velocity forces. On a closer look, the Orb's rise to the top of the playfield is in fact caused by that workaround though, combined with the unchanged top-to-bottom order of collision testing. As soon as any bumper bar completed its 7 collision delay frames, it resets the aforementioned flag, which already reactivates collision handling for any remaining vertical bumper bars during the same frame. Look out for frames with both a 7 and a 1, like the one marked in the video above: The 7 will always appear before the 1 in the row-major order. Whenever this happens, the current oscillation period is cut down from 7 to 6 frames – and because collision testing runs from top to bottom, this will always happen during the falling part. Depending on the Y velocity, the rising part may also be cut down to 6 frames from time to time, but that one at least has a chance to last for the full 7 frames. This difference adds those crucial extra frames of upward movement, which add up to send the Orb to the top. Without the flag, you'd always see the Orb oscillating between a fixed range of the bar column.
Finally, it's the "top of playfield" force that gradually slows down the Orb and makes sure it ultimately only moves at sub-pixel velocities, which have no visible effect. Because 📝 the regular effect of gravity is reset with each newly applied force, it's completely negated during most of the climb. This even holds true once the Orb reached the top: Since the Orb requires a negative force to repeatedly arrive up there and be bounced back, this force will stay active for the first 5 of the 7 collision frames and not move the Orb at all. Once gravity kicks in at the 5th frame and adds 1 to the Y velocity, it's already too late: The new velocity can't be larger than 0.5, and the Orb only has 1 or 2 frames before the flag reset causes it to be bounced back up to the top again.


Portals, on the other hand, turn out to be much simpler than the old description that ended up on Touhou Wiki in October 2005 might suggest. Everything about their teleportations is random: The destination portal, the exit force (as an integer between -9 and +9), as well as the exit X velocity, with each of the 📝 5 distinct horizontal velocities having an equal chance of being chosen. Of course, if the destination portal is next to the left or right edge of the playfield and it chooses to fire the Orb towards that edge, it immediately bounces off into the opposite direction, whereas the 0 velocity is always selected with a constant 20% probability.

The selection process for the destination portal involves a bit more than a single rand() call. The game bundles all obstacles in a single structure of dynamically allocated arrays, and only knows how many obstacles there are in total, not per type. Now, that alone wouldn't have much of an impact on random portal selection, as you could simply roll a random obstacle ID and try again if it's not a portal. But just to be extra cute, ZUN instead iterates over all obstacles, selects any non-entered portal with a chance of ¼, and just gives up if that dice roll wasn't successful after 16 loops over the whole array, defaulting to the entered portal in that case.
In all its silliness though, this works perfectly fine, and results in a chance of 0.7516(𝑛 - 1) for the Orb exiting out of the same portal it entered, with 𝑛 being the total number of portals in a stage. That's 1% for two portals, and 0.01% for three. Pretty decent for a random result you don't want to happen, but that hurts nobody if it does.

The one tiny ZUN bug with portals is technically not even part of the newly decompiled code here. If Reimu gets hit while the Orb is being sent through a portal, the Orb is immediately kicked out of the portal it entered, no matter whether it already shows up inside the sprite of the destination portal. Neither of the two portal sprites is reset when this happens, leading to "two Orbs" being visible simultaneously. :tannedcirno::onricdennat:
This makes very little sense no matter how you look at it. The Orb doesn't receive a new velocity or force when this happens, so it will simply re-enter the same portal once the gameplay resumes on Reimu's next life:

And that's it! At least the turrets don't have anything notable to say about them 📝 that I haven't said before.


That left another ½ of a push over at the end. Way too much time to finish FUUIN.exe, way too little time to start with Mima… but the bomb animation fit perfectly in there. No secrets or bugs there, just a bunch of sprite animation code wasting at least another 82 bytes in the data segment. The special effect after the kuji-in sprites uses the same single-bitplane 32×32 square inversion effect seen at the end of Kikuri's and Sariel's entrance animation, except that it's a 3-stack of 16-rings moving at 6, 7, and 8 pixels per frame respectively. At these comparatively slow speeds, the byte alignment of each square adds some further noise to the discoloration pattern… if you even notice it below all the shaking and seizure-inducing hardware palette manipulation.
And yes, due to the very destructive nature of the effect, the game does in fact rely on it only being applied to VRAM page 0. While that will cause every moving sprite to tear holes into the inverted squares along its trajectory, keeping a clean playfield on VRAM page 1 is what allows all that pixel damage to be easily undone at the end of this 89-frame animation.

Next up: Mima! Let's hope that stage obstacles already were the most complex part remaining in TH01…

📝 Posted:
🚚 Summary of:
P0137
Commits:
07bfcf2...8d953dc
💰 Funded by:
[Anonymous]
🏷 Tags:

Whoops, the build was broken again? Since P0127 from mid-November 2020, on TASM32 version 5.3, which also happens to be the one in the DevKit… That version changed the alignment for the default segments of certain memory models when requesting .386 support. And since redefining segment alignment apparently is highly illegal and absolutely has to be a build error, some of the stand-alone .ASM translation units didn't assemble anymore on this version. I've only spotted this on my own because I casually compiled ReC98 somewhere else – on my development system, I happened to have TASM32 version 5.0 in the PATH during all this time.
At least this was a good occasion to get rid of some weird segment alignment workarounds from 2015, and replace them with the superior convention of using the USE16 modifier for the .MODEL directive.

ReC98 would highly benefit from a build server – both in order to immediately spot issues like this one, and as a service for modders. Even more so than the usual open-source project of its size, I would say. But that might be exactly because it doesn't seem like something you can trivially outsource to one of the big CI providers for open-source projects, and quickly set it up with a few lines of YAML.
That might still work in the beginning, and we might get by with a regular 64-bit Windows 10 and DOSBox running the exact build tools from the DevKit. Ideally, though, such a server should really run the optimal configuration of a 32-bit Windows 10, allowing both the 32-bit and the 16-bit build step to run natively, which already is something that no popular CI service out there offers. Then, we'd optimally expand to Linux, every other Windows version down to 95, emulated PC-98 systems, other TASM versions… yeah, it'd be a lot. An experimental project all on its own, with additional hosting costs and probably diminishing returns, the more it expands…
I've added it as a category to the order form, let's see how much interest there is once the store reopens (which will be at the beginning of May, at the latest). That aside, it would 📝 also be a great project for outside contributors!


So, technical debt, part 8… and right away, we're faced with TH03's low-level input function, which 📝 once 📝 again 📝 insists on being word-aligned in a way we can't fake without duplicating translation units. Being undecompilable isn't exactly the best property for a function that has been interesting to modders in the past: In 2018, spaztron64 created an ASM-level mod that hardcoded more ergonomic key bindings for human-vs-human multiplayer mode: 2021-04-04-TH03-WASD-2player.zip However, this remapping attempt remained quite limited, since we hadn't (and still haven't) reached full position independence for TH03 yet. There's quite some potential for size optimizations in this function, which would allow more BIOS key groups to already be used right now, but it's not all that obvious to modders who aren't intimately familiar with x86 ASM. Therefore, I really wouldn't want to keep such a long and important function in ASM if we don't absolutely have to…

… and apparently, that's all the motivation I needed? So I took the risk, and spent the first half of this push on reverse-engineering TCC.EXE, to hopefully find a way to get word-aligned code segments out of Turbo C++ after all.

And there is! The -WX option, used for creating DPMI applications, messes up all sorts of code generation aspects in weird ways, but does in fact mark the code segment as word-aligned. We can consider ourselves quite lucky that we get to use Turbo C++ 4.0, because this feature isn't available in any previous version of Borland's C++ compilers.
That allowed us to restore all the decompilations I previously threw away… well, two of the three, that lookup table generator was too much of a mess in C. :tannedcirno: But what an abuse this is. The subtly different code generation has basically required one creative workaround per usage of -WX. For example, enabling that option causes the regular PUSH BP and POP BP prolog and epilog instructions to be wrapped with INC BP and DEC BP, for some reason:

a_function_compiled_with_wx proc
	inc 	bp    	; ???
	push	bp
	mov 	bp, sp
	    	      	; [… function code …]
	pop 	bp
	dec 	bp    	; ???
	ret
a_function_compiled_with_wx endp

Luckily again, all the functions that currently require -WX don't set up a stack frame and don't take any parameters.
While this hasn't directly been an issue so far, it's been pretty close: snd_se_reset(void) is one of the functions that require word alignment. Previously, it shared a translation unit with the immediately following snd_se_play(int new_se), which does take a parameter, and therefore would have had its prolog and epilog code messed up by -WX. Since the latter function has a consistent (and thus, fakeable) alignment, I simply split that code segment into two, with a new -WX translation unit for just snd_se_reset(void). Problem solved – after all, two C++ translation units are still better than one ASM translation unit. :onricdennat: Especially with all the previous #include improvements.

The rest was more of the usual, getting us 74% done with repaying the technical debt in the SHARED segment. A lot of the remaining 26% is TH04 needing to catch up with TH03 and TH05, which takes comparatively little time. With some good luck, we might get this done within the next push… that is, if we aren't confronted with all too many more disgusting decompilations, like the two functions that ended this push. If we are, we might be needing 10 pushes to complete this after all, but that piece of research was definitely worth the delay. Next up: One more of these.