⮜ Blog

⮜ List of tags

Showing all posts tagged
and

📝 Posted:
🚚 Summary of:
P0238, P0239
Commits:
(Website) 4698397...edf2926, c5e51e6...P0239
💰 Funded by:
Ember2528
🏷 Tags:

:stripe: Stripe is now properly integrated into this website as an alternative to PayPal! Now, you can also financially support the project if PayPal doesn't work for you, or if you prefer using a provider out of Stripe's greater variety. It's unfortunate that I had to ship this integration while the store is still sold out, but the Shuusou Gyoku OpenGL backend has turned out way too complicated to be finished next to these two pushes within a month. It will take quite a while until the store reopens and you all can start using Stripe, so I'll just link back to this blog post when it happens.

Integrating Stripe wasn't the simplest task in the world either. At first, the Checkout API seems pretty friendly to developers: The entire payment flow is handled on the backend, in the server language of your choice, and requires no frontend JavaScript except for the UI feedback code you choose to write. Your backend API endpoint initiates the Stripe Checkout session, answers with a redirect to Stripe, and Stripe then sends a redirect back to your server if the customer completed the payment. Superficially, this server-based approach seems much more GDPR-friendly than PayPal, because there are no remote scripts to obtain consent for. In reality though, Stripe shares much more potential personal data about your credit card or bank account with a merchant, compared to PayPal's almost bare minimum of necessary data. :thonk:
It's also rather annoying how the backend has to persist the order form information throughout the entire Checkout session, because it would otherwise be lost if the server restarts while a customer is still busy entering data into Stripe's Checkout form. Compare that to the PayPal JavaScript SDK, which only POSTs back to your server after the customer completed a payment. In Stripe's case, more JavaScript actually only makes the integration harder: If you trigger the initial payment HTTP request from JavaScript, you will have to improvise a bit to avoid the CORS error when redirecting away to a different domain.

But sure, it's all not too bad… for regular orders at least. With subscriptions, however, things get much worse. Unlike PayPal, Stripe kind of wants to stay out of the way of the payment process as much as possible, and just be a wrapper around its supported payment methods. So if customers aren't really meant to register with Stripe, how would they cancel their subscriptions? :thonk:
Answer: Through the… merchant? Which I quite dislike in principle, because why should you have to trust me to actually cancel your subscription after you requested it? It also means that I probably should add some sort of UI for self-canceling a Stripe subscription, ideally without adding full-blown user accounts. Not that this solves the underlying trust issue, but it's more convenient than contacting me via email or, worse, going through your bank somehow. Here is how my solution works:

I might have gone a bit overboard with the crypto there, but I liked the idea of not storing any of the Stripe session IDs in the server database. It's not like that makes the system more complex anyway, and it's nice to have a separate confirmation step before canceling a subscription.

But even that wasn't everything I had to keep in mind here. Once you switch from test to production mode for the final tests, you'll notice that certain SEPA-based payment providers take their sweet time to process and activate new subscriptions. The Checkout session object even informs you about that, by including a payment status field. Which initially seems just like another field that could indicate hacking attempts, but treating it as such and rejecting any unpaid session can also reject perfectly valid subscriptions. I don't want all this control… 🥲
Instead, all I can do in this case is to tell you about it. In my test, the Stripe dashboard said that it might take days or even weeks for the initial subscription transaction to be confirmed. In such a case, the respective fraction of the cap will unfortunately need to remain red for that entire time.

And that was 1½ pushes just to replicate the basic functionality of a simple PayPal integration with the simplest type of Stripe integration. On the architectural site, all the necessary refactoring work made me finally upgrade my frontend code to TypeScript at least, using the amazing esbuild to handle transpilation inside the server binary. Let's see how long it will now take for me to upgrade to SCSS…


With the new payment options, it makes sense to go for another slight price increase, from up to per push. The amount of taxes I have to pay on this income is slowly becoming significant, and the store has been selling out almost immediately for the last few months anyway. If demand remains at the current level or even increases, I plan to gradually go up to by the end of the year.
📝 As 📝 usual, I'm going to deliver existing orders in the backlog at the value they were originally purchased at. Due to the way the cap has to be calculated, these contributions now appear to have increased in value by a rather awkward 13.33%.


This left ½ of a push for some more work on the TH01 Anniversary Edition. Unfortunately, this was too little time for the grand issue of removing byte-aligned rendering of bigger sprites, which will need some additional blitting performance research. Instead, I went for a bunch of smaller bugfixes:

The final point, however, raised the question of what we're now going to do about 📝 a certain issue in the 地獄/Jigoku Bad Ending. ZUN's original expensive way of switching the accessed VRAM page was the main reason behind the lag frames on slower PC-98 systems, and search-replacing the respective function calls would immediately get us to the optimized version shown in that blog post. But is this something we actually want? If we wanted to retain the lag, we could surely preserve that function just for this one instance…
The discovery of this issue predates the clear distinction between bloat, quirks, and bugs, so it makes sense to first classify what this issue even is. The distinction comes all down to observability, which I defined as changes to rendered frames between explicitly defined frame boundaries. That alone would be enough to categorize any cause behind lag frames as bloat, but it can't hurt to be more explicit here.

Therefore, I now officially judge observability in terms of an infinitely fast PC-98 that can instantly render everything between two explicitly defined frames, and will never add additional lag frames. If we plan to port the games to faster architectures that aren't bottlenecked by disappointing blitter chips, this is the only reasonable assumption to make, in my opinion: The minimum system requirements in the games' README files are minimums, after all, not recommendations. Chasing the exact frame drop behavior that ZUN must have experienced during the time he developed these games can only be a guessing game at best, because how can we know which PC-98 model ZUN actually developed the games on? There might even be more than one model, especially when it comes to TH01 which had been in development for at least two years before ZUN first sold it. It's also not like any current PC-98 emulator even claims to emulate the specific timing of any existing model, and I sure hope that nobody expects me to import a bunch of bulky obsolete hardware just to count dropped frames.

That leaves the tearing, where it's much more obvious how it's a bug. On an infinitely fast PC-98, the ドカーン frame would never be visible, and thus falls into the same category as the 📝 two unused animations in the Sariel fight. With only a single unconditional 2-frame delay inside the animation loop, it becomes clear that ZUN intended both frames of the animation to be displayed for 2 frames each:

No tearing, and 34 frames in total for the first of the two instances of this animation.

:th01: TH01 Anniversary Edition, version P0239 2023-05-01-th01-anniv.zip

Next up: Taking the oldest still undelivered push and working towards TH04 position independence in preparation for multilingual translations. The Shuusou Gyoku OpenGL backend shouldn't take that much longer either, so I should have lots of stuff coming up in May afterward.

📝 Posted:
🚚 Summary of:
P0214, P0215
Commits:
158a91e...414770c, 414770c...3123c9d
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

Last blog post before the 100% completion of TH01! The final parts of REIIDEN.EXE would feel rather out of place in a celebratory blog post, after all. They provided quite a neat summary of the typical technical details that are wrong with this game, and that I now get to mention for one final time:

But hey, there's an error message if you start REIIDEN.EXE without a resident MDRV2 or a correctly prepared resident structure! And even a good, user-friendly one, asking the user to launch the batch file instead. For some reason, this convenience went out of fashion in the later games.


The Game Over animation (how fitting) gives us TH01's final piece of weird sprite blitting code, which seriously manages to include 2 bugs and 3 quirks in under 50 lines of code. In test mode (game t or game d), you can trigger this effect by pressing the ⬇️ down arrow key, which certainly explains why I encountered seemingly random Game Over events during all the tests I did with this game…
The animation appears to have changed quite a bit during development, to the point that probably even ZUN himself didn't know what he wanted it to look like in the end:

The original version unblits a 32×32 rectangle around Reimu that only grows on the X axis… for the first 5 frames. The unblitting call is only run if the corresponding sprite wasn't clipped at the edges of the playfield in the frame before, and ZUN uses the animation's frame number rather than the sprite loop variable to index the per-sprite clip flag array. The resulting out-of-bounds access then reads the sprite coordinates instead, which are never 0, thus interpreting all 5 sprites as clipped.
This variant would interpret the declared 5 effect coordinates as distinct sprites and unblit them correctly every frame. The end result is rather wimpy though… hardly appropriate for a Game Over, especially with the original animation in mind.
This variant would not unblit anything, and is probably closest to what the final animation should have been.

Finally, we get to the big main() function, serving as the duct tape that holds this game together. It may read rather disorganized with all the (actually necessary) assignments and function calls, but the only actual minor issue I've seen there is that you're robbed of any pellet destroy bonus collected on the final frame of the final boss. There is a certain charm in directly nesting the infinite main gameplay loop within the infinite per-life loop within the infinite stage loop. But come on, why is there no fourth scene loop? :zunpet: Instead, the game just starts a new REIIDEN.EXE process before and after a boss fight. With all the wildly mutated global state, that was probably a much saner choice.

The final secrets can be found in the debug stage selection. ZUN implemented the prompts using the C standard library's scanf() function, which is the natural choice for quick-and-dirty testing features like this one. However, the C standard library is also complete and utter trash, and so it's not surprising that both of the scanf() calls do… well, probably not what ZUN intended. The guaranteed out-of-bounds memory access in the select_flag route prompt thankfully has no real effect on the game, but it gets really interesting with the 面数 stage prompt.
Back in 2020, I already wrote about 📝 stages 21-24, and how they're loaded from actual data that ZUN shipped with the game. As it now turns out, the code that maps stage IDs to STAGE?.DAT scene numbers contains an explicit branch that maps any (1-based) stage number ≥21 to scene 7. Does this mean that an Extra Stage was indeed planned at some point? That branch seems way too specific to just be meant as a fallback. Maybe Asprey was on to something after all…

However, since ZUN passed the stage ID as a signed integer to scanf(), you can also enter negative numbers. The only place that kind of accidentally checks for them is the aforementioned stage ID → scene mapping, which ensures that (1-based) stages < 5 use the shrine's background image and BGM. With no checks anywhere else, we get a new set of "glitch stages":

TH01's stage -1
Stage -1
TH01's stage -2
Stage -2
TH01's stage -3
Stage -3
TH01's stage -4
Stage -4
TH01's stage -5
Stage -5

The scene loading function takes the entered 0-based stage ID value modulo 5, so these 4 are the only ones that "exist", and lower stage numbers will simply loop around to them. When loading these stages, the function accesses the data in REIIDEN.EXE that lies before the statically allocated 5-element stages-of-scene array, which happens to encompass Borland C++'s locale and exception handling data, as well as a small bit of ZUN's global variables. In particular, the obstacle/card HP on the tile I highlighted in green corresponds to the lowest byte of the 32-bit RNG seed. If it weren't for that and the fact that the obstacles/card HP on the few tiles before are similarly controlled by the x86 segment values of certain initialization function addresses, these glitch stages would be completely deterministic across PC-98 systems, and technically canon… :tannedcirno:
Stage -4 is the only playable one here as it's the only stage to end up below the 📝 heap corruption limit of 102 stage objects. Completing it loads Stage -3, which crashes with a Divide Error just like it does if it's directly selected. Unsurprisingly, this happens because all 50 card bytes at that memory location are 0, so one division (or in this case, modulo operation) by the number of cards is enough to crash the game.
Stage -5 is modulo'd to 0 and thus loads the first regular stage. The only apparent broken element there is the timer, which is handled by a completely different function that still operates with a (0-based) stage ID value of -5. Completing the stage loads Stage -4, which also crashes, but only because its 61 cards naturally cause the 📝 stack overflow in the flip-in animation for any stage with more than 50 cards.

And that's REIIDEN.EXE, the biggest and most bloated PC-98 Touhou executable, fully decompiled! Next up: Finishing this game with the main menu, and hoping I'll actually pull it off within 24 hours. (If I do, we might all have to thank 32th System, who independently decompiled half of the remaining 14 functions…)

📝 Posted:
🚚 Summary of:
P0201, P0202
Commits:
9342665...ff49e9e, ff49e9e...4568bf7
💰 Funded by:
Ember2528, Yanga, [Anonymous]
🏷 Tags:

The positive:

The negative:

The overview:


This time, we're back to the Orb hitbox being a logical 49×49 pixels in SinGyoku's center, and the shot hitbox being the weird one. What happens if you want the shot hitbox to be both offset to the left a bit and stretch the entire width of SinGyoku's sprite? You get a hitbox that ends in mid-air, far away from the right edge of the sprite:

Due to VRAM byte alignment, all player shots fired between gx = 376 and gx = 383 inclusive appear at the same visual X position, but are internally already partly outside the hitbox and therefore won't hit SinGyoku – compare the marked shot at gx = 376 to the one at gx = 380. So much for precisely visualizing hitboxes in this game…

Since the female and male forms also use the sphere entity's coordinates, they share the same hitbox.


Onto the rendering glitches then, which can – you guessed it – all be found in the sphere form's slam movement:

By having the sphere move from the right edge of the playfield to the left, this video demonstrates both the lazy reblitting and broken unblitting at the right edge for negative X velocities. Also, isn't it funny how Reimu can partly disappear from all the sloppy SinGyoku-related unblitting going on after her sprite was blitted?

Due to the low contrast of the sphere against the background, you typically don't notice these glitches, but the white invincibility flashing after a hit really does draw attention to them. This time, all of these glitches aren't even directly caused by ZUN having never learned about the EGC's bit length register – if he just wrote correct code for SinGyoku, none of this would have been an issue. Sigh… I wonder how many more glitches will be caused by improper use of this one function in the last 18% of REIIDEN.EXE.

There's even another bug here, with ZUN hardcoding a horizontal delta of 8 pixels rather than just passing the actual X velocity. Luckily, the maximum movement speed is 6 pixels on Lunatic, and this would have only turned into an additional observable glitch if the X velocity were to exceed 24 pixels. But that just means it's the kind of bug that still drains RE attention to prove that you can't actually observe it in-game under some circumstances.


The 5 pellet patterns are all pretty straightforward, with nothing to talk about. The code architecture during phase 2 does hint towards ZUN having had more creative patterns in mind – especially for the male form, which uses the transformation function's three pattern callback slots for three repetitions of the same pellet group.
There is one more oddity to be found at the very end of the fight:

The first frame of TH01 SinGyoku's defeat animation, showing the sphere blitted on top of a potentially active person form

Right before the defeat white-out animation, the sphere form is explicitly reblitted for no reason, on top of the form that was blitted to VRAM in the previous frame, and regardless of which form is currently active. If SinGyoku was meant to immediately transform back to the sphere form before being defeated, why isn't the person form unblitted before then? Therefore, the visibility of both forms is undeniably canon, and there is some lore meaning to be found here… :thonk:
In any case, that's SinGyoku done! 6th PC-98 Touhou boss fully decompiled, 25 remaining.


No FUUIN.EXE code rounding out the last push for a change, as the 📝 remaining missile code has been waiting in front of SinGyoku for a while. It already looked bad in November, but the angle-based sprite selection function definitely takes the cake when it comes to unnecessary and decadent floating-point abuse in this game.
The algorithm itself is very trivial: Even with 📝 .PTN requiring an additional quarter parameter to access 16×16 sprites, it's essentially just one bit shift, one addition, and one binary AND. For whatever reason though, ZUN casts the 8-bit missile angle into a 64-bit double, which turns the following explicit comparisons (!) against all possible 4 + 16 boundary angles (!!) into FPU operations. :zunpet: Even with naive and readable division and modulo operations, and the whole existence of this function not playing well with Turbo C++ 4.0J's terrible code generation at all, this could have been 3 lines of code and 35 un-inlined constant-time instructions. Instead, we've got this 207-instruction monster… but hey, at least it works. 🤷
The remaining time then went to YuugenMagan's initialization code, which allowed me to immediately remove more declarations from ASM land, but more on that once we get to the rest of that boss fight.

That leaves 76 functions until we're done with TH01! Next up: Card-flipping stage obstacles.

📝 Posted:
🚚 Summary of:
P0165, P0166, P0167
Commits:
7a0e5d8...f2bca01, f2bca01...e697907, e697907...c2de6ab
💰 Funded by:
Ember2528
🏷 Tags:

OK, TH01 missile bullets. Can we maybe have a well-behaved entity type, without any weirdness? Just once?

Ehh, kinda. Apart from another 150 bytes wasted on unused structure members, this code is indeed more on the low end in terms of overall jank. It does become very obvious why dodging these missiles in the YuugenMagan, Mima, and Elis fights feels so awful though: An unfair 46×46 pixel hitbox around Reimu's center pixel, combined with the comeback of 📝 interlaced rendering, this time in every stage. ZUN probably did this because missiles are the only 16×16 sprite in TH01 that is blitted to unaligned X positions, which effectively ends up touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would have been totally possible to render every missile in every frame at roughly the same amount of CPU time that the original game uses for interlaced rendering:

That's an optimization that would have significantly benefitted the game, in contrast to all of the fake ones introduced in later games. Then again, this optimization is actually something that the later games do, and it might have in fact been necessary to achieve their higher bullet counts without significant slowdown.

Unfortunately, it was only worth decompiling half of the missile code right now, thanks to gratuitous FPU usage in the other half, where 📝 double variables are compared to float literals. That one will have to wait 📝 until after SinGyoku.


After some effectively unused Mima sprite effect code that is so broken that it's impossible to make sense out of it, we get to the final feature I wanted to cover for all bosses in parallel before returning to Sariel: The separate sprite background storage for moving or animated boss sprites in the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from main memory on every frame. After all, TH01 and TH02 had a minimum required clock speed of 33 MHz, half of the speed required for the later three games. So, he simply blitted these boss sprites to both VRAM pages, leading the usual unblitting calls to only remove the other sprites on top of the boss. However, these bosses themselves want to move across the screen… and this makes it necessary to save the stage background behind them in some other way.

Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM into a sprite slot. No problem with that approach in theory, as the size of all these bigger sprites is a multiple of 32×32; splitting a larger sprite into these smaller 32×32 chunks makes the code look just a little bit clumsy (and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is blitted on top of her regular sprite, using just regular sprite transparency, she ends up with her infamous third arm:

TH01 Mima's third arm

Ironically, there's an unused code path in Mima's unblit function where ZUN assumes a height of 48 pixels for Mima's animation sprites rather than the actual 64. This leads to even clumsier .PTN function calls for the bottom 128×16 pixels… Failing to unblit the bottom 16 pixels would have also yielded that third arm, although it wouldn't have looked as natural. Still wouldn't say that it was intentional; maybe this casting sprite was just added pretty late in the game's development?


So, mission accomplished, Sariel unblocked… at 2¼ pushes. :thonk: That's quite some time left for some smaller stage initialization code, which bundles a bunch of random function calls in places where they logically really don't belong. The stage opening animation then adds a bunch of VRAM inter-page copies that are not only redundant but can't even be understood without knowing the hidden internal state of the last VRAM page accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any complexity limit on inlining arithmetic expressions, as long as they only operate on compile-time constants. That's how we get macro-free, compile-time Shift-JIS to JIS X 0208 conversion of the individual code points in the 東方★靈異伝 string, in a compiler from 1994. As long as you don't store any intermediate results in variables, that is… :tannedcirno:

But wait, there's more! With still ¼ of a push left, I also went for the boss defeat animation, which includes the route selection after the SinGyoku fight.
As in all other instances, the 2× scaled font is accomplished by first rendering the text at regular 1× resolution to the other, invisible VRAM page, and then scaled from there to the visible one. However, the route selection is unique in that its scaled text is both drawn transparently on top of the stage background (not onto a black one), and can also change colors depending on the selection. It would have been no problem to unblit and reblit the text by rendering the 1× version to a position on the invisible VRAM page that isn't covered by the 2× version on the visible one, but ZUN (needlessly) clears the invisible page before rendering any text. :zunpet: Instead, he assigned a separate VRAM color for both the 魔界 and 地獄 options, and only changed the palette value for these colors to white or gray, depending on the correct selection. This is another one of the 📝 rare cases where TH01 demonstrates good use of PC-98 hardware, as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.

Then, why does this still not count as good-code? When changing palette colors, you kinda need to be aware of everything else that can possibly be on screen, which colors are used there, and which aren't and can therefore be used for such an effect without affecting other sprites. In this case, well… hover over the image below, and notice how Reimu's hair and the bomb sprites in the HUD light up when Makai is selected:

Demonstration of palette changes in TH01's route selection

This push did end on a high note though, with the generic, non-SinGyoku version of the defeat animation being an easily parametrizable copy. And that's how you decompile another 2.58% of TH01 in just slightly over three pushes.


Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and SinGyoku without needing any more detours into non-boss code. Thanks to the current TH01 funding subscriptions, I can plan to cover most, if not all, of Sariel in a single push series, but the currently 3 pending pushes probably won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got quite a lot of not specifically TH01-related funds in the backlog to pass the time though.

Due to recent developments, it actually makes quite a lot of sense to take a break from TH01: spaztron64 has managed what every Touhou download site so far has failed to do: Bundling all 5 game onto a single .HDI together with pre-configured PC-98 emulators and a nice boot menu, and hosting the resulting package on a proper website. While this first release is already quite good (and much better than my attempt from 2014), there is still a bit of room for improvement to be gained from specific ReC98 research. Next up, therefore:

📝 Posted:
🚚 Summary of:
P0162, P0163, P0164
Commits:
81dd96e...24b3a0d, 24b3a0d...6d572b3, 6d572b3...7a0e5d8
💰 Funded by:
Ember2528, Yanga
🏷 Tags:

No technical obstacles for once! Just pure overcomplicated ZUN code. Unlike 📝 Konngara's main function, the main TH01 player function was every bit as difficult to decompile as you would expect from its size.

With TH01 using both separate left- and right-facing sprites for all of Reimu's moves and separate classes for Reimu's 32×32 and 48×* sprites, we're already off to a bad start. Sure, sprite mirroring is minimally more involved on PC-98, as the planar nature of VRAM requires the bits within an 8-pixel byte to also be mirrored, in addition to writing the sprite bytes from right to left. TH03 uses a 256-byte lookup table for this, generated at runtime by an infamous micro-optimized and undecompilable ASM algorithm. With TH01's existing architecture, ZUN would have then needed to write 3 additional blitting functions. But instead, he chose to waste a total of 26,112 bytes of memory on pre-mirrored sprites… :godzun:

Alright, but surely selecting those sprites from code is no big deal? Just store the direction Reimu is facing in, and then add some branches to the rendering code. And there is in fact a variable for Reimu's direction… during regular arrow-key movement, and another one while shooting and sliding, and a third as part of the special attack types, launched out of a slide.
Well, OK, technically, the last two are the same variable. But that's even worse, because it means that ZUN stores two distinct enums at the same place in memory: Shooting and sliding uses 1 for left, 2 for right, and 3 for the "invalid" direction of holding both, while the special attack types indicate the direction in their lowest bit, with 0 for right and 1 for left. I decompiled the latter as bitflags, but in ZUN's code, each of the 8 permutations is handled as a distinct type, with copy-pasted and adapted code… :zunpet: The interpretation of this two-enum "sub-mode" union variable is controlled by yet another "mode" variable… and unsurprisingly, two of the bugs in this function relate to the sub-mode variable being interpreted incorrectly.

Also, "rendering code"? This one big function basically consists of separate unblit→update→render code snippets for every state and direction Reimu can be in (moving, shooting, swinging, sliding, special-attacking, and bombing), pasted together into a tangled mess of nested if(…) statements. While a lot of the code is copy-pasted, there are still a number of inconsistencies that defeat the point of my usual refactoring treatment. After all, with a total of 85 conditional branches, anything more than I did would have just obscured the control flow too badly, making it even harder to understand what's going on.
In the end, I spotted a total of 8 bugs in this function, all of which leave Reimu invisible for one or more frames:

Thanks to the last one, Reimu's first swing animation frame is never actually rendered. So whenever someone complains about TH01 sprite flickering on an emulator: That emulator is accurate, it's the game that's poorly written. :tannedcirno:

And guess what, this function doesn't even contain everything you'd associate with per-frame player behavior. While it does handle Yin-Yang Orb repulsion as part of slides and special attacks, it does not handle the actual player/Orb collision that results in lives being lost. The funny thing about this: These two things are done in the same function… :onricdennat:

Therefore, the life loss animation is also part of another function. This is where we find the final glitch in this 3-push series: Before the 16-frame shake, this function only unblits a 32×32 area around Reimu's center point, even though it's possible to lose a life during the non-deflecting part of a 48×48-pixel animation. In that case, the extra pixels will just stay on screen during the shake. They are unblitted afterwards though, which suggests that ZUN was at least somewhat aware of the issue?
Finally, the chance to see the alternate life loss sprite Alternate TH01 life loss sprite is exactly ⅛.


As for any new insights into game mechanics… you know what? I'm just not going to write anything, and leave you with this flowchart instead. Here's the definitive guide on how to control Reimu in TH01 we've been waiting for 24 years:

(SVG download)

Pellets are deflected during all gray states. Not shown is the obvious "double-tap Z and X" transition from all non-(#1) states to the Bomb state, but that would have made this diagram even more unwieldy than it turned out. And yes, you can shoot twice as fast while moving left or right.

While I'm at it, here are two more animations from MIKO.PTN which aren't referenced by any code:

An unused animation from TH01's MIKO.PTNAn unused animation from TH01's MIKO.PTN

With that monster of a function taken care of, we've only got boss sprite animation as the final blocker of uninterrupted Sariel progress. Due to some unfavorable code layout in the Mima segment though, I'll need to spend a bit more time with some of the features used there. Next up: The missile bullets used in the Mima and YuugenMagan fights.

📝 Posted:
🚚 Summary of:
P0160, P0161
Commits:
e491cd7...42ba4a5, 42ba4a5...81dd96e
💰 Funded by:
Yanga, [Anonymous]
🏷 Tags:

Nothing really noteworthy in TH01's stage timer code, just yet another HUD element that is needlessly drawn into VRAM. Sure, ZUN applies his custom boldfacing effect on top of the glyphs retrieved from font ROM, but he could have easily installed those modified glyphs as gaiji.
Well, OK, halfwidth gaiji aren't exactly well documented, and sometimes not even correctly emulated 📝 due to the same PC-98 hardware oddity I was researching last month. I've reserved two of the pending anonymous "anything" pushes for the conclusion of this research, just in case you were wondering why the outstanding workload is now lower after the two delivered here.

And since it doesn't seem to be clearly documented elsewhere: Every 2 ticks on the stage timer correspond to 4 frames.


So, TH01 rank pellet speed. The resident pellet speed value is a factor ranging from a minimum of -0.375 up to a maximum of 0.5 (pixels per frame), multiplied with the difficulty-adjusted base speed for each pellet and added on top of that same speed. This multiplier is modified

Apparently, ZUN noted that these deltas couldn't be losslessly stored in an IEEE 754 floating-point variable, and therefore didn't store the pellet speed factor exactly in a way that would correspond to its gameplay effect. Instead, it's stored similar to Q12.4 subpixels: as a simple integer, pre-multiplied by 40. This results in a raw range of -15 to 20, which is what the undecompiled ASM calls still use. When spawning a new pellet, its base speed is first multiplied by that factor, and then divided by 40 again. This is actually quite smart: The calculation doesn't need to be aware of either Q12.4 or the 40× format, as ((Q12.4 * factor×40) / factor×40) still comes out as a Q12.4 subpixel even if all numbers are integers. The only limiting issue here would be the potential overflow of the 16-bit multiplication at unadjusted base speeds of more than 50 pixels per frame, but that'd be seriously unplayable.
So yeah, pellet speed modifications are indeed gradual, and don't just fall into the coarse three "high, normal, and low" categories.


That's ⅝ of P0160 done, and the continue and pause menus would make good candidates to fill up the remaining ⅜… except that it seemed impossible to figure out the correct compiler options for this code?
The issues centered around the two effects of Turbo C++ 4.0J's -O switch:

  1. Optimizing jump instructions: merging duplicate successive jumps into a single one, and merging duplicated instructions at the end of conditional branches into a single place under a single branch, which the other branches then jump to
  2. Compressing ADD SP and POP CX stack-clearing instructions after multiple successive CALLs to __cdecl functions into a single ADD SP with the combined parameter stack size of all function calls

But how can the ASM for these functions exhibit #1 but not #2? How can it be seemingly optimized and unoptimized at the same time? The only option that gets somewhat close would be -O- -y, which emits line number information into the .OBJ files for debugging. This combination provides its own kind of #1, but these functions clearly need the real deal.

The research into this issue ended up consuming a full push on its own. In the end, this solution turned out to be completely unrelated to compiler options, and instead came from the effects of a compiler bug in a totally different place. Initializing a local structure instance or array like

const uint4_t flash_colors[3] = { 3, 4, 5 };

always emits the { 3, 4, 5 } array into the program's data segment, and then generates a call to the internal SCOPY@ function which copies this data array to the local variable on the stack. And as soon as this SCOPY@ call is emitted, the -O optimization #1 is disabled for the entire rest of the translation unit?!
So, any code segment with an SCOPY@ call followed by __cdecl functions must strictly be decompiled from top to bottom, mirroring the original layout of translation units. That means no TH01 continue and pause menus before we haven't decompiled the bomb animation, which contains such an SCOPY@ call. 😕
Luckily, TH01 is the only game where this bug leads to significant restrictions in decompilation order, as later games predominantly use the pascal calling convention, in which each function itself clears its stack as part of its RET instruction.


What now, then? With 51% of REIIDEN.EXE decompiled, we're slowly running out of small features that can be decompiled within ⅜ of a push. Good that I haven't been looking a lot into OP.EXE and FUUIN.EXE, which pretty much only got easy pieces of code left to do. Maybe I'll end up finishing their decompilations entirely within these smaller gaps?
I still ended up finding one more small piece in REIIDEN.EXE though: The particle system, seen in the Mima fight.

I like how everything about this animation is contained within a single function that is called once per frame, but ZUN could have really consolidated the spawning code for new particles a bit. In Mima's fight, particles are only spawned from the top and right edges of the screen, but the function in fact contains unused code for all other 7 possible directions, written in quite a bloated manner. This wouldn't feel quite as unused if ZUN had used an angle parameter instead… :thonk: Also, why unnecessarily waste another 40 bytes of the BSS segment?

But wait, what's going on with the very first spawned particle that just stops near the bottom edge of the screen in the video above? Well, even in such a simple and self-contained function, ZUN managed to include an off-by-one error. This one then results in an out-of-bounds array access on the 80th frame, where the code attempts to spawn a 41st particle. If the first particle was unlucky to be both slow enough and spawned away far enough from the bottom and right edges, the spawning code will then kill it off before its unblitting code gets to run, leaving its pixel on the screen until something else overlaps it and causes it to be unblitted.
Which, during regular gameplay, will quickly happen with the Orb, all the pellets flying around, and your own player movement. Also, the RNG can easily spawn this particle at a position and velocity that causes it to leave the screen more quickly. Kind of impressive how ZUN laid out the structure of arrays in a way that ensured practically no effect of this bug on the game; this glitch could have easily happened every 80 frames instead. He almost got close to all bugs canceling out each other here! :godzun:

Next up: The player control functions, including the second-biggest function in all of PC-98 Touhou.

📝 Posted:
🚚 Summary of:
P0158, P0159
Commits:
bf7bb7e...c0c0ebc, c0c0ebc...e491cd7
💰 Funded by:
Yanga
🏷 Tags:

Of course, Sariel's potentially bloated and copy-pasted code is blocked by even more definitely bloated and copy-pasted code. It's TH01, what did you expect? :tannedcirno:

But even then, TH01's item code is on a new level of software architecture ridiculousness. First, ZUN uses distinct arrays for both types of items, with their own caps of 4 for bomb items, and 10 for point items. Since that obviously makes any type-related switch statement redundant, he also used distinct functions for both types, with copy-pasted boilerplate code. The main per-item update and render function is shared though… and takes every single accessed member of the item structure as its own reference parameter. Like, why, you have a structure, right there?! That's one way to really practice the C++ language concept of passing arbitrary structure fields by mutable reference… :zunpet:
To complete the unwarranted grand generic design of this function, it calls back into per-type collision detection, drop, and collect functions with another three reference parameters. Yeah, why use C++ virtual methods when you can also implement the effectively same polymorphism functionality by hand? Oh, and the coordinate clamping code in one of these callbacks could only possibly have come from nested min() and max() preprocessor macros. And that's how you extend such dead-simple functionality to 1¼ pushes…

Amidst all this jank, we've at least got a sensible item↔player hitbox this time, with 24 pixels around Reimu's center point to the left and right, and extending from 24 pixels above Reimu down to the bottom of the playfield. It absolutely didn't look like that from the initial naive decompilation though. Changing entity coordinates from left/top to center was one of the better lessons from TH01 that ZUN implemented in later games, it really makes collision detection code much more intuitive to grasp.


The card flip code is where we find out some slightly more interesting aspects about item drops in this game, and how they're controlled by a hidden cycle variable:

Then again, score players largely ignore point items anyway, as card combos simply have a much bigger effect on the score. With this, I should have RE'd all information necessary to construct a tool-assisted score run, though?
Edit: Turns out that 1) point items are becoming increasingly important in score runs, and 2) Pearl already did a TAS some months ago. Thanks to spaztron64 for the info!

The Orb↔card hitbox also makes perfect sense, with 24 pixels around the center point of a card in every direction.

The rest of the code confirms the card flip score formula documented on Touhou Wiki, as well as the way cards are flipped by bombs: During every of the 90 "damaging" frames of the 140-frame bomb animation, there is a 75% chance to flip the card at the [bomb_frame % total_card_count_in_stage] array index. Since stages can only have up to 50 cards 📝 thanks to a bug, even a 75% chance is high enough to typically flip most cards during a bomb. Each of these flips still only removes a single card HP, just like after a regular collision with the Orb.
Also, why are the card score popups rendered before the cards themselves? That's two needless frames of flicker during that 25-frame animation. Not all too noticeable, but still.


And that's over 50% of REIIDEN.EXE decompiled as well! Next up: More HUD update and rendering code… with a direct dependency on rank pellet speed modifications?

📝 Posted:
🚚 Summary of:
P0157
Commits:
4bc6405...bf7bb7e
💰 Funded by:
Yanga
🏷 Tags:

Yup, there still are features that can be fully covered in a single push and don't lead to sprawling blog posts. The giant STAGE number and HARRY UP messages, as well as the flashing transparent 東方★靈異伝 at the beginning of each scene are drawn by retrieving the glyphs for each letter from font ROM, and then "blitting" them to text RAM by placing a colored fullwidth 16×16 square at every pixel that is set in the font bitmap.
And 📝 once again, ZUN's code there matches the mediocre example code for the related hardware interrupt from the PC-9801 Programmers' Bible. It's not 100% copied this time, but definitely inspired by the code on page 121. Therefore, we can conclude that these letters are probably only displayed as these 16× scaled glyphs because that book had code on how to achieve this effect.

ZUN "improved" on the example code by implementing a write-only cursor over the entire text RAM that fills every 16×16 cell with a differently colored space character, fully clearing the text RAM as a side effect. For once, he even removed some redundancy here by using helper functions! It's all still far from good-code though. For example, there's a function for filling 5 rows worth of cells, which he uses for both the top and bottom margin of these letters. But since the bottom margin starts at the 22nd line, the code writes past the 25th line and into the second TRAM page. Good that this page is not used by either the hardware or the game.

These cursor functions can actually write any fullwidth JIS code point to text RAM… and seem to do that in a rather simplified way, because shouldn't you set the most significant bit to indicate the right half of a fullwidth character? That's what's written in the same book that ZUN copied all functions out of, after all. 🤔 Researching this led me down quite the rabbit hole, where I found an oddity in PC-98 text RAM rendering that no single one of the widely-used PC-98 emulators gets completely right. I'm almost done with the 2-push research into this issue, which will include fixes for DOSBox-X and Neko Project II. The only thing I'm missing to get these fully accurate is a screenshot of the output created by this binary, on any PC-98 model made by EPSON: 2021-09-12-jist0x28.com.zip That's the reason why this push was rather delayed. Thanks in advance to anyone who'd like to help with this!


In maybe more disappointing news: Sariel is going to be delayed for a while longer. 😕 The player- and HUD-related functions, which previously delayed further progress there, turned out to call a lot of not yet RE'd functions themselves. Seems as if we're doing most of the card-flipping code second, after all? Next up: Point and bomb items, which at least are a significant step in terms of position independence.

📝 Posted:
🚚 Summary of:
P0123
Commits:
4406c3d...72dfa09
💰 Funded by:
Yanga
🏷 Tags:

Done with the .BOS format, at last! While there's still quite a bunch of undecompiled non-format blitting code left, this was in fact the final piece of graphics format loading code in TH01.

📝 Continuing the trend from three pushes ago, we've got yet another class, this time for the 48×48 and 48×32 sprites used in Reimu's gohei, slide, and kick animations. The only reason these had to use the .BOS format at all is simply because Reimu's regular sprites are 32×32, and are therefore loaded from 📝 .PTN files.
Yes, this makes no sense, because why would you split animations for the same character across two file formats and two APIs, just because of a sprite size difference? This necessity for switching blitting APIs might also explain why Reimu vanishes for a few frames at the beginning and the end of the gohei swing animation, but more on that once we get to the high-level rendering code.

Now that we've decompiled all the .BOS implementations in TH01, here's an overview of all of them, together with .PTN to show that there really was no reason for not using the .BOS API for all of Reimu's sprites:

CBossEntity CBossAnim CPlayerAnim ptn_* (32×32)
Format .BOS .BOS .BOS .PTN
Hitbox
Byte-aligned blitting
Byte-aligned unblitting
Unaligned blitting Single-line and wave only
Precise unblitting
Per-file sprite limit 8 8 32 64
Pixels blitted at once 16 16 8 32

And even that last property could simply be handled by branching based on the sprite width, and wouldn't be a reason for switching formats. But well, it just wouldn't be TH01 without all that redundant bloat though, would it?

The basic loading, freeing, and blitting code was yet another variation on the other .BOS code we've seen before. So this should have caused just as little trouble as the CBossAnim code… except that CPlayerAnim did add one slightly difficult function to the mix, which led to it requiring almost a full push after all. Similar to 📝 the unblitting code for moving lasers we've seen in the last push, ZUN tries to minimize the amount of VRAM writes when unblitting Reimu's slide animations. Technically, it's only necessary to restore the pixels that Reimu traveled by, plus the ones that wouldn't be redrawn by the new animation frame at the new X position.
The theoretically arbitrary distance between the two sprites is, of course, modeled by a fixed-size buffer on the stack :onricdennat:, coming with the further assumption that the sprite surely hasn't moved by more than 1 horizontal VRAM byte compared to the last frame. Which, of course, results in glitches if that's not the case, leaving little Reimu parts in VRAM if the slide speed ever exceeded 8 pixels per frame. :tannedcirno: (Which it never does, being hardcoded to 6 pixels, but still.). As it also turns out, all those bit masking operations easily lead to incredibly sloppy C code. Which compiles into incredibly terrible ASM, which in turn might end up wasting way more CPU time than the final VRAM write optimization would have gained? Then again, in-depth profiling is way beyond the scope of this project at this point.

Next up: The TH04 main menu, and some more technical debt.

📝 Posted:
🚚 Summary of:
P0105, P0106, P0107, P0108
Commits:
3622eb6...11b776b, 11b776b...1f1829d, 1f1829d...1650241, 1650241...dcf4e2c
💰 Funded by:
Yanga
🏷 Tags:

And indeed, I got to end my vacation with a lot of image format and blitting code, covering the final two formats, .GRC and .BOS. .GRC was nothing noteworthy – one function for loading, one function for byte-aligned blitting, and one function for freeing memory. That's it – not even a unblitting function for this one. .BOS, on the other hand…

…has no generic (read: single/sane) implementation, and is only implemented as methods of some boss entity class. And then again for Sariel's dress and wand animations, and then again for Reimu's animations, both of which weren't even part of these 4 pushes. Looking forward to decompiling essentially the same algorithms all over again… And that's how TH01 became the largest and most bloated PC-98 Touhou game. So yeah, still not done with image formats, even at 44% RE.

This means I also had to reverse-engineer that "boss entity" class… yeah, what else to call something a boss can have multiple of, that may or may not be part of a larger boss sprite, may or may not be animated, and that may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this class. Since renaming all these variables in ASM land is tedious anyway, I went the extra mile and directly defined separate, meaningful names for the entities of all bosses. These also now document the natural order in which the bosses will ultimately be decompiled. So, unless a backer requests anything else, this order will be:

  1. Konngara
  2. Sariel
  3. Elis
  4. Kikuri
  5. SinGyoku
  6. (code for regular card-flipping stages)
  7. Mima
  8. YuugenMagan

As everyone kind of expects from TH01 by now, this class reveals yet another… um, unique and quirky piece of code architecture. In addition to the position and hitbox members you'd expect from a class like this, the game also stores the .BOS metadata – width, height, animation frame count, and 📝 bitplane pointer slot number – inside the same class. But if each of those still corresponds to one individual on-screen sprite, how can YuugenMagan have 5 eye sprites, or Kikuri have more than one soul and tear sprite? By duplicating that metadata, of course! And copying it from one entity to another :onricdennat:
At this point, I feel like I even have to congratulate the game for not actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760 bytes of waste would have definitely been noticeable in the DOS days. Makes much more sense to waste that amount of space on an unused C++ exception handler, and a bunch of redundant, unoptimized blitting functions :tannedcirno:

(Thinking about it, YuugenMagan fits this entire system perfectly. And together with its position in the game's code – last to be decompiled means first on the linker command line – we might speculate that YuugenMagan was the first boss to be programmed for TH01?)

So if a boss wants to use sprites with different sizes, there's no way around using another entity. And that's why Girl-Elis and Bat-Elis are two distinct entities internally, and have to manually sync their position. Except that there's also a third one for Attacking-Girl-Elis, because Girl-Elis has 9 frames of animation in total, and the global .BOS bitplane pointers are divided into 4 slots of only 8 images each. :zunpet:
Same for SinGyoku, who is split into a sphere entity, a person entity, and a… white flash entity for all three forms, all at the same resolution. Or Konngara's facial expressions, which also require two entities just for themselves.


And once you decompile all this code, you notice just how much of it the game didn't even use. 13 of the 50 bytes of the boss entity class are outright unused, and 10 bytes are used for a movement clamping and lock system that would have been nice if ZUN also used it outside of Kikuri's soul sprites. Instead, all other bosses ignore this system completely, and just party on the X/Y coordinates of the boss entities directly.

As for the rendering functions, 5 out of 10 are unused. And while those definitely make up less than half of the code, I still must have spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis' entrance animation, the class provides functions for wavy blitting and unblitting, which use a separate X coordinate for every line of the sprite. But there's also an unused and sort of broken one for unblitting two overlapping wavy sprites, located at the same Y coordinate. This might indicate that Elis could originally split herself into two sprites, similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind of animation effect, who knows.


After over 3 months of TH01 progress though, it's finally time to look at other games, to cover the rest of the crowdfunding backlog. Next up: Going back to TH05, and getting rid of those last PI false positives. And since I can potentially spend the next 7 weeks on almost full-time ReC98 work, I've also re-opened the store until October!

📝 Posted:
🚚 Summary of:
P0034, P0035
Commits:
6cdd229...6f1f367, 6f1f367...a533b5d
💰 Funded by:
zorg
🏷 Tags:

Deathbombs confirmed, in both TH04 and TH05! On the surface, it's the same 8-frame window as in most Windows games, but due to the slightly lower PC-98 frame rate of 56.4 Hz, it's actually slightly more lenient in TH04 and TH05.

The last function in front of the TH05 shot type control functions marks the player's previous position in VRAM to be redrawn. But as it turns out, "player" not only means "the player's option satellites on shot levels ≥ 2", but also "the explosion animation if you lose a life", which required reverse-engineering both things, ultimately leading to the confirmation of deathbombs.

It actually was kind of surprising that we then had reverse-engineered everything related to rendering all three things mentioned above, and could also cover the player rendering function right now. Luckily, TH05 didn't decide to also micro-optimize that function into un-decompilability; in fact, it wasn't changed at all from TH04. Unlike the one invalidation function whose decompilation would have actually been the goal here…

But now, we've finally gotten to where we wanted to… and only got 2 outstanding decompilation pushes left. Time to get the website ready for hosting an actual crowdfunding campaign, I'd say – It'll make a better impression if people can still see things being delivered after the big announcement.