Stripe is now
properly integrated into this website as an alternative to PayPal! Now, you
can also financially support the project if PayPal doesn't work for you, or
if you prefer using a
provider out of Stripe's greater variety. It's unfortunate that I had to
ship this integration while the store is still sold out, but the Shuusou
Gyoku OpenGL backend has turned out way too complicated to be finished next
to these two pushes within a month. It will take quite a while until the
store reopens and you all can start using Stripe, so I'll just link back to
this blog post when it happens.
Integrating Stripe wasn't the simplest task in the world either. At first,
the Checkout API
seems pretty friendly to developers: The entire payment flow is handled on
the backend, in the server language of your choice, and requires no frontend
JavaScript except for the UI feedback code you choose to write. Your
backend API endpoint initiates the Stripe Checkout session, answers with a
redirect to Stripe, and Stripe then sends a redirect back to your server if
the customer completed the payment. Superficially, this server-based
approach seems much more GDPR-friendly than PayPal, because there are no
remote scripts to obtain consent for. In reality though, Stripe shares
much more potential personal data about your credit card or bank
account with a merchant, compared to PayPal's almost bare minimum of
necessary data.
It's also rather annoying how the backend has to persist the order form
information throughout the entire Checkout session, because it would
otherwise be lost if the server restarts while a customer is still busy
entering data into Stripe's Checkout form. Compare that to the PayPal
JavaScript SDK, which only POSTs back to your server after the
customer completed a payment. In Stripe's case, more JavaScript actually
only makes the integration harder: If you trigger the initial payment
HTTP request from JavaScript, you will have
to improvise a bit to avoid the CORS error when redirecting away to a
different domain.
But sure, it's all not too bad… for regular orders at least. With
subscriptions, however, things get much worse. Unlike PayPal, Stripe
kind of wants to stay out of the way of the payment process as much as
possible, and just be a wrapper around its supported payment methods. So if
customers aren't really meant to register with Stripe, how would they cancel
their subscriptions?
Answer: Through
the… merchant? Which I quite dislike in principle, because why should
you have to trust me to actually cancel your subscription after you
requested it? It also means that I probably should add some sort of UI for
self-canceling a Stripe subscription, ideally without adding full-blown user
accounts. Not that this solves the underlying trust issue, but it's more
convenient than contacting me via email or, worse, going through your bank
somehow. Here is how my solution works:
When setting up a Stripe subscription, the server will generate a random
ID for authentication. This ID is then used as a salt for a hash
of the Stripe subscription ID, linking the two without storing the latter on
my server.
The thank you page, which is parameterized with the Stripe
Checkout session ID, will use that ID to retrieve the subscription
ID via an API call to Stripe, and display it together with the above
salt. This works indefinitely – contrary to what the expiry field in the
Checkout session object suggests, Stripe sessions are indeed stored
forever. After all, Stripe also displays this session information in a
merchant's transaction log with an excessive amount of detail. It might have
been better to add my own expiration system to these pages, but this had
been taking long enough already. For now, be aware that sharing the link to
a Stripe thank you page is equivalent to sharing your subscription
cancellation password.
The salt is then used as the key for a subscription management page. To
cancel, you visit this page and enter the Stripe subscription ID to confirm.
The server then checks whether the salt and subscription ID pair belong to
each other, and sends the actual cancellation
request back to Stripe if they do.
I might have gone a bit overboard with the crypto there, but I liked the
idea of not storing any of the Stripe session IDs in the server database.
It's not like that makes the system more complex anyway, and it's nice to
have a separate confirmation step before canceling a subscription.
But even that wasn't everything I had to keep in mind here. Once you
switch from test to production mode for the final tests, you'll notice that
certain SEPA-based
payment providers take their sweet time to process and activate new
subscriptions. The Checkout session object even informs you about that, by
including a payment status field. Which initially seems just like
another field that could indicate hacking attempts, but treating it as such
and rejecting any unpaid session can also reject perfectly valid
subscriptions. I don't want all this control… 🥲
Instead, all I can do in this case is to tell you about it. In my test, the
Stripe dashboard said that it might take days or even weeks for the initial
subscription transaction to be confirmed. In such a case, the respective
fraction of the cap will unfortunately need to remain red for that entire time.
And that was 1½ pushes just to replicate the basic functionality of a simple
PayPal integration with the simplest type of Stripe integration. On the
architectural site, all the necessary refactoring work made me finally
upgrade my frontend code to TypeScript at least, using the amazing esbuild to handle transpilation inside
the server binary. Let's see how long it will now take for me to upgrade to
SCSS…
With the new payment options, it makes sense to go for another slight price
increase, from up to per push.
The amount of taxes I have to pay on this income is slowly becoming
significant, and the store has been selling out almost immediately for the
last few months anyway. If demand remains at the current level or even
increases, I plan to gradually go up to by the end
of the year. 📝 As📝 usual,
I'm going to deliver existing orders in the backlog at the value they were
originally purchased at. Due to the way the cap has to be calculated, these
contributions now appear to have increased in value by a rather awkward
13.33%.
This left ½ of a push for some more work on the TH01 Anniversary Edition.
Unfortunately, this was too little time for the grand issue of removing
byte-aligned rendering of bigger sprites, which will need some additional
blitting performance research. Instead, I went for a bunch of smaller
bugfixes:
ANNIV.EXE now launches ZUNSOFT.COM if
MDRV98 wasn't resident before. In hindsight, it's completely obvious
why this is the right thing to do: Either you start
ANNIV.EXE directly, in which case there's no resident
MDRV98 and you haven't seen the ZUN Soft logo, or you have
made a single-line edit to GAME.BAT and replaced
op with anniv, in which case MDRV98 is
resident and you have seen the logo. These are the two
reasonable cases to support out of the box. If you are doing
anything else, it shouldn't be that hard to adjust though?
You might be wondering why I didn't just include all code of
ZUNSOFT.COM inside ANNIV.EXE together with
the rest of the game. The reason: ZUNSOFT.COM has
almost nothing in common with regular TH01 code. While the rest of
TH01 uses the custom image formats and bad rendering code I
documented again and again during its RE process,
ZUNSOFT.COM fully relies on master.lib for everything
about the bouncing-ball logo animation. Its code is much closer to
TH02 in that respect, which suggests that ZUN did in fact write this
animation for TH02, and just included the binary in TH01 for
consistency when he first sold both games together at Comiket 52.
Unlike the 📝 various bad reasons for splitting the PC-98 Touhou games into three main executables,
it's still a good idea to split off animations that use a completely
different set of rendering and file format functions. Combined with
all the BFNT and shape rendering code, ZUNSOFT.COM
actually contains even more unique code than OP.EXE,
and only slightly less than FUUIN.EXE.
The optional AUTOEXEC.BAT is now correctly encoded in
Shift-JIS instead of accidentally being UTF-8, fixing the previous
mojibake in its final ECHO line.
The command-line option that just adds a stage selection without
other debug features (anniv s) now works reliably.
This one's quite interesting because it only actually ever worked
because of a ZUN bug. From a superficial look at the code, it
shouldn't, because ZUN forgot to initialize the debug flag inside
the resident structure when specifying the s parameter.
However, master.lib's resdata_create() function doesn't
clear the resident structure after allocation. So if anything on the
system previously happened to write something other than
0x00, 0x01, or 0x03 to the
debug flag byte, the lack of initialization combined with ZUN
actually turns into a distinct non-test and non-debug stage
selection mode.
This is what happens on a certain widely circulated .HDI copy of
TH01 that boots MS-DOS 3.30C. On this system, the memory that
master.lib will allocate to the TH01 resident structure was
previously used by DOS as stack for its kernel, which left the
future resident debug flag byte at address 9FF6:0012 at
a value of 0x12. This might be the entire reason why
game s is even widely documented to trigger a stage
selection to begin with – on the widely circulated TH04 .HDI that
boots MS-DOS 6.20, or on DOSBox-X, the s parameter
doesn't work because both DOS systems leave the resident debug flag
byte at 0x00. And since ANNIV.EXE pushes
MDRV98 into that area of conventional DOS RAM, anniv s
previously didn't work even on MS-DOS 3.30C.
Both bugs in the
📝 1×1 particle system during the Mima fight
have been fixed. These include the off-by-one error that killed off the
very first particle on the 80th
frame and left it in VRAM, and, just like every other entity type, a
replacement of ZUN's EGC unblitter with the new pixel-perfect and fast
one. Until I've rearchitected unblitting as a whole, the particles will
now merely rip barely visible 1×1 holes into the sprites they overlap.
The bomb value shown in the lowest line of the in-game
debug mode output is now right-aligned together with the rest of the
values. This ensures that the game always writes a consistent number
of characters to TRAM, regardless of the magnitude of the
bomb value, preventing the seemingly wrong
timer values that appeared in the original game
whenever the value of the bomb variable changed to a
lower number of digits:
Finally, I've streamlined VRAM page access changes, which allowed me to
consistently replace ZUN's expensive function call with the optimal two
inlined x86 instructions. Interestingly, this change alone removed
2 KiB from the binary size, which is almost all of the difference
between 📝 the P0234-1 release and this
one. Let's see how much longer we can make each new release of
ANNIV.EXE smaller than the previous one.
The final point, however, raised the question of what we're now going to do
about
📝 a certain issue in the 地獄/Jigoku Bad Ending.
ZUN's original expensive way of switching the accessed VRAM page was the
main reason behind the lag frames on slower PC-98 systems, and
search-replacing the respective function calls would immediately get us to
the optimized version shown in that blog post. But is this something we
actually want? If we wanted to retain the lag, we could surely preserve that
function just for this one instance… The discovery of this issue
predates the clear distinction between bloat, quirks, and bugs, so it makes
sense to first classify what this issue even is. The distinction comes all
down to observability, which I defined as changes to rendered frames
between explicitly defined frame boundaries. That alone would be enough to
categorize any cause behind lag frames as bloat, but it can't hurt to be
more explicit here.
Therefore, I now officially judge observability in terms of an infinitely
fast PC-98 that can instantly render everything between two explicitly
defined frames, and will never add additional lag frames. If we plan to port
the games to faster architectures that aren't bottlenecked by disappointing
blitter chips, this is the only reasonable assumption to make, in my
opinion: The minimum system requirements in the games' README files are
minimums, after all, not recommendations. Chasing the exact frame
drop behavior that ZUN must have experienced during the time he developed
these games can only be a guessing game at best, because how can we know
which PC-98 model ZUN actually developed the games on? There might even be
more than one model, especially when it comes to TH01 which had been in
development for at least two years before ZUN first sold it. It's also not
like any current PC-98 emulator even claims to emulate the specific timing
of any existing model, and I sure hope that nobody expects me to import a
bunch of bulky obsolete hardware just to count dropped frames.
That leaves the tearing, where it's much more obvious how it's a bug. On an
infinitely fast PC-98, the ドカーン
frame would never be visible, and thus falls into the same category as the
📝 two unused animations in the Sariel fight.
With only a single unconditional 2-frame delay inside the animation loop, it
becomes clear that ZUN intended both frames of the animation to be displayed
for 2 frames each:
No tearing, and 34 frames in total for the first of the two
instances of this animation.
Next up: Taking the oldest still undelivered push and working towards TH04
position independence in preparation for multilingual translations. The
Shuusou Gyoku OpenGL backend shouldn't take that much longer either,
so I should have lots of stuff coming up in May afterward.
Last blog post before the 100% completion of TH01! The final parts of
REIIDEN.EXE would feel rather out of place in a celebratory
blog post, after all. They provided quite a neat summary of the typical
technical details that are wrong with this game, and that I now get to
mention for one final time:
The Orb's animation cycle is maybe two frames shorter than it should
have been, showing its last sprite for just 1 frame rather than 3:
The text in the Pause and Continue menus is not quite correctly
centered.
The memory info screen hides quite a bit of information about the .PTN
buffers, and obscures even the info that it does show behind
misleading labels. The most vital information would have been that ZUN could
have easily saved 20% of the memory by using a structure without the
unneeded alpha plane… Oh, and the REWIRTE option
mapped to the ⬇️ down arrow key simply redraws the info screen. Might be
useful after a NODE CHEAK, which replaces the output
with its own, but stays within the same input loop.
But hey, there's an error message if you start REIIDEN.EXE
without a resident MDRV2 or a correctly prepared resident structure! And
even a good, user-friendly one, asking the user to launch the batch file
instead. For some reason, this convenience went out of fashion in the later
games.
The Game Over animation (how fitting) gives us TH01's final piece of weird
sprite blitting code, which seriously manages to include 2 bugs and 3 quirks
in under 50 lines of code. In debug mode, you can trigger this effect by
pressing the ⬇️ down arrow key, which certainly explains why I encountered
seemingly random Game Over events during all the tests I did with this
game…
The animation appears to have changed quite a bit during development, to the
point that probably even ZUN himself didn't know what he wanted it to look
like in the end:
The original version unblits a 32×32 rectangle around Reimu that only
grows on the X axis… for the first 5 frames. The unblitting call is
only run if the corresponding sprite wasn't clipped at the edges of the
playfield in the frame before, and ZUN uses the animation's frame
number rather than the sprite loop variable to index the per-sprite
clip flag array. The resulting out-of-bounds access then reads the
sprite coordinates instead, which are never 0, thus interpreting
all 5 sprites as clipped.
This variant would interpret the declared 5 effect coordinates as
distinct sprites and unblit them correctly every frame. The end result
is rather wimpy though… hardly appropriate for a Game Over, especially
with the original animation in mind.
This variant would not unblit anything, and is probably closest to what
the final animation should have been.
Finally, we get to the big main() function, serving as the duct
tape that holds this game together. It may read rather disorganized with all
the (actually necessary) assignments and function calls, but the only
actual minor issue I've seen there is that you're robbed of any
pellet destroy bonus collected on the final frame of the final boss. There
is a certain charm in directly nesting the infinite main gameplay loop
within the infinite per-life loop within the infinite stage loop. But come
on, why is there no fourth scene loop? Instead, the
game just starts a new REIIDEN.EXE process before and after a
boss fight. With all the wildly mutated global state, that was probably a
much saner choice.
The final secrets can be found in the debug mode stage selection. ZUN
implemented the prompts using the C standard library's scanf()
function, which is the natural choice for quick-and-dirty testing features
like this one. However, the C standard library is also complete and utter
trash, and so it's not surprising that both of the scanf()
calls do… well, probably not what ZUN intended. The guaranteed out-of-bounds
memory access in the select_flag route prompt thankfully has no
real effect on the game, but it gets really interesting with the 面数 stage prompt.
Back in 2020, I already wrote about
📝 stages 21-24, and how they're loaded from actual data that ZUN shipped with the game.
As it now turns out, the code that maps stage IDs to STAGE?.DAT
scene numbers contains an explicit branch that maps any (1-based) stage
number ≥21 to scene 7. Does this mean that an Extra Stage was indeed planned
at some point? That branch seems way too specific to just be meant as a
fallback. Maybe
Asprey was on to something after all…
However, since ZUN passed the stage ID as a signed integer to
scanf(), you can also enter negative numbers. The only place
that kind of accidentally checks for them is the aforementioned stage
ID → scene mapping, which ensures that (1-based) stages < 5 use
the shrine's background image and BGM. With no checks anywhere else, we get
a new set of "glitch stages":
Stage -1Stage -2Stage -3Stage -4Stage -5
The scene loading function takes the entered 0-based stage ID value modulo
5, so these 4 are the only ones that "exist", and lower stage numbers will
simply loop around to them. When loading these stages, the function accesses
the data in REIIDEN.EXE that lies before the statically
allocated 5-element stages-of-scene array, which happens to encompass
Borland C++'s locale and exception handling data, as well as a small bit of
ZUN's global variables. In particular, the obstacle/card HP on the tile I
highlighted in green corresponds to the
lowest byte of the 32-bit RNG seed. If it weren't for that and the fact that
the obstacles/card HP on the few tiles before are similarly controlled by
the x86 segment values of certain initialization function addresses, these
glitch stages would be completely deterministic across PC-98 systems, and
technically canon…
Stage -4 is the only playable one here as it's the only stage to end up
below the
📝 heap corruption limit of 102 stage objects.
Completing it loads Stage -3, which crashes with a Divide Error
just like it does if it's directly selected. Unsurprisingly, this happens
because all 50 card bytes at that memory location are 0, so one division (or
in this case, modulo operation) by the number of cards is enough to crash
the game.
Stage -5 is modulo'd to 0 and thus loads the first regular stage. The only
apparent broken element there is the timer, which is handled by a completely
different function that still operates with a (0-based) stage ID value of
-5. Completing the stage loads Stage -4, which also crashes, but only
because its 61 cards naturally cause the
📝 stack overflow in the flip-in animation for any stage with more than 50 cards.
And that's REIIDEN.EXE, the biggest and most bloated PC-98
Touhou executable, fully decompiled! Next up: Finishing this game with the
main menu, and hoping I'll actually pull it off within 24 hours. (If I do,
we might all have to thank 32th
System, who independently decompiled half of the remaining 14
functions…)
Wow, it's been 3 days and I'm already back with an unexpectedly long post
about TH01's bonus point screens? 3 days used to take much longer in my
previous projects…
Before I talk about graphics for the rest of this post, let's start with the
exact calculations for both bonuses. Touhou Wiki already got these right,
but it still makes sense to provide them here, in a format that allows you
to cross-reference them with the source code more easily. For the
card-flipping stage bonus:
Time
min((Stage timer * 3), 6553)
Continuous
min((Highest card combo * 100), 6553)
Bomb&Player
min(((Lives * 200) + (Bombs * 100)), 6553)
STAGE
min(((Stage number - 1) * 200), 6553)
BONUS Point
Sum of all above values * 10
The boss stage bonus is calculated from the exact same metrics, despite half
of them being labeled differently. The only actual differences are in the
higher multipliers and in the cap for the stage number bonus. Why remove it
if raising it high enough also effectively disables it?
Time
min((Stage timer * 5), 6553)
Continuous
min((Highest card combo * 200), 6553)
MIKOsan
min(((Lives * 500) + (Bombs * 200)), 6553)
Clear
min((Stage number * 1000), 65530)
TOTLE
Sum of all above values * 10
The transition between the gameplay and TOTLE screens is one of the more
impressive effects showcased in this game, especially due to how wavy it
often tends to look. Aside from the palette interpolation (which is, by the
way, the first time ZUN wrote a correct interpolation algorithm between two
4-bit palettes), the core of the effect is quite simple. With the TOTLE
image blitted to VRAM page 1:
Shift the contents of a line on VRAM page 0 by 32 pixels, alternating
the shift direction between right edge → left edge (even Y
values) and the other way round (odd Y values)
Keep a cursor for the destination pixels on VRAM page 1 for every line,
starting at the respective opposite edge
Blit the 32 pixels at the VRAM page 1 cursor to the newly freed 32
pixels on VRAM page 0, and advance the cursor towards the other edge
Successive line shifts will then include these newly blitted 32 pixels
as well
Repeat (640 / 32) = 20 times, after which all new pixels
will be in their intended place
So it's really more like two interlaced shift effects with opposite
directions, starting on different scanlines. No trigonometry involved at
all.
Horizontally scrolling pixels on a single VRAM page remains one of the few
📝 appropriate uses of the EGC in a fullscreen 640×400 PC-98 game,
regardless of the copied block size. The few inter-page copies in this
effect are also reasonable: With 8 new lines starting on each effect frame,
up to (8 × 20) = 160 lines are transferred at any given time, resulting
in a maximum of (160 × 2 × 2) = 640 VRAM page switches per frame for the newly
transferred pixels. Not that frame rate matters in this situation to begin
with though, as the game is doing nothing else while playing this effect.
What does sort of matter: Why 32 pixels every 2 frames, instead of 16
pixels on every frame? There's no performance difference between doing one
half of the work in one frame, or two halves of the work in two frames. It's
not like the overhead of another loop has a serious impact here,
especially with the PC-98 VRAM being said to have rather high
latencies. 32 pixels over 2 frames is also harder to code, so ZUN
must have done it on purpose. Guess he really wanted to go for that 📽
cinematic 30 FPS look 📽 here…
Removing the palette interpolation and transitioning from a black screen
to CLEAR3.GRP makes it a lot clearer how the effect works.
Once all the metrics have been calculated, ZUN animates each value with a
rather fancy left-to-right typing effect. As 16×16 images that use a single
bright-red color, these numbers would be
perfect candidates for gaiji… except that ZUN wanted to render them at the
more natural Y positions of the labels inside CLEAR3.GRP that
are far from aligned to the 8×16 text RAM grid. Not having been in the mood
for hardcoding another set of monochrome sprites as C arrays that day, ZUN
made the still reasonable choice of storing the image data for these numbers
in the single-color .GRC form– yeah, no, of course he once again
chose the .PTN hammer, and its
📝 16×16 "quarter" wrapper functions around nominal 32×32 sprites.
The three 32×32 TOTLE metric digit sprites inside
NUMB.PTN.
Why do I bring up such a detail? What's actually going on there is that ZUN
loops through and blits each digit from 0 to 9, and then continues the loop
with "digit" numbers from 10 to 19, stopping before the number whose ones
digit equals the one that should stay on screen. No problem with that in
theory, and the .PTN sprite selection is correct… but the .PTN
quarter selection isn't, as ZUN wrote (digit % 4)
instead of the correct ((digit % 10) % 4).
Since .PTN quarters are indexed in a row-major
way, the 10-19 part of the loop thus ends up blitting
2 →
3 →
0 →
1 →
6 →
7 →
4 →
5 →
(nothing):
This footage was slowed down to show one sprite blitting operation per
frame. The actual game waits a hardcoded 4 milliseconds between each
sprite, so even theoretically, you would only see roughly every
4th digit. And yes, we can also observe the empty quarter
here, only blitted if one of the digits is a 9.
Seriously though? If the deadline is looming and you've got to rush
some part of your game, a standalone screen that doesn't affect
anything is the best place to pick. At 4 milliseconds per digit, the
animation goes by so fast that this quirk might even add to its
perceived fanciness. It's exactly the reason why I've always been rather
careful with labeling such quirks as "bugs". And in the end, the code does
perform one more blitting call after the loop to make sure that the correct
digit remains on screen.
The remaining ¾ of the second push went towards transferring the final data
definitions from ASM to C land. Most of the details there paint a rather
depressing picture about ZUN's original code layout and the bloat that came
with it, but it did end on a real highlight. There was some unused data
between ZUN's non-master.lib VSync and text RAM code that I just moved away
in September 2015 without taking a closer look at it. Those bytes kind of
look like another hardcoded 1bpp image though… wait, what?!
Lovely! With no mouse-related code left in the game otherwise, this cursor
sprite provides some great fuel for wild fan theories about TH01's
development history:
Could ZUN have 📝 stolen the basic PC-98
VSync or text RAM function code from a source that also implemented mouse
support?
Or was this game actually meant to have mouse-controllable portions at
some point during development? Even if it would have just been the
menus.
… Actually, you know what, with all shared data moved to C land, I might as
well finish FUUIN.EXE right now. The last secret hidden in its
main() function: Just like GAME.BAT supports
launching the game in a debug mode from the DOS command line,
FUUIN.EXE can directly launch one of the game's endings. As
long as the MDRV2 driver is installed, you can enter
fuuin t1 for the 魔界/Makai Good Ending, or
fuuin t for 地獄/Jigoku Good Ending.
Unfortunately, the command-line parameter can only control the route.
Choosing between a Good or Bad Ending is still done exclusively through
TH01's resident structure, and the continues_per_scene array in
particular. But if you pre-allocate that structure somehow and set one of
the members to a nonzero value, it would work. Trainers, anyone?
Alright, gotta get back to the code if I want to have any chance of
finishing this game before the 15th… Next up: The final 17
functions in REIIDEN.EXE that tie everything together and add
some more debug features on top.