Oh look, it's another rather short and straightforward boss with a rather
small number of bugs and quirks. Yup, contrary to the character's
popularity, Mima's premiere is really not all that special in terms of code,
and continues the trend established with
📝 Kikuri and
📝 SinGyoku. I've already covered
📝 the initial sprite-related bugs last November,
so this post focuses on the main code of the fight itself. The overview:
The TH01 Mima fight consists of 3 phases, with phases 1 and 3 each
corresponding to one half of the 12-HP bar.
📝 Just like with SinGyoku, the distinction
between the red-white and red parts is purely visual once again, and doesn't
reflect anything about the boss script. As usual, all of the phases have to
be completed in order.
Phases 1 and 3 cycle through 4 danmaku patterns each, for a total of 8.
The cycles always start on a fixed pattern.
3 of the patterns in each phase feature rotating white squares, thus
introducing a new sprite in need of being unblitted.
Phase 1 additionally features the "hop pattern" as the last one in its
cycle. This is the only pattern where Mima leaves the seal in the center of
the playfield to hop from one edge of the playfield towards the other, while
also moving slightly higher up on the Y axis, and staying on the final
position for the next pattern cycle. For the first time, Mima selects a
random starting edge, which is then alternated on successive cycles.
Since the square entities are local to the respective pattern function,
Phase 1 can only end once the current pattern is done, even if Mima's HP are
already below 6. This makes Mima susceptible to the
📝 debug mode HP bar heap corruption bug.
Phase 2 simply consists of a spread-in teleport back to Mima's initial
position in the center of the playfield. This would only have been strictly
necessary if phase 1 ended on the hop pattern, but is done regardless of the
previous pattern, and does provide a nice visual separation between the two
That's it – nothing special in Phase 3.
And there aren't even any weird hitboxes this time. What is maybe
special about Mima, however, is how there's something to cover about all of
her patterns. Since this is TH01, it's won't surprise anyone that the
rotating square patterns are one giant copy-pasta of unblitting, updating,
and rendering code. At least ZUN placed the core polar→Cartesian
transformation in a separate function for creating regular polygons
with an arbitrary number of sides, which might hint toward some more varied
shapes having been planned at one point?
5 of the 6 patterns even follow the exact same steps during square update
Calculate square corner coordinates
Unblit the square
Update the square angle and radius
Use the square corner coordinates for spawning pellets or missiles
Recalculate square corner coordinates
Render the square
Notice something? Bullets are spawned before the corner coordinates
are updated. That's why their initial positions seem to be a bit off – they
are spawned exactly in the corners of the square, it's just that it's
the square from 8 frames ago.
Once ZUN reached the final laser pattern though, he must have noticed that
there's something wrong there… or maybe he just wanted to fire those
lasers independently from the square unblit/update/render timer for a
change. Spending an additional 16 bytes of the data segment for conveniently
remembering the square corner coordinates across frames was definitely a
When Mima isn't shooting bullets from the corners of a square or hopping
across the playfield, she's raising flame pillars from the bottom of the playfield within very specifically calculated
random ranges… which are then rendered at byte-aligned VRAM positions, while
collision detection still uses their actual pixel position. Since I don't
want to sound like a broken record all too much, I'll just direct you to
📝 Kikuri, where we've seen the exact same issue with the teardrop ripple sprites.
The conclusions are identical as well.
However, I'd say that the saddest part about this pattern is how choppy it
is, with the circle/pillar entities updating and rendering at a meager 7
FPS. Why go that low on purpose when you can just make the game render ✨
smoothly ✨ instead?
The reason quickly becomes obvious: With TH01's lack of optimization, going
for the full 56.4 FPS would have significantly slowed down the game on its
intended 33 MHz CPUs, requiring more than cheap surface-level ASM
optimization for a stable frame rate. That might very well have been ZUN's
reason for only ever rendering one circle per frame to VRAM, and designing
the pattern with these time offsets in mind. It's always been typical for
PC-98 developers to target the lowest-spec models that could possibly still
run a game, and implementing dynamic frame rates into such an engine-less
game is nothing I would wish on anybody. And it's not like TH01 is
particularly unique in its choppiness anyway; low frame rates are actually a
rather typical part of the PC-98 game aesthetic.
The final piece of weirdness in this fight can be found in phase 1's hop
pattern, and specifically its palette manipulation. Just from looking at the
pattern code itself, each of the 4 hops is supposed to darken the hardware
palette by subtracting #444 from every color. At the last hop, every color
should have therefore been reduced to a pitch-black #000,
leaving the player completely blind to the movement of the chasing pellets
for 30 frames and making the pattern quite ghostly indeed. However, that's
not what we see in the actual game:
Looking at the frame counter, it appears that something outside the
pattern resets the palette every 40 frames. The only known constant with a
value of 40 would be the invincibility frames after hitting a boss with the
Orb, but we're not hitting Mima here…
But as it turns out, that's exactly where the palette reset comes from: The
hop animation darkens the hardware palette directly, while the
📝 infamous 12-parameter boss collision handler function
unconditionally resets the hardware palette to the "default boss palette"
every 40 frames, regardless of whether the boss was hit or not. I'd classify
this as a bug: That function has no business doing periodic hardware palette
resets outside the invincibility flash effect, and it completely defies
common sense that it does.
That explains one unexpected palette change, but could this function
possibly also explain the other infamous one, namely, the temporary green
discoloration in the Konngara fight? That glitch comes down to how the game
actually uses two global "default" palettes: a default boss
palette for undoing the invincibility flash effect, and a default
stage palette for returning the colors back to normal at the end of
the bomb animation or when leaving the Pause menu. And sure enough, the
stage palette is the one with the green color, while the boss
palette contains the intended colors used throughout the fight. Sending the
latter palette to the graphics chip every 40 frames is what corrects
the discoloration, which would otherwise be permanent.
The green color comes from BOSS7_D1.GRP, the scroll background,
and that's what turns this into a clear bug: The stage palette is
only set a single time in the entire fight, at the beginning of the entrance
animation, to the palette of this image. Apart from consistency reasons, it
doesn't even make sense to set the stage palette there, as you can't enter
the Pause menu or bomb during a blocking animation function.
And just 3 lines of code later, ZUN loads BOSS8_A1.GRP, the
main background image of the fight. Moving the stage palette assignment
there would have easily prevented the discoloration.
But yeah, as you can tell, palette manipulation is complete jank in this
game. Why differentiate between a stage and a boss palette to begin with?
The blocking Pause menu function could have easily copied the original
palette to a local variable before darkening it, and then restored it after
closing the menu. It's not so easy for bombs as the intended palette could
change between the start and end of the animation, but the code could have
still been simplified a lot if there was just one global "default palette"
variable instead of two. Heck, even the other bosses who manipulate their
palettes correctly only do so because they manually synchronize the two
after every change. The proper defense against bugs that result from wild
mutation of global state is to get rid of global state, and not to put up
safety nets hidden in the middle of existing effect code.
In any case, that's Mima done! 7th PC-98 Touhou boss fully
decompiled, 24 bosses remaining, and 59 functions left in all of TH01.
In other thrilling news, my call for secondary funding priorities in new
TH01 contributions has given us three different priorities so far. This
raises an interesting question though: Which of these contributions should I
now put towards TH01 immediately, and which ones should I leave in the
backlog for the time being? Since I've never liked deciding on priorities,
let's turn this into a popularity contest instead: The contributions with
the least popular secondary priorities will go towards TH01 first, giving
the most popular priorities a higher chance to still be left over after TH01
is done. As of this delivery, we'd have the following popularity order:
TH05 (1.67 pushes), from T0182
Seihou (1 push), from T0184
TH03 (0.67 pushes), from T0146
Which means that T0146 will be consumed for TH01 next, followed by T0184 and
then T0182. I only assign transactions immediately before a delivery though,
so you all still have the chance to change up these priorities before the
Next up: The final boss of TH01 decompilation, YuugenMagan… if the current
or newly incoming TH01 funds happen to be enough to cover the entire fight.
If they don't turn out to be, I will have to pass the time with some Seihou
work instead, missing the TH01 anniversary deadline as a result.Edit (2022-07-18): Thanks to Yanga for
securing the funding for YuugenMagan after all! That fight will feature
slightly more than half of all remaining code in TH01's
REIIDEN.EXE and the single biggest function in all of PC-98
Touhou, let's go!
More than 50% of all PC-98 Touhou game code has now been
reverse-engineered! 🎉 While this number isn't equally distributed among the
games, we've got one game very close to 100% and reverse-engineered most of
the core features of two others. During the last 32 months of continuous
funding, I've averaged an overall speed of 1.11% total RE per month. That
looks like a decent prediction of how much more time it will take for 100%
across all games – unless, of course, I'd get to work towards some of the
non-RE goals in the meantime.
70 functions left in TH01, with less than 10,000 ASM instructions
remaining! Due to immense hype, I've temporarily raised the cap by 50% until
August 15. With the last TH01 pushes delivering at roughly 1.5× of the
currently calculated average speed, that should be more than enough to get
TH01 done – especially since I expect YuugenMagan to come with lots of
redundant code. Therefore, please also request a secondary priority for
these final TH01 RE contributions.
So, how did this card-flipping stage obstacle delivery get so horribly
delayed? With all the different layouts showcased in the 28 card-flipping
stages, you'd expect this to be among the more stable and bug-free parts of
the codebase. Heck, with all stage objects being placed on a 32×32-pixel
grid, this is the first TH01-related blog post this year that doesn't have
to describe an alignment-related unblitting glitch!
That alone doesn't mean that this code is free from quirky behavior though,
and we have to look no further than the first few lines of the collision
handling for round bumpers to already find a whole lot of that. Simplified,
they do the following:
Immediately, you wonder why these assignments only exist for the Y
coordinate. Sure, hitting a bumper from the left or right side should happen
less often, but it's definitely possible. Is it really a good idea to warp
the Orb to the top or bottom edge of a bumper regardless?
What's more important though: The fact that these immediate assignments
exist at all. The game's regular Orb physics work by producing a Y velocity
from the single force acting on the Orb and a gravity factor, and are
completely independent of its current Y position. A bumper collision does
also apply a new force onto the Orb further down in the code, but these
assignments still bypass the physics system and are bound to have
some knock-on effect on the Orb's movement.
To observe that effect, we just have to enter Stage 18 on the 地獄/Jigoku route, where it's particularly trivial to
reproduce. At a 📝 horizontal velocity of ±4,
these assignments are exactly what can cause the Orb to endlessly
bounce between two bumpers. As rudimentary as the Orb's physics may be, just
letting them do their work would have entirely prevented these loops:
Now, you might be thinking that these Y assignments were just an attempt to
prevent the Orb from colliding with the same bumper again on the next frame.
After all, those 24 pixels exactly correspond to ⅓ of the height of a
bumper's hitbox with an additional pixel added on top. However, the game
already perfectly prevents repeated collisions by turning off collision
testing with the same bumper for the next 7 frames after a collision. Thus,
we can conclude that ZUN either explicitly coded bumper collision handling
to facilitate these loops, or just didn't take out that code after
inevitably discovering what it did. This is not janky code, it's not a
glitch, it's not sarcasm from my end, and it's not the game's physics being
But wait. Couldn't these assignments just be a remnant from a time in
development before ZUN decided on the 7-frame delay on further
collisions? Well, even that explanation stops holding water after the next
few lines of code. Simplified, again:
What's important here is the part that's not in the code – namely,
anything that handles X velocities of -8 or +8. In those cases, the Orb
simply continues in the same horizontal direction. The manual Y assignment
is the only part of the code that actually prevents a collision there, as
the newly applied force is not guaranteed to be enough:
Forgetting to handle ⅖ of your discrete X velocity cases is simply not
something you do by accident. So we might as well say that ZUN deliberately
designed the game to behave exactly as it does in this regard.
Bumpers also come in vertical or horizontal bar shapes. Their collision
handling also turns off further collision testing for the next 7 frames, and
doesn't do any manual coordinate assignment. That's definitely a step up in
cleanliness from round bumpers, but it doesn't seem to keep in mind that the
player can fire a new shot every 4 frames when standing still. That makes it
immediately obvious why this works:
That's the most well-known case of reducing the Orb's horizontal velocity to
0 by exactly hitting it with shots in its center and then button-mashing it
through a horizontal bar. This also works with vertical bars and yields even
more interesting results there, but if we want to have any chance of
understanding what happens there, we have to first go over some basics:
Collision detection for all stage obstacles is done in row-major
order from the top-left to the bottom-right corner of the
All obstacles are collision-tested independently from each other, with
the collision response code immediately following the test.
The hitboxes for bumper bars extend far past their 32×32 sprites to make
sure that the Orb can collide with them from any side. They are a
pixel-perfect* 87×56 pixels for horizontal bars, and 57×87 pixels for
vertical ones. Yes, that's no typo, they really do differ in one pixel.
Changing the Y velocity during such a collision just involves applying a
new force with the magnitude of the negated current Y velocity, which can be
done multiple times during a frame without changing the result. This
explains why the force is correctly inverted in the clip above, despite the
Orb colliding with two bumpers simultaneously.
Lacking a similar force system, the X coordinate is simply directly
However, if that were everything the game did, kicking the Orb into a column
of vertical bumper bars would lead them to behave more like a rope that the
Orb can climb, as the initial collision with two hitboxes cancels out the
intended sign change that reflects the Orb away from the bars:
While that would have been a fun gameplay mechanic on its own, it
immediately breaks apart once you place two vertical bumper bars next to
each other. Due to how these bumper bar hitboxes extend past their sprites,
any two adjacent vertical bars will end up with the exact same hitbox in
absolute screen coordinates. Stage 17 on the 魔界
/Makai route contains exactly such a layout:
ZUN's workaround: Setting a "vertical bumper bar block flag" after any
collision with such a bar, which simply disables any collision with
any vertical bar for the next 7 frames. This quick hack made all
vertical bars work as intended, and avoided the need for involving the Orb's
X velocity in any kind of physics system.
Edit (2022-07-12): This flag only works around glitches
that would be caused by simultaneously colliding with more than one vertical
bar. The actual response to a bumper bar collision still remains unaffected,
and is very naive:
Horizontal bars always invert the Orb's Y velocity
Vertical bars invert either the Y or X velocity depending on whether
the Orb's current X velocity is 0 (Y) or not (X)
These conditions are only correct if the Orb comes in at an angle roughly
between 45° and 135° on either side of a bar. If it's anywhere close to 0°
or 180°, this response will be incorrect, and send the Orb straight
through the bar. Since the large hitboxes make this easily possible, you can
still get the Orb to climb a vertical column, or glide along a horizontal
With that clarified, we can now try mashing the Orb into these two vertical
At first, that workaround doesn't seem to make a difference here. As we
expect, the frame numbers now tell us that only one of the two bumper bars
in a row activates, but we couldn't have told otherwise as the number of
bars has no effect on newly applied Y velocity forces. On a closer look, the
Orb's rise to the top of the playfield is in fact caused by that
workaround though, combined with the unchanged top-to-bottom order of
collision testing. As soon as any bumper bar completed its 7
collision delay frames, it resets the aforementioned flag, which already
reactivates collision handling for any remaining vertical bumper bars during
the same frame. Look out for frames with both a 7 and a 1: The 7 will always
appear before the 1 in the
row-major order. Whenever this happens, the current oscillation period is
cut down from 7 to 6 frames – and because collision testing runs from top to
bottom, this will always happen during the falling part. Depending on the Y
velocity, the rising part may also be cut down to 6 frames from time to
time, but that one at least has a chance to last for the full 7
frames. This difference adds those crucial extra frames of upward movement,
which add up to send the Orb to the top. Without the flag, you'd always see
the Orb oscillating between a fixed range of the bar column.
Finally, it's the "top of playfield" force that gradually slows down the Orb
and makes sure it ultimately only moves at sub-pixel velocities, which have
no visible effect. Because
📝 the regular effect of gravity is reset with
each newly applied force, it's completely negated during most of the climb.
This even holds true once the Orb reached the top: Since the Orb requires a
negative force to repeatedly arrive up there and be bounced back, this force
will stay active for the first 5 of the 7 collision frames and not move the
Orb at all. Once gravity kicks in at the 5th frame and adds 1 to
the Y velocity, it's already too late: The new velocity can't be larger than
0.5, and the Orb only has 1 or 2 frames before the flag reset causes it to
be bounced back up to the top again.
Portals, on the other hand, turn out to be much simpler than the old
description that ended up on Touhou Wiki in October 2005 might suggest.
Everything about their teleportations is random: The destination portal, the
exit force (as an integer between -9 and +9), as well as the exit X
velocity, with each of the
📝 5 distinct horizontal velocities having an
equal chance of being chosen. Of course, if the destination portal is next
to the left or right edge of the playfield and it chooses to fire the Orb
towards that edge, it immediately bounces off into the opposite direction,
whereas the 0 velocity is always selected with a constant 20% probability.
The selection process for the destination portal involves a bit more than a
single rand() call. The game bundles all obstacles in a single
structure of dynamically allocated arrays, and only knows how many obstacles
there are in total, not per type. Now, that alone wouldn't have much
of an impact on random portal selection, as you could simply roll a random
obstacle ID and try again if it's not a portal. But just to be extra cute,
ZUN instead iterates over all obstacles, selects any non-entered portal with
a chance of ¼, and just gives up if that dice roll wasn't successful after
16 loops over the whole array, defaulting to the entered portal in that
In all its silliness though, this works perfectly fine, and results in a
chance of 0.7516(𝑛 - 1) for the Orb exiting out of the
same portal it entered, with 𝑛 being the total number of portals in a
stage. That's 1% for two portals, and 0.01% for three. Pretty decent for a
random result you don't want to happen, but that hurts nobody if it does.
The one tiny ZUN bug with portals is technically not even part of the newly
decompiled code here. If Reimu gets hit while the Orb is being sent through
a portal, the Orb is immediately kicked out of the portal it entered, no
matter whether it already shows up inside the sprite of the destination
portal. Neither of the two portal sprites is reset when this happens,
leading to "two Orbs" being visible simultaneously.
This makes very little sense no matter how you look at it. The Orb doesn't
receive a new velocity or force when this happens, so it will simply
re-enter the same portal once the gameplay resumes on Reimu's next life:
That left another ½ of a push over at the end. Way too much time to finish
FUUIN.exe, way too little time to start with Mima… but the bomb
animation fit perfectly in there. No secrets or bugs there, just a bunch of
sprite animation code wasting at least another 82 bytes in the data segment.
The special effect after the kuji-in sprites uses the same single-bitplane
32×32 square inversion effect seen at the end of Kikuri's and Sariel's
entrance animation, except that it's a 3-stack of 16-rings moving at 6, 7,
and 8 pixels per frame respectively. At these comparatively slow speeds, the
byte alignment of each square adds some further noise to the discoloration
pattern… if you even notice it below all the shaking and seizure-inducing
hardware palette manipulation.
And yes, due to the very destructive nature of the effect, the game does in
fact rely on it only being applied to VRAM page 0. While that will cause
every moving sprite to tear holes into the inverted squares along its
trajectory, keeping a clean playfield on VRAM page 1 is what allows all that
pixel damage to be easily undone at the end of this 89-frame animation.
Next up: Mima! Let's hope that stage obstacles already were the most complex
part remaining in TH01…
The "bad" news first: Expanding to Stripe in order to support Google Pay
requires bureaucratic effort that is not quite justified yet, and would only
be worth it after the next price increase.
Visualizing technical debt has definitely been overdue for a while though.
With 1 of these 2 pushes being focused on this topic, it makes sense to
summarize once again what "technical debt"
means in the context of ReC98, as this info was previously kind of scattered
over multiple blog posts. Mainly, it encompasses
any ZUN-written code
that we did name and reverse-engineer,
but which we simply moved out into dedicated files that are then
#included back into the big .ASM translation units,
without worrying about decompilation or proving undecompilability for
Technically (ha), it would also include all of master.lib, which has
always been compiled into the binaries in this way, and which will require
quite a bit of dedicated effort to be moved out into a properly linkable
library, once it's feasible. But this code has never been part of any
progress metric – in fact, 0% RE is
defined as the total number of x86 instructions in the binary minus
any library code. There is also no relation between instruction numbers and
the time it will take to finalize master.lib code, let alone a precedent of
how much it would cost.
If we now want to express technical debt as a percentage, it's clear where
the 100% point would be: when all RE'd code is also compiled in from a
translation unit outside the big .ASM one. But where would 0% be? Logically,
it would be the point where no reverse-engineered code has ever been moved
out of the big translation units yet, and nothing has ever been decompiled.
With these boundary points, this is what we get:
Not too bad! So it's 6.22% of total RE that we will have to revisit at some
point, concentrated mostly around TH04 and TH05 where it resulted from a
focus on position independence. The prices also give an accurate impression
of how much more work would be required there.
But is that really the best visualization? After all, it requires an
understanding of our definition of technical debt, so it's maybe not the
most useful measurement to have on a front page. But how about subtracting
those 6.22% from the number shown on the RE% bars? Then, we get this:
Which is where we get to the good news: Twitter surprisingly helped me out
in choosing one visualization over the other, voting
7:2 in favor of the Finalized version. While this one requires
you to manually calculate € finalized - € RE'd to
obtain the raw financial cost of technical debt, it clearly shows, for the
first time, how far away we are from the main goal of fully decompiling all
5 games… at least to the extent it's possible.
Now that the parser is looking at these recursively included .ASM files for
the first time, it needed a small number of improvements to correctly handle
the more advanced directives used there, which no automatic disassembler
would ever emit. Turns out I've been counting some directives as
instructions that never should have been, which is where the additional
0.02% total RE came from.
One more overcounting issue remains though. Some of the RE'd assembly slices
included by multiple games contain different if branches for
each game, like this:
; An example assembly file included by both TH04's and TH05's MAIN.EXE:
if (GAME eq 5)
; (Code for TH05)
; (Code for TH04)
Currently, the parser simply ignores if, else, and
endif, leading to the combined code of all branches being
counted for every game that includes such a file. This also affects the
calculated speed, and is the reason why finalization seems to be slightly
faster than reverse-engineering, at currently 471 instructions per push
compared to 463. However, it's not that bad of a signal to send: Most of the
not yet finalized code is shared between TH04 and TH05, so finalizing it
will roughly be twice as fast as regular reverse-engineering to begin with.
(Unless the code then turns out to be twice as complex than average code…
For completeness, finalization is now also shown as part of the per-commit metrics. Now it's clearly visible what I was
doing in those very slow five months between P0131 and P0140, where
the progress bar didn't move at all: Repaying 3.49% of previously
accumulated technical debt across all games. 👌
As announced, I've also implemented a new caching system for this website,
as the second main feature of these two pushes. By appending a hash string
to the URLs of static resources, your browser should now both cache them
forever and re-download them once they did change on the server. This
avoids the unnecessary (and quite frankly, embarrassing) re-requests for all
static resources that typically just return a 304 Not Modified
response. As a result, the blog should now load a bit faster on repeated
visits, especially on slower connections. That should allow me to
deliberately not paginate it for another few years, without it getting all
too slow – and should prepare us for the day when our first game
reaches 100% and the server will get smashed.
However, I am open to changing the progress blog link in the
navigation bar at the top to the list of tags, once
people start complaining.
Apart frome some more invisible correctness and QoL improvements, I've also
prepared some new funding goals, but I'll cover those once the store
reopens, next year. Syntax highlighting for code snippets would have also
been cool, but unfortunately didn't make it into those two pushes. It's
still on the list though!
Next up: Back to RE with the TH03 score file format, and other code that
Made it through almost three years without a price increase! It's been
overdue for a while, though.
With the last months being full of rather research- and documentation-heavy
pushes, I've been just about able to keep up with the existing
subscriptions. By now, the amount of quality control and documentation I
found myself putting into this project has far surpassed the raw
reverse-engineering work. Back at the beginning of 2019 when I decided on
the previous push price of 30 €, I didn't have this blog nor the current
aspirations at code quality. Neither of these have ever been reflected in
the price, and I still find it hard to put a number on them. On the other
hand, I continue to dislike the typical Patreon model of no inherent defined
obligations on my part, and no direct association of the resulting work with
the person who funded it. You might have noticed that I don't use the word
"donations" anywhere, and instead refer to them as "orders" or "purchases" –
and that's precisely for this reason.
The result, however, has been a sold-out store for pretty much all of 2021.
I can only begin to imagine how much potential revenue I've already lost
from people who might have wanted to contribute at one point, but couldn't,
and have already written off this project…
Raising prices is pretty much the only way to get the pending workload back
to a more comfortable amount. I also thought about a two-tiered system: Have
a documentation-less option for 30 €, and take 60 € for any push that should
be accompanied by a blog post. However, skimping on documentation will
compromise the quality of the code as well. Writing these blog posts
presents another chance of improving it before release, which has made quite
a difference on many occasions. And after all, this documentation is the one
thing about ReC98 that people mainly interact with. As long as we haven't
hit 100% RE, the actual code seems to be an afterthought, which is perfectly
understandable: Why start work on a bigger mod or port now if the code is
steadily improving in every aspect, and it all will be just a bit more
maintainable in a few months?
But why go more commercial then, and especially now? If the recent attention
to spaztron64's PC-98 Touhou
collection package is any indication, ReC98 has a way bigger career
potential than the dead-end RL job I found myself in. Demand for fixed
translations and replay
support is definitely there – and given that these haven't been done so
far, it's very likely that I'll end up as the one to implement such mods,
especially if that should happen before reaching 100% RE or PI. People also
still seem to want* a port to IBM-compatible DOS,
📝 even though this makes no sense? But if
this is something you all want to pay for, then sure, why not.
And even right now, working on ReC98 sure beats writing junk software using
ill-suited technologies for highly corporate clients, or living close to a
world where academic papers are valued higher than working and maintained
code. I am in fact very happy whenever I'm done with that for the day, and
get to work on ReC98! Who would have thought.
So let's try to grow this into an actual business and raise prices to match
demand, going up to a nicely divisible 60 € per push. If you all still
manage to regularly sell out the store at this level and I get to
raise prices again, I should be able to reduce RL work further and therefore
raise the cap as well. Now that I've also clarified a potential route
towards self-employment, I'm going to react to these sell-out events more
quickly, and with smaller raises. So, no further immediate doubling in the
Now, will this delay the currently highly awaited 100% completion of TH01
past August 2022, the 25th anniversary of its release? We'll see once we're
back to an almost empty backlog, after I'm done with the TH01 Sariel fight.
I'm hopeful that such a price increase will give a new voice to the goals
and priorities of less wealthy potential patrons. This crowdfunding is very
much designed to be hacked by "microtransactions" – small contributions with
specific requests that require other, larger generic contributions to be
fulfilled – and I'd like to see more of that. 😛 And even if 60 € per push
is already more than the combined fandom wants to pay, that means I can get
the 📝 16-bit build system done before the
first big 100% release. (Trust me, you really want that!)
I will still deliver the entire current backlog at the value the
contributions were originally purchased at. Due to the way the cap has to be
calculated, these contributions now appear to have doubled in value. All
existing subscriptions will then pay for half of their original pushes
starting with their respective December 2021 transaction.
Next up: A bunch of smaller website features, including:
a caching strategy for static content,
a third set of percentage bars to visualize the remaining technical
and, hopefully, some new payment methods that expand the number of
countries I can accept money from. (Yes, this was requested at one point
earlier this year!)
Secured a 30-hour RL workweek to leave plenty of time for this project,
Touhou Patch Center's commissioned MediaWiki update work is also nearing
completion, time to reopen the store! Since it's been a long time, here's
an overview of where we currently are in each game and binary, and what
the next logical step would be:
OP.EXE: Main menu and game initialization.
REIIDEN.EXE: In order: Various utility functions, the debug and Continue menus, main(), and we're done!
MAIN.EXE: Got a few variables and functions here and there, but not a single gameplay-related structure yet. The game seems to have a weird sprite system; looks like that'll be first.
MAINE.EXE: The ending sequences.
OP.EXE: Story Mode initialization, or the title screen.
MAIN.EXE: The enemy structure? Or the bullet structure, maybe? Either way, we're still missing a lot of essential gameplay structures. At least we did make 📝 first📝 steps in this game, in contrast to TH02.
MAINL.EXE: The stage end / VS dialogue followed by the stage start screen.
OP.EXE: The game title slide-in animation, followed by the High Score menu and the Music Room. The latter will be done in parallel with the TH05 version.
Turns out that this game has a 📝 shared structure for custom entity types after all. Looking at that one would get us pretty close to 100% PI, and will be required for half of the boss scripts (Kurumi, Stage 4 Reimu / Marisa, Stage 6 Yuuka, and Gengetsu). Otherwise, we could go for either
the big "Master Spark" laser, with Gengetsu's boss script afterward
HUD rendering (in parallel with TH05)
in-game dialog (in parallel with TH05)
player update code (in parallel with TH05), required for all midbosses in this game
player shot control functions
the end-of-stage bonus calculation
MAINE.EXE: High score name entry, or ending script parsing in parallel with TH05.
OP.EXE: High Score menu, followed by the Music Room. Will be done in parallel with TH04's version.
MAIN.EXE: Got quite a lot of segment splits where we could immediately continue:
midboss and boss script code, either continuing with the in-game order and the Stage 1 Sara fight, or going directly for
Mai & Yuki,
the Extra Stage midboss,
stage tile rendering
in-game dialog (in parallel with TH04)
player update code (in parallel with TH04)
items and extends
the single-color areas of boss backgrounds, which use the GRCG's TDW mode
HUD rendering (in parallel with TH04, and getting us right in front of some boss sprite animations)
player shot collision detection and rendering
MAINE.EXE: The staff roll animation, or ending script parsing in parallel with TH04.
But as always, you can request pretty much any other part of any game.
We're now at a pretty good place as far as arbitrary requests are
concerned, as I simply can't decide myself where to put all the current
pending contributions in the funding backlog. 😅 By
spending only the missing amount of money to complete any of those, you can
capture any of those "fractional" contributions towards a specific goal.
The next specific requests are going to set the priorities of this project
for quite some time! The best strategy: Spend a low amount of money on
something very specific, and watch as existing generic contributions will
necessarily have to be put towards making that specific goal happen 😛
ReC98 would highly benefit from a build server – both in order to
immediately spot issues like this one, and as a service for modders.
Even more so than the usual open-source project of its size, I would say.
But that might be exactly
because it doesn't seem like something you can trivially outsource
to one of the big CI providers for open-source projects, and quickly set
it up with a few lines of YAML.
That might still work in the beginning, and we might get by with a regular
64-bit Windows 10 and DOSBox running the exact build tools from the DevKit.
Ideally, though, such a server should really run the optimal configuration
of a 32-bit Windows 10, allowing both the 32-bit and the 16-bit build step
to run natively, which already is something that no popular CI service out
there offers. Then, we'd optimally expand to Linux, every other Windows
version down to 95, emulated PC-98 systems, other TASM versions… yeah, it'd
be a lot. An experimental project all on its own, with additional hosting
costs and probably diminishing returns, the more it expands…
I've added it as a category to the order form, let's see how much interest
there is once the store reopens (which will be at the beginning of May, at
the latest). That aside, it would 📝 also be
a great project for outside contributors!
So, technical debt, part 8… and right away, we're faced with TH03's
low-level input function, which
📝 once📝 again📝 insists on being word-aligned in a way we
can't fake without duplicating translation units.
Being undecompilable isn't exactly the best property for a function that
has been interesting to modders in the past: In 2018,
spaztron64 created an
ASM-level mod that hardcoded more ergonomic key bindings for human-vs-human
multiplayer mode: 2021-04-04-TH03-WASD-2player.zip
However, this remapping attempt remained quite limited, since we hadn't
(and still haven't) reached full position independence for TH03 yet.
There's quite some potential for size optimizations in this function, which
would allow more BIOS key groups to already be used right now, but it's not
all that obvious to modders who aren't intimately familiar with x86 ASM.
Therefore, I really wouldn't want to keep such a long and important
function in ASM if we don't absolutely have to…
… and apparently, that's all the motivation I needed? So I took the risk,
and spent the first half of this push on reverse-engineering
TCC.EXE, to hopefully find a way to get word-aligned code
segments out of Turbo C++ after all.
And there is! The -WX option, used for creating
applications, messes up all sorts of code generation aspects in weird
ways, but does in fact mark the code segment as word-aligned. We can
consider ourselves quite lucky that we get to use Turbo C++ 4.0, because
this feature isn't available in any previous version of Borland's C++
That allowed us to restore all the decompilations I previously threw away…
well, two of the three, that lookup table generator was too much of a mess
in C. But what an abuse this is. The
subtly different code generation has basically required one creative
workaround per usage of -WX. For example, enabling that option
causes the regular PUSH BP and POP BP prolog and
epilog instructions to be wrapped with INC BP and
DEC BP, for some reason:
inc bp ; ???
mov bp, sp
; [… function code …]
dec bp ; ???
Luckily again, all the functions that currently require -WX
don't set up a stack frame and don't take any parameters.
While this hasn't directly been an issue so far, it's been pretty
close: snd_se_reset(void) is one of the functions that require
word alignment. Previously, it shared a translation unit with the
immediately following snd_se_play(int new_se), which does take
a parameter, and therefore would have had its prolog and epilog code messed
up by -WX.
Since the latter function has a consistent (and thus, fakeable) alignment,
I simply split that code segment into two, with a new -WX
translation unit for just snd_se_reset(void). Problem solved –
after all, two C++ translation units are still better than one ASM
translation unit. Especially with all the
previous #include improvements.
The rest was more of the usual, getting us 74% done with repaying the
technical debt in the SHARED segment. A lot of the remaining
26% is TH04 needing to catch up with TH03 and TH05, which takes
comparatively little time. With some good luck, we might get this
done within the next push… that is, if we aren't confronted with all too
many more disgusting decompilations, like the two functions that ended this
If we are, we might be needing 10 pushes to complete this after all, but
that piece of research was definitely worth the delay. Next up: One more of
Alright, back to continuing the master.hpp transition started
in P0124, and repaying technical debt. The last blog post already
announced some ridiculous decompilations… and in fact, not a single
one of the functions in these two pushes was decompilable into
idiomatic C/C++ code.
As usual, that didn't keep me from trying though. The TH04 and TH05
version of the infamous 16-pixel-aligned, EGC-accelerated rectangle
blitting function from page 1 to page 0 was fairly average as far as
unreasonable decompilations are concerned.
The big blocker in TH03's MAIN.EXE, however, turned out to be
the .MRS functions, used to render the gauge attack portraits and bomb
backgrounds. The blitting code there uses the additional FS and GS segment
registers provided by the Intel 386… which
are not supported by Turbo C++'s inline assembler, and
can't be turned into pointers, due to a compiler bug in Turbo C++ that
generates wrong segment prefix opcodes for the _FS and
Apparently I'm the first one to even try doing that with this compiler? I
haven't found any other mention of this bug…
Compiling via assembly (#pragma inline) would work around
this bug and generate the correct instructions. But that would incur yet
another dependency on a 16-bit TASM, for something honestly quite
What we can always do, however, is using __emit__() to simply
output x86 opcodes anywhere in a function. Unlike spelled-out inline
assembly, that can even be used in helper functions that are supposed to
inline… which does in fact allow us to fully abstract away this compiler
bug. Regular if() comparisons with pseudo-registers
wouldn't inline, but "converting" them into C++ template function
specializations does. All that's left is some C preprocessor abuse
to turn the pseudo-registers into types, and then we do retain a
normal-looking poke() call in the blitting functions in the
Yeah… the result is
I may have gone too far in a few places…
One might certainly argue that all these ridiculous decompilations
actually hurt the preservation angle of this project. "Clearly, ZUN
couldn't have possibly written such unreasonable C++ code.
So why pretend he did, and not just keep it all in its more natural ASM
form?" Well, there are several reasons:
Future port authors will merely have to translate all the
pseudo-registers and inline assembly to C++. For the former, this is
typically as easy as replacing them with newly declared local variables. No
need to bother with function prolog and epilog code, calling conventions, or
the build system.
No duplication of constants and structures in ASM land.
As a more expressive language, C++ can document the code much better.
Meticulous documentation seems to have become the main attraction of ReC98
these days – I've seen it appreciated quite a number of times, and the
continued financial support of all the backers speaks volumes. Mods, on the
other hand, are still a rather rare sight.
Having as few .ASM files in the source tree as possible looks better to
casual visitors who just look at GitHub's repo language breakdown. This way,
ReC98 will also turn from an "Assembly project" to its rightful state
of "C++ project" much sooner.
And finally, it's not like the ASM versions are
gone – they're still part of the Git history.
Unfortunately, these pushes also demonstrated a second disadvantage in
trying to decompile everything possible: Since Turbo C++ lacks TASM's
fine-grained ability to enforce code alignment on certain multiples of
bytes, it might actually be unfeasible to link in a C-compiled object file
at its intended original position in some of the .EXE files it's used in.
Which… you're only going to notice once you encounter such a case. Due to
the slightly jumbled order of functions in the
📝 second, shared code segment, that might
be long after you decompiled and successfully linked in the function
And then you'll have to throw away that decompilation after all 😕 Oh
well. In this specific case (the lookup table generator for horizontally
flipping images), that decompilation was a mess anyway, and probably
helped nobody. I could have added a dummy .OBJ that does nothing but
enforce the needed 2-byte alignment before the function if I
really insisted on keeping the C version, but it really wasn't
Now that I've also described yet another meta-issue, maybe there'll
really be nothing to say about the next technical debt pushes?
Next up though: Back to actual progress
again, with TH01. Which maybe even ends up pushing that game over the 50%
And indeed, I got to end my vacation with a lot of image format and
blitting code, covering the final two formats, .GRC and .BOS. .GRC was
nothing noteworthy – one function for loading, one function for
byte-aligned blitting, and one function for freeing memory. That's it –
not even a unblitting function for this one. .BOS, on the other hand…
…has no generic (read: single/sane) implementation, and is only
implemented as methods of some boss entity class. And then again for
Sariel's dress and wand animations, and then again for Reimu's
animations, both of which weren't even part of these 4 pushes. Looking
forward to decompiling essentially the same algorithms all over again… And
that's how TH01 became the largest and most bloated PC-98 Touhou game. So
yeah, still not done with image formats, even at 44% RE.
This means I also had to reverse-engineer that "boss entity" class… yeah,
what else to call something a boss can have multiple of, that may or may
not be part of a larger boss sprite, may or may not be animated, and that
may or may not have an orb hitbox?
All bosses except for Kikuri share the same 5 global instances of this
class. Since renaming all these variables in ASM land is tedious anyway, I
went the extra mile and directly defined separate, meaningful names for
the entities of all bosses. These also now document the natural order in
which the bosses will ultimately be decompiled. So, unless a backer
requests anything else, this order will be:
(code for regular card-flipping stages)
As everyone kind of expects from TH01 by now, this class reveals yet
another… um, unique and quirky piece of code architecture. In
addition to the position and hitbox members you'd expect from a class like
this, the game also stores the .BOS metadata – width, height, animation
frame count, and 📝 bitplane pointer slot
number – inside the same class. But if each of those still corresponds to
one individual on-screen sprite, how can YuugenMagan have 5 eye sprites,
or Kikuri have more than one soul and tear sprite? By duplicating that
metadata, of course! And copying it from one entity to another
At this point, I feel like I even have to congratulate the game for not
actually loading YuugenMagan's eye sprites 5 times. But then again, 53,760
bytes of waste would have definitely been noticeable in the DOS days.
Makes much more sense to waste that amount of space on an unused C++
exception handler, and a bunch of redundant, unoptimized blitting
(Thinking about it, YuugenMagan fits this entire system perfectly. And
together with its position in the game's code – last to be decompiled
means first on the linker command line – we might speculate that
YuugenMagan was the first boss to be programmed for TH01?)
So if a boss wants to use sprites with different sizes, there's no way
around using another entity. And that's why Girl-Elis and Bat-Elis are two
distinct entities internally, and have to manually sync their position.
Except that there's also a third one for Attacking-Girl-Elis,
because Girl-Elis has 9 frames of animation in total, and the global .BOS
bitplane pointers are divided into 4 slots of only 8 images each.
Same for SinGyoku, who is split into a sphere entity, a
person entity, and a… white flash entity for all three forms,
all at the same resolution. Or Konngara's facial expressions, which also
require two entities just for themselves.
And once you decompile all this code, you notice just how much of it the
game didn't even use. 13 of the 50 bytes of the boss entity class are
outright unused, and 10 bytes are used for a movement clamping and lock
system that would have been nice if ZUN also used it outside of
Kikuri's soul sprites. Instead, all other bosses ignore this system
completely, and just
the X/Y coordinates of the boss entities directly.
As for the rendering functions, 5 out of 10 are unused. And while those
definitely make up less than half of the code, I still must have
spent at least 1 of those 4 pushes on effectively unused functionality.
Only one of these functions lends itself to some speculation. For Elis'
entrance animation, the class provides functions for wavy blitting and
unblitting, which use a separate X coordinate for every line of the
sprite. But there's also an unused and sort of broken one for unblitting
two overlapping wavy sprites, located at the same Y coordinate. This might
indicate that Elis could originally split herself into two sprites,
similar to TH04 Stage 6 Yuuka? Or it might just have been some other kind
of animation effect, who knows.
After over 3 months of TH01 progress though, it's finally time to look at
other games, to cover the rest of the crowdfunding backlog. Next up: Going
back to TH05, and getting rid of those last PI false positives. And since
I can potentially spend the next 7 weeks on almost full-time ReC98 work,
I've also re-opened the store until October!
TH01 pellets are coming up next, and for the first time, we'll have the
chance to move hardcoded sprite data from ASM land to C land. As it would
turn out, bad luck with the 2-byte alignment at the end of
REIIDEN.EXE's data segment pretty much forces us to declare
TH01's pellet sprites in C if we want to decompile the final few pellet
functions without ugly workarounds for the float literals there. And while
I could have just converted them into a C array and called it a day, it
did raise the question of when we are going to do this The Right And
Moddable Way, by auto-converting actual image files into ASM or C arrays
during the build process. These arrays are even more annoying to edit in
C, after all – unlike TASM, the old C++ we have to work with doesn't
support binary number literals, only hexadecimal or, gasp, octal.
Without the explicit funding for such a converter,
I reached out to
GitHub, asking backers and outside contributors whether they'd be in
favor of it. As something that requires no RE skills and collides with
nothing else, it would be a perfect task for C/C++ coders who want to
support ReC98 with something other than money.
And surprisingly, those still exist!
Jonathan Campbell, of
went ahead and implemented all the required functionality, within just a
few days. Thanks again! The result is probably a lot more portable than it
would have been if I had written it. Which is pretty relevant for future
port authors – any additional tooling we write ourselves should not
add to the list of problems they'll have to worry about.
Right now, all of the sprites are #included from the big ASM
dump files, which means that they have to be converted before those files
are assembled during the 32-bit build part. We could have introduced a
third distinct build step there, perhaps even a 16-bit one so that we can
use Turbo C++ 4.0J to also compile the converter… However, the more
reasonable option was to do this at the beginning of the 32-bit build
step, and add a 32-bit Windows C++ compiler to the list of tools required
for ReC98's build process.
And the best choice for ReC98 is, in fact… 🥁… the 20-year-old Borland C++
5.5 freeware release.
See the README for a lengthy justification, as well as
So yes, all sprites mentioned in the GitHub issue can now be modded by
simply editing .BMP files, using an image editor of your choice. 🖌
And now that that's dealt with, it's finally time for more actual
progress! TH01 pellets coming tomorrow.
Did WindowsTiger just cover
2% over all games on his
own? While not all of that passed my review, +1.59% RE and +1.66% PI
over all 5 games is still pretty noteworthy, and comfortably pushes TH05
over the 25% mark in RE, and the 60% mark in PI.
While I definitely do appreciate such contributions, reviewing and
adapting these to my current code organization standards also takes more
time than I'd like it to take. And taken to this level, it does
kind of undermine this crowdfunding project, causing both a literal
denial of service and exactly the stress that this crowdfunding was
designed to avoid. Most of the time, I can't merge all of that as-is
without knowingly creating annoyances down the line. But I don't want to
just ignore it either, or reject every non-perfect commit…
That's also why I let it slide this time, due to some of the RE work in
there being genuinely amazing. In the future though, be aware that your
chance of having your work merged diminishes the further you move ahead of
my current master branch. In extreme cases like this one, I'll
then just be waiting until enough generic reverse-engineering pushes have
accrued, and treat the merge as regular work.
But now, time to continue with the regular programming… I am kind
of exhausted from all of this, so no bullets for the next two
Touhou Patch Center pushes, still… Good thing there's still plenty of
simpler things with big percentage gains to be done:
WindowsTiger mostly focused on OP.EXE which I tended to
neglect, as the big MAIN executables seemed to be more
interesting to my backers. (It's not like anyone ever requestedOP to be done either – like, who even cares about boring menu
source code, right?) Good that I therefore sort of left it as low-hanging
fruit to be grabbed by outside contributors – because now, TH04's and
TH05's OP.EXE are close to 100% position-independence. The
GENSOU.SCR format is pretty much the only thing missing there
right now, so let's finally go all the way there, I'd say.
And in TH05, there's still Reimu's shot type functions left to be
With no feedback to 📝 last week's blog post,
I assume you all are fine with how things are going? Alright then, another
one towards position independence, with the same approach as before…
Since -Tom- wanted to learn something about how the PC-98
EGC is used in TH04 and TH05, I took a look at master.lib's
egc_shift_*() functions. These simply do a hardware-accelerated
memmove() of any VRAM region, and are used for screen shaking
effects. Hover over the image below for the raw effect:
Then, I finally wanted to take a look at the bullet structures, but it
required way too much reverse-engineering to even start within ¾ of
a position independence push. Even with the help of uth05win –
bullet handling was changed quite a bit from TH04 to TH05.
What I ultimately settled on was more raw, "boring" PI work based around
an already known set of functions. For this one, I looked at vector
construction… and this time, that actually made the games a little
bit more position-independent, and wasn't just all about removing
false positives from the calculation. This was one of the few sets of
functions that would also apply to TH01, and it revealed just how
chaotically that game was coded. This one commit shows three ways how ZUN
stored regular 2D points in TH01:
"regularly", like in master.lib's Point structure (X
first, Y second)
reversed, (Y first and X second), then obviously with two distinct
variables declared next to each other
… yeah. But in more productive news, this did actually lay the
groundwork for TH04 and TH05 bullet structures. Which might even be coming
up within the next big, 5-push order from Touhou Patch Center? These are
the priorities I got from them, let's see how close I can get!
So, here we have the first two pushes with an explicit focus on position
independence… and they start out looking barely different from regular
reverse-engineering? They even already deduplicate a bunch of item-related
code, which was simple enough that it required little additional work?
Because the actual work, once again, was in comparing uth05win's
interpretations and naming choices with the original PC-98 code? So that
we only ended up removing a handful of memory references there?
(Oh well, you can mod item drops now!)
So, continuing to interpret PI as a mere by-product of reverse-engineering
might ultimately drive up the total PI cost quite a bit. But alright then,
let's systematically clear out some false positives by looking at
master.lib function calls instead… and suddenly we get the PI progress we
were looking for, nicely spread out over all games since TH02. That kinda
makes it sound like useless work, only done because it's dictated by some
counting algorithm on a website. But decompilation will want to convert
all of these values to decimal anyway. We're merely doing that right now,
across all games.
Then again, it doesn't actually make any game more
position-independent, and only proves how position-independent it already
was. So I'm really wondering right now whether I should just rush
actual position independence by simply identifying structures and
their sizes, and not bother with members or false positives until that's
done. That would certainly get the job done for TH04 and TH05 in just a
few more pushes, but then leave all the proving work (and the road
to 100% PI on the front page) to reverse-engineering.
I don't know. Would it be worth it to have a game that's „maybe
fully position-independent“, only for there to maybe be rare edge
cases where it isn't?
Or maybe, continuing to strike a balance between identifying false
positives (fast) and reverse-engineering structures (slow) will continue
to work out like it did now, and make us end up close to the current
estimate, which was attractive enough to sell out the crowdfunding for the
first time… 🤔
Please give feedback! If possible, by Friday evening UTC+1, before I start
working on the next PI push, this time with a focus on TH04.