- 📝 Posted:
- 🚚 Summary of:
- P0193, P0194, P0195, P0196, P0197
- ⌨ Commits:
e1f3f9f...183d7a2
,183d7a2...5d93a50
,5d93a50...e18c53d
,e18c53d...57c9ac5
,57c9ac5...48db0b7
- 💰 Funded by:
- Ember2528, Yanga
- 🏷 Tags:
With Elis, we've not only reached the midway point in TH01's boss code, but
also a bunch of other milestones: Both REIIDEN.EXE
and TH01 as
a whole have crossed the 75% RE mark, and overall position independence has
also finally cracked 80%!
And it got done in 4 pushes again? Yup, we're back to
📝 Konngara levels of redundancy and
copy-pasta. This time, it didn't even stop at the big copy-pasted code
blocks for the rift sprite and 256-pixel circle animations, with the words
"redundant" and "unnecessary" ending up a total of 18 times in my source
code comments.
But damn is this fight broken. As usual with TH01 bosses, let's start with a
high-level overview:
- The Elis fight consists of 5 phases (excluding the entrance animation), which must be completed in order.
- In all odd-numbered phases, Elis uses a random one-shot danmaku pattern
from an exclusive per-phase pool before teleporting to a random
position.
- There are 3 exclusive girl-form patterns per phase, plus 4 additional bat-form patterns in phase 5, for a total of 13.
- Due to a quirk in the selection algorithm in phases 1 and 3, there is a 25% chance of Elis skipping an attack cycle and just teleporting again.
- In contrast to Konngara, Elis can freely select the same pattern multiple times in a row. There's nothing in the code to prevent that from happening.
- This pattern+teleport cycle is repeated until Elis' HP reach a certain threshold value. The odd-numbered phases correspond to the white (phase 1), red-white (phase 3), and red (phase 5) sections of the health bar. However, the next phase can only start at the end of each cycle, after a teleport.
- Phase 2 simply teleports Elis back to her starting screen position of (320, 144) and then advances to phase 3.
- Phase 4 does the same as phase 2, but adds the initial bat form transformation before advancing to phase 5.
- Phase 5 replaces the teleport with a transformation to the bat form. Rather than teleporting instantly to the target position, the bat gradually flies there, firing a randomly selected looping pattern from the 4-pattern bat pool on the way, before transforming back to the girl form.
This puts the earliest possible end of the fight at the first frame of phase 5. However, nothing prevents Elis' HP from reaching 0 before that point. You can nicely see this in 📝 debug mode: Wait until the HP bar has filled up to avoid heap corruption, hold ↵ Return to reduce her HP to 0, and watch how Elis still goes through a total of two patterns* and four teleport animations before accepting defeat.
But wait, heap corruption? Yup, there's a bug in the HP bar that already affected Konngara as well, and it isn't even just about the graphical glitches generated by negative HP:
- Because 📝 the backgrounds behind the points are kept in main memory, there has to be a cap on the number of hitpoints, which turns out to be 96.
- The initial fill-up animation is drawn to both VRAM pages at a rate of 1
HP per frame… by passing the current frame number as the
current_hp
number. - The
target_hp
is indicated by simply passing the current HP… - … which, however, can be reduced in debug mode at an equal rate of up to 1 HP per frame.
- The completion condition only checks if
((target_hp - 1) == current_hp)
. With the right timing, both numbers can therefore run past each other. - In that case, the function is repeatedly called on every frame, backing up the original VRAM contents for the current HP point before blitting it…
- … until frame
((96 / 2) + 1)
, where the .PTN slot pointer overflows the heap buffer and overwrites whatever comes after. 📝 Sounds familiar, right?
Since Elis starts with 14 HP, which is an even number, this corruption is
trivial to cause: Simply hold ↵ Return from the beginning of the
fight, and the completion condition will never be true
, as the
HP and frame numbers run past the off-by-one meeting point.
Regular gameplay, however, entirely prevents this due to the fixed start positions of Reimu and the Orb, the Orb's fixed initial trajectory, and the 50 frames of delay until a bomb deals damage to a boss. These aspects make it impossible to hit Elis within the first 14 frames of phase 1, and ensure that her HP bar is always filled up completely. So ultimately, this bug ends up comparable in seriousness to the 📝 recursion / stack overflow bug in the memory info screen.
These wavy teleport animations point to a quite frustrating architectural issue in this fight. It's not even the fact that unblitting the yellow star sprites rips temporary holes into Elis' sprite; that's almost expected from TH01 at this point. Instead, it's all because of this unused frame of the animation:
With this sprite still being part of BOSS5.BOS
, Girl-Elis has a
total of 9 animation frames, 1 more than the
📝 8 per-entity sprites allowed by ZUN's architecture.
The quick and easy solution would have been to simply bump the sprite array
size by 1, but… nah, this would have added another 20 bytes to all 6 of the
.BOS image slots. Instead, ZUN wrote the manual
position synchronization code I mentioned in that 2020 blog post.
Ironically, he then copy-pasted this snippet of code often enough that it
ended up taking up more than 120 bytes in the Elis fight alone – with, you
guessed it, some of those copies being redundant. Not to mention that just
going from 8 to 9 sprites would have allowed ZUN to go down from 6 .BOS
image slots to 3. That would have actually saved 420 bytes in
addition to the manual synchronization trouble. Looking forward to SinGyoku,
that's going to be fun again…
As for the fight itself, it doesn't take long until we reach its most janky danmaku pattern, right in phase 1:
- For whatever reason, the lower-right quarter of the circle isn't
animated? This animation works by only drawing the new dots added with every
subsequent animation frame, expressed as a tiny arc of a dotted circle. This
arc starts at the animation's current 8-bit angle and ends on the sum of
that angle and a hardcoded constant. In every other (copy-pasted, and
correct) instance of this animation, ZUN uses
0x02
as the constant, but this one uses…0.05
for the lower-right quarter? As in, a 64-bitdouble
constant that truncates to 0 when added to an 8-bit integer, thus leading to the start and end angles being identical and the game not drawing anything. - On Easy and Normal, the pattern then spawns 32 bullets along the outline
of the circle, no problem there. On Lunatic though, every one of these
bullets is instead turned into a narrow-angled 5-spread, resulting in 160
pellets… in a game with a pellet cap of 100.
Now, if Elis teleported herself to a position near the top of the playfield,
most of the capped pellets would have been clipped at that top edge anyway,
since the bullets are spawned in clockwise order starting at Elis' right
side with an angle of
0x00
. On lower positions though, you can definitely see a difference if the cap were high enough to allow all coded pellets to actually be spawned.
The Hard version gets dangerously close to the cap by spawning a total of 96 pellets. Since this is the only pattern in phase 1 that fires pellets though, you are guaranteed to see all of the unclipped ones. - The pellets also aren't spawned exactly on the telegraphed circle, but 4 pixels to the left.
Then again, it might very well be that all of this was intended, or, most likely, just left in the game as a happy accident. The latter interpretation would explain why ZUN didn't just delete the rendering calls for the lower-right quarter of the circle, because seriously, how would you not spot that? The phase 3 patterns continue with more minor graphical glitches that aren't even worth talking about anymore.
And then Elis transforms into her bat form at the beginning of Phase 5, which displays some rather unique hitboxes. The one against the Orb is fine, but the one against player shots…
… uses the bat's X coordinate for both X and Y dimensions. In regular gameplay, it's not too bad as most of the bat patterns fire aimed pellets which typically don't allow you to move below her sprite to begin with. But if you ever tried destroying these pellets while standing near the middle of the playfield, now you know why that didn't work. This video also nicely points out how the bat, like any boss sprite, is only ever blitted at positions on the 8×1-pixel VRAM byte grid, while collision detection uses the actual pixel position.
The bat form patterns are all relatively simple, with little variation depending on the difficulty level, except for the "slow pellet spreads" pattern. This one is almost easiest to dodge on Lunatic, where the 5-spreads are not only always fired downwards, but also at the hardcoded narrow delta angle, leaving plenty of room for the player to move out of the way:
Finally, we've got another potential timesave in the girl form's "safety circle" pattern:
After the circle spawned completely, you lose a life by moving outside it, but doing that immediately advances the pattern past the circle part. This part takes 200 frames, but the defeat animation only takes 82 frames, so you can save up to 118 frames there.
Final funny tidbit: As with all dynamic entities, this circle is only blitted to VRAM page 0 to allow easy unblitting. However, it's also kind of static, and there needs to be some way to keep the Orb, the player shots, and the pellets from ripping holes into it. So, ZUN just re-blits the circle every… 4 frames?! 🤪 The same is true for the Star of David and its surrounding circle, but there you at least get a flash animation to justify it. All the overlap is actually quite a good reason for not even attempting to 📝 mess with the hardware color palette instead.
And that's the 4th PC-98 Touhou boss decompiled, 27 to go… but wait, all these quirks, and I still got nothing about the one actual crash that can appear in regular gameplay? There has even been a recent video about it. The cause has to be in Elis' main function, after entering the defeat branch and before the blocking white-out animation. It can't be anywhere else other than in the 📝 central line blitting and unblitting function, called from 📝 that one broken laser reset+unblit function, because everything else in that branch looks fine… and I think we can rule out a crash in MDRV2's non-blocking fade-out call. That's going to need some extra research, and a 5th push added on top of this delivery.
Reproducing the crash was the whole challenge here. Even after moving Elis and Reimu to the exact positions seen in Pearl's video and setting Elis' HP to 0 on the exact same frame, everything ran fine for me. It's definitely no division by 0 this time, the function perfectly guards against that possibility. The line specified in the function's parameters is always clipped to the VRAM region as well, so we can also rule out illegal memory accesses here…
… or can we? Stepping through it all reminded me of how this function brings
unblitting sloppiness to the next level: For each VRAM byte touched, ZUN
actually unblits the 4 surrounding bytes, adding one byte to the left
and two bytes to the right, and using a single 32-bit read and write per
bitplane. So what happens if the function tries to unblit the topmost byte
of VRAM, covering the pixel positions from (0, 0) to (7, 0)
inclusive? The VRAM offset of 0x0000
is decremented to
0xFFFF
to cover the one byte to the left, 4 bytes are written
to this address, the CPU's internal offset overflows… and as it turns out,
that is illegal even in Real Mode as of the 80286, and will raise a General Protection
Fault. Which is… ignored by DOSBox-X,
every Neko Project II version in common use, the CSCP
emulators, SL9821, and T98-Next. Only Anex86 accurately emulates the
behavior of real hardware here.
OK, but no laser fired by Elis ever reaches the top-left corner of the screen. How can such a fault even happen in practice? That's where the broken laser reset+unblit function comes in: Not only does it just flat out pass the wrong parameters to the line unblitting function – describing the line already traveled by the laser and stopping where the laser begins – but it also passes them wrongly, in the form of raw 32-bit fixed-point Q24.8 values, with no conversion other than a truncation to the signed 16-bit pixels expected by the function. What then follows is an attempt at interpolation and clipping to find a line segment between those garbage coordinates that actually falls within the boundaries of VRAM:
right/bottom
correspond to a laser's origin position, andleft/top
to the leftmost pixel of its moved-out top line. The bug therefore only occurs with lasers that stopped growing and have started moving.- Moreover, it will only happen if either
(left % 256)
or(right % 256)
is ≤ 127 and the other one of the two is ≥ 128. The typecast to signed 16-bit integers then turns the former into a large positive value and the latter into a large negative value, triggering the function's clipping code. - The function then follows Bresenham's
algorithm:
left
is ensured to be smaller than right by swapping the two values if necessary. If that happened,top
andbottom
are also swapped, regardless of their value – the algorithm does not care about their order. - The slope in the X dimension is calculated using an integer division of
((bottom - top) / (right - left))
. Both subtractions are done on signed 16-bit integers, and overflow accordingly. (-left × slope_x)
is added totop
, andleft
is set to 0.- If both
top
andbottom
are < 0 or ≥ 640, there's nothing to be unblitted. Otherwise, the final coordinates are clipped to the VRAM range of [(0, 0), (639, 399)]. - If the function got this far, the line to be unblitted is now very
likely to reach from
- the top-left to the bottom-right corner, starting out at (0, 0) right away, or
- from the bottom-left corner to the top-right corner. In this case, you'd expect unblitting to end at (639, 0), but thanks to an off-by-one error, it actually ends at (640, -1), which is equivalent to (0, 0). Why add clipping to VRAM offset calculations when everything else is clipped already, right?
tl;dr: TH01 has a high chance of freezing at a boss defeat sequence if there
are diagonally moving lasers on screen, and if your PC-98 system
raises a General Protection Fault on a 4-byte write to offset
0xFFFF
, and if you don't run a TSR with an INT
0Dh
handler that might handle this fault differently.
The easiest fix option would be to just remove the attempted laser unblitting entirely, but that would also have an impact on this game's… distinctive visual glitches, in addition to touching a whole lot of code bytes. If I ever get funded to work on a hypothetical TH01 Anniversary Edition that completely rearchitects the game to fix all these glitches, it would be appropriate there, but not for something that purports to be the original game.
(Sidenote to further hype up this Anniversary Edition idea for PC-98 hardware owners: With the amount of performance left on the table at every corner of this game, I'm pretty confident that we can get it to work decently on PC-98 models with just an 80286 CPU.)
Since we're in critical infrastructure territory once again, I went for the
most conservative fix with the least impact on the binary: Simply changing
any VRAM offsets >= 0xFFFD
to 0x0000
to avoid
the GPF, and leaving all other bugs in place. Sure, it's rather lazy and
"incorrect"; the function still unblits a 32-pixel block there, but adding a
special case for blitting 24 pixels would add way too much code. And
seriously, it's not like anything happens in the 8 pixels between
(24, 0) and (31, 0) inclusive during gameplay to begin with.
To balance out the additional per-row if()
branch, I inlined
the VRAM page change I/O, saving two function calls and one memory write per
unblitted row.
That means it's time for a new community_choice_fixes
build, containing the new definitive bugfixed versions of these games:
2022-05-31-community-choice-fixes.zip
Check the th01_critical_fixes
branch for the modified TH01 code. It also contains a fix for the HP bar
heap corruption in test or debug mode – simply changing the ==
comparison to <=
is enough to avoid it, and negative HP will
still create aesthetic glitch art.
Once again, I then was left with ½ of a push, which I finally filled with
some FUUIN.EXE
code, specifically the verdict screen. The most
interesting part here is the player title calculation, which is quite
sneaky: There are only 6 skill levels, but three groups of
titles for each level, and the title you'll see is picked from a random
group. It looks like this is the first time anyone has documented the
calculation?
As for the levels, ZUN definitely didn't expect players to do particularly
well. With a 1cc being the standard goal for completing a Touhou game, it's
especially funny how TH01 expects you to continue a lot: The code has
branches for up to 21 continues, and the on-screen table explicitly leaves
room for 3 digits worth of continues per 5-stage scene. Heck, these
counts are even stored in 32-bit long
variables.
Next up: 📝 Finally finishing the long overdue Touhou Patch Center MediaWiki update work, while continuing with Kikuri in the meantime. Originally I wasn't sure about what to do between Elis and Seihou, but with Ember2528's surprise contribution last week, y'all have demonstrated more than enough interest in the idea of getting TH01 done sooner rather than later. And I agree – after all, we've got the 25th anniversary of its first public release coming up on August 15, and I might still manage to completely decompile this game by that point…