Stripe is now
properly integrated into this website as an alternative to PayPal! Now, you
can also financially support the project if PayPal doesn't work for you, or
if you prefer using a
provider out of Stripe's greater variety. It's unfortunate that I had to
ship this integration while the store is still sold out, but the Shuusou
Gyoku OpenGL backend has turned out way too complicated to be finished next
to these two pushes within a month. It will take quite a while until the
store reopens and you all can start using Stripe, so I'll just link back to
this blog post when it happens.
Integrating Stripe wasn't the simplest task in the world either. At first,
the Checkout API
seems pretty friendly to developers: The entire payment flow is handled on
the backend, in the server language of your choice, and requires no frontend
JavaScript except for the UI feedback code you choose to write. Your
backend API endpoint initiates the Stripe Checkout session, answers with a
redirect to Stripe, and Stripe then sends a redirect back to your server if
the customer completed the payment. Superficially, this server-based
approach seems much more GDPR-friendly than PayPal, because there are no
remote scripts to obtain consent for. In reality though, Stripe shares
much more potential personal data about your credit card or bank
account with a merchant, compared to PayPal's almost bare minimum of
necessary data.
It's also rather annoying how the backend has to persist the order form
information throughout the entire Checkout session, because it would
otherwise be lost if the server restarts while a customer is still busy
entering data into Stripe's Checkout form. Compare that to the PayPal
JavaScript SDK, which only POSTs back to your server after the
customer completed a payment. In Stripe's case, more JavaScript actually
only makes the integration harder: If you trigger the initial payment
HTTP request from JavaScript, you will have
to improvise a bit to avoid the CORS error when redirecting away to a
different domain.
But sure, it's all not too bad… for regular orders at least. With
subscriptions, however, things get much worse. Unlike PayPal, Stripe
kind of wants to stay out of the way of the payment process as much as
possible, and just be a wrapper around its supported payment methods. So if
customers aren't really meant to register with Stripe, how would they cancel
their subscriptions?
Answer: Through
the… merchant? Which I quite dislike in principle, because why should
you have to trust me to actually cancel your subscription after you
requested it? It also means that I probably should add some sort of UI for
self-canceling a Stripe subscription, ideally without adding full-blown user
accounts. Not that this solves the underlying trust issue, but it's more
convenient than contacting me via email or, worse, going through your bank
somehow. Here is how my solution works:
When setting up a Stripe subscription, the server will generate a random
ID for authentication. This ID is then used as a salt for a hash
of the Stripe subscription ID, linking the two without storing the latter on
my server.
The thank you page, which is parameterized with the Stripe
Checkout session ID, will use that ID to retrieve the subscription
ID via an API call to Stripe, and display it together with the above
salt. This works indefinitely – contrary to what the expiry field in the
Checkout session object suggests, Stripe sessions are indeed stored
forever. After all, Stripe also displays this session information in a
merchant's transaction log with an excessive amount of detail. It might have
been better to add my own expiration system to these pages, but this had
been taking long enough already. For now, be aware that sharing the link to
a Stripe thank you page is equivalent to sharing your subscription
cancellation password.
The salt is then used as the key for a subscription management page. To
cancel, you visit this page and enter the Stripe subscription ID to confirm.
The server then checks whether the salt and subscription ID pair belong to
each other, and sends the actual cancellation
request back to Stripe if they do.
I might have gone a bit overboard with the crypto there, but I liked the
idea of not storing any of the Stripe session IDs in the server database.
It's not like that makes the system more complex anyway, and it's nice to
have a separate confirmation step before canceling a subscription.
But even that wasn't everything I had to keep in mind here. Once you
switch from test to production mode for the final tests, you'll notice that
certain SEPA-based
payment providers take their sweet time to process and activate new
subscriptions. The Checkout session object even informs you about that, by
including a payment status field. Which initially seems just like
another field that could indicate hacking attempts, but treating it as such
and rejecting any unpaid session can also reject perfectly valid
subscriptions. I don't want all this control… 🥲
Instead, all I can do in this case is to tell you about it. In my test, the
Stripe dashboard said that it might take days or even weeks for the initial
subscription transaction to be confirmed. In such a case, the respective
fraction of the cap will unfortunately need to remain red for that entire time.
And that was 1½ pushes just to replicate the basic functionality of a simple
PayPal integration with the simplest type of Stripe integration. On the
architectural site, all the necessary refactoring work made me finally
upgrade my frontend code to TypeScript at least, using the amazing esbuild to handle transpilation inside
the server binary. Let's see how long it will now take for me to upgrade to
SCSS…
With the new payment options, it makes sense to go for another slight price
increase, from up to per push.
The amount of taxes I have to pay on this income is slowly becoming
significant, and the store has been selling out almost immediately for the
last few months anyway. If demand remains at the current level or even
increases, I plan to gradually go up to by the end
of the year. 📝 As📝 usual,
I'm going to deliver existing orders in the backlog at the value they were
originally purchased at. Due to the way the cap has to be calculated, these
contributions now appear to have increased in value by a rather awkward
13.33%.
This left ½ of a push for some more work on the TH01 Anniversary Edition.
Unfortunately, this was too little time for the grand issue of removing
byte-aligned rendering of bigger sprites, which will need some additional
blitting performance research. Instead, I went for a bunch of smaller
bugfixes:
ANNIV.EXE now launches ZUNSOFT.COM if
MDRV98 wasn't resident before. In hindsight, it's completely obvious
why this is the right thing to do: Either you start
ANNIV.EXE directly, in which case there's no resident
MDRV98 and you haven't seen the ZUN Soft logo, or you have
made a single-line edit to GAME.BAT and replaced
op with anniv, in which case MDRV98 is
resident and you have seen the logo. These are the two
reasonable cases to support out of the box. If you are doing
anything else, it shouldn't be that hard to adjust though?
You might be wondering why I didn't just include all code of
ZUNSOFT.COM inside ANNIV.EXE together with
the rest of the game. The reason: ZUNSOFT.COM has
almost nothing in common with regular TH01 code. While the rest of
TH01 uses the custom image formats and bad rendering code I
documented again and again during its RE process,
ZUNSOFT.COM fully relies on master.lib for everything
about the bouncing-ball logo animation. Its code is much closer to
TH02 in that respect, which suggests that ZUN did in fact write this
animation for TH02, and just included the binary in TH01 for
consistency when he first sold both games together at Comiket 52.
Unlike the 📝 various bad reasons for splitting the PC-98 Touhou games into three main executables,
it's still a good idea to split off animations that use a completely
different set of rendering and file format functions. Combined with
all the BFNT and shape rendering code, ZUNSOFT.COM
actually contains even more unique code than OP.EXE,
and only slightly less than FUUIN.EXE.
The optional AUTOEXEC.BAT is now correctly encoded in
Shift-JIS instead of accidentally being UTF-8, fixing the previous
mojibake in its final ECHO line.
The command-line option that just adds a stage selection without
other debug features (anniv s) now works reliably.
This one's quite interesting because it only ever worked
because of a ZUN bug. From a superficial look at the code, it
shouldn't: While the presence of an 's' branch proves
that ZUN had such a mode during development, he nevertheless forgot
to initialize the debug flag inside the resident structure within
this branch. This mode only ever worked because master.lib's
resdata_create() function doesn't clear the resident
structure after allocation. If anything on the system previously
happened to write something other than 0x00,
0x01, or 0x03 to the specific byte that
then gets repurposed as the debug mode flag, this lack of
initialization does in fact result in a distinct non-test and
non-debug stage selection mode.
This is what happens on a certain widely circulated .HDI copy of
TH01 that boots MS-DOS 3.30C. On this system, the memory that
master.lib will allocate to the TH01 resident structure was
previously used by DOS as stack for its kernel, which left the
future resident debug flag byte at address 9FF6:0012 at
a value of 0x12. This might be the entire reason why
game s is even widely documented to trigger a stage
selection to begin with – on the widely circulated TH04 .HDI that
boots MS-DOS 6.20, or on DOSBox-X, the s parameter
doesn't work because both DOS systems leave the resident debug flag
byte at 0x00. And since ANNIV.EXE pushes
MDRV98 into that area of conventional DOS RAM, anniv s
previously didn't work even on MS-DOS 3.30C.
Both bugs in the
📝 1×1 particle system during the Mima fight
have been fixed. These include the off-by-one error that killed off the
very first particle on the 80th
frame and left it in VRAM, and, just like every other entity type, a
replacement of ZUN's EGC unblitter with the new pixel-perfect and fast
one. Until I've rearchitected unblitting as a whole, the particles will
now merely rip barely visible 1×1 holes into the sprites they overlap.
The bomb value shown in the lowest line of the in-game
debug mode output is now right-aligned together with the rest of the
values. This ensures that the game always writes a consistent number
of characters to TRAM, regardless of the magnitude of the
bomb value, preventing the seemingly wrong
timer values that appeared in the original game
whenever the value of the bomb variable changed to a
lower number of digits:
Finally, I've streamlined VRAM page access changes, which allowed me to
consistently replace ZUN's expensive function call with the optimal two
inlined x86 instructions. Interestingly, this change alone removed
2 KiB from the binary size, which is almost all of the difference
between 📝 the P0234-1 release and this
one. Let's see how much longer we can make each new release of
ANNIV.EXE smaller than the previous one.
The final point, however, raised the question of what we're now going to do
about
📝 a certain issue in the 地獄/Jigoku Bad Ending.
ZUN's original expensive way of switching the accessed VRAM page was the
main reason behind the lag frames on slower PC-98 systems, and
search-replacing the respective function calls would immediately get us to
the optimized version shown in that blog post. But is this something we
actually want? If we wanted to retain the lag, we could surely preserve that
function just for this one instance… The discovery of this issue
predates the clear distinction between bloat, quirks, and bugs, so it makes
sense to first classify what this issue even is. The distinction comes all
down to observability, which I defined as changes to rendered frames
between explicitly defined frame boundaries. That alone would be enough to
categorize any cause behind lag frames as bloat, but it can't hurt to be
more explicit here.
Therefore, I now officially judge observability in terms of an infinitely
fast PC-98 that can instantly render everything between two explicitly
defined frames, and will never add additional lag frames. If we plan to port
the games to faster architectures that aren't bottlenecked by disappointing
blitter chips, this is the only reasonable assumption to make, in my
opinion: The minimum system requirements in the games' README files are
minimums, after all, not recommendations. Chasing the exact frame
drop behavior that ZUN must have experienced during the time he developed
these games can only be a guessing game at best, because how can we know
which PC-98 model ZUN actually developed the games on? There might even be
more than one model, especially when it comes to TH01 which had been in
development for at least two years before ZUN first sold it. It's also not
like any current PC-98 emulator even claims to emulate the specific timing
of any existing model, and I sure hope that nobody expects me to import a
bunch of bulky obsolete hardware just to count dropped frames.
That leaves the tearing, where it's much more obvious how it's a bug. On an
infinitely fast PC-98, the ドカーン
frame would never be visible, and thus falls into the same category as the
📝 two unused animations in the Sariel fight.
With only a single unconditional 2-frame delay inside the animation loop, it
becomes clear that ZUN intended both frames of the animation to be displayed
for 2 frames each:
No tearing, and 34 frames in total for the first of the two
instances of this animation.
Next up: Taking the oldest still undelivered push and working towards TH04
position independence in preparation for multilingual translations. The
Shuusou Gyoku OpenGL backend shouldn't take that much longer either,
so I should have lots of stuff coming up in May afterward.
Yes, I'm still alive. This delivery was just plagued by all of the worst
luck: Data loss, physical hard drive failure, exploding phone batteries,
minor illness… and after taking 4 weeks to recover from all of that, I had
to face this beast of a task. 😵
Turns out that neither part of improving video performance and usability on
this blog was particularly easy. Decently encoding the videos into all
web-supported formats required unexpected trade-offs even for the low-res,
low-color material we are working with, and writing custom video player
controls added the timing precision resistance of HTML
<video> on top of the inherent complexity of frontend web
development. Why did this need to be 800 lines of commented JavaScript and
200 lines of commented CSS, and consume almost more than 5 pushes?!
Apparently, the latest price increase also seemed to have raised the minimum
level of acceptable polish in my work, since that's more than the maximum of
3.67 pushes it should have taken. To fund the rest, I stole some of the
reserved JIS trail word rendering research pushes, which means that the next
towards anything will go back towards that goal.
The codec situation is especially sad because it seems like so much of a
solved problem. ZMBV, the lossless capture codec introduced by DOSBox, is
both very well suited for retro game footage and remarkably simple too:
DOSBox-X's implementation of both an encoder and decoder comes in at under
650 lines of C++, excluding the Deflate implementation. Heck, the AVI
container around the codec is more complicated to write than the
compressed video data itself, and AVI is already the easiest choice you have
for a widely supported video container format.
Currently, this blog contains 9:02 minutes of video across 86 files, with a
total frame count of 24,515. In case this post attracts a general video
encoding audience that isn't familiar with what I'm encoding here: The
maximum resolution is 640×400, and most of the video uses 16 colors, with
some parts occasionally using more. With ZMBV, the lossless source files
take up 43.8 MiB, and that's even with AVI's infamously bad
overhead. While you can always spend more time on any compression task and
precisely tune your algorithm to match your source data even better,
43.8 MiB looks like a more than reasonable amount for this type of
content.
Especially compared with what I actually have to ship here, because sadly,
ZMBV is not supported by browsers. 😔 Writing a WebAssembly player for ZMBV
would have certainly been interesting, but it already took 5 pushes to get
to what we have now. So, let's instead shell out to ffmpeg and build a
pipeline to convert ZMBV to the ill-suited codecs supported by web browsers,
replacing the previously committed VP9 and VP8 files. From that point, we
can then look into AV1, the latest and greatest web-supported video codec,
to save some additional bandwidth.
But first, we've got to gather all the ZMBV source files. While I was
working on the 📝 2022-07-10 blog post, I
noticed some weirdly washed-out colors in the converted videos, leading to
the shocking realization that my previous, historically grown conversion
script didn't actually encode in a lossless way. 😢 By extension,
this meant that every video before that post could have had minor
discolorations as well.
For the majority of videos, I still had the original ZMBV capture files
straight out of DOSBox-X, and reproducing the final videos wasn't too big of
a deal. For the few cases where I didn't, I went the extra mile, took the
VP9 files, and manually fixed up all the minor color errors based on
reference videos from the same gameplay stage. There might be a huge ffmpeg
command line with a complicated filter graph to do the job, but for such a
small 4-digit number of frames, it is much more straightforward to just dump
each frame as an image and perform the color replacement with ImageMagick's
-opaque and -fill options.
So, time to encode our new definite collection of source files into AV1, and
what the hell, how slow is this codec? With ffmpeg's
libaom-av1, fully encoding all 86 videos takes almost 9
hours on my mid-range
development system, regardless of the quality selected.
But sure, the encoded videos are managed by a cache, and this obviously only
needs to be done once. If the results are amazing, they might even justify
these glacial encoding speeds. Unfortunately, they don't: In its lossless
-crf 0 mode, AV1 performs even worse than VP9, taking up
222 MiB rather than 182 MiB. It might not sound bad now,
but as we're later going to find out, we want to have a lot of
keyframes in these videos, which will blow up video sizes even further.
So, time to go lossy and maybe take a deep dive into AV1 tuning? Turns out
that it only gets worse from there:
The alternative libsvtav1 encoder is fast and creates small
files… but even on the highest-quality settings, -crf 0 and
-qp 0, the video quality resembled the terrible x264 YUV420P
format that Twitter enforces on uploaded videos.
I don't remember the librav1e results, but they sure
weren't convincing either.
libaom-av1's -usage realtime option is a
complete joke. 771 MiB for all videos, and it doesn't even compress
in real time on my system, more like 2.5× real-time. For comparison,
a certain stone-age technology by the name of "animated GIF" would take
54.3 MiB, encode in sub-realtime (0.47×), and the only necessary tuning
you need is an easily
googled palette generation and usage filter. Why can't I just use
those in a <video> tag?! These results have
clearly proven the top-voted just use modern video codecs Stack
Overflow answers wrong.
What you're actually supposed to do is to drop -cpu-used to
maybe 2 or 3, and then selectively add back prediction filters that suit
your type of content. In our case, these are
and maybe others, depending on much time you want to waste.
Because that's what all this tuning ended up being: a complete waste of
time. No matter which tuning options I tried, all they did was cut down
encoding time in exchange for slightly larger files on average. If there is
a magic tuning option that would suddenly cause AV1 to maybe even beat ZMBV,
I haven't found it. Heck, at particularly low settings,
-enable-intrabc even caused blocky glitches with certain pellet
patterns that looked like the internal frame block hashes were colliding all
over the place. Unfortunately, I didn't save the video where it happened.
So yeah, if you've already invested the computation time and encoded your
content by just specifying a -crf value and keeping the
remaining settings at their time-consuming defaults, any further tuning will
make no difference. Which is… an interesting choice from a usability
perspective. I would have expected the exact
opposite: default to a reasonably fast and efficient profile, and leave the
vast selection of tuning options for those people to explore who do
want to wait 5× as long for their encoder for that additional 5% of
compression efficiency. On the other hand, that surely is one way to get
people to extensively study your glorious engineering efforts, I guess? You
know what would maybe even motivate people to intrinsically do that?
Good documentation, with examples of the intent behind every option and its
optimal use case. Nobody needs long help strings that just spell out all of
the abbreviations that occur in the name of the option…
But hey, that at least means there's no reason to not use anything but ZMBV
for storing and archiving the lossless source files. Best compression
efficiency, encodes in real-time, and the files are much easier to edit.
OK, end of rant. To understand why anyone could be hyped about AV1 to begin
with, we just have to compare it to VP9, not to ZMBV. In that light, AV1
is pretty impressive even at -crf 1, compressing all 86
videos to 68.9 MiB, and even preserving 22.3% of frames completely
losslessly. The remaining frames exhibit the exact kind of quality loss
you'd want for retro game footage: Minor discoloration in individual pixels,
so minuscule that subtracting the encoded image from the source yields an
almost completely black image. Even after highlighting the errors by
normalizing such a difference image, they are barely visible even if you
know where to look. If "compressed PNG size of the normalized difference
between ZMBV and AV1 -crf 1" is a useful metric, this would be
its median frame among the 77.7% of non-lossless frames:
Whether you can actually spot the difference is pretty much down to the
glass between the physical pixels and your eyes. In any case, it's very
hard, even if you know where to look. As far as I'm concerned, I can
confidently call this "visually lossless", and it's definitely good enough
for regular watching and even single-frame stepping on this blog.
Since the appeal of the original lossless files is undeniable though, I also
made those more easily available. You can directly download the one for the
currently active video with the ⍗ button in the new video player – or directly
get all of them from the Git repository if you don't like clicking.
Unfortunately, even that only made up for half of the complexity in this
pipeline. As impressive as the AV1 -crf 1 result may be, it
does in fact come with the drawback of also being impressively heavy to
decode within today's browsers. Seeking is dog slow, with even the latencies
for single-frame stepping being way beyond what I'd consider
tolerable. To compensate, we have to invest another 78 MiB into turning
every 10th frame into a keyframe until single-stepping through an
entire video becomes as fast as it could be on my system.
But fine, 146 MiB, that's still less than the 178 MiB that the old
committed VP9 files used to take up. However, we still want to support VP9
for older browsers, older
hardware, and people who use Safari. And it's this codec where keyframes
are so bad that there is no clear best solution, only compromises. The main
issue: The lower you turn VP9's -crf value, the slower the
seeking performance with the same number of keyframes. Conversely,
this means that raising quality also requires more keyframes for the same
seeking performance – and at these file sizes, you really don't want to
raise either. We're talking 1.2 GiB for all 86 videos at
-crf 10 and -g 5, and even on that configuration,
seeking takes 1.3× as long as it would in the optimal case.
Thankfully, a full VP9 encode of all 86 videos only takes some 30 minutes as
opposed to 9 hours. At that speed, it made sense to try a larger number of
encoding settings during the ongoing development of the player. Here's a
table with all the trials I've kept:
Codec
-crf
-g
Other parameters
Total size
Seek time
VP9
32
20
-vf format=yuv420p
111 MiB
32 s
VP8
10
30
-qmin 10 -qmax 10 -b:v 1G
120 MiB
32 s
VP8
7
30
-qmin 7 -qmax 7 -b:v 1G
140 MiB
32 s
AV1
1
10
146 MiB
32 s
VP8
10
20
-qmin 10 -qmax 10 -b:v 1G
147 MiB
32 s
VP8
6
30
-qmin 6 -qmax 6 -b:v 1G
149 MiB
32 s
VP8
15
10
-qmin 15 -qmax 15 -b:v 1G
177 MiB
32 s
VP8
10
10
-qmin 10 -qmax 10 -b:v 1G
225 MiB
32 s
VP9
32
10
-vf format=yuv422p
329 MiB
32 s
VP8
0-4
10
-qmin 0 -qmax 4 -b:v 1G
376 MiB
32 s
VP8
5
30
-qmin 5 -qmax 5 -b:v 1G
169 MiB
33 s
VP9
63
40
47 MiB
34 s
VP9
32
20
-vf format=yuv422p
146 MiB
34 s
VP8
4
30
-qmin 0 -qmax 4 -b:v 1G
192 MiB
34 s
VP8
4
40
-qmin 4 -qmax 4 -b:v 1G
168 MiB
35 s
VP9
25
20
-vf format=yuv422p
173 MiB
36 s
VP9
15
15
-vf format=yuv422p
252 MiB
36 s
VP9
32
25
-vf format=yuv422p
118 MiB
37 s
VP9
20
20
-vf format=yuv422p
190 MiB
37 s
VP9
19
21
-vf format=yuv422p
187 MiB
38 s
VP9
32
10
553 MiB
38 s
VP9
32
10
-tune-content screen
553 MiB
VP9
32
10
-tile-columns 6 -tile-rows 2
553 MiB
VP9
15
20
-vf format=yuv422p
207 MiB
39 s
VP9
10
5
1210 MiB
43 s
VP9
32
20
264 MiB
45 s
VP9
32
20
-vf format=yuv444p
215 MiB
46 s
VP9
32
20
-vf format=gbrp10le
272 MiB
49 s
VP9
63
24 MiB
67 s
VP8
0-4
-qmin 0 -qmax 4 -b:v 1G
119 MiB
76 s
VP9
32
107 MiB
170 s
The bold rows correspond to the final encoding choices that
are live right now. The seeking time was measured by holding → Right on
the 📝 cheeto dodge strategy video.
Yup, the compromise ended up including a chroma subsampling conversion to
YUV422P. That's the one thing you don't want to do for retro pixel
graphics, as it's the exact cause behind washed-out colors and red fringing
around edges:
The worst example of chroma subsampling in a VP9-encoded file according
to the above metric, from frame 130 (0-based) of
📝 Sariel's restored leaf "spark" animation,
featuring smeared-out contours and even an all-around darker image,
blowing up the image to a whopping 3653 colors. It's certainly an
aesthetic.
But there simply was no satisfying solution around the ~200 MiB mark
with RGB colors, and even this compromise is still a disappointment in both
size and seeking speed. Let's hope that Safari
users do get AV1 support soon… Heck, even VP8, with its exclusive
support for YUV420P, performs much better here, with the impact of
-crf on seeking speed being much less pronounced. Encoding VP8
also just takes 3 minutes for all 86 videos, so I could have experimented
much more. Too bad that it only matters for really ancient systems…
Two final takeaways about VP9:
-tune-content screen and the tile options make no
difference at all.
All results used two-pass encoding. VP9 is the only codec where two
passes made a noticeable difference, cutting down the final encoded size
from 224 MiB to 207 MiB. For AV1, compression even seems to be
slightly worse with two passes, yielding 154,201,892 bytes rather than the
153,643,316 bytes we get with a single pass. But that's a difference of
0.36%, and hardly significant.
Alright, now we're done with codecs and get to finish the work on the
pipeline with perhaps its biggest advantage. With a ffmpeg conversion
infrastructure in place, we can also easily output a video's first frame as
a poster image to be passed into the <video> tag.
If this image is kept at the exact resolution of the video, the browser
doesn't need to wait for an indeterminate amount of "video metadata" to be
loaded, and can reserve the necessary space in the page layout much faster
and without any of these dreaded loading spinners. For the big
/blog page, this cuts down the minimum amount of required
resources from 69.5 MB to 3.6 MB, finally making it usable again without
waiting an eternity for the page to fully load. It's become pretty bad, so I
really had to prioritize this task before adding any more blog posts on top.
That leaves the player itself, which is basically a sum of lots of little
implementation challenges. Single-frame stepping and seeking to discrete
frames is the biggest one of them, as it's technically
not possible within the <video> tag, which only
returns the current time as a continuous value in seconds. It only sort
of works for us because the backend can pass the necessary FPS and frame
count values to the frontend. These allow us to place a discrete grid of
frame "frets" at regular intervals, and thus establish a consistent mapping
from frames to seconds and back. The only drawback here is a noticeably
weird jump back by one frame when pausing a video within the second half of
a frame, caused by snapping the continuous time in seconds back onto the
frame grid in order to maintain a consistent frame counter. But the whole
feature of frame-based seeking more than makes up for that.
The new scrubbable timeline might be even nicer to use with a mouse or a
finger than just letting a video play regularly. With all the tuning work I
put into keyframes, seeking is buttery smooth, and much better than the
built-in <video> UI of either Chrome or Firefox.
Unfortunately, it still costs a whole lot of CPU, but I'd say it's worth it.
🥲
Finally, the new player also has a few features that might not be
immediately obvious:
Keybindings for almost everything you might want them for, indicated by
hovering on top of each button. The tab switchers additionally support the
↑ Up and ↓ Down keys to cycle through all tabs, or the number keys
to jump to a specific tab. Couldn't find a way to indicate these mappings in
the UI yet.
Per-video captions now reserve the maximum height of any caption in the
layout. This prevents layout reflows when switching through such videos,
which previously caused quite annoying lag on the big /blog
page.
Useful fullscreen modes on both desktop and mobile, including all
markers and the video caption. Firefox made this harder than it needed to
be, and if it weren't for display: contents, the implementation
would have been even worse. In the end though, we didn't even need any video
pixel sizes from the backend – just as it should be…
… and supporting Firefox was definitely worth it, as it's the only
browser to support nearest-neighbor interpolation on videos.
As some of the Unicode codepoints on the buttons aren't covered by the
default fonts of some operating systems, I've taken them from the Catrinity font, licensed under the SIL
Open Font License. With all
the edits I did on this font, that license definitely was necessary. I
hope I applied it correctly though; it's not straightforward at all how to
properly license a Modified Version of an original font with a
Reserved Font Name.
And with that, development hell is over, and I finally get to return to the
core business! Just more than one month late.
Next up: Shipping the oldest still pending order, covering the TH04/TH05
ending script format. Meanwhile, the Seihou community also wants to keep
investing in Shuusou Gyoku, so we're also going to see more of that on the
side.
The "bad" news first: Expanding to Stripe in order to support Google Pay
requires bureaucratic effort that is not quite justified yet, and would only
be worth it after the next price increase.
Visualizing technical debt has definitely been overdue for a while though.
With 1 of these 2 pushes being focused on this topic, it makes sense to
summarize once again what "technical debt"
means in the context of ReC98, as this info was previously kind of scattered
over multiple blog posts. Mainly, it encompasses
any ZUN-written code
that we did name and reverse-engineer,
but which we simply moved out into dedicated files that are then
#included back into the big .ASM translation units,
without worrying about decompilation or proving undecompilability for
now.
Technically (ha), it would also include all of master.lib, which has
always been compiled into the binaries in this way, and which will require
quite a bit of dedicated effort to be moved out into a properly linkable
library, once it's feasible. But this code has never been part of any
progress metric – in fact, 0% RE is
defined as the total number of x86 instructions in the binary minus
any library code. There is also no relation between instruction numbers and
the time it will take to finalize master.lib code, let alone a precedent of
how much it would cost.
If we now want to express technical debt as a percentage, it's clear where
the 100% point would be: when all RE'd code is also compiled in from a
translation unit outside the big .ASM one. But where would 0% be? Logically,
it would be the point where no reverse-engineered code has ever been moved
out of the big translation units yet, and nothing has ever been decompiled.
With these boundary points, this is what we get:
Not too bad! So it's 6.22% of total RE that we will have to revisit at some
point, concentrated mostly around TH04 and TH05 where it resulted from a
focus on position independence. The prices also give an accurate impression
of how much more work would be required there.
But is that really the best visualization? After all, it requires an
understanding of our definition of technical debt, so it's maybe not the
most useful measurement to have on a front page. But how about subtracting
those 6.22% from the number shown on the RE% bars? Then, we get this:
Which is where we get to the good news: Twitter surprisingly helped me out
in choosing one visualization over the other, voting
7:2 in favor of the Finalized version. While this one requires
you to manually calculate € finalized - € RE'd to
obtain the raw financial cost of technical debt, it clearly shows, for the
first time, how far away we are from the main goal of fully decompiling all
5 games… at least to the extent it's possible.
Now that the parser is looking at these recursively included .ASM files for
the first time, it needed a small number of improvements to correctly handle
the more advanced directives used there, which no automatic disassembler
would ever emit. Turns out I've been counting some directives as
instructions that never should have been, which is where the additional
0.02% total RE came from.
One more overcounting issue remains though. Some of the RE'd assembly slices
included by multiple games contain different if branches for
each game, like this:
; An example assembly file included by both TH04's and TH05's MAIN.EXE:
if (GAME eq 5)
; (Code for TH05)
else
; (Code for TH04)
endif
Currently, the parser simply ignores if, else, and
endif, leading to the combined code of all branches being
counted for every game that includes such a file. This also affects the
calculated speed, and is the reason why finalization seems to be slightly
faster than reverse-engineering, at currently 471 instructions per push
compared to 463. However, it's not that bad of a signal to send: Most of the
not yet finalized code is shared between TH04 and TH05, so finalizing it
will roughly be twice as fast as regular reverse-engineering to begin with.
(Unless the code then turns out to be twice as complex than average code…
).
For completeness, finalization is now also shown as part of the per-commit metrics. Now it's clearly visible what I was
doing in those very slow five months between P0131 and P0140, where
the progress bar didn't move at all: Repaying 3.49% of previously
accumulated technical debt across all games. 👌
As announced, I've also implemented a new caching system for this website,
as the second main feature of these two pushes. By appending a hash string
to the URLs of static resources, your browser should now both cache them
forever and re-download them once they did change on the server. This
avoids the unnecessary (and quite frankly, embarrassing) re-requests for all
static resources that typically just return a 304 Not Modified
response. As a result, the blog should now load a bit faster on repeated
visits, especially on slower connections. That should allow me to
deliberately not paginate it for another few years, without it getting all
too slow – and should prepare us for the day when our first game
reaches 100% and the server will get smashed.
However, I am open to changing the progress blog link in the
navigation bar at the top to the list of tags, once
people start complaining.
Apart frome some more invisible correctness and QoL improvements, I've also
prepared some new funding goals, but I'll cover those once the store
reopens, next year. Syntax highlighting for code snippets would have also
been cool, but unfortunately didn't make it into those two pushes. It's
still on the list though!
Next up: Back to RE with the TH03 score file format, and other code that
surrounds it.
Who said working on the website was "fun"? That code is a mess.
This right here is the first time I seriously
wrote a website from (almost) scratch. Its main job is to parse over a Git
repository and calculate numbers, so any additional bulky frameworks would
only be in the way, and probably need to be run on some sort of wobbly,
unmaintainable "stack" anyway, right? 😛
📝 As with the main project though, I'm only
beginning to figure out the best structure for this, and these new features
prompted quite a lot of upfront refactoring…
Before I start ranting though, let's quickly summarize the most visible
change, the new tag system for this blog!
Yes, I manually went through every one of the 82 posts I've written so
far, and assigned labels to them.
The per-project (rec98 and
website) and per-game (th01th02th03th04th05) tags are automatically generated from the
database and the Git commit history, respectively. That might have
ended us up with a fair bit of category clutter, as any single change
to a tiny aspect is enough for a blog post to be tagged with an
otherwise unrelated game. For now, it doesn't seem too much of
an issue though.
Filtering already works for an arbitrary number of tags. Right now,
these are always combined with AND – no arbitrary boolean expressions for tag filtering yet.
Adding filters simply works by adding components to the URL path:
https://rec98.nmlgc.net/blog/tag/tag1/tag2/tag3/… and so
on.
Hovering over any tag shows a brief description of what that tag is
about. Some of the terms really needed a definition, so I just added one for
all of them. Hope you all enjoy them!
These descriptions are also shown on the new
tag overview page, which now kind of doubles as a
glossary.
Finally, the order page now shows the exact number of pushes a contribution
will fund – no more manual divisions required.
Shoutout to the one email I received, which pointed out this potential
improvement!
As for the "invisible" changes: The one main feature of this website, the
aforementioned calculation of the progress metrics, also turned out as its
biggest annoyance over the years. It takes a little while to parse all the
big .ASM files in the source tree, once for every push that can affect the
average number of removed instructions and unlabeled addresses. And without
a cache, we've had to do that every time we re-launch the app server
process.
Fundamentally, this is – you might have guessed it – a dependency tracking
problem, with two inputs: the .ASM files from the ReC98 repo, and the
Golang code that calculates the instruction and PI numbers. Sure, the code
has been pretty stable, but what if we do end up extending it one day? I've
always disliked manually specified version numbers for use cases like this
one, where the problem at hand could be exactly solved with a hashing
function, without being prone to human error.
(Sidenote: That's why I never actively supported thcrap mods that affected
gameplay while I was still working on that project. We still want to be
able to save and share replays made on modded games, but I do not
want to subject users to the unacceptable burden of manually remembering
which version of which patch stack they've recorded a given replay with.
So, we'd somehow need to calculate a hash of everything that defines the
gameplay, exclude the things that don't, and only show
replays that were recorded on the hash that matches the currently running
patch stack. Well, turns out that True Touhou Fans™ quite enjoy watching
the games get broken in every possible way. That's the way ZUN intended the
games to be experienced, after all. Otherwise, he'd be constantly
maintaining the games and shipping bugfix patches… 🤷)
Now, why haven't I been caching the progress numbers all along? Well,
parallelizing that parsing process onto all available CPU cores seemed
enough in 2019 when this site launched. Back then, the estimates were
calculated from slightly over 10 million lines of ASM, which took about 7
seconds to be parsed on my mid-range dev system.
Fast forward to P0142 though, and we have to parse 34.3 million lines of
ASM, which takes about 26 seconds on my dev system. That would have only
got worse with every new delivery, especially since this production server
doesn't have as many cores.
I was thinking about a "doing less" approach for a while: Parsing only the
files that had changed between the start and end commit of a push, and
keeping those deltas across push boundaries. However, that turned out to be
slightly more complex than the few hours I wanted to spend on it.
And who knows how well that would have scaled. We've still got a few
hundred pushes left to go before we're done here, after all.
So with the tag system, as always, taking longer and consuming more pushes
than I had planned, the time had come to finally address the underlying
dependency tracking problem.
Initially, this sounded like a nail that was tailor-made for
📝 my favorite hammer, Tup: Move the parser
to a separate binary, gather the list of all commits via git
rev-list, and run that parser binary on every one of the commits
returned. That should end up correctly tracking the relevant parts of
.git/ and the new binary as inputs, and cause the commits to
be re-parsed if the parser binary changes, right? Too bad that Tup both
refuses to track
anything inside .git/, and can't track a Golang binary
either, due to all of the compiler's unpredictable outputs into its build
cache. But can't we at least turn off–
> The build cache is now required as a step toward eliminating $GOPATH/pkg.
— Go 1.12 release notes
Oh, wonderful. Hey, I always liked $GOPATH! 🙁
But sure, Golang is too smart anyway to require an external build system.
The compiler's
build
ID is exactly what we need to correctly invalidate the progress number
cache. Surely there is a way to retrieve the build ID for any package that
makes up a binary at runtime via some kind of reflection, right? Right? …Of
course not, in the great Unix tradition, this functionality is only
available as a CLI tool that prints its result to stdout.
🙄
But sure, no problem, let's just exec() a separate process on
the parser's library package file… oh wait, such a thing doesn't exist
anymore, unless you manually install the package. This would
have added another complication to the build process, and you'd
still have to manually locate the package file, with its version-specific
directory name. That might have worked out in the end, but figuring
all this out would have probably gone way beyond the budget.
OK, but who cares about packages? We just care about one single file here,
anyway. Didn't they put the official Golang source code parser into the
standard library? Maybe that can give us something close to the
build ID, by hashing the abstract syntax tree of that file. Well, for
starters, one does not simply serialize the returned AST. At least
into Golang's own, most "native" Gob
format, which requires all types from the go/ast package
to be manually registered first.
That leaves
ast.Fprint() as the
only thing close to a ready-made serialization function… and guess what,
that one suffers from Golang's typical non-deterministic order when
rendering any map to a string. 🤦
Guess there's no way around the simplest, most stupid way of simply
calculating any cryptographically secure hash over the ASM parser file. 😶
It's not like we frequently change comments in this file, but still, this
could have been so much nicer.
Oh well, at least I did get that issue resolved now, in an
acceptable way. If you ever happened to see this website rebuilding: That
should now be a matter of seconds, rather than minutes. Next up: Shinki's
background animations!
Calculating the average speed of the previous crowdfunded pushes, we arrive at estimated "goals" of…
So, time's up, and I didn't even get to the entire PayPal integration and FAQ parts… 😕 Still got to clarify a couple of legal questions before formally starting this, though. So for now, let's continue with zorg's next 5 TH05 reverse-engineering and decompilation pushes, and watch those prices go down a bit… hopefully quite significantly!
In order to be able to calculate how many instructions and absolute memory references are actually being removed with each push, we first need the database with the previous pushes from the Discord crowdfunding days. And while I was at it, I also imported the summary posts from back then.
Also, we now got something resembling a web design!
So yeah, "upper bound" and "probability". In reality it's certainly better than the numbers suggest, but as I keep saying, we can't say much about position independence without having reverse-engineered everything.
Now with the number of not yet RE'd x86 instructions the you might have seen in the thpatch Discord. They're a bit smaller now, didn't filter out a couple of directives back then.
Yes, requesting these currently is super slow. That's why I didn't want to have everyone here yet!
Next step: Figuring out the actual total number of game code instructions, for that nice "% done". Also, trying to do the same for position independence.