FAQ

πŸ”—

What is this about?

For now, the aim is to perfectly reconstruct the lost[citation needed] source code of the first five Touhou Project games by ZUN Soft (now Team Shanghai Alice), which were originally released exclusively for the NEC PC-9801 platform.

The original games being:

TH01:
ζ±ζ–Ήιˆη•°δΌγ€€ο½ž The Highly Responsive to Prayers (1997)
TH02:
ζ±ζ–Ήε°ι­”ιŒ²γ€€ο½ž the Story of Eastern Wonderland (1997)
TH03:
ζ±ζ–Ήε€’ζ™‚η©Ίγ€€ο½ž Phantasmagoria of Dim.Dream (1997)
TH04:
ζ±ζ–ΉεΉ»ζƒ³ιƒ·γ€€ο½ž Lotus Land Story (1998)
TH05:
東方ζ€ͺηΆΊθ«‡γ€€ο½ž Mystic Square (1998)

Since we only have the binaries, we obviously can't know how ZUN named any variables and functions, and which comments the original code was surrounded with. Perfect therefore means that the binaries compiled from the code in the ReC98 repository are indistinguishable from ZUN's original builds, making it impossible to disprove that the original code couldn't have looked like this. This property is maintained for every Git commit along the way.

Aside from the preservation angle and the resulting deep insight into the games' mechanics, the code can then serve as the foundation for any type of mod, or any port to non-PC-98 platforms, developed by the community.

πŸ”—

Who are you?

I created the Touhou Community Reliant Automatic Patcher and Touhou Patch Center in 2012, and remained a core developer of both before retiring in March 2019. Older Touhou fans might also remember me for Touhou Music Room (2010/2011) and the Touhou Vorbis Compressor (2011).

Check my GitHub page as well as the crowdfunding log here for more proof of my track record.

πŸ”—

What's the split between OP, MAIN and MAINE? What things are they responsible for? How do they interact with each other?

Since only one executable can be active at any given time, there has to be some way of sharing data between them. This is done using a resident "Config" structure, kept at the top of conventional DOS RAM. πŸ“ As of 2020-02-23, we've reverse-engineered the contents of this structure for all 5 games. (TH01, TH02, TH03, TH04, TH05)

πŸ”—

What is this "position independence" thing about?

Position dependence means that a binary's references to global variables are expressed as raw number constants, rather than being named with identifiers:

mov ax, some_data   ; Position-independent
mov ax, 1234h       ; Position-dependent; assumes that
                    ; [some_data] is at address 1234h

If you increase or decrease the number of bytes anywhere in the non-header parts of an executable, you'll end up breaking most of these position-dependent references, since global variables no longer are where the game expects them to be. This will lead to quite some instability.

Now, why is this such an issue for PC-98 Touhou? 16-bit x86 code has to take segmentation into account for all its memory accesses. This means that each actual address is built out of two 16-bit values, the segment and the offset. Since offsets therefore can only range from 0 to 216-1, the line between actual memory offsets and numeric constants becomes blurred. Most disassemblers I know of that target this architecture therefore only do a very superficial attempt at identifying data references, and give up once arrays are involved, just leaving a numeric constant in place of such a reference. And for good reason: Doing this properly effectively requires an emulator, running the game and performing control flow analysis. Anything less than that – especially anything parsing individual lines of ASM – and you're bound to

And even with an emulator, you're still faced with the fact that on the low level of ASM and C, the declared size of an array is simply advisory anyway. So what do you put, especially when being confronted with out-of-bounds array access bugs in the original game itself?

So if you've chosen to deliver quality instead of delivering experimental research, the best choice is to give up, don't pretend to be position-independent in the first place, and treat every numeric constant that falls within the range of any data segment as a possible memory reference. Sure, this means that the actual number of memory references are lower, and thus, the actual percentage of position-independence is higher than the front page may suggest. But we can't tell, and erring on the side of caution is, in my opinion, better than pretending that the code is more position-independent than it actually is, just because it ran through some sort of experimental tool.

While position-dependent code is still a significant step up from modding game binaries via hex-editing, it effectively still suffers from most of the same constraints, despite looking like regular source code that you can just arbitrarily edit and recompile. So while modding the game in all sorts of ways is definitely possible right now, it's definitely harder than it needs to be. Once a binary reaches 100% position independence though, developing any sort of mod, in either C/C++ or ASM, will become trivial.

πŸ”—

How is position independence calculated?

The absolute number is the sum of all remaining hexadecimal literals in all code segments of a binary's big .asm dump file that fall into all of these categories:

1) Matches the regex (-?[0-9][0-9a-fA-F]{1,4})h for hexadecimal literals in TASM/MASM syntax

IDA dumps all number literals β‰₯10 as hex by default. Restricting the PI calculation to hex numbers allows us to clearly mark false positives by simply converting those numbers to decimal. Having to do this manually further communicates that every such conversion was a conscious decision, based on the newly RE'd context the number is used in.

This might seem like useless work at first, only necessary because it's dictated by some counting algorithm on a website. However, most of those false positives turn out to be things like (sub)pixel coordinates, number of score points, frame counts… which the typical person does prefer and expect to be expressed in decimal. Thus, this conversion turns into quite a quality-of-life improvement for anyone reading and modding the code. Especially with the fixed-point 12.4 "subpixel" type used for playfield-space coordinates in TH03-TH05, which we can abstract away even at the ASM level.

2) Falls within the data segment ranges occupied by ZUN data

This means that structure size independence is an explicit non-goal of PI. The reason becomes clear if you look at all the things a 16-bit number literal can represent:

If we don't limit the value range to ZUN data, all the low numbers would vastly drown out the actual memory references we are trying to identify, resulting in a number that's even less representative of the code's actual position independence. And since structures can have any size, we'll necessarily have to leave them to reverse-engineering.

3) Not the argument to any x86 instruction unrelated to memory accesses

These include

πŸ”—

Why crowdfunding?

I'm trying to steadily grow this project into an actual job so that I can spend more time working on it. Even though I don't particularly like Touhou these days, I can do a much better job here than in any corporate RL software development position, where I am typically limited by people, dumb tech stacks, and the fact that ReC98 is just a much more interesting project in general. The demand on part of the fandom is also clearly there, as evidenced by the success of this crowdfunding and this store being sold out for almost the entirety of 2021.

Another advantage: It's you, the patrons, who then get to choose which game I focus on. This has always felt wrong for me to decide, and I've never had much of a preference for a specific game to begin with.

πŸ”—

Can't a machine automate all this work? It all seems very blue-collar and mechanical.

Maybe. While it would have been an option to collect lots of money for developing an automated decompilation solution, that would have been a huge risk, and my previous attempts at it failed spectacularly. In contrast, selling small chunks of progress for an hourly wage leads to a stream of tiny, but immediate results. It may take longer in the end, but even partially reverse-engineered game code can be a tremendous help to modders. Also, naming variables, contextualizing numeric constants, and the resulting insights into the game mechanics, is something you simply can't get out of an automated solution.

Consider this piece of ASM:

; Somewhere…
	mov	byte_2CEC2, 40

; Somewhere else…
	cmp	byte_25351, 0
	jz	@@return_from_function
	; …
	cmp	byte_2CEC2, 0
	jz	@@down
	cmp	byte_2CEC2, 32
	jbe	@@return_from_function
	mov	byte_2CEC2, 0
	; …
@@down:
	dec	byte_25351

Now, I could simply decompile this into

// Somewhere…
	byte_2CEC2 = 40;

// Somewhere else…
	if(byte_25351 == 0) {
		return;
	}
	; …
	if(byte_2CEC2 != 0) {
		if(byte_2CEC2 <= 32) {
			return;
		}
		byte_2CEC2 = 0;
	}
	byte_25351--;

However, that doesn't really tell you anything that you couldn't already tell from looking at the assembly. After manually reverse-engineering the meaning of these variables, we learn that

And lo and behold, we just proved the existence of an 8-frame deathbomb window, ending up with an insight that's immediately valuable to many fans. Finally, let's define some symbols:

MISS_FRAMES = 32
DEATHBOMB_WINDOW = 8

; Somewhere…
	mov	_miss_time, MISS_FRAMES + DEATHBOMB_WINDOW

; Somewhere else…
	cmp	_bombs, 0
	jz	@@return_from_function
	; …
	cmp	_miss_time, 0
	jz	@@down
	cmp	_miss_time, MISS_FRAMES
	jbe	@@return_from_function
	mov	_miss_time, 0
	; …
@@down:
	dec	_bombs

And suddenly, it becomes both obvious and easily moddable to whoever reads the code, even while it's still assembly. This is the level I operate at. Decompilation only becomes mere syntactic sugar at this point.

πŸ”—

How long is this crowdfunding campaign going to run?

Indefinitely – and that's the beauty of it. Whenever someone is interested, they can insert a coin, and see how that money gets turned into tangible progress towards their goal of choice. Effectively, this project will run for as long as the market deems it valuable. Maybe we get enough to complete one game, maybe we won't. Maybe there will be no interest whatsoever for a few months, and then a small number of big transactions. Who knows.

In a way, this is therefore closer to art commissions than it is to your typical video game crowdfunding campaign.

πŸ”—

How does pricing work? What is a "push"?

A push is a reasonably long stretch of work towards a given goal, currently sold for a piece. You can purchase any partial amount of that sum of money though – and definitely should, if only just to signal your interest in a particular goal to the wider community and maybe drum up more support for it. However, any goal requires at least one fully funded push first before I start working on it. This approach works well with reverse-engineering, as it ensures that I get to concentrate on newly RE'd code for a while, leading to more accurate picture of the details and interactions and keeping the high standard that this project has developed over the years.

Smaller stretches of work do make sense for modding-related goals and Seihou though. For these goals, you can bypass the regular push system via microtransactions, and get immediate deliveries of any small piece of work without the upfront investment of a full push.

πŸ”—

PC-98 emulation is getting better and better, DOSBox-X even has dynamic recompilation now. Are source ports of a single game series even worth it?

Again, you decide.

πŸ”—

Isn't uth05win already what you wanted to achieve? We even have source code for it.

Initially, I thought the same, and had the impression that uth05win's source code release would immediately obsolete ReC98.

Fast-forward to 2022 though, and TH05 has been one of the most requested games among ReC98 backers. uth05win did legitimately reverse-engineer most of TH05, and it definitely was a tremendous help during the initial reverse-engineering phases of not only TH05, but also TH04 and, to a lesser extent, even the previous three games. However, the final port has taken quite some liberties, ranging from πŸ“ fanfiction fixes for even just minor inconsistencies within ZUN's original code to πŸ“ flat-out wrong code in certain boss scripts. It's completely understandable why die-hard PC-98 Touhou fans immediately dismiss it as "not the real thing". Which, ironically, led to ReC98's approach of a provably legit source code reconstruction being appreciated more, not less, among this group of people.

Also, the obvious reason I don't restrict myself to just one game.

πŸ”—

Why do pushes that are geared towards one specific game also tend to come with progress in other games? Aren't you wasting time there by not focusing 100% on what your patrons wanted you to do?

If the same function appears in more than one game, more or less unchanged, I'd only be wasting time re-familiarizing myself with all the involved concepts months later. I think it makes more sense to immediately cover identical functionality in all games. It's basically free progress for everyone else.

Then again, the more progress is made, the more infrequently this will happen, as the amount of not yet reverse-engineered code shared between the games approaches zero.

πŸ”—

Do you have a refund policy?

Yes! You can request refunds for every push I haven't started working on yet. I will keep the money after having delivered a push though.

πŸ”—

I found a bug in one of your mod releases!

Please tell me! I will release a bugfix for free, together with a short explanatory blog post, if the bug is

An example: The P0234 build of the TH01 Anniversary Edition had a bug during the transition between Sariel's two forms that caused certain pellets to be rendered as white streaks. This was due to a quirk I didn't keep in mind when writing the new pellet rendering code that this release was all about, so I πŸ“ fixed it shortly after I received the bug report, for free.

Two counterexamples:

πŸ”—

Can't Team Shanghai Alice take down this project and crowdfunding at any time?

While I can't promise that they won't, the same kind of source code reconstruction has been done for the Generation I-III PokΓ©mon games, Super Mario 64, Ocarina of Time, and Diablo, all of which still generate revenue for their rights holders. PC-98 Touhou, on the other hand, is both no longer sold and unlikely to be ever sold again in its original form due to πŸ“ various copyright infringements in the games themselves. That fact should even make them an inherently safer choice for a decompilation project than any of the aforementioned ones…

… or so you would think. Despite all that, full downloads for the PC-98 games are actively being DMCA'd by Team Shanghai Alice as of April 2022, officially robbing the games of their perceived abandonware status. Now, it is still unclear whether they plan to extend their copyright enforcement to the source code and research level that this project exists at. Without a precedent inside the Touhou scene, ReC98 does seem safe for the time being – especially since it has never included any asset data from the original games, and is unusable without supplying that data from existing game copies.
Takedowns of decompilation projects have happened outside Touhou though, most notably with Take-Two's DMCA claim against the GTA 3 / Vice City project. And as long as that court case is still pending, Team Shanghai Alice might very well try the same, even surpassing Nintendo in terms of corporate anti-consumer conduct in the process.

That said, it would take quite a bit more than a simple DMCA claim to GitHub to take down this project. Everything about it has always been self-hosted outside the US, and the GitHub presence of both the game code and website repositories only fulfills four reasons:

  1. Providing a nice code and commit browser
  2. Offering another expendable place for issues to be reported, in addition to my usual Internet presence
  3. Easy discoverability
  4. Participating in the ⭐ star count popularity contest

Yes, no "web hosting" on this list, and no essential reliance on GitHub infrastructure anywhere else. 1) could even be implemented as part of this website in a push or two, if you all consider that a worthwhile thing to have.

There is certainly an argument in favor of taking down the project at the first sign of resistance. Why continue working in the hostile environment that is canon Touhou if the perceived "free culture spirit" has been nothing but a misunderstanding from the very beginning, and the rights holders are more corporate, controlling, and distant than actual big corporations? The rational choice would definitely be to leave the sinking ship. I'd love if there was enough money in the non-Touhou parts of the PC-98 scene for a follow-up project, it would be a shame to let my experience on the platform go to waste. Meme games, anyone?

Seihou also looks rather welcoming, don't you think?

But that's not what all the backers signed up for. I have several ideas for transforming the project after a takedown notice while still keeping its essence. I will keep it running as long as possible – even if that will someday mean that I have to manually send out the source code to people. And the blog – arguably the main attraction while development is still ongoing – should be, in theory, even safer than that. Let's wait and see how far Team Shanghai Alice will actually escalate this. There are risks, and you should be aware of them and invest responsibly, but I'm far from panicking.

Until then (and let's all hope we'll never reach that point): Always keep in mind that the product is both the code and the documentation, in the form of new commits in a Git repository. Nothing more, nothing less. Perform a git clone after I pushed the commits you bought, and you now have a DRM-free digital copy of the progress you paid for. Even if I have to start manually sending out the source code to people, rest assured that nothing I produce will ever be put behind a paywall. (The only thing that is behind a paywall is the time it takes to make it all happen.)

And finally, because it seems to be frequently misunderstood: I have never sold the promise of a finished Windows/Linux/phone release of any of the games, and still don't sell the promise of any Team Shanghai Alice release in relation to the completion of ReC98. Sure, in a fair world, Team Shanghai Alice would leave this project alive until it reached its core goal, and then acquire and commercially exploit it. They have every right to do this, and it would be fine by me, as I will have been paid my fair share at that point. But throughout all its Windows history, Touhou has always been a poster child of the "not invented here" mentality prevalent in Japanese business. Even the currently unlikely event of a takedown is much, much more likely than them acknowledging or even using my work. Heck, I remember hearing about offers for professional localization from people who are much closer to ZUN than I am[citation needed], and they've all been turned down…

πŸ”—

Can I still help out with the reverse-engineering by contributing to the ReC98 repository?

The amount of time I spend on raw reverse-engineering and decompilation almost pales in comparison to the deeper research and documentation work that this project evolved towards. Unless you can deliver at a similar level, I would spend almost as much time reviewing your changes as if I just did everything myself, if not more. Frequent RE pull requests also reduce the chance for me to turn this into my only job, which I would very much like to do.

If you still want to help by coding, I've got a bunch of other contribution-ideas. These are slightly out of scope of the main project, but interesting for the big picture nonetheless. They come with a lower barrier to entry, offer more freedom than regular reverse-engineering work, and I would actually appreciate your help there.

In all honesty though, spending your time contributing to any other project would probably bring you much further in life than anything related to the main Touhou series ever will.

πŸ”—

What about contributing code to the website?

There is a wide array of potential features that could be added to this site. Better accessibility for progress tables and syntax highlighting for code snippets are two examples of features that I already have in mind for future website pushes. Other features might be cool to have, but are maybe too expensive relative to their usefulness, such as πŸ“ porting the ZMBV codec to the Web for efficient lossless video support on the blog. And of course, there might be features that I haven't even thought about so far.
Unlike the core reverse-engineering business, improving the website is something I only get to do maybe once or twice a year, as a side project. That's why I would also highly appreciate you helping out in that regard.

Note that any code contributions to the website will be licensed under the AGPL.

πŸ”—

Why a cap?

The cap corresponds to the maximum time I can healthily allocate to this project within the next 4 weeks. It is meant to

πŸ”—

With the current rate of progress, and the cap being at the low level that it is, the project is never going to finish!

If you all manage to regularly sell out the store at higher and higher prices per push, I will be able to increase the cap ever so slightly by reducing those pesky RL work hours. If those reach zero and I can turn ReC98 into my only job, I can remove the cap entirely and go for a proportional "bidding war" model instead, allocating my constant amount of time relative to how much money comes in for a particular goal.

But as of December 2021, that's still far off. Meanwhile, the steadily increasing amount of care and documentation I put into this project has proved highly popular, while no one has ever requested me to compromise on that and instead rush towards 100% RE as fast as possible. So, getting everything 100% done within the foreseeable future doesn't actually seem to be much of a concern for my existing audience.

πŸ”—

Some of the 2018 pushes were delivered months or even years after they were paid…

Back then, I not only didn't have a cap, but also vastly undersold myself, while also offering crowdfunded features for thcrap in parallel. That's why the latter are sometimes referred to in the old blog posts here. But compare that to now:

However, if you absolutely request me to prioritize an element of a game that requires a ton of not yet reverse-engineered knowledge to fully grasp, and you absolutely don't accept your money going to anything else, I will have to put that on the back burner. It will be made clear in the backlog whenever that happens, though.

πŸ”—

I'd like to see PC-98 Policenauts (or any other DOS program compiled using Borland/Turbo C++) decompiled. What's in it for me?

The ReC98 repository includes a currently incomplete file with the ASMβ†’C++ patterns, as well as information about the limits of decompilability. This file will be continuously updated with new insights. So while you probably wouldn't want to support this project until the very end, it might be worth supporting ReC98 for just a bit – at least until it becomes obvious that I completely figured out Turbo C++, and that other decompilation project you wanted to see made significant progress.

And who knows, maybe we will see a somewhat automated decompilation solution come out of this.

πŸ”—

I want replays! What's in it for me?

As of January 2022, I also offer to develop PC-98-native replay mods, if you don't want to wait for your favorite game to get 100% decompiled and ported to a modern system first. There's a separate option in the order form just for that goal.

πŸ”—

I want translations into languages with non-ASCII characters! What's in it for me?

In 2023, Touhou Patch Center commissioned the basic feature set that would allow such translations, and I expect to deliver it during 2024, after completing the Shuusou Gyoku Linux port. However, you probably want more than the basic features, so feel free to further support this goal by donating to their :opencollective: Open Collective page. There's no cap there, and every bit will help to improve the end product. πŸ“ Check my announcement blog post for details and feature ideas.

πŸ”—

I want TH03 netplay! What's in it for me?

I'm not a low-level networking person, so who knows whether doing this natively on PC-98 is actually as impractical as it sounds. Porting the game to a modern OS with a network stack first (which, again, requires a complete decompilation) will certainly be a lot more convenient to whoever ends up trying their hand on it, though.

πŸ”—

Do you sell ad space on this site?

Every contributor, no matter how much they paid, has the option to have their name be turned into a link to a URL of their choice. So if you consider that to be advertising, then yes. If you had more than that in mind, hit me up, and we might make it happen. No JavaScript or remote content, though!

πŸ”—

Alright! I have understood what this project is about, and am convinced that I want to support it. Take me to the order form!