Game Science Part 2: Old is Gold
Hurdles of going cross-platform ... Fixed-point arithmetic, fast hitboxes, CRT effects
In our last dev blog, I covered some of the bare essential concepts required to build a game engine. Today I want to talk a little bit more about what makes the framework we’ve built special, and what unique technical challenges needed to be overcome for our game to eventually be made. First, a note about our development pace.
We’re behind the previously projected schedule of having working game logic by now and getting artists’ work into the game, partly because our only other programmer is occupied with many other projects, and partly because I’m still learning a lot of this with you guys too! I’ve been writing software for almost a decade eversince middle school, but this is the furthest I’ve got in a serious project of this scale, and since it’s a commercial product I want it to be done right the first time. When the game is done it better be a complete package. Since I’m building all of the engine tools by myself it’s not unreasonable for the game to linger several additional months in the early stages, for reasons you’ll soon realize shortly. This also isn’t a full time job for me, I have to balance my time to prevent burnout. The estimated release date has been pushed back to as late as Winter 2026, but this could improve as we pick up the pace and more work gets offloaded to paid volunteers.
Our project is a vertically-scrolling shmup called TIME FALCON. Set during a 600-year long war of attrition against a former ally, the game will have you blasting away swarms of ships across seven planets using wormhole teleportation and a varied roster of weapons earned and upgraded throughout your run. I’m in the process of writing the GDD for the game, and I hope to shed more light on the premise of the gameplay once our engine tech is completed.
Keeping the Code Portable
One unusual goal for TIME FALCON is getting the game onto AGA Amigas. We're building two vastly different versions of the game from the same source code, one made for a modern PC-compatible with all of it's conveniences, and one made for the more aging esoteric platforms. I thought it would make for a good programming exercise, with the rationale being that if we can get our game to run on a computer from 30+ years ago, it’s a safe bet we can port it to just about anything made after.
A huge win for accessibility, as one of our stated goals in our previous blog post was aiming for the low income market. Even if this port doesn’t happen, there is some comfort knowing the code will be much faster and we can reuse the optimizations in other ports that end up releasing. Most of the code can be reused but any draw calls to Allegro will have to be wrapped and pointed to the Commodore NDK. Our toolchain of choice is amiga-gcc. It's not perfect, it's based on a version of the GNU compiler from almost a decade ago and using it's built-in libnix support bloats even a basic "Hello World!" to over a megabyte in size. Not exactly efficient, but it's easy to get fast ports of x86 Linux programs working with it.
The real obstacles to overcome have to do with math. First, we can’t rely on floating-point math for anything in the game logic. Because of the replay system, the math for all of the gameplay has to match on every single platform, and that means the modern PC versions of the game need to appeal to the lowest end. Hardware floating point math didn’t get good in computers until the Pentium, and the Amiga’s equivalent in the Motorola 68040 and 68060 was (allegedly) terrrrrible. Still, better than none at all: many computers even today, mostly embedded systems, don't have any FPU.
Even more research reveals that IEEE 754 is not always followed on some hardware, infact it’s quite lenient to begin with, so much so that different “compliant” compilers and standard libraries for the same platform can produce varying results due to precision loss. So even if the game was a PC exclusive, there could be compatibility issues between for example the Windows and Linux versions of the game. This isn't just a hypothetical situation, this has happened in real commercial PC games before, albiet with usually insignificant consequences.
If a new PC-only game can’t even avoid this discrepancy, what guarantees that we can get deterministic calculations on drastically different computers that are decades apart from eachother? Fixed point math isn't the answer by itself, because built-in support for fixed point arithmetic in C compilers isn't so great either.
To avoid all of these problems, as much of the game’s math needs to be done with custom math libraries as possible. We’re using a third party library called fpm to get around this, but it needs some adjustments to work with our existing codebase as it's a bit stingy about conversions.
The second big problem is the speed of our collision system. We have rotating hitbox objects which, on each tick, are converted into 4-sided polygons and piped into an algorithm based on the Separate Axis Theorem. There are a lot of opportunities to optimize our approach. But the biggest one specific to the backports is avoiding any multiplication, division, sine or cosine calculations in time-sensitive functions. Converting the hitboxes into real polygons for the intersection algorithm is very expensive for Motorola 680x0 processors and other old platforms because it relies heavily on sine and cosine functions to produce rotation. Many of the most expensive math instructions throughout the game can be cut down on, but the sine wave is tricky, because it has to be calculated for every vertex of every single bounding box, and that data has to be available on every single tick. And that brings us to…
I Am Speed
(Optimizing the Intersection Algorithm)
How can we address the speed problem with minimal compromise in a way that’s compatible between every single computer system the game will run on?
The very first optimization I did was to simply cache each confirmed collision and repeat the collideWith() call in reverse whenever something matches the cache. Unless you’re doing something to move the hitbox when it’s touched or in between the reciprocated hits, it’s impossible for an object to have a one-way collision, and if it did happen this way it’s almost certanly an oversight. So it makes sense you’d want to skip doing all of the hard math and just call the collide event on both objects mutually after only the first intersection is detected. This is a really simple shortcut to implement, and it instantly cuts down on the performance penalty of all hitboxes by 50%.
We can skip more physics checks by adding a whitelist or blacklist to certain object types and ignoring them if we know the two boxes being checked will never interact with eachother. In a bullethell shooter this is a pretty big save, since projectiles are going to make up the majority of them. There’s no reason any of the bullets should be talking to one another unless some weapon parries another weapon, for those cases we can make exceptions to the rule.
Another solution I came up with was to allow physics calculations to be skipped on certain ticks. Slower computers can use a lower physics rate if they struggle to keep up. It’s an elegant solution that required very little changes to the code to make happen, and all it takes to make it function identically across systems is to record one more unsigned integer in the demo file. That’s it! There doesn’t need to be any code for “if this is a replay from a 68k, do this thing instead”. The game just has to change the physics rate and the rest of the demo file emerges from the inputs as planned.
This does come with some caveats. For starters, if you have a bullet deal continuous damage for multiple ticks before fizzling out, the damage-per-second will be altered. The implementation I have in mind for bullets avoids this problem anyways, but it’s another gotcha that has to be kept in the back of our minds when writing object behaviors. The second and most obvious problem is that this makes collision detection less accurate. There is an inherent discrepancy between what is shown on-screen and what the game counts as a successful attack. Kept at around 30 Hz or higher, this shouldn’t be too noticable in most cases. Perhaps one or two stray bullets will pass through a thin opponent in some poor soul’s world record run, and they miss a few points of damage, not really a big deal in our case since our weapons will have infinite ammo.
A convenient upside to this is that it almost entirely takes care of our interpolation problem. Our engine has no animation system yet. I wanted to introduce “intents”, a system to allow objects to interpolate their own sprites every frame based on what they were doing on the last tick, but I scrapped it. That’s a lot of work to make something look right for a margin of players on high refresh rate monitors, on PC only, when I could simply run the game at 120 or 240 Hz instead and then lower the expensive physics pass to a more reasonable 60-ish Hz. We can brute force the game’s speed to make up for input lag with very little consequences now, if we need to.
The last big optimization for hit detection is to speed up the collisions themselves when they are checked. Step one is to cache the vertices of each hitbox. Earlier I mentioned that the conversion from bounding boxes to real polygons for the intersection algorithm was really, really slow. So we want to avoid doing this conversion whenever possible. The gist of it is, if the hitbox has not moved since the last physics tick, we pull the vertices from the last physics tick and reuse them. Finally, we can use a much simpler algorithm for the conversion if the box is axis-aligned, and a much much more simple algorithm for the entire collider process if both of them are. If we have to do the full calculation, we can use some fake LUT-based sine wave functions for the slower version of the algorithm. All of this cuts down on the required CPU time significantly.
Some of these optimizations are work-in-progress right now, but the ideas are solid and I’m confident about finishing them.
Mind-Boggling Effects
(The Scanline Interrupt)
When designing background graphics for the bosses in TIME FALCON, I wanted to convey what it would feel like to travel between the fabric of reality itself. You often see depictions of wormholes in movies, but what’s a good method to convey that aesthetic in an appealing way that fits a retro art style? For some of these battle backgrounds, I took inspiration from the SNES game Earthbound. I was intrigued by the oscilation effects in that game and thought they would be a great fit.
Below is one background I created, as an example. The animation is put together using a 30 FPS, 4-bit flipbook that is recolored and then layered on top of itself to create these 3D pillars that move into the screen. This might be replaced with realtime 3D in the future if the memory save is worth it, who knows. But how is that wavy effect done?
Old computers did something called a Raster Interrupt to accomplish special effects like this. It was a surprisingly versatile feature that became a staple of old-school hardware frequently used by all kinds of games. In a nutshell, you can tell the hardware to pause whatever code you’re running and go do something else when a certain event occurs, in this case, that event is whenever the phosphor beam in the CRT display finishes drawing a certain row of pixels. You can respond however you want to these hardware interrupts, and one common response was to change certain data used by the graphics chip on the fly as it was still sending the data out to the screen. This takes very little extra time for the computer to pull off, so it’s an excellent way to spice up your game without sacrificing much.
Huge Problem: while it’s still totally feasible to do scanline interrupts on modern LCD screens, graphics cards for PCs present no concept of this. The OS’s graphics driver handles all of the synchronization with the monitor or television automatically and hides it behind many layers of abstraction, there is no convenient way to interact with the scanning beam nowadays from our 3D libraries. These tricks are simple to pull off on an old 90s system specifically designed for it, but ironically very difficult for our much faster systems of today.
This is one of the reasons we decided to target DirectX 9 for the PC version in our last blog post. The only way to do this effect otherwise would be to bruteforce every single line as a separate draw call, resulting in millions of wasted CPU cycles telling the GPU how to recreate the effect using it's Fixed Function Pipeline. DirectX 9's programmable pipeline (and OpenGL 2+) allows us to upload a program called a "shader" and give it just a few draw calls instead of tens of thousands, so the GPU can do all of the work instantly instead of waiting for the CPU to slowly pipe in the draw data for each scanline.
Below is a simple GLSL implementation of this effect. (You'll need WebGL enabled to see it.) It runs pretty well even on low end devices. The video I showed you above, on the other hand, was done using the shader-less method. While it doesn't look like it's lagging, that's because it was recorded on a computer way above our target specs. If we tried that on a lower end machine it would absolutely struggle. One of the computers we tested with barely managed 60 FPS and that's without a single enemy on the screen. When we reimplmented the special effects in that background as a shader, we saw a minimum 700% increase in performance right away.
Closing Notes
The last thing I wanted to mention was controller support. I wrote our own file format for storing Dinput key binds and mapping them to the input bitmask touched on in the previous blog entry. There’s still work to do, setting default binds for certain supported controllers, finishing the button prompt graphics, and getting good Xinput support for Xbox and Dualshock controllers to name a few. But we have the feature working with a Sega Saturn controller right now and the results are flawless.
Allegro just got improved gamepad support in 5.2.11 which isn’t out yet, so we have to compile it from source for our engine to function and that’s easier said than done for Windows. So that has made setting up MinGW a real pain in the ass. But this should be long resolved by the time the game is done.
Some of my work on this game has resulted in bug fixes appearing in Allegro the next day, and we’re happy to see that our game has indirectly contributed to free software used by thousands of developers like us. Given more free time, I have considered submitting upstream patches in the future to develop Allegro even further. Their support for tracker modules needs some work, we added our own libxmp integration into our engine to work around the pretty lacking DUMB & OpenMPT backends provided by Allegro. Not that DUMB or Modplug are bad, but rather Allegro on it’s own doesn’t expose enough control over tracker playback to really make using sequencer formats worthwhile.
Other minor features we’ve been working on include a pixelation filter, multi-language support, and tidying up the intro screens. The final major addition required to get our first trial version of the game out is the level loading routine, which has been difficult to get right. I'm eagar to cover it in the next episode.
May the fourth be with you!
A grateful shoutout to: SiegeLord