Since my last post in October about Cab Hustle, my Commodore 64 retro game programming project, there has been some progress—mostly under the hood. Before diving into it, the game was also covered on the Universe64 YouTube channel. That was a bit of a pleasant surprise since outside of this blog and on my YouTube channel, I haven’t been talking about it at all.

As a quick review to catch everyone up: Cab Hustle is a two-dimensional thrust/gravity-based game. The objective is to pick up passengers by landing on platforms and then bring them to their requested destination platform. Here’s what’s new (alongside some background about developing for the C64).

## Improvements

As far as changes go, I’ve replaced the algorithm that places passengers on platforms. In the code, these pick-ups are called jobs. Jobs are now probabilistic, so there is hopefully more variety throughout the game. The way jobs are created is done such that the player is slowly taken through all the rooms, i.e. there is a slow forward progression through the rooms with some occasional backtracking.

Next, the flickering status display that you can see in the bottom left of my last video no longer flickers. The problem in the prior version was that for every frame drawn, most CPU cycles actually went into formatting the strings shown (e.g. converting numbers to strings). The game still spends most of its time with this, but it no longer takes more than a frame, so the flicker is gone. There’s room for a lot more optimization, but that isn’t required for now since I don’t need the raster time (yet) for other computations.

As a side note, benchmarking this type of code is actually quite easy on the C64. Since the function rendering the status display gets called every frame, updating the border color of the screen in specific spots throughout the function visualizes the time consumed with minimal effort. Example: on function entry, set the border color to white and on exit to black. A big block of white border on top of the screen now tells us that we’re spending too much time in this function…

## Rearranging Map Data

Another larger set of changes revolves around how the rooms of the map are stored. Cab Hustle uses the multicolor character mode of the C64, so updating the background graphics (everything but the ship) requires changing what characters are in each character cell (that’s done using Screen RAM) and selecting one of the four colors for the cell (that’s done using Color RAM; the remaining three colors are set globally). Actraiser provides a good overview of how this works on the Dustlayer blog.

Previously when entering a new room, the room’s data was copied straight into Screen RAM, and the Color RAM was set based on the character chosen in the cell. That requires exactly 1000 bytes of data per room, one byte per character cell (and there are 40×25 cells). The downside of this approach is that each character has always the same colors. The upside is that that way the Color RAM information can be derived from Screen RAM and doesn’t have to be stored separately—that would require another 500 bytes per room (4 bits per cell). To boot, this is also the way that CharPad, the software I use to edit the rooms, exports the data. In practice, the color limitation is negligible. In some cases I am manually updating Color RAM. That happens for the “Game Over” text or for highlighting the target platform (only Color RAM changes to highlight it in green).

The room data for the game therefore takes 9000 bytes for the 9 rooms. In addition, to compute the right value for the color RAM, a lookup table of 256 bytes is used, which maps the 256 possible character values to 4 bits of color information. At the cost of slowing down the lookup, this could be done in 128 bytes, but since the lookup needs to be done 1000 times when showing a new room, it’s better to keep it fast. While 9000 bytes doesn’t sound like much by modern standards, it is a large chunk of the C64’s 64 kB of RAM. That in turn severely limits how large future maps can get. I’d like to add additional larger maps that allow for speeding across longer distances, for example.

The latest version uses zlib compression for all room data. During the build, a Python script compresses the data exported from CharPad and writes out a new assembly source file containing the compressed room data, which then gets linked into the binary. On the C64 side, cc65 provides an inflatemem() function, defined in zlib.h, which can uncompress the original data again.

Let’s tally up how well this works to conserve RAM. First, room data compresses efficiently and rooms now range between 250 to 350 bytes (less than 300 bytes on average) instead of the original 1000 bytes. However, the decompression routine requires space. Code and initialized data take about 600 bytes, which now need to be part of the final binary. In addition, the decompression code requires additional memory for uninitialized data (i.e. in the BSS segment). Uninitialized data is not added to the binary but space for it needs to be reserved by the linker in the C64’s memory map. That space is no longer available for other uses. Looking at the cc65 library code, that’s about another 800 bytes. So overall, the 9000 bytes of room data are now in the range of 4000 bytes of data and code. With the original allotment of 9000 bytes, 25 rooms could be handled instead of just 9. Let’s call that a win.

There’s of course no free lunch, especially with retrocomputing. And the cost to pay here is the speed of decompression. When flying from one room into the next, there’s now a slightly longer delay. It’s noticeable but I consider it acceptable. At the moment, the implementation first does a decompression into a backbuffer. Then, it copies the backbuffer to Screen RAM while updating Color RAM concurrently. This works similarly to how things worked before with the difference that the raw Screen RAM data first needs to be decompressed. Decompression directly into Screen RAM (avoiding the indirection over a backbuffer) and then updating Color RAM causes visually unpleasant color artifacts while the update is in progress since for some of the new characters written to Screen RAM the Color RAM still has the old color information. For example, if a brick cell (gray) gets replaced with part of a platform (e.g. blue) in Screen RAM, then the brick is blue for a split second. A custom decompression routine that updates Color RAM concurrently with Screen RAM could address that in a later version.

There are some other options to shrink down the size of the data required for the map. First, there are faster decompression algorithms than zlib’s deflate. LZ4 generally produces larger output but is faster, and there conveniently is also an implementation for 6502 machines that comes with cc65. I’ve yet to try it out. Alternatively, CharPad supports a tiled mode where tiles consist of multiple characters, e.g. 4×3 blocks. That reduces storage somewhat; in this example 10×9 tiles are required to cover 40×25 cells. On top of that, tile definitions need to be stored (mapping a tile ID to 4 times 3 characters), but those are shared for all rooms. The downside is that this approach imposes placement constraints since every block needs to be aligned on the tile grid.

Another possible option I considered is to place elements of varying size freely at defined coordinates, i.e. having a list of elements that make up a given room alongside their coordinates. However, that would require me writing some bespoke tooling. It’s something to try for a future project.

Compression for rooms aside, another recent change is that all map-related information moved into a single struct. The idea is that in the future new maps can be loaded at runtime, and for that I need the map data in a consecutive chunk of memory with known offsets (or pointers to data at known offsets). For each map, that data includes job generation probabilities per room, locations of platforms, arrangement information for room data, and the actual compressed room data.

## Title Screen and Music

Lastly, I’ve made some progress on adding a title screen and music. The title screen is work in progress (I find drawing geometric shapes for the tile-based map quite a bit easier). It uses multicolor bitmap mode as opposed to multicolor character mode (see the Dustlayer link above for the details). In a nutshell, in contrast to multicolor character mode, in multicolor bitmap mode each attribute cell can have its own pixel pattern (it does not need to be out of a set of 256 character shapes) and instead of three global and one cell-specific color, there’s only one global color (the background color), and three colors per cell can be chosen freely.

What’s the cost for these lifted constraints? First, there is a new chunk of memory that is used, called Bitmap RAM. It’s a whopping 8 kB in size. On the plus side, a character set is no longer needed, which saves 2kB. That makes sense since 1000 attribute cells are now directly defined instead of using 256 reusable characters (approximately one fourth).

In multicolor character mode, the Screen RAM references a character and hence the pixel data to be displayed. Since the pattern of pixels to be shown is now handled over the Bitmap RAM, the Screen RAM is used to define two additional cell-specific colors (4 bits of information for each color). Alongside the color defined in Color RAM and the global background color, that allows for four colors per cell again (but with fewer limitations).

There is one more drawback. Multicolor modes on the C64 have double-wide pixels. But in multicolor text mode, the highest bit of the cell-specific 4-bit color value toggles the cell between multicolor or standard mode. That means the freely choosable color can only come out of a palette of 8 and not 16 colors (1 bit for the flag and 3 bits for picking the color). It does however allow rendering a cell in high resolution with two colors with standard square pixels. That is especially useful for text. In multicolor bitmap mode, that option doesn’t exist. So any text in this mode needs to be represented awkwardly using double-wide pixels. On the other hand, all colors can be chosen out of the full palette of 16 colors.

As far as adding music to the title screen goes—it’s been one step forward and two steps back. For Slither, I used SidTracker64 to compose music for the game. Well, “music” may be an overstatement for the in-game cacophony, but you get the point. SidTracker64 is an iPad app, and I’m not a fan of touch controls. That aside, my iPad is no longer supported by Apple and receives no security updates, which results in it mainly collecting dust these days. Cue a Right to Repair rant here…

For Cab Hustle, I picked up a copy of Deflemask to create the music. However, I’m running into a number of problems exporting a SID file from it that I can include in the game. First, let’s talk about what a SID file is and how music is commonly done on the C64. SID files consist of a little metadata and a bunch of 6502 code. A program that wants to play music calls part of that code (the playroutine) on every frame, which is generally accomplished by hooking the playroutine into the raster interrupt. The playroutine then writes values to the C64’s sound chip (the SID).

There are simpler ways to create music, of course. The Programmer’s Reference Guide that came with the C64 showed a couple of BASIC programs that used pitch and duration data to modify SID registers, i.e. set the pitch, trigger the note, wait a while, release the note. The downsides of this approach are that it consumes the CPU 100% of the time and that more complex sounds are harder to create this way.

While the SID chip is a great synthesizer, it lacks some of the features of what a full-fledged synthesizer provides, such as a modulation matrix. To achieve a tremolo effect (an oscillating change in volume) on a modern synthesizer, one might hook up a low-frequency oscillator (LFO) to the synth’s volume parameter. The LFO’s frequency can be set to achieve the desired pulsing of volume—nothing more is needed once a note is triggered. Or alternatively, an envelope generator can be connected to the pitch, causing a note to glide up and down again once triggered. In the case of the SID, all of these changes need to be applied by the CPU during each frame instead. On top of triggering the right notes in the right order, this is also done by the playroutine.