Not And Xor

Making a combo randomizer part 01

Emulating the Nintendo 64 boot process on the Nintendo 64

For over three years, I have been working on OoTMM, a combo randomizer in the like of SMZ3, but for Ocarina of Time and Majora's Mask. I started working on it seriously since around a year. It's been out for a few months now[1], and I'm gonna write a few posts about the development of this project.

This post is about how the Nintendo 64 loads games, and how Ocarina of Time can be modified to load Majora's Mask.

Booting the Nintendo 64

N64 games are stored on cartridges that are plugged into the console and memory-mapped into various parts of the address space. The virtual address space of the system is 32-bit, and is split into a few large segments, only two of which are really used[2]: KSEG0 and KSEG1, starting at 0x80000000 and 0xa0000000 respectively. They map to the same physical range, but KSEG0 is cached, while KSEG1 is not.

The physical memory map is more complex, given that the system is built around a lot of components that the CPU can talk to using MMIO[3]. The main memory, called RDRAM, is mapped to the lowest 8MiB of the address space (or 4MiB, if the expansion pack is not installed). The area starting at 0x10000000 is mapped to the "main" part of the cartridge[4]. Since it's usually desirable to access the cartridge through the uncached segment, for most purposes, the cartridge can be considered to be mapped starting at 0xb0000000 in the virtual space.

When the system is turned on, it performs a 3-stage boot process, called IPL1, IPL2 and IPL3. The first two stages are handled by the internal boot rom and perform basic hardware setup, and then attempts to load IPL3 which is stored in the cartridge itself. There are a few different versions of IPL3, depending on the game, but they all perform the same basic operations: load the first MiB of the cartridge (not including its header, which is also where the IPL3 is stored) into memory, at physical offset 0x1000, checksum it, and if it's valid, jump into the entrypoint of the game.

The thing is, this process is fairly straightforward to replicate[5], and n64 roms can get pretty large, which are the two main reasons I considered this project worth investigating in the first place.

Merging ROMs

The first step is of course to create a ROM that contains both games. Both Ocarina of Time and Majora's Mask are 32MiB in size[6], but the Nintendo 64 handles 64MiB carts just fine, so a basic solution is to do the dumbest thing that could possibly work and just concatenate the two ROMs together. Ocarina lives in the low half, and Majora in the high half.

Ocarina of Time does not care the slightest that the ROM is oversized and happily boots, without having to change anything, completely ignoring the upper 32MiB. So there are only two things left to do:

  • Inject some code into OoT to replicate the n64 boot process, but with a 32 MiB offset, making MM load.
  • Patch MM so that every time it tries to load something from the cartridge, it adds 32 MiB to the address.

How exactly I managed to add custom code to the games is a fairly complex topic that will take a few posts at least, so for the time being, let's just assume I can patch and add some assembly and some C there and there.

I decided fairly early on that the trigger to load MM would be stepping in the Happy Mask Shop in OoT, since, well, masks[7]. Turns out, there is a function called Play_Init living at address 0x8009a750, that is called every time the player enters a scene. The scene is determined by an entrance index, a 32-bit value stored at 0x8011a5d0. It was not very hard to write some assembly to check for entrance 0x1d1, which is the Happy Mask Shop, and call some custom function if that's the case.

Writing the code to replicate the boot process is not extremely hard either, but while running the game, the system is in a much more dynamic state than after the IPL2. There are a lot of things that are going on: threads, audio and video processing, peripherals being read, etc. All of that needs to be stopped, in order to back to a known, simpler state that will not interfere with the loading.

Interrupts and subsystems

The very first step here is to disable all interrupts. We don't want the interrupts to interfere, and MM's entrypoint will expect instructions to be masked anyway. There is only one core on the Nintendo 64, so disabling interrupts also ensures that no other code will run, essentially forcing critical section semantics. The assembly for that is pretty straightforward:

.global comboDisableInterrupts
comboDisableInterrupts:
  /* Disable the IE bit in COP0 Status to disable interrupts */
  mfc0 a0, $12
  li a1, 0xfffffffe
  and a0, a1
  mtc0 a0, $12

  /* Mask MIPS Interface interrupts */
  la   a1, 0xa4300000
  li   a0, 0x0555
  jr   ra
   sw  a0, 0x0c(a1)

Bit 0 of the Status register of coprocessor 0 is IE, the Interrupt Enable bit. Clearing it does exactly what you would expect. The Nintendo 64 has it's own interrupt system that are used by the various peripheral, and they all assert the same interrupt line on the CPU[8]. Writing 0x0555 to the memory mapped register at 0xa4300000 masks these interrupts. Note that it does not necessarily prevents new interrupts from being generated, nor does it clear the interrupts that might be pending (as unlikely as it is in this 4 instruictions window). This is handled right afterwards, by reading and writing a bunch of MMIO registers. The code here is fairly long and not very interesting, but what it does essnetially is the following:

  • Disable the video output and clear any video interrupt
  • Stop the RSP and the RDP (the graphical co-processors, essentially) and clear their interrupts
  • Stop the audio output and disable the audio interrupt
  • Disable the controller interface and clear the controller interrupt

Cache and DMA

For console compatibility, it's also required to handle the caches. The MIPS CPU in the Nintendo 64 uses a simple system with a single data cache and a single instruction cache, with no coherency. As Majora's Mask will be loaded via DMA, bypassing the CPU caches, it's very important that the first MiB of memory is not present in the cache at all, or else there is a risk of executing garbage or silently corrupting data, depending on which cache is at fault. It is not possible to rely on the cache functions that comes with Ocarina of Time, since they are located within the first MiB of memory themselves! Instead, it's necessary to reimplement these functions. Fortunately they're not hard to implement:

#define ICACHELINE 0x20
#define DCACHELINE 0x10

void comboInvalICache(void* addr, u32 size)
{
    u32 iaddr;
    u32 iend;
    u32 count;

    iaddr = (u32)addr & ~(ICACHELINE - 1);
    iend = (u32)addr + size;
    count = (iend - iaddr + (ICACHELINE - 1)) / ICACHELINE;

    for (u32 i = 0; i < count; ++i)
    {
        __asm__ __volatile__("cache 0x10, 0(%0)" :: "r"(iaddr));
        iaddr += ICACHELINE;
    }
}

void comboInvalDCache(void* addr, u32 size)
{
    u32 daddr;
    u32 dend;
    u32 count;

    daddr = (u32)addr & ~(DCACHELINE - 1);
    dend = (u32)addr + size;
    count = (dend - daddr + (DCACHELINE - 1)) / DCACHELINE;

    for (u32 i = 0; i < count; ++i)
    {
        __asm__ __volatile__("cache 0x11, 0(%0)" :: "r"(daddr));
        daddr += DCACHELINE;
    }
}

Setting up a temporary stack is also required, since the current one might be overriden when loading the game. Any address sufficiently higher than the first MiB does the trick.

Now that all this setup is done, it's possible to actually load Majora's Mask. The only thing left to do is to load the first MiB[9] of the ROM into memory and jump to the entrypoint. Doing that through CPU reads and writes is doable but slow. Instead, a much faster DMA transfer can be used to do the job. It's as simple as writing the source, destination and size to some MMIO registers, and then busy-looping until the transfer is done:

static void waitForPi(void)
{
    u32 status;

    for (;;)
    {
        status = IO_READ(PI_STATUS_REG);
        if ((status & 3) == 0)
            return;
    }
}

void comboDma_NoCacheInval(void* dramAddr, u32 cartAddr, u32 size)
{
    u32 tmp;

    waitForPi();
    while (size)
    {
        tmp = size;
        if (tmp > 0x2000)
            tmp = 0x2000;
        IO_WRITE(PI_DRAM_ADDR_REG, (u32)dramAddr & 0x1fffffff);
        IO_WRITE(PI_CART_ADDR_REG, cartAddr | PI_DOM1_ADDR2);
        IO_WRITE(PI_WR_LEN_REG, tmp - 1);
        waitForPi();
        size -= tmp;
        dramAddr = (void*)((u32)dramAddr + tmp);
        cartAddr += tmp;
    }
}

And after jumping to the MM entry point at 0x80080000, Majora's Mask finally loads, and then immediately crash as soon as it tries to load some data from the cartridge.

Fixups

To fix that, every time Majora's Mask tries to load anything from the cartridge, 0x2000000 needs to be added to the physical address, to account for the offset in the ROM. Patching the code that initiates every DMA request in MM works, but unfortunately it's a bit of a dead-end, as we'll need to load data below the 32 MiB mark later, when adding OoT elements into MM. The correct solution is to patch every single physical address to add the 32 MiB offset. Fortunately, there are not that many of them. There is the main DMA table located at 0x7430 that is trivial to patch from the script that builds the combined ROM, and then there are a few hardcoded offsets, mostly related to audio.

Once all of these are patched, this is the result:

This was, more or less, the state of the project for most of its life, before I seriously started working on it. It was a fun proof of concept, but was not really playable in that state. The MM menu was still there, creating a save in MM didn't work, it was not possible to go back to OoT, and various anti-piracy measures triggered since the game was started in an unorthodox way. And, of course, nothing was randomized, which is kind of a problem for a randomizer.

Nowadays, the system is a bit more complex than that, as is the ROM layout. But this is still a good starting point to understand how the system works.