Metaprogramming a full Virtual Machine


If you are currently in Epita, you might be coding Corewar. In the second part of the project, you have to code the Virtual Machine.

If you are not concerned with Corewar, you may be interested in VM in any way, and this article won't be Corewar specific.

In the end, you will be able to write:

RegisterOpcode<0x83, Instr<ADD, Register<A>, Register<E>>>();

Decoding will be in O(1), and efficiency of our code will good, since operands will be bounds with operation at compile-time. You kill one level of indirection.


Virtual Machine

A virtual machine is a program which simulates a processor and a memory. It does it the following way :

  1. Fetch instruction
  2. Decode
  3. Execute
  4. Restart

Each instruction is associated to an opcode in binary. This opcode contains the instruction, and information about operands (the value itself, or its adress into memory, or in which register it is).

All this pipeline is static. Instructions won't change etc. This is a perfect playground for Metaprogramming. Perfect. We handle instructions with operands from Register.


The memory is just an indexed linear array, starting at address 0 or 1 depending on your architecture. A word may be 8 bits, 16 bits, or whatever depending on your architecture.


Since I first developped this library to code a Gameboy emulator, we will handle registers, even if it doesn't improve performance. It will improve genericity of your code. It is composed by the following element:

  1. Some registers, indexed with a letter.
  2. A "flags" byte.
  3. A stack pointer (Word), which points to the top of the stack.
  4. A program counter (Word), wich points to the instruction currently executed.


Our instructions will be composed by : an operation, one or two operands. Each operand can be a register, an address in the memory, an immediate value located in the next byte(s) of the program.


This is really simple to write. In the source code of my original emulator, this is a little bit more complicated: devices (like graphic card, sound module etc) are mapped directly into the adress space:

#include <cstdint>

uint8_t ram_[RAM_SIZE];


No surprise here, the basic code is really straight forward

class CPU
        void Process();


        byte _regs[REG_NBR];
        byte _flags;
        word _sp;
        word _pc;
        Ram _addr;
        std::function<void(CPU*)> _instr[256];

But this class does nothing. We need to add more concepts. Something which could be noted is the _instr attribute: this is an array of functions returning nothing (if there is a return value, it will be stored into the memory or a register) and takes a context, a CPU*, the this pointer of our CPU.

Operands source

We want to index our registers with a good syntax, and at compile-time

enum class RegisterLetter {A = 0, B, C, D, E, F };

template <int Reg>
class Register
    static inline byte Get(CPU* proc)
        return proc->_regs[Reg];

    static inline void Set(CPU* proc, byte val)
        proc->_regs[Reg] = val;

Here is defined an important concept: the ValueInterface, and its two static methods, Get and Set. Great. now we can access our registers with compile-time adressing doing:

byte reg_a = Register<A>::Get(cpu);

Similarly, we have to write a wrapper around around RAM to match this interface too:

template <class Addr>
struct ToAddr
    static inline byte Get(CPU* p)
        return p->_addr.Get(Addr::Get(p));

    static inline void Set(CPU* p, byte val)
        p->_addr.Set(Addr::Get(p), val);

And, we have to be able to take operands from the next byte in the program code

struct NextByte
    static byte Get(CPU* p)
        return p->_addr.Get(p->_pc);

This code is not really surprising. We increment the program counter to the next byte, and fetch the value here.


Here starts the metaprogramming symphony. We need the general concept of an instruction, expressed with templates

template <
  template <class, class>
    class Action,
  class Op1,
  class Op2
class Instr
        static void Do(CPU* p)
            Action<Op1, Op2>::Do(p);

Okay, take a breath. We defined another concept: ExecutableInterface, caracterized with the presence of T::Do(CPU*). This class is the general interface of any instruction. Three template parameters:

  1. An operation: this is a template parameter, itself templated with two template parameters, the operands. Because we want to write ADD<Register<A>, Register<B>>::Do(cpu);
  2. The first operand<
  3. The second operand

This class just binds

Instr<ADD, Register<A>, Register<B>>::Do(cpu);


Add<Register<A>, Register<B>>::Do(cpu);

And after this wrapper, we have to define a real instruction:

template <class A, class B>
struct ADD
    static void Do(CPU* p)
        int res = B::Get(p) + A::Get(p);

        int flag = 0;
        flag = (res == 0 ? (1 << 7) : 0);
        flag += ((res >> 4) ? 1 << 5 : 0);
        if (std::is_same<decltype(A::Get(p)), byte>::value)
            flag += ((res >> 8) ? 1 << 4 : 0);
            flag += ((res >> 16) ? 1 << 4 : 0);
        CPU::Register<Z80::F>::Set(p, flag);
        A::Set(p, res);

This class represents the ADD asm instruction, totally with generic inputs. Template parameters A and B are just locations of operands. I let you as an exercise to write a lot of operations. A trickier part is the range of JP instructions. JP, JPZ, JPNZ etc. We first write a JP templated with a Test class (do we jump ?), the location where is the adress (a register ? a pointer in the memory ?) and a dummy parameter, just to match the two template parameter pattern we already defined:

template <class Test, class A, class>
struct JP_Impl
    static void Do(CPU* p)
        word jmp = A::Get(p);
        if (Test::Do(p))
            Z80::Register<CPU::PC>::Set(p, jmp-1);

... and some tests to work with

struct True
    static inline bool Do(CPU*)
        return true;

struct IfZ
    static inline bool Do(CPU* p)
        return CPU::Register<Flags>::Get(p) & 10000000_b;

... and define legal instructions from that with templated using from C++11

template <class A, class B>
using JR = JR_Impl<True, A, B>;

template <class A, class B>
using JRZ = JR_Impl<IfZ, A, B>;

Processing the whole thing

Okay, we did a lot of thing, but nothing works. We still have to associate our instructions with opcodes, and process the whole thing. The first goal will be reached with a simple function. We already added a _instr member in our CPU, now we will fill it.

// Register function
template <unsigned char Opcode, class Inst>
void CPU::RegisterOpcode()
    _instr[Opcode] = std::function<void(Z80*)>(&Inst::Do);

// Register some operations somewhere
    RegisterOpcode<0xBB, Instr<CP, Register<A>, Register<E>>>();
    RegisterOpcode<0xBC, Instr<CP, Register<A>, Register<H>>>();
    RegisterOpcode<0xBD, Instr<CP, Register&ltA>, Register<L>>>();
    RegisterOpcode<0xBE, Instr<CP, Register<A>, ToAddr<Register<HL>>>>();
    RegisterOpcode<0xBF, Instr<CP, Register<A>, Register<A>>>();
    RegisterOpcode<0xC0, Instr<RETNZ, void, void>>();
    RegisterOpcode<0xC1, Instr<POP, Register<BC>, void>>();
    RegisterOpcode<0xC2, Instr<JPNZ, NextWord, void>>();
    RegisterOpcode<0xC3, Instr<JP, NextWord, void>>();
    RegisterOpcode<0xC4, Instr<CALLNZ, NextWord, void>>();
    RegisterOpcode<0xC5, Instr<PUSH, Register<BC>, void>>();
    RegisterOpcode<0xC6, Instr<ADD, Register<A>, NextByte>>();

We take the pointer to function of our instruction, and associate it whith the opcode. And we need our main loop:

void CPU::Process()
    while (true)

And we are done !


That was not so hard, and we have a design easily extensible and efficient. Any questions and appreciations are welcome in the comments !