A Simple Example

To get you started, here is a simple code snippet to be pre-processed. The lines starting with '|' (the pipe symbol) are for DynASM:

  if (ptr != NULL) {
    |  mov eax, foo+17
    |  mov edx, [eax+esi*2+0x20]
    |  add ebx, [ecx+bar(ptr, 9)]

After pre-processing you get:

  if (ptr != NULL) {
    dasm_put(Dst, 123, foo+17, bar(ptr, 9));

Note: yes, you usually get the assembler code as comments and proper CPP directives to match them up with the source. I've omitted them here for clarity. Oh and BTW: the pipe symbols probably line up much more nicely in your editor than in a browser.

Here 123 is an offset into the action list buffer that holds the partially specified machine code. Without going into too much detail, the embedded C library implements a tiny bytecode engine that takes the action list as input and outputs machine code. It basically copies machine code snippets from the action list and merges them with the arguments passed in by dasm_put().

The arguments can be any kind of C expressions. In practical use most of them evaluate to constants (e.g. structure offsets). Your C compiler should generate very compact code out of it.

The embedded C library knows only what's absolutely needed to generate proper machine code for the target CPU (e.g. variable displacement sizes, variable branch offset sizes and so on). It doesn't have a clue about other atrocities like x86 opcode encodings — and it doesn't need to. This dramatically reduces the minimum required code size to around 2K [sic!].

The action list buffer itself has a pretty compact encoding, too. E.g. the whole action list buffer for an early version of LuaJIT needs only around 3K.

Advanced Features

Here's a real-life example taken from LuaJIT that shows some advanced features like type maps, macros and how to access C structures:

|.type L,      lua_State,  esi  // L.
|.type BASE,   TValue,     ebx  // L->base.
|.type TOP,    TValue,     edi  // L->top.
|.type CI,     CallInfo,   ecx  // L->ci.
|.type LCL,    LClosure,   eax  // L->ci->func->value.
|.type UPVAL,  UpVal

|.macro copyslot, D, S, R1, R2, R3
|  mov R1, S.value;  mov R2,[1];  mov R3,
|  mov D.value, R1;  mov[1], R2;  mov, R3

|.macro copyslot, D, S;  copyslot D, S, ecx, edx, eax; .endmacro

|.macro getLCL, reg
||if (!J->pt->is_vararg) {
|  mov LCL:reg, BASE[-1].value
||} else {
|  mov CI, L->ci
|  mov TOP, CI->func
|  mov LCL:reg, TOP->value

|.macro getLCL;  getLCL eax; .endmacro


static void jit_op_getupval(jit_State *J, int dest, int uvidx)
  |  getLCL
  |  mov UPVAL:ecx, LCL->upvals[uvidx]
  |  mov TOP, UPVAL:ecx->v
  |  copyslot BASE[dest], TOP[0]

And here is the pre-processed output (stripped a bit for clarity):

#define Dt1(_V) (int)&(((lua_State *)0)_V)
static void jit_op_getupval(jit_State *J, int dest, int uvidx)
  if (!J->pt->is_vararg) {
    dasm_put(Dst, 1164, Dt2([-1].value));
  } else {
    dasm_put(Dst, 1168, Dt1(->ci), Dt4(->func), Dt3(->value));
  dasm_put(Dst, 1178, Dt5(->upvals[uvidx]), DtF(->v), Dt3([0].value),
           Dt3([0][1]), Dt3([0].tt), Dt2([dest].value),
           Dt2([dest][1]), Dt2([dest].tt));