ilusm.dev
Advanced / Internals

Advanced & Internals

The deep end: JIT tracing, bytecode format, machine code generation, schema system, capability model, FFI, and reflection. For contributors, embedders, and people who want to understand how ilusm works from the inside.

JIT & tracing

ilusm uses a tracing JIT: the interpreter runs normally until a loop or function call is executed frequently enough to cross a hotness threshold. At that point, the JIT records a trace - a linear sequence of operations - and compiles it to native code.

How tracing works

Interpreter executes bytecode, counts iterations
Hotness detector threshold: 1000 iterations by default
Trace recorder records ops as a linear IR
Optimizer constant folding, dead code, type specialization
Code generator mch module → x86-64 / ARM64
Native execution guard checks → fallback to interpreter on deopt

jit module API

use jit # Query JIT stats for a function stats = jit.stats("my_hot_function") prn("compilations: " + str(stats.compilations)) prn("trace length: " + str(stats.trace_len)) prn("deopt count: " + str(stats.deopts)) # Force compilation of a specific function jit.compile("my_hot_function") # Disable JIT for a function (useful for debugging) jit.disable("my_function") # Dump the compiled trace as pseudo-assembly jit.dump_trace("my_hot_function")

Tuning

OptionDefaultDescription
jit.set_threshold(n)1000Iterations before a trace is compiled
jit.set_max_trace(n)4096Maximum trace length (ops)
jit.set_deopt_limit(n)10Deopts before a trace is abandoned
jit.enable_type_spec(bool)trueEnable type specialization (faster but more deopts)

When the JIT helps and when it doesn't

Helps: tight numeric loops, string processing, list transforms over large collections, hot server request handlers.

Doesn't help: code that runs once, highly polymorphic code (many different value types in the same slot), code with frequent side effects that prevent optimization.

Bytecode - ILBC format

The mcde module encodes the compiler's instruction list into the ILBC binary format. ILBC is a compact, stack-based bytecode designed for fast loading and straightforward interpretation.

File format

ILBC file layout ───────────────────────────────────────────────────── offset size field ───────────────────────────────────────────────────── 0 4 magic: 0x494C4243 ("ILBC") 4 2 version major 6 2 version minor 8 4 flags (bit 0: debug info present) 12 4 entry point (function index) 16 4 constant pool offset 20 4 function table offset 24 4 debug info offset (0 if absent) 28 4 reserved ───────────────────────────────────────────────────── [constant pool] strings, numbers, symbols [function table] {name, arity, locals, code_offset, code_len} [instruction stream] flat array of variable-width instructions [debug info] {line_map, source_file, local_names}

Instruction set (selected opcodes)

OpcodeHexStack effectDescription
PUSH_CONST0x01( -- val)Push constant pool entry by index
PUSH_NIL0x02( -- nil)Push nil
PUSH_TRU0x03( -- tru)Push boolean true
PUSH_FLS0x04( -- fls)Push boolean false
LOAD_LOCAL0x10( -- val)Load local variable by slot index
STORE_LOCAL0x11(val -- )Store top of stack to local slot
LOAD_GLOBAL0x12( -- val)Load global by name index
STORE_GLOBAL0x13(val -- )Store to global
CALL0x20(fn a0..an -- ret)Call function with N args
CALL_TAIL0x21(fn a0..an -- ret)Tail call (reuses frame)
RET0x22(val -- )Return from function
JUMP0x30( -- )Unconditional jump by signed offset
JUMP_IF0x31(cond -- )Jump if truthy
JUMP_UNLESS0x32(cond -- )Jump if falsy
ADD0x40(a b -- a+b)Add (numbers or string concat)
SUB0x41(a b -- a-b)Subtract
MUL0x42(a b -- a*b)Multiply
DIV0x43(a b -- a/b)Divide (float result)
MOD0x44(a b -- a%b)Modulo
EQ0x50(a b -- bool)Structural equality
NEQ0x51(a b -- bool)Not equal
LT / GT / LE / GE0x52–55(a b -- bool)Comparisons
MAKE_LIST0x60(a0..an -- list)Construct list from N stack values
MAKE_OBJ0x61(k0 v0..kn vn -- obj)Construct object from N key-value pairs
GET_IDX0x62(obj key -- val)Index into list or object
SET_IDX0x63(obj key val -- obj)Set index (returns new object)
MAKE_CLOSURE0x70( -- fn)Create closure capturing upvalues
SYSCALL0xF0(args -- result)Host syscall by ID

Disassembling bytecode

$ ilusm disasm hello.ilbc ; hello.ilbc - ilusm bytecode v2.0 ; entry: fn#0 (main) fn#0 main (arity=0, locals=1) 0000 PUSH_CONST 0 ; "Hello, World!" 0003 LOAD_GLOBAL 1 ; prn 0006 CALL 1 ; 1 arg 0009 POP 000A PUSH_NIL 000B RET

Machine code - mch module

The mch module is ilusm's native code backend. It takes the JIT's optimized IR and emits machine code for x86-64 (Linux/macOS) and ARM64 (Apple Silicon, Raspberry Pi). It can also be used directly to write assembly-level programs in ilusm.

Architecture

JIT IRmch.lower()register allocatorcode emittermmap(EXEC)guard checks deopt stubs

mch API - direct use

use mch # Create a code buffer buf = mch.new_buf() # Emit x86-64 instructions mch.mov_rax_imm(buf, 42) # mov rax, 42 mch.ret(buf) # ret # Finalize and get a callable function fn = mch.finalize(buf) result = mch.call(fn) # returns 42 prn(str(result))

Supported targets

TargetStatusNotes
x86-64PartialSystem V ABI (Linux/macOS). Integer ops, basic control flow.
arm64StubAArch64 AAPCS. Planned for Apple Silicon + RPi.
wasm32StubWebAssembly binary format. Planned.

mcde - the encoder

mcde is the bytecode encoder (distinct from mch which generates native code). It sits at the end of the compile pipeline and serializes the instruction list into the ILBC binary format. You rarely interact with it directly - ilusm compile calls it for you.

use mcde # Encode an instruction list to ILBC bytes instrs = [ {op: "PUSH_CONST", arg: 0}, {op: "LOAD_GLOBAL", arg: 1}, {op: "CALL", arg: 1}, {op: "RET"} ] consts = ["Hello, World!", "prn"] bytecode = mcde.encode(instrs, consts, {entry: 0}) fs.write("out.ilbc", bytecode)

Schema system - sch module

The sch module provides runtime schema validation. Schemas describe the shape of data - types, required fields, constraints - and can validate, coerce, or generate sample data.

Defining schemas

use sch # Define a schema User = sch.obj({ id: sch.int({min: 1}), name: sch.str({min_len: 1, max_len: 100}), email: sch.str({pattern: ".+@.+\\..+"}), role: sch.one_of(["user", "admin", "moderator"]), age: sch.int({min: 0, max: 150, optional: tru}) }) # Validate data result = sch.validate(User, {id: 1, name: "Alice", email: "alice@example.com", role: "user"}) if result.ok: prn("valid") | prn("invalid: " + str(result.errors))

Schema composition

use sch # Base schemas Address = sch.obj({ street: sch.str({}), city: sch.str({}), zip: sch.str({pattern: "\\d{5}"}) }) # Extend a schema UserWithAddress = sch.extend(User, { address: Address }) # Union types StringOrInt = sch.union([sch.str({}), sch.int({})]) # List with item schema Tags = sch.list(sch.str({max_len: 32}), {max_len: 20})

sch API

FunctionDescription
sch.obj(fields)Object schema with named fields
sch.str(opts)String schema - min_len, max_len, pattern
sch.int(opts)Integer schema - min, max
sch.num(opts)Number schema - min, max
sch.bool()Boolean schema
sch.list(item, opts)List schema with item schema
sch.one_of(values)Enum - must be one of the listed values
sch.union(schemas)Union - must match one of the schemas
sch.extend(base, extra)Extend a schema with additional fields
sch.validate(schema, data)Validate data - returns {ok, errors}
sch.coerce(schema, data)Validate and coerce types (e.g. "42" → 42)
sch.sample(schema)Generate a sample value matching the schema
sch.to_json_schema(schema)Export as JSON Schema (draft-07)

Integration with ORM

use sch use orm # Define schema once, use for both validation and DB mapping UserSchema = sch.obj({ id: sch.int({min: 1}), name: sch.str({min_len: 1}), email: sch.str({}) }) # orm.model uses the schema for column types and validation UserModel = orm.model("users", UserSchema) # Insert validates against schema before writing orm.insert(db, UserModel, {name: "Alice", email: "alice@example.com"})

Capability model deep dive

ilusm's capability model has three enforcement layers. Understanding all three is important for building secure programs and for auditing whether a program's claimed capabilities match its actual behavior.

Layer 1 - Module-level declarations

Each stdlib module declares its required capabilities at the top of the .ilu file using a # @requires comment. The module loader checks these before executing the module body.

# lib/stdlib/net.ilu # @requires net.connect net.dns # This declaration means: if you use net, you must have # net.connect and net.dns in your active capability set. # The loader enforces this - the module body never runs otherwise.

Layer 2 - Syscall contract

The __sys_* host functions check the active capability set before executing. This is enforced in the runtime, not the kernel - it can catch violations before they reach the OS.

# How __sys_connect is guarded (pseudocode from the runtime): __sys_connect(host, port) = if !cap.has("net.connect"): err("capability denied: net.connect") # ... actual connect syscall ...

Layer 3 - Kernel enforcement (seccomp/pledge)

After sbx.pledge(), the kernel installs a BPF seccomp filter. Any syscall not in the whitelist causes SIGKILL - the process is killed immediately, with no chance for the program to catch or bypass it.

use sbx # After this call, the kernel enforces the filter. # Even a bug in the ilusm runtime cannot bypass it. sbx.pledge(["fs.read", "net.connect", "time", "rand"]) # The seccomp filter allows only: # read, write, open(O_RDONLY), stat, connect, getaddrinfo, # clock_gettime, getrandom, exit, sigreturn # Everything else: SIGKILL

Capability inheritance

Child processes spawned via proc.spawn inherit the parent's capability set by default. You can restrict them further:

use proc use sbx # Spawn a child with fewer capabilities than the parent child = proc.spawn("worker.ilu", { caps: ["fs.read", "time"] # subset of parent's caps })

Auditing a program's capabilities

$ ilusm --audit myprogram.ilu # Prints all capabilities required by the program and its imports: # net.connect (net.ilu line 3) # net.dns (net.ilu line 3) # fs.read (fs.ilu line 1) # time (tim.ilu line 1) # Recommended pledge: ["net.connect", "net.dns", "fs.read", "time"]

FFI & interop

The ffi module lets ilusm call functions in native shared libraries (.so, .dylib, .dll) and embed ilusm as a scripting engine inside C programs.

Capability required. ffi requires the ffi capability in your pledge. It is intentionally excluded from the default capability set because it bypasses all other safety guarantees.

Calling native libraries from ilusm

use ffi # Load a shared library libc = ffi.open("libc.so.6") # Declare a function signature # ffi.fn(lib, name, return_type, [arg_types]) strlen = ffi.fn(libc, "strlen", "i64", ["ptr"]) printf = ffi.fn(libc, "printf", "i32", ["ptr", "..."]) # Call it n = strlen(ffi.str("hello")) prn("strlen: " + str(n)) # 5 # Allocate memory buf = ffi.alloc(256) ffi.memcpy(buf, ffi.str("hello world"), 11) printf(ffi.str("from C: %s\n"), buf) ffi.free(buf)

Embedding ilusm in C

// C program embedding ilusm #include "ilusm.h" int main() { ilu_vm_t *vm = ilu_vm_new(); // Load and run a .ilbc file ilu_load_file(vm, "script.ilbc"); ilu_run(vm); // Call an ilusm function from C ilu_val_t result = ilu_call(vm, "my_function", ilu_str(vm, "hello"), ilu_int(vm, 42), ILU_END); printf("result: %s\n", ilu_to_str(vm, result)); ilu_vm_free(vm); return 0; }

ffi type mapping

ilusm typeC typeffi type string
num (integer)int64_t"i64"
num (float)double"f64"
strconst char*"ptr"
nilvoid"void"
ffi.alloc(n)void*"ptr"
ffi.struct({…})struct"struct"

Reflection & metaprogramming

The rfl module provides runtime introspection: inspect values, functions, and modules at runtime. Combined with ast and iluc, it enables full metaprogramming - programs that generate and evaluate other programs.

Value inspection

use rfl x = {name: "alice", age: 30} # Type information prn(rfl.type(x)) # "obj" prn(rfl.keys(x)) # ["name", "age"] prn(rfl.has(x, "name")) # tru # Function inspection add = \(a, b) a + b prn(rfl.arity(add)) # 2 prn(rfl.is_fn(add)) # tru prn(rfl.upvals(add)) # [] (no captured variables) # Module inspection use txt prn(rfl.exports(txt)) # ["upr", "lwr", "spl", "has", ...]

Dynamic dispatch

use rfl # Call a function by name (string) obj = { greet: \(name) "hello " + name, farewell: \(name) "goodbye " + name } method = "greet" result = rfl.call(obj[method], "alice") prn(result) # "hello alice" # Apply a function to a list of arguments args = ["alice", "bob"] rfl.apply(\(a, b) prn(a + " and " + b), args)

Code generation with ast + iluc

use ast use iluc use rfl # Build an AST node programmatically call_node = ast.call( ast.name("prn"), [ast.str("generated!")] ) # Compile the AST to bytecode bytecode = iluc.compile_ast(call_node) # Execute the compiled bytecode iluc.exec(bytecode) # prints: generated!

Macros via ast transformation

use ast use iluc # Define a macro: unless(cond, body) → if(!cond) body unless_macro(node) = if node.type != "call" or node.fn.name != "unless": node # not our macro - return unchanged | cond = node.args[0] body = node.args[1] # Transform to: if !cond: body ast.if(ast.not(cond), body, nil) # Register the macro transformer iluc.add_transform(unless_macro) # Now "unless" works as a keyword: # unless x > 10: # prn("x is small")

rfl API reference

FunctionDescription
rfl.type(v)Return type string: "str", "num", "bl", "nil", "lst", "obj", "fn"
rfl.keys(obj)List of keys in an object
rfl.has(obj, key)True if object has the key
rfl.arity(fn)Number of parameters a function takes
rfl.is_fn(v)True if value is callable
rfl.upvals(fn)List of captured variable names
rfl.exports(mod)List of exported names from a module
rfl.call(fn, arg)Call a function with a single argument
rfl.apply(fn, args)Call a function with a list of arguments
rfl.src(fn)Source location of a function (file, line)
rfl.eq(a, b)Deep structural equality check
rfl.clone(v)Deep clone of any value