Lesson 2

What Is a Compiler?

The 10,000-foot view of how source code becomes something a computer can run.

20 min read Beginner friendly

The Simple Answer

A compiler is a program that translates code from one language to another.

That's it. Really.

When you write print("Hello") in Python, your computer doesn't understand that. It only understands machine code — ones and zeros. Something has to translate your human-readable code into machine-readable instructions.

Think of It Like Translation

You speak English. Your computer speaks Machine. A compiler is the translator in between. But unlike human translation, it's perfect — the meaning is preserved exactly, every single time.

The Compilation Pipeline

Every compiler follows roughly the same pipeline. The source code flows through a series of transformations, each one bringing it closer to machine code.

Source Code
Lexer
Parser
Type Checker
Codegen
Machine Code

Let's walk through each stage with a simple example. Say you write this Nova code:

fn add(a: Int, b: Int) -> Int {
    a + b
}

add(3, 5)
1

Lexer: Breaking Into Tokens

The lexer reads your source code character by character and groups them into tokens — the atomic units of your language.

// Your code becomes a stream of tokens:
[FN] [IDENT:"add"] [LPAREN] [IDENT:"a"] [COLON] [IDENT:"Int"]
[COMMA] [IDENT:"b"] [COLON] [IDENT:"Int"] [RPAREN] [ARROW]
[IDENT:"Int"] [LBRACE] [IDENT:"a"] [PLUS] [IDENT:"b"] [RBRACE]
[IDENT:"add"] [LPAREN] [NUMBER:3] [COMMA] [NUMBER:5] [RPAREN]

It doesn't understand what the code means yet. It just identifies the pieces.

2

Parser: Building a Tree

The parser takes the flat stream of tokens and builds a tree structure called an Abstract Syntax Tree (AST). This represents the structure of your code.

// Your code becomes a tree:
Program
├── FunctionDef
│   ├── name: "add"
│   ├── params: [(a, Int), (b, Int)]
│   ├── return_type: Int
│   └── body: BinaryExpr
│       ├── op: Add
│       ├── left: Var("a")
│       └── right: Var("b")
└── FunctionCall
    ├── name: "add"
    └── args: [Literal(3), Literal(5)]

Now we understand the grammar — what's a function, what's an expression, how they nest.

3

Type Checker: Ensuring Correctness

The type checker walks the AST and verifies that everything makes sense. Does a + b work if both are Ints? Is add(3, 5) calling the function with the right types?

// Type checking passes:
✓ add: (Int, Int) -> Int
✓ a: Int, b: Int
✓ a + b: Int (Int + Int = Int)
✓ add(3, 5): Int (call matches signature)

If you tried add("hello", 5), the type checker would catch it before the code ever runs.

4

Code Generation: Emit Machine Code

Finally, the code generator walks the type-checked AST and emits actual machine code (or in Nova's case, WebAssembly).

// WebAssembly output:
(func $add (param $a i32) (param $b i32) (result i32)
    local.get $a
    local.get $b
    i32.add)

// Calling add(3, 5):
i32.const 3
i32.const 5
call $add

This is what actually runs on your machine. The computer finally understands!

Compiler vs. Interpreter

You might wonder: "What about Python? It doesn't compile to machine code."

Good catch! There are actually two approaches:

The Reality Is Messier

Most modern languages use a hybrid approach. Python compiles to bytecode, then interprets it. JavaScript engines (V8) compile hot paths to machine code on the fly (JIT compilation). Nova compiles to WebAssembly, which is then compiled to native code by the browser or runtime.

Why This Matters for Nova

Nova adds extra stages to the pipeline that most languages skip:

Source
Lexer
Parser
Types
Verifier
Codegen
WASM

That Verifier stage is what makes Nova special. It uses an SMT solver (a kind of mathematical proof engine) to verify that your code satisfies its contracts.

If you write:

fn divide(a: Int, b: Int) -> Int
where
    requires b != 0
{
    a / b
}

Nova's verifier will prove that every call to divide has a non-zero second argument. Not by testing — by mathematical proof.

The Foundation: 5 Components

Every compiler needs at least these 5 fundamental components:

  1. Span — Where in the source code? (line, column, file)
  2. Token — What did we see? (keyword, number, identifier)
  3. Source — What is the source code? (file contents)
  4. Error — What went wrong? (diagnostics, messages)
  5. Lexer — How do we tokenize? (the first transformer)

Every production compiler (Rust, Go, Swift, TypeScript) has exactly these 5 components. We'll build each one in the coming lessons.

Key Takeaway

A compiler is just a series of transformations: text → tokens → tree → type-checked tree → machine code. Each stage gets us closer to something the computer understands, while catching more errors along the way.

What's Next?

Now you understand the big picture. In the next lesson, we'll zoom into the first stage: Lexers and Tokens. You'll see actual Rust code that tokenizes source text, and understand how the lexer identifies keywords, numbers, and symbols.