Architecture

Parser Architecture

How Nova transforms tokens into an Abstract Syntax Tree, and the libraries we evaluated.

Library Comparison

We evaluated five main approaches for parsing in Rust:

Library	Type	Error Recovery	Learning Curve	Best For
Chumsky	Combinator	Excellent	Medium	Languages with good errors
LALRPOP	Generator (LR)	Basic	Medium	Traditional grammars
Pest	Generator (PEG)	Good	Easy	Quick prototyping
Winnow	Combinator	Manual	Medium	Performance-critical
Hand-rolled (Pratt)	Manual	Full control	Higher	Maximum flexibility

Option 1: Chumsky (Recommended for Most)

Chumsky is a parser combinator library designed for ergonomics and excellent error recovery. It produces beautiful error messages out of the box.

Pros

Best-in-class error recovery, declarative API, strong typing, active development

Cons

Compile times can be slow for complex grammars, learning curve for combinators

Example: Nova Expression Parser with Chumsky

use chumsky::prelude::*;

#[derive(Debug, Clone)]
pub enum Expr {
    Int(i64),
    Ident(String),
    Binary {
        op: BinOp,
        lhs: Box<Expr>,
        rhs: Box<Expr>,
    },
    Call {
        name: String,
        args: Vec<Expr>,
    },
    If {
        cond: Box<Expr>,
        then: Box<Expr>,
        else_: Option<Box<Expr>>,
    },
}

#[derive(Debug, Clone)]
pub enum BinOp { Add, Sub, Mul, Div, Eq, Ne, Lt, Gt }

fn parser<'a>() -> impl Parser<'a, &'a str, Expr, extra::Err<Rich<'a, char>>> {
    let ident = text::ident()
        .map(|s: &str| s.to_string());

    let int = text::int(10)
        .map(|s: &str| Expr::Int(s.parse().unwrap()));

    // Recursive expression parser
    recursive(|expr| {
        let atom = choice((
            int,
            ident.clone().map(Expr::Ident),
            expr.clone()
                .delimited_by(just('('), just(')'))
                .padded(),
        ));

        // Function calls
        let call = ident.clone()
            .then(
                expr.clone()
                    .separated_by(just(',').padded())
                    .collect()
                    .delimited_by(just('('), just(')'))
            )
            .map(|(name, args)| Expr::Call { name, args });

        let primary = choice((call, atom));

        // Binary operators with precedence
        let op = |c| just(c).padded();

        let product = primary.clone()
            .foldl(
                choice((
                    op('*').to(BinOp::Mul),
                    op('/').to(BinOp::Div),
                ))
                .then(primary)
                .repeated(),
                |lhs, (op, rhs)| Expr::Binary {
                    op,
                    lhs: Box::new(lhs),
                    rhs: Box::new(rhs),
                },
            );

        let sum = product.clone()
            .foldl(
                choice((
                    op('+').to(BinOp::Add),
                    op('-').to(BinOp::Sub),
                ))
                .then(product)
                .repeated(),
                |lhs, (op, rhs)| Expr::Binary {
                    op,
                    lhs: Box::new(lhs),
                    rhs: Box::new(rhs),
                },
            );

        sum
    })
}

Cargo.toml

[dependencies]
chumsky = "1.0.0-alpha.8"

Option 2: LALRPOP

LALRPOP is a parser generator that takes a grammar file and produces Rust code. It uses LR(1) parsing.

Pros

Generates typed AST directly, powerful for context-free grammars, macro-like syntax

Cons

Separate grammar file, built-in lexer is limited, requires build.rs setup

Example: Nova Grammar with LALRPOP

// nova.lalrpop
use crate::ast::{Expr, BinOp};

grammar;

pub Expr: Box<Expr> = {
    Expr ExprOp Factor => Box::new(Expr::Binary {
        lhs: <>,
        op: <>,
        rhs: <>,
    }),
    Factor,
};

ExprOp: BinOp = {
    "+" => BinOp::Add,
    "-" => BinOp::Sub,
};

Factor: Box<Expr> = {
    Factor FactorOp Term => Box::new(Expr::Binary {
        lhs: <>,
        op: <>,
        rhs: <>,
    }),
    Term,
};

FactorOp: BinOp = {
    "*" => BinOp::Mul,
    "/" => BinOp::Div,
};

Term: Box<Expr> = {
    Num => Box::new(Expr::Int(<>)),
    Ident => Box::new(Expr::Ident(<>.to_string())),
    "(" <Expr> ")",
};

Num: i64 = r"[0-9]+" => <>.parse().unwrap();
Ident: &'input str = r"[a-zA-Z_][a-zA-Z0-9_]*";

Option 3: Pratt Parsing (Hand-Rolled)

Pratt parsing (operator-precedence parsing) is a simple, elegant algorithm for parsing expressions with operators at different precedence levels. This is what Nova uses.

Pros

Simple to understand, easy to extend, full control over error messages, no dependencies

Cons

More code to write, must implement error recovery manually

Example: Pratt Parser Core

impl Parser<'_> {
    /// Parse expression with precedence
    fn parse_expr(&mut self, min_prec: u8) -> Result<Expr, ParseError> {
        // Parse the left-hand side (prefix expression)
        let mut lhs = self.parse_prefix()?;

        // Parse binary operators at or above min_prec
        while let Some((op, prec, assoc)) = self.peek_binop() {
            if prec < min_prec {
                break;
            }

            self.advance();  // Consume operator

            // Right associative: same precedence
            // Left associative: higher precedence
            let next_prec = if assoc == Assoc::Right { prec } else { prec + 1 };
            let rhs = self.parse_expr(next_prec)?;

            lhs = Expr::Binary {
                op,
                lhs: Box::new(lhs),
                rhs: Box::new(rhs),
                span: self.span_from(start),
            };
        }

        Ok(lhs)
    }

    /// Get binary operator info: (operator, precedence, associativity)
    fn peek_binop(&self) -> Option<(BinOp, u8, Assoc)> {
        let kind = self.current()?.kind();
        match kind {
            // Assignment (lowest, right assoc)
            TokenKind::Eq => Some((BinOp::Assign, 1, Assoc::Right)),

            // Logical OR
            TokenKind::PipePipe => Some((BinOp::Or, 2, Assoc::Left)),

            // Logical AND
            TokenKind::AmpAmp => Some((BinOp::And, 3, Assoc::Left)),

            // Equality
            TokenKind::EqEq => Some((BinOp::Eq, 4, Assoc::Left)),
            TokenKind::NotEq => Some((BinOp::Ne, 4, Assoc::Left)),

            // Comparison
            TokenKind::Lt => Some((BinOp::Lt, 5, Assoc::Left)),
            TokenKind::Gt => Some((BinOp::Gt, 5, Assoc::Left)),
            TokenKind::LtEq => Some((BinOp::Le, 5, Assoc::Left)),
            TokenKind::GtEq => Some((BinOp::Ge, 5, Assoc::Left)),

            // Additive
            TokenKind::Plus => Some((BinOp::Add, 6, Assoc::Left)),
            TokenKind::Minus => Some((BinOp::Sub, 6, Assoc::Left)),

            // Multiplicative (highest)
            TokenKind::Star => Some((BinOp::Mul, 7, Assoc::Left)),
            TokenKind::Slash => Some((BinOp::Div, 7, Assoc::Left)),
            TokenKind::Percent => Some((BinOp::Rem, 7, Assoc::Left)),

            _ => None,
        }
    }
}

Nova's Decision: Pratt Parsing

Nova uses hand-rolled Pratt parsing for:

Minimal dependencies: Bootstrap compiler is self-contained
Full control: Custom error recovery and reporting
Performance: No abstraction overhead
Security: Depth limits prevent stack overflow attacks

Security: Nesting Limits

const MAX_EXPR_DEPTH: usize = 64;
const MAX_BLOCK_DEPTH: usize = 64;

fn parse_expr_with_depth(&mut self, depth: usize) -> Result<Expr, ParseError> {
    if depth > MAX_EXPR_DEPTH {
        return Err(ParseError::NestingTooDeep {
            depth,
            max: MAX_EXPR_DEPTH,
            span: self.current_span(),
        });
    }
    // ... parse expression
}

Comparison: Error Messages

Chumsky produces the best error messages out of the box:

// Chumsky error output
Error: unexpected token '+', expected identifier
  --> src/main.nova:3:5
   |
 3 |     + invalid
   |     ^

Error: unclosed delimiter
  --> src/main.nova:1:10
   |
 1 | fn foo() {
   |          ^ this opening brace...
...
 5 |
   | ^ ...was never closed

Parser Architecture

Library Comparison

Option 1: Chumsky (Recommended for Most)

Pros

Cons

Example: Nova Expression Parser with Chumsky

Cargo.toml

Option 2: LALRPOP

Pros

Cons

Example: Nova Grammar with LALRPOP

Option 3: Pratt Parsing (Hand-Rolled)

Pros

Cons

Example: Pratt Parser Core

Nova's Decision: Pratt Parsing

Security: Nesting Limits

Comparison: Error Messages

References