Architecture

Parser Architecture

How Nova transforms tokens into an Abstract Syntax Tree, and the libraries we evaluated.

Library Comparison

We evaluated five main approaches for parsing in Rust:

Library Type Error Recovery Learning Curve Best For
Chumsky Combinator Excellent Medium Languages with good errors
LALRPOP Generator (LR) Basic Medium Traditional grammars
Pest Generator (PEG) Good Easy Quick prototyping
Winnow Combinator Manual Medium Performance-critical
Hand-rolled (Pratt) Manual Full control Higher Maximum flexibility

Option 1: Chumsky (Recommended for Most)

Chumsky is a parser combinator library designed for ergonomics and excellent error recovery. It produces beautiful error messages out of the box.

Pros

Best-in-class error recovery, declarative API, strong typing, active development

Cons

Compile times can be slow for complex grammars, learning curve for combinators

Example: Nova Expression Parser with Chumsky

use chumsky::prelude::*;

#[derive(Debug, Clone)]
pub enum Expr {
    Int(i64),
    Ident(String),
    Binary {
        op: BinOp,
        lhs: Box<Expr>,
        rhs: Box<Expr>,
    },
    Call {
        name: String,
        args: Vec<Expr>,
    },
    If {
        cond: Box<Expr>,
        then: Box<Expr>,
        else_: Option<Box<Expr>>,
    },
}

#[derive(Debug, Clone)]
pub enum BinOp { Add, Sub, Mul, Div, Eq, Ne, Lt, Gt }

fn parser<'a>() -> impl Parser<'a, &'a str, Expr, extra::Err<Rich<'a, char>>> {
    let ident = text::ident()
        .map(|s: &str| s.to_string());

    let int = text::int(10)
        .map(|s: &str| Expr::Int(s.parse().unwrap()));

    // Recursive expression parser
    recursive(|expr| {
        let atom = choice((
            int,
            ident.clone().map(Expr::Ident),
            expr.clone()
                .delimited_by(just('('), just(')'))
                .padded(),
        ));

        // Function calls
        let call = ident.clone()
            .then(
                expr.clone()
                    .separated_by(just(',').padded())
                    .collect()
                    .delimited_by(just('('), just(')'))
            )
            .map(|(name, args)| Expr::Call { name, args });

        let primary = choice((call, atom));

        // Binary operators with precedence
        let op = |c| just(c).padded();

        let product = primary.clone()
            .foldl(
                choice((
                    op('*').to(BinOp::Mul),
                    op('/').to(BinOp::Div),
                ))
                .then(primary)
                .repeated(),
                |lhs, (op, rhs)| Expr::Binary {
                    op,
                    lhs: Box::new(lhs),
                    rhs: Box::new(rhs),
                },
            );

        let sum = product.clone()
            .foldl(
                choice((
                    op('+').to(BinOp::Add),
                    op('-').to(BinOp::Sub),
                ))
                .then(product)
                .repeated(),
                |lhs, (op, rhs)| Expr::Binary {
                    op,
                    lhs: Box::new(lhs),
                    rhs: Box::new(rhs),
                },
            );

        sum
    })
}

Cargo.toml

[dependencies]
chumsky = "1.0.0-alpha.8"

Option 2: LALRPOP

LALRPOP is a parser generator that takes a grammar file and produces Rust code. It uses LR(1) parsing.

Pros

Generates typed AST directly, powerful for context-free grammars, macro-like syntax

Cons

Separate grammar file, built-in lexer is limited, requires build.rs setup

Example: Nova Grammar with LALRPOP

// nova.lalrpop
use crate::ast::{Expr, BinOp};

grammar;

pub Expr: Box<Expr> = {
    Expr ExprOp Factor => Box::new(Expr::Binary {
        lhs: <>,
        op: <>,
        rhs: <>,
    }),
    Factor,
};

ExprOp: BinOp = {
    "+" => BinOp::Add,
    "-" => BinOp::Sub,
};

Factor: Box<Expr> = {
    Factor FactorOp Term => Box::new(Expr::Binary {
        lhs: <>,
        op: <>,
        rhs: <>,
    }),
    Term,
};

FactorOp: BinOp = {
    "*" => BinOp::Mul,
    "/" => BinOp::Div,
};

Term: Box<Expr> = {
    Num => Box::new(Expr::Int(<>)),
    Ident => Box::new(Expr::Ident(<>.to_string())),
    "(" <Expr> ")",
};

Num: i64 = r"[0-9]+" => <>.parse().unwrap();
Ident: &'input str = r"[a-zA-Z_][a-zA-Z0-9_]*";

Option 3: Pratt Parsing (Hand-Rolled)

Pratt parsing (operator-precedence parsing) is a simple, elegant algorithm for parsing expressions with operators at different precedence levels. This is what Nova uses.

Pros

Simple to understand, easy to extend, full control over error messages, no dependencies

Cons

More code to write, must implement error recovery manually

Example: Pratt Parser Core

impl Parser<'_> {
    /// Parse expression with precedence
    fn parse_expr(&mut self, min_prec: u8) -> Result<Expr, ParseError> {
        // Parse the left-hand side (prefix expression)
        let mut lhs = self.parse_prefix()?;

        // Parse binary operators at or above min_prec
        while let Some((op, prec, assoc)) = self.peek_binop() {
            if prec < min_prec {
                break;
            }

            self.advance();  // Consume operator

            // Right associative: same precedence
            // Left associative: higher precedence
            let next_prec = if assoc == Assoc::Right { prec } else { prec + 1 };
            let rhs = self.parse_expr(next_prec)?;

            lhs = Expr::Binary {
                op,
                lhs: Box::new(lhs),
                rhs: Box::new(rhs),
                span: self.span_from(start),
            };
        }

        Ok(lhs)
    }

    /// Get binary operator info: (operator, precedence, associativity)
    fn peek_binop(&self) -> Option<(BinOp, u8, Assoc)> {
        let kind = self.current()?.kind();
        match kind {
            // Assignment (lowest, right assoc)
            TokenKind::Eq => Some((BinOp::Assign, 1, Assoc::Right)),

            // Logical OR
            TokenKind::PipePipe => Some((BinOp::Or, 2, Assoc::Left)),

            // Logical AND
            TokenKind::AmpAmp => Some((BinOp::And, 3, Assoc::Left)),

            // Equality
            TokenKind::EqEq => Some((BinOp::Eq, 4, Assoc::Left)),
            TokenKind::NotEq => Some((BinOp::Ne, 4, Assoc::Left)),

            // Comparison
            TokenKind::Lt => Some((BinOp::Lt, 5, Assoc::Left)),
            TokenKind::Gt => Some((BinOp::Gt, 5, Assoc::Left)),
            TokenKind::LtEq => Some((BinOp::Le, 5, Assoc::Left)),
            TokenKind::GtEq => Some((BinOp::Ge, 5, Assoc::Left)),

            // Additive
            TokenKind::Plus => Some((BinOp::Add, 6, Assoc::Left)),
            TokenKind::Minus => Some((BinOp::Sub, 6, Assoc::Left)),

            // Multiplicative (highest)
            TokenKind::Star => Some((BinOp::Mul, 7, Assoc::Left)),
            TokenKind::Slash => Some((BinOp::Div, 7, Assoc::Left)),
            TokenKind::Percent => Some((BinOp::Rem, 7, Assoc::Left)),

            _ => None,
        }
    }
}

Nova's Decision: Pratt Parsing

Nova uses hand-rolled Pratt parsing for:

Security: Nesting Limits

const MAX_EXPR_DEPTH: usize = 64;
const MAX_BLOCK_DEPTH: usize = 64;

fn parse_expr_with_depth(&mut self, depth: usize) -> Result<Expr, ParseError> {
    if depth > MAX_EXPR_DEPTH {
        return Err(ParseError::NestingTooDeep {
            depth,
            max: MAX_EXPR_DEPTH,
            span: self.current_span(),
        });
    }
    // ... parse expression
}

Comparison: Error Messages

Chumsky produces the best error messages out of the box:

// Chumsky error output
Error: unexpected token '+', expected identifier
  --> src/main.nova:3:5
   |
 3 |     + invalid
   |     ^

Error: unclosed delimiter
  --> src/main.nova:1:10
   |
 1 | fn foo() {
   |          ^ this opening brace...
...
 5 |
   | ^ ...was never closed

References