How Nova transforms tokens into an Abstract Syntax Tree, and the libraries we evaluated.
We evaluated five main approaches for parsing in Rust:
| Library | Type | Error Recovery | Learning Curve | Best For |
|---|---|---|---|---|
| Chumsky | Combinator | Excellent | Medium | Languages with good errors |
| LALRPOP | Generator (LR) | Basic | Medium | Traditional grammars |
| Pest | Generator (PEG) | Good | Easy | Quick prototyping |
| Winnow | Combinator | Manual | Medium | Performance-critical |
| Hand-rolled (Pratt) | Manual | Full control | Higher | Maximum flexibility |
Chumsky is a parser combinator library designed for ergonomics and excellent error recovery. It produces beautiful error messages out of the box.
Best-in-class error recovery, declarative API, strong typing, active development
Compile times can be slow for complex grammars, learning curve for combinators
use chumsky::prelude::*;
#[derive(Debug, Clone)]
pub enum Expr {
Int(i64),
Ident(String),
Binary {
op: BinOp,
lhs: Box<Expr>,
rhs: Box<Expr>,
},
Call {
name: String,
args: Vec<Expr>,
},
If {
cond: Box<Expr>,
then: Box<Expr>,
else_: Option<Box<Expr>>,
},
}
#[derive(Debug, Clone)]
pub enum BinOp { Add, Sub, Mul, Div, Eq, Ne, Lt, Gt }
fn parser<'a>() -> impl Parser<'a, &'a str, Expr, extra::Err<Rich<'a, char>>> {
let ident = text::ident()
.map(|s: &str| s.to_string());
let int = text::int(10)
.map(|s: &str| Expr::Int(s.parse().unwrap()));
// Recursive expression parser
recursive(|expr| {
let atom = choice((
int,
ident.clone().map(Expr::Ident),
expr.clone()
.delimited_by(just('('), just(')'))
.padded(),
));
// Function calls
let call = ident.clone()
.then(
expr.clone()
.separated_by(just(',').padded())
.collect()
.delimited_by(just('('), just(')'))
)
.map(|(name, args)| Expr::Call { name, args });
let primary = choice((call, atom));
// Binary operators with precedence
let op = |c| just(c).padded();
let product = primary.clone()
.foldl(
choice((
op('*').to(BinOp::Mul),
op('/').to(BinOp::Div),
))
.then(primary)
.repeated(),
|lhs, (op, rhs)| Expr::Binary {
op,
lhs: Box::new(lhs),
rhs: Box::new(rhs),
},
);
let sum = product.clone()
.foldl(
choice((
op('+').to(BinOp::Add),
op('-').to(BinOp::Sub),
))
.then(product)
.repeated(),
|lhs, (op, rhs)| Expr::Binary {
op,
lhs: Box::new(lhs),
rhs: Box::new(rhs),
},
);
sum
})
}
[dependencies]
chumsky = "1.0.0-alpha.8"
LALRPOP is a parser generator that takes a grammar file and produces Rust code. It uses LR(1) parsing.
Generates typed AST directly, powerful for context-free grammars, macro-like syntax
Separate grammar file, built-in lexer is limited, requires build.rs setup
// nova.lalrpop
use crate::ast::{Expr, BinOp};
grammar;
pub Expr: Box<Expr> = {
Expr ExprOp Factor => Box::new(Expr::Binary {
lhs: <>,
op: <>,
rhs: <>,
}),
Factor,
};
ExprOp: BinOp = {
"+" => BinOp::Add,
"-" => BinOp::Sub,
};
Factor: Box<Expr> = {
Factor FactorOp Term => Box::new(Expr::Binary {
lhs: <>,
op: <>,
rhs: <>,
}),
Term,
};
FactorOp: BinOp = {
"*" => BinOp::Mul,
"/" => BinOp::Div,
};
Term: Box<Expr> = {
Num => Box::new(Expr::Int(<>)),
Ident => Box::new(Expr::Ident(<>.to_string())),
"(" <Expr> ")",
};
Num: i64 = r"[0-9]+" => <>.parse().unwrap();
Ident: &'input str = r"[a-zA-Z_][a-zA-Z0-9_]*";
Pratt parsing (operator-precedence parsing) is a simple, elegant algorithm for parsing expressions with operators at different precedence levels. This is what Nova uses.
Simple to understand, easy to extend, full control over error messages, no dependencies
More code to write, must implement error recovery manually
impl Parser<'_> {
/// Parse expression with precedence
fn parse_expr(&mut self, min_prec: u8) -> Result<Expr, ParseError> {
// Parse the left-hand side (prefix expression)
let mut lhs = self.parse_prefix()?;
// Parse binary operators at or above min_prec
while let Some((op, prec, assoc)) = self.peek_binop() {
if prec < min_prec {
break;
}
self.advance(); // Consume operator
// Right associative: same precedence
// Left associative: higher precedence
let next_prec = if assoc == Assoc::Right { prec } else { prec + 1 };
let rhs = self.parse_expr(next_prec)?;
lhs = Expr::Binary {
op,
lhs: Box::new(lhs),
rhs: Box::new(rhs),
span: self.span_from(start),
};
}
Ok(lhs)
}
/// Get binary operator info: (operator, precedence, associativity)
fn peek_binop(&self) -> Option<(BinOp, u8, Assoc)> {
let kind = self.current()?.kind();
match kind {
// Assignment (lowest, right assoc)
TokenKind::Eq => Some((BinOp::Assign, 1, Assoc::Right)),
// Logical OR
TokenKind::PipePipe => Some((BinOp::Or, 2, Assoc::Left)),
// Logical AND
TokenKind::AmpAmp => Some((BinOp::And, 3, Assoc::Left)),
// Equality
TokenKind::EqEq => Some((BinOp::Eq, 4, Assoc::Left)),
TokenKind::NotEq => Some((BinOp::Ne, 4, Assoc::Left)),
// Comparison
TokenKind::Lt => Some((BinOp::Lt, 5, Assoc::Left)),
TokenKind::Gt => Some((BinOp::Gt, 5, Assoc::Left)),
TokenKind::LtEq => Some((BinOp::Le, 5, Assoc::Left)),
TokenKind::GtEq => Some((BinOp::Ge, 5, Assoc::Left)),
// Additive
TokenKind::Plus => Some((BinOp::Add, 6, Assoc::Left)),
TokenKind::Minus => Some((BinOp::Sub, 6, Assoc::Left)),
// Multiplicative (highest)
TokenKind::Star => Some((BinOp::Mul, 7, Assoc::Left)),
TokenKind::Slash => Some((BinOp::Div, 7, Assoc::Left)),
TokenKind::Percent => Some((BinOp::Rem, 7, Assoc::Left)),
_ => None,
}
}
}
Nova uses hand-rolled Pratt parsing for:
const MAX_EXPR_DEPTH: usize = 64;
const MAX_BLOCK_DEPTH: usize = 64;
fn parse_expr_with_depth(&mut self, depth: usize) -> Result<Expr, ParseError> {
if depth > MAX_EXPR_DEPTH {
return Err(ParseError::NestingTooDeep {
depth,
max: MAX_EXPR_DEPTH,
span: self.current_span(),
});
}
// ... parse expression
}
Chumsky produces the best error messages out of the box:
// Chumsky error output
Error: unexpected token '+', expected identifier
--> src/main.nova:3:5
|
3 | + invalid
| ^
Error: unclosed delimiter
--> src/main.nova:1:10
|
1 | fn foo() {
| ^ this opening brace...
...
5 |
| ^ ...was never closed