Here’s a collection of resources for learning to create programming languages. I’m using Rust and some C so that’s the focus of the resources listed at the end. First, I’ll talk about Crafting Interpreters, a book that’s applicable no matter what you plan to use to build your language.
Crafting Interpreters A fun, practical introduction to writing interpreters.
The book is divided into two parts: A simple “tre-walk” interpreter and introduction to the “Lox” language being interpreted, and then the same language again with a bytecode compiler and vm. The two parts take different approaches to parsing. Part One demonstrates building a simple interpreter with Java and Part Two uses C. Even though you may not plan to use those languages yourself it’s very much worth a read. Actually, following along and implementing the projects with different languages may be an even better learning experience.
If you have taken compiler courses you’ll know a lot of the material especially from part one but it’s worth reading anyway – there are lots of notes on design decisions.
Reading and implementing the first half with Rust made me appreciate how easy Java makes some tasks: I followed along in Rust and certain parts were a lot more difficult than they were in the Java version since Java has a garbage collector and lets you cast to “Object”.
Part two uses C to implement a much faster version of the interpreter. It’s not just the use of C that makes the interpreter fast, but the approach of a bytecode compiler and interpreter plus some clever optimizations. I was familiar with a number of the techniques used but not the design of a bytecode vm. Also the parsing approach – Pratt parsers – was something I’d never done.
After building part one with Rust, I encountered this post as I began on the second project. My Experience Crafting an Interpreter in Rust It’s an amazing deep dive into implementing the Clox bytecode interpreter in Rust. TLDR: the Rust version needs a substantially different design than the straightforward safe Rust way if you want the vm to run at a reasonable clip. It’s a lot of work to get there.
This is where I departed from the book and started on my own project. I used parts of the lexing and parsing design from the book to build my language. It’s compiled and strongly, statically typed. I discarded the interpreter and added a symbol table and factored it in to the parser. For some of the runtime code emitted by the compiler (I’m targeting C99) I used some ideas from part two of the book, mostly for representing values and convenient macros. While the explanation of a garbage collector in the book is excellent, I used an off-the-shelf GC since the one in the book is meant to be part of the bytecode vm and I didn’t need the vm for my project.
Anyway this is all to say I found the book inspirational even though I didn’t strictly follow along with the book after part one.
It would be interesting to see if someone extends the Clox vm in the book to be a more full-featured bytecode interpreter and uses it for other language interpreters. This is Clox with some performance improvements.
C++ and Rust Code Generation and Interpretation
Originally I chose to build a compiler targeting C99 as an intermediate language since I already knew it and C compilers are available to target almost all platforms. Also, a C compiler will optimize your intermediate code quite well. While this has worked out nicely, by the time I got something working with C I realized it wouldn’t be much harder to emit some other language like a bytecode for some VM. There are code generation libraries to make that task easier so you don’t have to compose the text of the IL yourself.
Here are the best resources I found for learning LLVM and Cranelift and other related topics.
- A tutorial on implementing a calculation language with both llvm or a custom bytecode interpreter Create Your Own Programming Language in Rust
- tutorial on implementing a compiler in C++ targeting llvm. This is the “Kaleidoscope” teaching language. LLVM Tutorial LLVM is generally suited as a target of a statically typed language.
- The “Inkwell” library for llvm code generation in Rust. Includes some toy language examples. Inkwell
- A tutorial on the Cranelift code generator implementing a toy language. Cranelift is suited to support either dynamic or statically typed languages. Cranelift
- simple-jit The “simple-jit” Rust crate for Cranelift
- Calling Rust Functions shows how to register and use Rust functions from Cranelift generated code. If you want to build a standard library for your language in Rust you’ll need to be able to do this.
- zub is a good little high-performance VM written in Rust. Not currently being developed but a nice example. It could be interesting to pick this up and add to it.
For my current project my next steps will be to try to target Cranelift to get a real REPL running (and hopefully a pretty fast one.)