Colin's Notes

What follows are a few sort of random observations on topics I’ve pondered while evaluating new languages and thinking about building my own language projects. They aren’t radical design choices or anything groundbreaking but they lend a language its feel for better or worse.

Mainstream PL not quite big ideas

Recently I watched an interview with Cris Lattner, the creator of Mojo, Swift and llvm. It’s fantastic, you should listen to it or watch it. He pointed out several choices for Mojo’s design that got me thinking about language design. Do you want to pick one “big idea” and do everything else utterly conventionally, or innovate everywhere you can as much as developers will tolerate. Developers can be very cranky. While putting most of your energy into the “big idea”, can you still choose a few really good modern language features to sweeten the deal?

Mojo makes a couple of relatively new improvements in its memory management but otherwise every aspect seems to come from other languages. Nothing wrong with that, – they combine to further a few goals for the language. These are what I’m calling “little big ideas.” As I’ve worked on my hobby languages I’ve had a couple of these. They are uninteresting academically but nice ergonomically.

(Mojo’s big idea is it’s compiler target of MLIR, not the language itself.)

On the other hand: The Go language presents a special counter example. It’s “big idea” was to make the simplest language that was practical to develop in. Even for the time it eschewed nearly all modern conveniences, except garbage collection and nice concurrency. Its maintainers only add new language features after a lot of deliberation and their changes have to fit within the current language – no breaking changes. The very thing that defines Go has repelled developers who otherwise like many aspects of Go. If only it had proper enums, or pattern matching, or immutable data or slightly terser error-handling. One nicety Go gives us is errors as return values. Seen as a step backward from exceptions at the time Go came out, the concensus now is probably that this was actually a forward thinking decision.

I’d argue the real “big idea” of Go ended up being its ecosystem of tools and standard library, not the language itself. Everything surrounding the language is very pleasant. The compiler is ultra-fast. Deployment is as easy as it gets. Formatting code is a non-issue.

Here are a number of not-quite big ideas a language could adopt. Go has none of these, Mojo does except the last one (enums) and I’m sure they’ll adopt some flavor of enums soon.

Single ownership

For Mojo, Lattner described how they wanted to realize benefits of functional languages in an imperative style language. One nice aspect of a functional language is it typically has immutable data structures. You never modify structures owned by other parts of a program because no data changes – you only make new data structures. The purity is nice, but at odds with how computers work. The performance isn’t great. Compilers will have to do a lot of optimizations to avoid so much copying. Imperative style code doesn’t do well with all immutable data.

Making the compiler aware of data “ownership” is a different way to avoid mutation unsafety. You can safely allow mutation of data if that change happens only when no other parts of a program use it. Rust does this with a “borrow checker” by analyzing the code to see if you alter a mutable value in an unsafe way. If data is provably unchanged it’s known to be safe. If the only change to data is when it has a single owner it’s safe. Rust has to use “lifetimes” to do this analysis. Usually they are determined automatically but programmers can attach lifetimes to data. Sometimes automatic lifetimes extend beyond what the programmer would want or expect, making the borrow checker more strict than necessary.

Mojo uses a different approach – it looks at the last use of a value and drops it after that. So lifetime analysis gets simpler and fewer programs have ownership errors. Like Rust, at least superficially, it uses “references” to pass data without handing off ownership. The compiler has to be sure no references exist where a value gets dropped. Lattner made a good case that his approach results in more efficient programs and a simpler compiler.

My own languages have used the tried and true garbage collection approach along with limited mutation. You don’t get the maximum performance but it’s a lot easier to build a compiler without complicated ownership analysis.

Value Semantics

Mojo uses pass-by-value by default. Internally the function “owns” the parameter value, simply because it’s a copy and so entirely safe to change. However if you want to use the function to effect a change outside the function you can, but you must explicitly declare that as ppart of the function’s definition on that parameter. This is a really good little idea.

One of my language projects took one more step in this direction: pass-by-value, pass-by-read-only-value or pass-by-mutable-value. Essentially if you’re passing by value but no changes to the value are needed you can just implement this as a reference, saving a copy. I used the val, var and cpy effects on the parameters, but val is a default so not needed to include in the syntax.

Few built in types – everything’s a library

I don’t fully understand all the implications of this but it feels right. Mojo allows you to make new struct or other types and implement operations on them. So for example one could make their own complex or quaternian or matrix type and it would act like a built-in numeric type in other languages. Also Mojo lets you implement operations differently for different architectures for maximum performance.

While too much operator overloading can be a problem for sure, freeing the developer from out of date old design decisions is great. For instance if you were stuck with small ints in an old C version but a new processor supports 128 bit ints and you want to do a lot of fast operations on UUID values you’re in trouble. That is, unless you’re prepared to update the compiler. Not so with Mojo.

I need to consider the implications some more. I understand the concept, but need to see how it looks in practice.

Comp time metaprogramming

This comes from Zig. The idea is, you use only one language to do metaprogramming rather than templating or strange extra functions or macros like with Rust or C++.

At compile time your “comp time” code chooses which code to compile – to essentially generate code to fit the call site, maybe based on architecture, maybe on a data type. In some ways this does the same thing as C++ templates but is more powerful. Since the metaprogramming is all in the main language debugging and error messages is straightforward in contrast to Rust macros or C++ templates.

Initialization is distinct from assignment

This comes along with allowing mutable or immutable variable declarations, but doesn’t have to. The idea is that initialization of variable values uses different syntax, and also is implemented differently so you can apply different analysis to initial values versus analyzing effects of assignment. It’s a small thing from the point of view of the programmer, but a really, really nice idea for a language designer.

In my language RCI, you declare variables like

var x = 9

And you assign a new value like:

x := 11

Syntactically this is nice, it’s easy to see the difference. := is mutation, and only mutation. = is an equality check or initial value declaration. More importantly however, the initialization and assignment work completely differently. Type solving can be done on the initializers and safety checks on the assignments. Plus, optimizations can be focused on the assignments which might work very differently to how the initialization does.

I need to understand more of how this distinction helps the design of Mojo to say anything Mojo specific.

Union types, Tagged Unions, Enums

This isn’t one specific language construct. There are different flavors.

On one end are re-assignable union types like in Crystal which give it the feel of a Ruby -like dynamic language. The compiler has to do some analysis to know the content type in different parts of the code. Superficially at least a Crystal union type variable looks like a Ruby variable. On the other end are Rust enumerated types (a “nominal heterogeneous disjoint union type.) You give variants names and associate any type of data with each name. You can destructure the values of the enum to get at the data – the structures can be complex.

Last I checked, Mojo doesn’t have support for any type of union or nominal enum, but proposals exist.

C has “union” types but they don’t capture the type of data they carry. They are “untagged.” From the runtime support in my RCI language:

typedef union {	
	rci_bool _boolean;	
	rci_number _number;
	rci_float _float;
	rci_integer _integer;
	rci_enumeration _enumeration;
	rci_object * _object;
} rci_data;

To know the type and correctly extract a value you have to “tag” the type:

typedef struct rci_value{
	rci_type type;	
	rci_data as;	
} rci_value;

(I took the idea for the ‘.as’ field name from the Crafting Interpreters book. You get to access values like data.as._integer. Pretty neat. You need to consult the type field next door to know which field to access.

if (value.type == _integer_) { ... do stuff }

Nothing stops you from assigning the wrong types to other things, or assigning the wrong types onto the union type value.

So this is kind of terrible and other languages do better than C.

Here’s an interesting description of the different sorts of union types Odin types .

The “Algol” style of tagged union would use the type name as the tag: You can only have one variant per type. This prevents you from using these types as “enumerations” as Rust / ML do but still gives you some polymorphism. You check the tag at runtime for pattern matching.

Rust lets you name the variants anything you want; with no interior type they function as simple enumerations (but much better type-checked than C enums.) In Rust you can associate complex structures with variants, like this classic way to describe expressions in a simple interpreter.

enum Operator {
	Plus, Minus, Times, Divide, Gt, Lt, Equal, NotEqual
}
enum Expression {
	Ternery { test: Box<Expression>, cond_true: Box<Expression>, cond_false: Box<Expression>},
	Binary { left: Box<Expression>, op: Operator, right: Box<Expression> },
	Unary { op: Operator, expr: Box<Expression>},
	Primary(Value),
	Call { func: String, args: Vec<Expression> },
	Variable(String),
}

With Algol style enums you could do something similar but first you’d need to define types for each of the above members.