The last few years have brought us an explosion in the number of new systems languages under development. They’re mostly trying to find good balances between safety, performance and expressivity. In this post I’ll first outline briefly what’s meant by “safety” and a little on how today’s system languages try to achieve it.
Here are some notable new languages you could use today: Rust, Zig, Odin, Jakt, Hare
This next group ranges from somewhat to very experimental but can still be tried out: Vale, Austral, Myrddin, V, Lobster, Compis, Cone
These are the second-most recent generation of systems languages. They’re all a bit more on the applications development side, less focused on highly optimized memory management. They’re all well past 1.0 status and heavily used in real-world software: Nim, Crystal, D, Go
I plan a follow-up post with details and my thoughts on each. All can be used today but not all have gotten beyond an early experimental state.
What do the new systems languages bring to the table over older established languages? Primarily “safety” in broad terms and secondarily syntactic conveniences. Aside from advanced memory management, they mostly do not try to explore new programming language concepts. That said, compared to C, C++ and Fortran they have borrowed from functional languages innovations like sum types.
The new wave of systems languages also typically have much saner build systems and library / dependency management and IDE integration but that’s a topic for another post.
Safety includes: Avoiding memory leaks and double free errors, buffer overflows, minimizing chances for concurrency bugs, automatic resource management, and generally designing to avoid dangerous patterns that lead to incorrect output or program crashes. Safer languages will have fewer security vulnerabilities, but security is a separate issue from safety. Some additional measures are needed to enhance program security.
Research and experience has found shared, mutable state to be a locus of a lot of unsafe conditions:bugs or bugs waiting to happen after a slight code change. Therefore Rust and many other modern systems languages seek to minimize it through a number of techniques.
Shared mutable state may arise most commonly during concurrent execution. So, unless a language prohibits concurrency, it needs to limit shared mutable state. Doing so will make for generally safer programs anyway. Reducing shared, mutable state is a form of “temporal” memory safety.
Memory safety has both spatial and temporal dimensions. An example of a spatial safety feature is array bounds checking: You prevent the program from reading or writing memory beyond the end of an array for instance. Temporal safety relates to tracking use of memory over time as the program runs, ensuring areas of memory aren’t used or reused erroneously. The simplest example of a temporal memory bug is failing to initialize a variable before use later on.
Preventing uninitialized variables is a very easy problem to design around in a new language. Enforcing single ownership (C++ unique_ptr, or the more comprehensive borrow checker in Rust) are much more difficult to build and use.
Manual memory reclamation (free, delete) is also a source of temporal memory unsafety in the form of bugs or crashes. A garbage collector will completely remove this class of safety problems, at some expense in performance and memory use. Borrow checking + lifetimes or linear types are another way to relieve the programmer from manually freeing memory. They can result in faster programs compared to GC managed memory but are more difficult to use by the programmer, harder to build into a language and may slow down compile times.
Techniques to Achieve Safety
- Type-checking: The first step to achieve safety is simply a good type checker, so your program is more likely to be doing what you intended at a basic level. C++ and C, for instance, while statically typed don’t feature particularly strong type checking. Depending on which errors and warnings you’ve enabled you can easily miss mistakes due to typecasting and type coercion,that you’d never miss in Rust or even Free Pascal or Go.
- Immutable data by default: variables may be declared immutable by default, limiting incidental mutability. Later attempts to change the data will result in compiler errors.
- Pure functions: Eliminating state is a great way to make both automatic and manual program logic checking easier. Functional languages limit mutation and state. Pure functions are stateless and instead of altering data they return new data. Sometimes this approach sacrifices performance with excessive data copying. Clever compilers can minimize the amount of copying by tracking ownership of data…
- Enforced single ownership: Rust and Austral use this approach. Austral is particularly interesting for its use of linear types to manage resources including but not limited to memory. Rust employs a “borrow checker.” ensuring single ownership of memory – there’s only one owner at any one time, but values can be “borrowed” according to the rules: One borrower at a time if the value is mutable, more if it’s immutable. Memory is only freed after a value has no owner. (This seemingly simple rule is why Rust occasionally requires the explicit use of “lifetimes” to indicate how long a value “lives.”) Borrow-checking prevents memory leaks but more importantly highlights places where potentially unsafe and unintended behavior could happen.
- Tracing garbage collection: While less powerful than borrow-checking it is simple to use. Garbage collection manages memory behind the scenes, preventing memory leaks and memory freeing coding mistakes. It can’t help to manage non-memory resources though and it won’t on its own prevent shared state. The trade-off is less power and slightly less performance for simpler to read and write programs and faster compilation.
- Reference counting: GC can add unpredictability to run times as you don’t know when the GC cycle will occur. Reference counting can at least produce predictable run times (though not the fastest) but can’t neatly dispose of retain cycles (self referential or referencing each other.) Lobsters and Vale have techniques to get the best of reference counting and limited GC. Jakt uses reference counting with string and weak pointers to allow self-referential data structures. The experimental Vale language tries a mixed approach where non-deterministic GC is rarely needed but resources can be cleaned up without restrictive ownership constraints or manual memory management.
- Sum types: Also known as “tagged unions”, which is how they’re implemented often, these are values that can be one of a number of types, or a predefined type that’s a union of a number of types. Rust and Jakt (to name two) call these “enums” – they can be used as simple enumerations but can carry data with them and that data can vary in type for each enum variant. This is powerful for representing values or errors “type T or Error”.
- Pattern matching: This accounts for all possible states a value can take on so that parts of your program’s logic can be perfectly checked at compile time. Often you use sum types in conjunction with pattern matching where you match against every possible sub-type a sum type can have. You’ll find pattern matching in most modern languages from the last ten years.
With multicore processors becoming the norm modern languages must offer support for using them in a reasonably safe manner. Most of the new systems languages provide good support (compared to C anyhow.) Earlier I mentioned temporal and spatial safety. Concurrency is where temporal safety is really beneficial. The threads will run over time and we need ways of ensuring data isn’t invalid while any thread using it is running, and so forth. Rust is particularly good in this respect. From what I’ve read (haven’t tried it) Vale should be quite good as well.
The new systems languages mostly prioritize developer convenience over raw power. They pick advances from academic language research only where it doesn’t stray too far from conventional practices and it adds safety. They add advances from software engineering for better compilers and tooling. Rust is somewhat of an exception but even so, Rust isn’t all that hard to learn compared to Haskell. Rust trades some ease-of-use for a lot of safety.
In the next post I’ll list a lot of the specific differences and notable features of the new systems languages and give a few opinions.