Value and Reference Semantics in Modern Programming Languages

Some relatively new programming languages give the developer clear ways to choose value or reference semantics with data type, which is unusual. This controls pass by value or by reference to and from functions. Languages have nearly always supported the distinction but it wasn’t always so obvious what was going on.

In programming languages When people talk about “semantics”, they’re talking about the precise meaning of various language constructs and syntax. In the case of value and reference semantics, what exactly happens when we “pass” something to a function? What does it mean to “assign” something to something else? Does the language “move stuff around” or make copies or what? You can get away without knowing the answer to those questions in all cases for quite a while as a new programmer. You can follow the logic and usually figure out what the result of a small bit of code should be.

Refresher on Conventional Value and Reference Semantics

Superficially pseudo-code like the following seems obvious:

    struct S { x: i64, expired: bool }

    function do_something(arg: S) -> S {
        arg.x = arg.x * 9;
        return arg;
    }    

But we know better. If the code behaves like Rust, calls to do_something() transfer ownership, copying the value in, and copying out (the compiler should be smart enough not to generate copying code in reality.)

If the language behaved like Java, it would be passing in a reference to an “S” type object, changing that same object and pointlessly returning it. In Rust we could pass references to an S type but we’d need to change the function signature to include &.

Reference Semantics

Java and Ruby use reference semantics. That is, the meaning of passing a variable to a function or returning a variable in Ruby or Java is that we’re passing a reference to an object. We’re not copying data, we’re not transfering ownership. There’s one object and code can read and write to that original object wherever it’s passed. This isn’t made explicit in the syntax of the language. Rather the syntax for passing and returning values means pass by reference as part of the language definition.

Value Semantics

C and Pascal on the other hand default to pass by value. Function arguments or returns copy all data. This ends up being problematic particularly in C when those copied values include memory addresses. To pass by reference you use the var parameter in Pascal or & in C. So the language has provisions for value vs. reference semantics in the part of the syntax where these actions take place.

Value and Reference Semantics as Types

C#, Crystal and Jakt provide different types to control whether value or reference semantics will be used. They copy struct types by value , and class types by reference. This moves the way you control value and reference passing from function calls and function definitions to type definitions. Crystal makes this super clear. The struct type inherits from Value and class inherits from Reference. In principle Crystal therefore supports , in a very OO manner, both value and reference semantics.

At first I found this strange and troubling since you wouldn’t know by reading a call to a function if it was getting a copy of a value or a reference; but thinking about it some more I favor this approach. Having only references (Ruby, Python, Java) isn’t great for some types of high-performance programs and not great for some aspects of safety. Needing to use pointers for pass by reference (most old-school languages like C, Pascal) is not so easy. Unless you create a copy constructor or overload assignment those old languages only make shallow copies with their pass by value semantics. The choice between struct and class lets you choose the right approach at a higher level in the design, and do it case-by-case.