Carbon’s most exciting feature is its calling convention

29 Jul 2022 by Jonathan

Last week, Chandler Carruth announced Carbon, a potential C++ replacement they’ve been working on for the past two years. It has the usual cool features you expect from a modern language: useful generics, compile-time interfaces/traits/concepts, modules, etc. – but the thing I’m most excited about is a tiny detail about the way parameters are passed there.

It’s something I’ve been thinking about in the past myself, and to my knowledge it hasn’t been done in any low-level language before, but the concept has a lot of potential. Let me explain what I’m talking about.

Carbon’s parameter passing

By default, i.e. if you don’t write anything else, Carbon parameters are passed by the equivalent of a const T& in C++.

class Point
{
  var x: i64;
  var y: i64;
  var z: i64;
}

fn Print(p : Point);

struct Point
{
    std::uint64_t x, y, z;
};

void Print(const Point& p);

However – and this is the import part – the compiler is allowed to convert that to a T under the as-if rule.

fn Print(x : i32);

void Print(std::int32_t x);

… and so what? Why am I so excited about that?

Advantage #1: Performance

Passing things by const T& is always good, right? After all, you’re avoiding a copy!

While true, references are essentially pointers on the assembly level. This means that passing an argument by const T& sets a register to its address, which means

in the caller, the argument needs an address and must be stored in memory somewhere, and
in the callee, the parameter needs to load the value from memory when its read.

This is the only options for types that don’t fit in a register, or small types with non-trivial copy constructors, but it’s less ideal for trivially copyable types that do fit.

Compare the assembly between the add function that takes its arguments by const T&

[[gnu::noinline]] int add(const int& a, const int& b)
{
    return a + b;
}

int foo()
{
    return add(11, 42);
}

and the one that doesn’t

[[gnu::noinline]] int add(int a, int b)
{
    return a + b;
}

int foo()
{
    return add(11, 42);
}

All the memory stores and loads just disappear; you don’t want to be passing int’s by reference!

So it’s really nice that in Carbon you don’t need to think about it – the compiler will just do the correct thing for you. Furthermore, you can’t always do it manually.

Advantage #2: Optimal calling convention in generic code

Suppose we want to write a generic function print function in C++. The type can be arbitrarily large with an arbitrarily expensive copy constructor, so the you should use const T& in generic code.

template <typename T>
void Print(const T& obj);

However, this pessimizes the situation for small and cheap types, which is unfortunate. It’s also not something the compiler can fix with optimizations, because the function signature and calling convention is part of the – here comes our favorite three-letter acronym – ABI. At best, the compiler can inline it and elide the entire call.

There are ways to work around that problem, because of course there are, but it just works™ in Carbon, which is nice.

But the real reason I’m excited about the feature has nothing to do with eliding memory load/stores.

Advantage #3: Copies that aren’t copies

Note that the transformation the compiler can do isn’t quite the same as const T& -> T in C++ would do. The latter creates a copy of the argument: if needed, it will invoke the copy constructor and destructor.

In Carbon, this isn’t the case: the value is simply set to a register. As the called function does not call the destructor of the parameter, the caller does not need to call the copy constructor. This means that the optimization would even be valid for Carbon’s equivalent of std::unique_ptr. The caller simply sets a register to the underlying pointer value, and the callee can access it. No transfer of ownership happens here.

This isn’t something you can do in (standard) C++.

Advantage #4: Parameters without address

If you’ve been thinking about the consequences of that language feature, you might wonder about Carbon code like the following:

fn Print(p : Point)
{
    var ptr : Point* = &p;
    …
}

If the compiler decides to pass p in a register, you can’t create a pointer to it. So the code doesn’t compile - you must not take the address of a parameter (unless its declared using the var keyword).

Without additional annotation, parameters of a Carbon function do not expose their address to the compiler, as they might not have any. This is the real reason I’m so excited about that feature.

More precise escape analysis

Since a programmer can’t take the address of a parameter, escape analysis does not need to consider them. For example, in the following C++ code, what is returned by the function?

void take_by_ref(const int& i);

void do_sth();

int foo()
{
    int i = 0;
    take_by_ref(i);
    i = 11;
    do_sth();
    return i;
}

Well, 11 right?

However, the following is a valid implementation of take_by_ref() and do_sth():

int* ptr; // global variable

void take_by_ref(const int& i)
{
    // i wasn't const to begin with, so it's fine
    ptr = &const_cast<int&>(i);
}

void do_sth()
{
    *ptr = 42;
}

Suddenly, foo() returns 42 – and this was 100% valid. As such, the compiler has to separately reload the value stored in i before returning, it escapes.

In Carbon, this is impossible, take_by_ref() can’t sneakily store the address somewhere where it can come back to haunt you. As such, i will not escape and the compiler can optimize the function to return 11.

Explicit address syntax

Is the following C++ code okay?

class Widget
{
public:
    void DoSth(const std::string& str);
};

Widget Foo()
{
    Widget result;

    std::string str = "Hello!";
    result.DoSth(str);

    return result;
}

It depends.

Widget::DoSth() can get the address of the function-local string and store it somewhere. Then when its returned from the function, it contains a dangling pointer.

In Carbon, this is impossible – if widget wants to store a pointer, it needs to accept a pointer:

class Widget
{
    fn DoSth[addr me : Self*](str : String*);
}

Crucially, calling code then also needs to take the address:

fn Foo() -> Widget
{
    var result : Widget;

    var str : String = "Hello";
    result.DoSth(&str);

    return result;
}

The extra syntax in the call makes it really obvious that something problematic might be going on here.

For the same reason, the Google C++ style guide used to require pointers in C++ code in such situations. This has the unfortunate side-effect that you can pass nullptr to the parameter, so I’ve suggested in the past to use my type_safe::object_ref instead.

This situation also makes it clear that references aren’t simply non-null pointers, which is a common misconception. References and pointers have crucial differences.

Future language extensions

Disclaimer: I’m not a Carbon developer, I’m just someone with opinions.

In parameters, foo : T is a parameter whose address can’t be taken, and var foo : T is a parameter with an address. The same principle can also be applied to more situations. For example, consider the following classes:

class Birthday
{
    var year : i32;
    var month : i8;
    var day : i8;
}

class Person
{
    var birthday : Birthday;
    var number_of_children : i8;
}

I know, it’s silly. Bear with me.

Assuming Carbon follows the same rules for data layout, as C++ the size of Birthday is 8 bytes (4 bytes for year, 1 for month, 1 for day and 2 padding bytes at the end), and the size of Person is 12 bytes (8 bytes for Birthday, 1 byte for number_of_children, and 3 for padding).

A more optimal layout would eliminate Birthday and inline the members into Person:

class Person
{
    var birthday_year : i32;
    var birthday_month : i8;
    var birthday_day : i8;
    var number_of_children : i8;
}

Now, the size of Person is only 8 bytes because number_of_children can be stored in what were padding bytes before.

Is this an optimization the compiler could do?

Not really, because it needs to preserve a separate Birthday subobject: someone could take the address of the birthday member and pass it around.

While it could work here, because we’re just stuffing things into padding bytes; in general optimal layout might require splitting an existing subobject into two different parts or shuffling the members around differently. Then there simply exists no contiguous sequence of bytes that make up the member, so there no pointer to it can exist.

However, we could imagine member variables where you can’t take the address, signified by a lack of var:

class Person
{
    birthday : Birthday;
    number_of_children : i8;
}

Now the compiler is free to change the layout, inline struct members and shuffle them around. Note that taking the address of birthday.month (and the other members) is still fine: it’s been declared with var and its stored contiguously in memory – just not necessarily next to year and day. var and non-var members can be freely mixed.

Similarly, an optimization that transforms Array of Structs to Struct of Arrays is also invalid, as in the first layout you have each individual struct in one contiguous chunk of memory that have an address, but in the second the struct members have been split. If you have an array where you can’t take the address of elements however, this isn’t something you can observe.

Finally, extending it to local variables essentially enables the register keyword from C: local variables without an address that can safely live in registers. While it isn’t necessary for modern optimizers, it’s still less work if the compiler doesn’t need to consider them during escape analysis at all. More importantly, it documents intent to the reader.

Conclusion

Creating entities whose address can’t be taken is a simple feature with lots of potential. It enables many optimizations to change layout, as layout can’t be observed, it simplifies escape analysis and optimizes parameter passing.

It’s also not really a limitation in many cases: how often do you actually need to take the address of something? Marking those few situations with an extra keyword doesn’t cost you anything.

I really wish C++ had it as well, but it wouldn’t work with functions that take references, which makes them useless unless the language was designed around it from the start.

This is exactly where Carbon comes in.