`saturating_add` vs. `saturating_int` – new function vs. new type?

30 Mar 2022 by Jonathan

Suppose you want to do integer arithmetic that saturates instead of overflowing. The built-in operator+ doesn’t behave that way, so you need to roll something yourself. Do you write a saturating_add() function or a new saturating_int type with overloaded operator+? What about atomic_load(x) vs. atomic<int> x? Or volatile_store(ptr, value) vs. volatile int*?

When should you provide functions that implement new behavior and when should you write a wrapper type? Let’s look at the pro and cons.

Writing a new function

If you want to have a saturating addition, just write saturating_add(int, int); to load something atomically, just write atomic_load(int*); to store something that isn’t optimized away, just write volatile_store(int*, int).

It’s a simple, straightforward solution, and for some of you the post can end here. However, it isn’t quite ideal.

Disadvantage #1: Can’t re-use existing names/operators

The following code computes something with overflowing (undefined) behavior:

int x = …;
int result = x * 42 + 11;

This is the same code, but using saturating behavior:

int x = …;
int result = saturating_add(saturating_mul(x, 42), 11);

Which version is more readable?

As operator* and operator+ already have meaning for ints, we can’t use them for saturating arithmetic, we have to use functions. This means we lose the nice operator syntax and instead have to figure out nested function calls.

The problem can be solved at a language level. For example, Swift has + which raises an error on overflow and &+ which wraps around on overflow. By defining new syntax, we don’t need to resort to function calls. Of course, this is inherently limiting to users that don’t work on the language itself, or it requires a language where you can define your own operators. But even Swift has no saturating operator and C++ doesn’t have anything at all.

If we instead decide to write a new saturating_int type, we can overload operator* and operator+ to implement the desired functionality,

struct saturating_int
{
    int value;

    explicit saturating_int(int v)
    : value(v) {}

    explicit operator int() const
    {
        return value;
    }

    friend saturating_int operator+(saturating_int lhs, saturating_int rhs);
    friend saturating_int operator*(saturating_int lhs, saturating_int rhs);
    …
};

then code that performs saturating arithmetic looks almost identical to regular code, we just need to change the types:

int x = …;
auto result = int(saturating_int(x) * 42 + 11);

Disadvantage #2: Can’t directly use generic code

This is really the same as the first disadvantage: as we have to invent a new name for the operation and can’t re-use the existing one, generic code doesn’t work out of the box. In C++, templates use duck-typing and they call operations based on syntax. If the syntax isn’t available or doesn’t do what we want, we can’t use them.

For example, using our saturating_add() function, we can’t use std::accumulate directly, as it calls operator+. Instead, we have to pass in a custom operation that calls saturating_add.

Disadvantage #3: Can’t enforce behavior

Suppose we want to control some sort of embedded peripheral (e.g. an LED) by writing to the special address 0xABCD. The following code is buggy:

const auto led = reinterpret_cast<unsigned char*>(0xABCD);
*led = 1; // turn it on
std::this_thread::sleep_for(std::chrono::seconds(1));
*led = 0; // turn it off

As the compiler can’t see anybody reading the 1 written to *led, it considers it a dead store that can be optimized away. The compiler has no idea that it has the additional side-effect of turning an LED on needs to be preserved!

The correct fix is to use a volatile store, which tells the compiler that it must not optimize the store away. Let’s suppose it is implemented by a hypothetical volatile_store() function:

const auto led = reinterpret_cast<unsigned char*>(0xABCD);
volatile_store(led, 1); // turn it on
std::this_thread::sleep_for(std::chrono::seconds(1));
volatile_store(led, 0); // turn it off

Now it works, but we have to manually remember to use volatile_store() as opposed to *led every time. If we forget, nobody reminds us.

In actual C++, where volatility is part of the pointer type, this isn’t an issue: once we create a volatile unsigned char*, all loads/stores are automatically volatile and we don’t need to remember it. By putting it in the type system, we can enforce the consistent use of a given behavior.

Disadvantage #4: Can’t store additional state

Suppose we want to write a generic function that can atomically load a value at a given memory address:

template <typename T>
T atomic_load(T* ptr);

On modern CPUs, implementing this function is straightforward if sizeof(T) <= 8. For sizeof(T) == 16, it becomes tricky, and for sizeof(T) == 1024, it is impossible, as there simply is no instruction that can load 1KiB of data atomically.

Yet std::atomic<T>::load() from the C++ standard library works for all T, as long as they’re trivially copyable. How do they manage that?

One possible implementation can look like this:

template <typename T>
class atomic
{
    T value;
    mutable std::mutex mutex;

public:
    T load() const
    {
        std::lock_guard<std::mutex> lock(mutex);
        return value;
    }
};

As they define a new type for atomic access, they can put additional members in there. In this case, a mutex to synchronize access. If all we have is a function that can’t change the type, this isn’t something we can do.

Writing a new type

So based on those disadvantages you decide to write a new type when you want to tweak the behavior. A saturating_int, a volatile_ptr, an atomic<T>. It’s a lot more boilerplate compared to the couple of free functions, but it’s worth it, as you have the beauty of existing operators, the flexibility of adding additional state if necessary, and the safety guarantees the type system gives you.

However, the new situation isn’t ideal either.

Disadvantage #1: Conversions everywhere

Suppose you want to do saturating arithmetic, but only sometimes, otherwise, you want overflow. As the behavior is provided by types, you need to change types to change the behavior:

int x = …;
saturating_int y = saturating_int(x) * 42;
int z = int(y) + 11;
saturating_int w = saturating_int(z) * 2;

For an int, this doesn’t really matter, the compiler will optimize them away. But for bigger types? All of those conversions can add up and the poor CPU needs to constantly move stuff around.

Disadvantage #2: Different types

A saturating_int is not an int. Sure, you can provide a conversion operator to make them related, but this doesn’t help in the case of std::vector<saturating_int> and std::vector<int>: they’re entirely unrelated types.

Remember how I complained about having to pass saturating_add to std::accumulate? Well, if you start with a std::vector<int> as opposed to std::vector<saturating_int> you’re still out of luck. You’re only option is to use C++20 ranges to provide a view that turns a std::vector<int> into a range of saturating_int. Or you just provide a custom operation.

A similar issue occurs when you decide to store a value somewhere. Do you store it as an int, as that’s what it is, or as a saturating_int as that’s how it’s used? The types are different, you have to pick one.

The fundamental issue

There is a fundamental issue trade-off here we have to make: logically, we want to provide behavior which is done by writing functions, but in the OOP model we need types to do it properly.

In C++, we always have this trade-off that we need to reason about. However, there are some hypothetical language changes that could be made to improve the situation.

Disclaimer: They aren’t serious proposals and don’t work with C++ for multiple reasons.

Solution #1: Distinguish between “layout” and “type”

Right now, int and saturating_int are different types even though for the CPU they’re essentially the same, only the function matters. So we can imagine that this underlying layout can be reasoned about in the language. C++20 already has the notion of “layout compatible types”, which matter for unions, let’s build on top of that.

We can imagine a layout_cast<T>(expr) operator that changes the type of an object while keeping the layout intact:

int x = …;
auto y = layout_cast<saturating_int>(x);

This generates no assembly instructions, as nothing changes for the CPU, and it logically ends the lifetime of x. y is now a new object that lives at the same address as x and stores the same bit pattern, but has a different type. The only effect is a different overload resolution for its operator+.

This can then also be extended to containers:

std::vector<int> x = …;
auto y = layout_cast<std::vector<saturating_int>>(x);

Again, logically there is no difference between a bunch of ints and a bunch of saturating_ints, so the CPU doesn’t need to do anything. Only the type has changed.

This allows us to change the behavior without affecting actual runtime performance.

Solution #2: Packaging behavior into a separate entity

Scala has an interesting take on the problem. Consider std::accumulate() again. It takes an additional operation that controls how “addition” is performed as well as the initial value. Mathematically, that is called a Monoid, it describes “addition” as well as the identity of “addition”. For int, that is operator+ and 0. However, it can also be operator* and 1. As such, std::accumulate() accepts the range of input as well as the Monoid to use.

In Scala, the Monoid can be passed in a special way, as an implicit parameter. Taken the example from their website it looks like so:

abstract class Monoid[A] {
  def add(x: A, y: A): A
  def unit: A
}

object ImplicitTest {
  implicit val stringMonoid: Monoid[String] = new Monoid[String] {
    def add(x: String, y: String): String = x concat y
    def unit: String = ""
  }

  implicit val intMonoid: Monoid[Int] = new Monoid[Int] {
    def add(x: Int, y: Int): Int = x + y
    def unit: Int = 0
  }

  def sum[A](xs: List[A])(implicit m: Monoid[A]): A =
    if (xs.isEmpty) m.unit
    else m.add(xs.head, sum(xs.tail))

  def main(args: Array[String]): Unit = {
    println(sum(List(1, 2, 3)))       // uses intMonoid implicitly
    println(sum(List("a", "b", "c"))) // uses stringMonoid implicitly
  }
}

We first define a Monoid as an interface that has addition and unit, we then implement it for strings and int, and write a generic function that sums a list. It accepts the Monoid as an implicit parameter which doesn’t need to be passed on the call site. Instead, the compiler will search for the closest implicit value and pass that in.

The same principle can be applied to our problem as well. For example, we can define overflowArithmetic and saturatingArithmetic and then use something to indicate which one we want. This would then change the lookup of operator+ and operator* in our algorithms accordingly.

Of course, this requires a way to easily specify a “compile-time interface”, like Rust has with traits. However, C++ decided against C++0x concepts, which makes it impossible to add something like that now.

Conclusion

Writing a new type to change the behavior is strictly more powerful than writing a new function. As such, in situations where you have to write a new type (e.g. std::atomic<T>), the choice is easy.

In all other cases, it is a trade-off.

Do you often need to mix different behaviors? Is it important that you can’t accidentally forgot the new behavior? If so, write a new type. Otherwise, write a function.

In an ideal world, where we have some way of decoupling layout from behavior, this wouldn’t be a problem. But we don’t have that, so we have to live with trade-offs. Of course, we can also provide both versions. This is what Rust does with wrapping_add and Wrapping<T>.