optional in Containers Ⅱ — Not All std::vector Usages Are The Same

17 Apr 2018 by Jonathan

Okay, so in the previous post I talked about putting optional<T> in container. I came to conclusions which I though were reasonable at the time, however, people — rightfully — pointed out some flaws in my argumentation.

As I was at ACCU last week, I wasn’t able to respond to them earlier (note to self: don’t publish and then fly away to a conference), so I’m doing that now. Let’s revisit my argumentations and see where I was wrong.

`std::optional<T>` vs. `std::variant<T, std::monostate>`

I argued that std::optional<T> and std::variant<T, std::monostate> fulfill the same purpose: Both represent a type that either stores a value of type T or none at all.

I still think this is valid. Of course — as someone on reddit pointed out — you wouldn’t want to actually use std::variant<T, std::monostate> in place of std::optional<T>: the interface is clumsier and it is simply more to type. But conceptually they are the same type.

I also argued that you shouldn’t use std::optional<T> (or std::variant<T, std::monostate>) if the empty type has a special semantic meaning like “id invalid”. Instead you should use std::variant<T, special_meaning>. I still think that following this advice can lead to cleaner code.

`std::optional<T>` in Sets

I said that you shouldn’t put std::optional<T> in a set, simply because it is somewhat pointless: You can only put a single empty optional in there anyway, and then you could also simply put nothing in there. So don’t use std::optional<T> in a set (or as a key type in a map).

If your algorithm works differently whether or not std::nullopt is in the set, you don’t actually mean std::nullopt, you mean special_meaning and want to store a std::variant.

Nobody seems to argue against that, so that advice is fine.

`std::optional<T>` in Maps

std::optional<T> as a key type in a map doesn’t make sense as argued above, so the only thing to look at is using std::optional<T> as a mapped type.

I said that a std::map<T, std::optional<U>> is a partial map: a key may or may not have a value. And if you need that, this is a fine abstraction.

However, a map of optionals is somewhat unwieldy: A potential lookup() function that returns an optional<mapped_type> leads to a nested optional, which is a little bit weird to use. A std::map<T, std::variant<U, no_value>> is a somewhat cleaner abstraction in my opinion.

But the best solution would be a partial_map<T, U> that supports it natively.

Not much objection there either, so let’s move to the main point of controversy:

`std::optional<T>` in Sequence Containers

I said that you don’t need to put std::nullopt in a sequence container: just put nothing in there instead.

And this is where many think I’m wrong. And I am — but my advice is still valid, just not for a “sequence container” per se.

Let me elaborate.

In a recent project I’m working on (just something fun for personal use) I’m using a lot of std::vector<T>. However, I’m not using them like you might want to use a std::vector<T>. In particular, I’m just using them as a place to stuff things in, and then I later need to do a range-based for over them:

std::vector<int> my_ints;
// fill container up with some integers

for (auto i : my_ints)
    do_sth(i);

// fill container up with some more ints

for (auto i : my_ints)
    do_sth_else(i);

I don’t really care about the interface that makes std::vector<T> special: I don’t need random access because asking for the i-th element doesn’t make any sense with my usage!

I don’t really care about order either: All I care about is whether or not I will process the element eventually if it is in there. This means that I would remove an element by swapping it with the last one and doing a pop_back(), which is O(1) compared to the usual O(n) of std::vector<T>::erase.

And for this kind of usage of std::vector<T> my advice is correct: I don’t need to store std::optional<T> in the container because I don’t need to process the std::nullopts. It leads to faster and more efficient code if I just store the Ts directly and nothing in case of a std::nullopt.

However, this is not the usual usage of std::vector<T>: Order usually matters — after all it is a sequence container. But I didn’t realize that my usage of std::vector<T> doesn’t match that usage, so I wrote that advice.

Bag of `T`

There’s something we can learn about this mistake: The need for a new container. A container that is like std::vector<T> but doesn’t provide ordering or an array access operator, it just has insert(element) and erase(iter), both are O(1).

Let’s call it bag<T> because it is just that: a bag where you put elements. A simple implementation on top of std::vector<T> can look like this:

template <typename T>
class bag
{
    std::vector<T> container_;

public:
    using value_type    = T;
    using iterator       = typename std::vector<T>::iterator;
    using const_iterator = typename std::vector<T>::const_iterator;

    //=== constructors/destructors ===//
    bag() = default;

    // other constructors, assignment if needed

    //=== access ===//
    iterator begin() noexcept
    {
        return container_.begin();
    }
    const_iterator begin() const noexcept
    {
        return container_.begin();
    }
    const_iterator cbegin() const noexcept
    {
        return container_.begin();
    }

    iterator end() noexcept
    {
        return container_.end();
    }
    const_iterator end() const noexcept
    {
        return container_.end();
    }
    const_iterator cend() const noexcept
    {
        return container_.end();
    }
    
    // note: no array access, front, back
    // maybe data() if necessary

    //=== capacity ===//
    bool empty() const noexcept
    {
        return container_.empty();
    }

    size_type size() const noexcept
    {
        return container_.size();
    }

    size_type capacity() const noexcept
    {
        return container_.capacity();
    }

    void reserve(size_type new_capacity)
    {
        container_.reserve(new_capacity);
    }

    void shrink_to_fit()
    {
        container_.shrink_to_fit();
    }

    //=== modifiers ===//
    template <typename... Args>
    void emplace(Args&&... args)
    {
        container_.emplace_back(std::forward<Args>(args)...);
    }

    void insert(const T& value)
    {
        emplace(value);
    }
    void insert(T&& value)
    {
        emplace(std::move(value));
    }

    // range insert if needed

    void clear() noexcept
    {
        container_.clear();
    }

    void erase(iterator iter) 
    {
        if (iter != std::prev(container_.end())
        {
            // swap with last element
            using std::swap;
            swap(*iter, container_.back());
        }
        container_.pop_back();
    }
    
    // range erase if needed
};

This is just a proof-of-concept implementation programmed in the editor. I am working on a library that contains a real implementation, expect it soon™.

Now, for this container it definitely doesn’t make sense to store optionals in there.

In the previous post I’ve also mentioned an optimization for std::vector<std::variant<T...>> that unwraps it into multiple std::vector<T>... internally. This is better for branch prediction and uses less memory. Of course, this optimization doesn’t make sense if you use std::vector<T> as a sequence container. But for bag it makes sense, and is in fact the main datastructure in my side project.

Why Bother At All?

Some of you also questioned why I was on such a crusade against std::optional<T> inside a container. The reason is simple: I had a similar design originally, realized its flaws and wanted to prevent others from doing the same. So I generalized and thought about other containers as well. What I didn’t realize at the time was that my use of std::vector was different than the normal use.

But I think this still lead to an interesting discovery: the need for a new container type, bag<T>.

This blog post was written for my old blog design and ported over. If there are any issues, please let me know.

std::optional<T> vs. std::variant<T, std::monostate>

std::optional<T> in Sets

std::optional<T> in Maps