Move Semantics and Default Constructors – Rule of Six?

24 Aug 2016 by Jonathan

A really long time ago - over four weeks! - I wrote about move safety.

I really need to force myself into blogging on a schedule. Let’s say I’ll publish something at least every two weeks.

The post spawned a lot of discussion about whether you should rely on moved-from state behavior or make any guarantees. See the first half of this CppChat episode for more.

BTW: Thanks for the nice words, Jon! Really appreciate it.

But I’m not going to continue that discussion. Both sides have convincing arguments and I don’t really want to advocate for one side here.

Instead I’m going to talk about something else related to the C++ move semantics, that couldn’t fit into the original post: The relationship between a default constructor and move semantics.

C++98’s Rule of Three

In C++98 there was the rule of three: If you define a destructor/copy constructor/copy assignment operator, you also have to define the other two.

A class with a destructor usually needs to do some cleanup: your class owns some form of resource which needs to be freed. And if your class owns a resource, it also needs to take special care before copying it.

Now when you have a C++98 class with a destructor, you have two sane options for the copy constructor/assignment operator:

“Delete” it, disallow copying for your class.
Define it so that it performs a deep copy of the resource or some form of ref-counting.

So far, too simple.

C++11’s Rule of Five

C++11 added move semantics and thus the rule of three became the rule of five (destructor/copy constructor/copy assignment/move constructor/move assignment).

Move in general can be seen as an optimization of copy for those cases where you don’t need the original object anymore. Then you can just “copy” by stealing the resource of the original object - a move.

Furthermore move semantics allows move-only types. Most “resources” cannot be copied properly and if you disable copy you cannot return the resource holder from functions. But with move you solve both problems: Instead of copy you steal the resource and you can return from functions.

Move-only types are in my opinion the most useful feature move semantics gave us. But there is a subtle change in semantics when introducing move semantics.

A C++98 example

Consider a socket class that is a C++ wrapper around some C API for handling sockets. In C++98 it would look like this:

class socket
{
public:
    socket(…)
    : socket_handle_(open_socket(…))
    {
        if (!is_valid(socket_handle_))
            throw invalid_socket_exception(…);
    }

    ~socket()
    {
        close_socket(socket_handle_);
    }

    … 

private:
    socket(const socket&); // no definition
    socket& operator=(const socket&); // no definition

    native_handle socket_handle_;
};

Ah, the good old way to delete copy constructors, so nostalgic…

Actually, this was before my time and it just feels bad.

We have a constructor that opens a socket given some parameters and a destructor that closes the socket. Copy operations are “deleted” because there is simply no way to copy a socket.

Note that in order to prevent user errors, the socket is checked for validity in the constructor. Only a socket object with a valid handle can be created. The socket class is thus never “empty”, i.e. never stores an invalid socket handle, it always has well-defined semantics. If a user gets a socket object, it can always use it without any checks.

This is a nice feature of an API.

I’m ignoring the fact that the socket might become invalidated due to a later operation.

Migrating `socket` to C++11

Fast forward 13 years. socket has become wildly used throughout the code base, even though people always complain that you can’t return it from functions.

But thanks to C++11 there is a solution: move semantics! So one day a developer goes ahead and adds a move constructor and move assignment operator to the socket class. The implementation naturally invalidates the socket from the original object, so that only the new one will destroy it.

So… end of story?

No.

Adding the two move operations was a bad idea and is a breaking change. A breaking change of the worst kind: the code still compiles, the code even works - until someone writes code similar to the following:

socket my_socket(…);
…
socket your_socket(std::move(my_socket));
…
do_sth(my_socket);

We’re passing a moved-from socket to do_sth(). As said above: the moved-from socket has an invalid handle, this is just the nature of moved-from objects. do_sth() does not expect that you give it an invalid socket object and is not prepared to handle it - why would it? It wasn’t possible to get and invalid socket object until very recently - it had a never-empty guarantee.

Now you can argue that it is a bad idea to write such code and that you shouldn’t write such code.

And I would agree. This is bad idea.

But that is not the point. The point is that thanks to the introduction of move semantics, the entire semantics of the class has changed. There is now a hole in the abstraction. Previously it guaranteed that each and every object is in a valid, non-empty state. Now this guarantee is broken.

You could also argue that you simply “make it undefined behavior” to use the moved-from state. You could - but that doesn’t change the fact that an operation that could not fail before, now could fail.

The introduction of move operations has changed the semantics of the class and weakened its main guarantee. Now this is a bad idea.

Consequences of move semantics

Introducing move semantics to a class changes the semantics of this class. When before it modelled resource, now it models optional<resource>: sometimes there is no resource, it can be empty.

But not for every class the introduction of move operations change the semantics. Take std::vector, move operations are a really nice addition that leave the original object in a valid, but unspecified state - the basic move safety to keep the terminology introduced in the last post - that is most likely an empty container. Why is that so?

Simple: std::vector always modelled optional<resource>. The state without elements was always well-defined and part of the interface. Move semantics just added a different way of obtaining it, but didn’t introduce it.

Note: std::vector isn’t really the perfect example because it doesn’t specify that the moved-from state is equivalent to the empty state. But it does specify that it is in a valid, but unspecified state, which means - as explained in the last post, that you can call all functions with wide contracts. And that is exactly like the empty state.

Now we can finally come to the default constructor of the title.

The meaning of default constructors

A default constructor should initialize a type with an appropriate but valid default value. For classes that own resources, i.e. for class where move semantics make sense, this is usually the state where they don’t have a resource. This means: a resource-class with a default constructor always has to deal with the “empty” state, even without any move operations!

So if you have a resource class with a default constructor, you can introduce move semantics without weakening any guarantees. Furthermore, if you add move semantics, consider also making the interface “empty”-aware, i.e. checker functions and a default constructor.

Adding a default constructor/making the interface “empty”-state-aware simply makes it more obvious for the user of the class that there is an empty state and that you should handle it. A default constructor also gives the user the ability to explicitly put an object into the “empty” state.

This does not apply if your default constructor creates the resource just with some default arguments. C++ does not have the means to differentiate between a default constructor that creates using default arguments and a default constructor that does not create.

Why do you need to explicitly put an object into the “empty” state?

I’m all in for “define an object only if you can properly initialize it”, but there are some situations where you need it - mainly dealing with bad APIs.

I look at you std::istream::operator>>.

And because move semantics have already weakened the interface guarantee, there is no (additional) harm in the introduction of default constructor.

Conclusion

I’m not going to introduce it the Rule of Six: there are some cases where you don’t want to have a default constructor, there are no bad consequences when you don’t follow it. But I want to make you aware that move semantics allow the creation of an “empty” state. And if you already have an “empty” state, adopt your entire interface. And then I also recommend that you introduce a default constructor.

This entire problem only occurs because C++ has no destructive move: The compiler isn’t preventing you from re-using a moved-from object. And when the compiler isn’t preventing you, some user will one day (accidentally) do it. Advocating for treating the moved-from state as a completely invalid state doesn’t help, because that rule isn’t enforced.

So with move semantics you cannot really make a never-empty guarantee, which isn’t nice. But you can at least adopt your interface to show that it can be empty.