malloc() and free() are a bad API

If you need to allocate dynamic memory in C, you use malloc() and free(). The API is very old, and while you might want to switch to a different implementation, be it jemalloc, tcmalloc, or mimalloc, they mostly copy the interface. It makes sense that they do that – they want to be a mostly drop-in replacement, but it’s still unfortunate because malloc() and free() are a bad API for memory allocation.

Let’s talk why.

The C allocation functions

malloc() and free() have a very simple interface: malloc() takes a size and returns a pointer to the allocated memory block of that size, free() takes a previously allocated pointer and frees it.

void* malloc(size_t size);

void free(void* ptr);

Then there is also calloc(), which allocates memory that has been zeroed. For whatever reason, it has a slightly different interface:

void* calloc(size_t num, size_t size);

Logically, it allocates num objects of size each, i.e. num * size bytes. It also does an overflow check for you, because why not.

Finally, there is realloc():

void* realloc(void* ptr, size_t new_size);

It attempts to grow or shrink a memory block to the new_size. This may or may not copy things around in memory, and returns the new starting address, or ptr unchanged if it was left in-place. Notably, malloc() can be implemented in terms of realloc():

void* malloc(size_t size)
{
    return realloc(NULl, size);
}

Seems straightforward enough, what’s the issue?

Problem #1: Alignment

Plain old malloc() does not allow specifying a custom alignment for the aligned memory. It returns memory that is suitable aligned for any object with fundamental alignment.

Want to allocate a SIMD vector or something aligned at a page boundary? It gets tricky:

constexpr auto page_size = 4096;

void* allocate_page_boundary(std::size_t size)
{
    // Allocate extra space to guarantee alignment.
    auto memory = std::malloc(page_size + size);

    // Align the starting address.
    auto address = reinterpret_cast<std::uintptr_t>(memory);
    auto misaligned = address & (page_size - 1);

    return static_cast<unsigned char*>(memory) + page_size - misaligned;
}

Of course, you can’t free the resulting address with std::free(), since it may point somewhere inside the allocated memory block. You have to remember the original address as well.

For example, you can store it in the alignment padding directly preceding the aligned address.

At least C11 has added aligned_alloc(), which then became part of C++17:

void* aligned_alloc(size_t alignment, size_t size);

This doesn’t help you with realloc() or calloc(), however.

Problem #2: Metadata storage

malloc() doesn’t directly go ahead and ask the OS for memory, that would be too slow. Instead there are various caches for memory blocks of varying sizes.

For example, a program often allocates 8 byte elements, so it might make sense to keep a list of 8 byte blocks. When you ask for 8 bytes, it simply returns one from the list:

void* malloc(size_t size)
{
    if (size == 8)
        return block_list_8_bytes.pop();

    
}

Then when we free an 8 byte memory block, it is added to the list instead:

void free(void* ptr)
{
    if (size_of_memory(ptr) == 8)
    {
        block_list_8_bytes.push(ptr);
        return;
    }

    
}

Of course this requires that the allocator knows the size of a memory block given its pointer. The only way to do that is to store some metadata about the allocator somewhere. This could be a global hash table that maps pointers to sizes, or extra metadata store directly in front of the address, as discussed in the over aligned example. In either case, it means that asking for 8 bytes of memory will not actually allocate 8 bytes of memory, but additional metadata as well.

This is especially wasteful because the user usually knows how big the memory block is it’s currently attempting to free!

template <typename T>
class dynamic_array
{
    T* ptr;
    std::size_t size;

public:
    explicit dynamic_array(T* ptr, std::size_t size)
    : ptr(static_cast<T*>(std::malloc(size * sizeof(T))))
    {}

    ~dynamic_array()
    {
         // call destructors

        // I know that I'm freeing size * sizeof(T) bytes!
        std::free(ptr);
    }
};

If free() took the size of the memory block as extra parameter, the implementation wouldn’t need to add extra metadata just for that.

Of course an implementation might choose to allocate extra metadata for performance reasons anyway. Currently it’s forced to do that.

Problem #3: Wasting space

Consider the implementation of std::vector<T>::push_back(). When there is no capacity to store an additional element it needs to reserve bigger memory and move everything over. To keep an amortized O(1) complexity, it grows the new memory by some factor:

void push_back(const T& obj)
{
    if (size() == capacity())
    {
        auto new_capacity = std::max(2 * capacity(), 1);
        auto new_memory = std::malloc(new_capacity * sizeof(T));

        
    }

    
}

This works, but can waste memory.

Suppose the implementation of std::malloc uses a cache of recently freed memory blocks. When attempting to allocate N blocks, it searches that cache for a block that is at least N bytes big. If it finds one (either the first one that fits, or the smallest one that fits, or …), returns it. In that case, the returned memory block might have space for more than N bytes!

This means that we ask for a memory with a capacity for e.g. 14 elements, but get a memory block with a capacity for 16 elements instead. But we don’t know that! We treat the block as if it has space for 14 elements only and trigger another unnecessary reallocation for the 15th element.

It would be great if std::malloc() could return how big the allocated memory block actually is, so we can leverage any extra space we might have gotten “for free”.

Problem #4: realloc()

realloc() attempts to grow a memory block in-place. If that’s not possible, it allocates a new one and copies the existing contents over. This is done as-if by std::memcpy().

This automatic copy is problematic.

For starters, it can’t be used with C++ objects that might want to invoke a move constructor. It also doesn’t work with C objects that have self-referential pointers such as a buffer containing a circular linked list.

This is a shame as realloc()’s ability to grow a memory block in-place is really useful and not achievable in any other way. Sadly, it can’t be used with e.g. std::vector.

A better interface

Let me propose a new interface that doesn’t have those shortcomings. It consists of three functions allocate(), deallocate(), and try_expand().

allocate() is the replacement for std::malloc(). It’s goal is to allocate a memory block for a given size and alignment. Crucially it returns not only a pointer to the allocated memory, but also the total size that is available for the user.

struct memory_block
{
    void* ptr;
    size_t size;
};

/// On success `result.ptr != NULL` and `result.size >= size`.
/// On failure, `result.ptr == NULL` and `result.size == 0`.
memory_block allocate(size_t size, size_t alignment);

This takes care of problem #1 and #3.

deallocate() is a replacement for std::free(). It takes a memory_block as well, in addition to the alignment that was used to request this block:

void deallocate(memory_block block, size_t alignment);

That way, we pass all information the caller has anyway to the allocator.

It might makes sense to relax the requirements for deallocate() a bit, so that the size of the block doesn’t need to be the exact value returned by allocate(), but some other value between the requested size and the actual size. For example, a std::vector<int> might request 12 bytes (3 ints), but gets 17 bytes (4 ints plus one spare). It would then call deallocate() with a size of 16 bytes, as it knows the capacity in number of ints.

Finally, try_expand() is a replacement for realloc(). Crucially, it will only attempt to expand the block in-place, and fail if that’s not possible.

/// If the block can be expanded in-place to `new_size`, returns true.
/// Otherwise, returns `false`.
bool try_expand(memory_block block, size_t new_size);

This solves problem #4 by making the caller responsible for copying the allocated memory if necessary.

C++ Solutions

C++’s operator new and operator delete, have inherited the same problems:

void* operator new(std::size_t size);
void operator delete(void* ptr);

// not pictured: dozens of other overloads

To its credit, it keeps making improvements.

C++17: Aligned allocation

C++17 adds an overload that accepts std::align_val_t, which allows specification of a custom alignment.

void* operator new(std::size_t size, std::align_val_t alignment);
void operator delete(void* ptr, std::align_val_t alignment);

The alignment is passed as std::align_val_t because a user might have overloaded operator new(std::size_t, std::size_t) already.

C++17: Sized deallocation

A user can actually define its own implementation of operator new/delete to control all memory allocations. This is then invoked by the compiler to allocate memory. Since C++17, the compiler will also attempt to invoke the following overloads:

void operator delete(void* ptr, std::size_t size);
void operator delete(void* ptr, std::size_t size, std::align_val_t alignment);

As the compiler knows the size of the objects its deallocating, it can pass that information to the function. If you’re writing a custom allocator implementation, you don’t need to worry about metadata.

Of course, this doesn’t help the default implementation using std::malloc and std::free.

C++23: Size feedback in std::allocator

C++23 has adopted P0401, which adds a new function to std::allocator:

template<class Pointer>
struct allocation_result
{
    Pointer ptr;
    size_t count;
};

class allocator
{
public:
     allocation_result<T*> allocate_at_least(size_t n);
};

The function does what it says: it allocates memory for at least n objects and returns the actual size of the memory available. This behaves like my proposed allocate() function.

The language side with changes for operator new as proposed by P0901 is still in the standardization process, and will hopefully come in C++26.

Conclusion

A good API requests all information it needs (duh) and returns as much information it can provide (law of useful return). malloc() and free() don’t follow those principles, which make them less useful as they could be.

It’s great to see that C++23 has finally fixed most of those shortcomings, at least on the library side. Of course, modern languages like Rust don’t make any of the mistakes in the first place.