Standardese documentation generator version 0.1

09 Jun 2016 by Jonathan

A little over a month ago, I’ve released the first prototype of standardese. Now, it has finally reached version 0.1 - it took way longer than I thought.

Well, it always takes longer as the estimate.

It doesn’t bring many more features on first look, but massive parsing improvements.

foonathan/standardese is a C++ documentation tool that aims to be a replacement of Doxygen. It is highly WIP and currently doesn’t support many features. But it can already parse a wide set of C++ and generate basic documentation in the Markdown format.

Better Parsing

Even when writing the first prototype, I quickly ran into limitations of libclang.

While it is great for parsing C++ code it doesn’t expose all the information I need. For example, whether a constructor is explicit or what the expression inside a noexcept is. But when writing the documentation, I need this information.

So I needed to manually parse code to get all the information I want. In the prototype I’ve used clang_tokenize() and scanned for the tokens. This has a big limitation though: It doesn’t mix with macros very well.

For example, if you have a function signature like so:

void foo() FOONATHAN_NOEXCEPT;

It gives the token FOONATHAN_NOEXCEPT, not the token it expands to.

This is a huge problem because a lot of my functions use that macro to support some older compilers.

So I’ve tried to implement some manual macro expansions but it didn’t work very well for more complex macros expanding to multiple tokens. Now I could just say that using macros is bad and you should feel bad but I have a fundamental design philosophy at standardese: If your code compiles, it should parse it.

I thus needed a different solution. I decided to use Boost.Wave for tokenizing. Unlike libclang it does preprocess the source code before tokenizing. Now I only needed to register all macros and obtain the cursor extent to read the appropriate section of the source file.

Registering macros is easy: If you pass CXTranslationUnit_DetailedPreprocessingRecord, libclang will happily give you all CXCursor_MacroExpansions. Those are at the top of the file but that doesn’t matter. All the corresponding definitions just need to be registered at the per-source file preprocessing context and you can use them throughout.

Getting the source extent seemed easy but wasn’t quite. libclang provides a function clang_getCursorExtent() which returns the extent of a cursor. This can be mapped to the actual offset inside the file with a couple of functions: Two of those are clang_getSpellingLocation() and clang_getFileLocation(). They are practically the same but if referring to a macro expansion, the file location is the location of the expansion and the spelling location the location of the macro definition. In this case I want the definition, so I’ve used clang_getSpellingLocation().

But I ran into issues with it, so I’ve looked at the source code:

void clang_getSpellingLocation(CXSourceLocation location,
                               CXFile *file,
                               unsigned *line,
                               unsigned *column,
                               unsigned *offset) {
  ...
  
  const SourceManager &SM =
  *static_cast<const SourceManager*>(location.ptr_data[0]);
  // FIXME: This should call SourceManager::getSpellingLoc().
  SourceLocation SpellLoc = SM.getFileLoc(Loc);

  ....
}

https://github.com/llvm-mirror/clang/blob/4ab9d6e02b29c24ca44638cc61b52cde2df4a888/tools/libclang/CXSourceLocation.cpp#L312-L347 Yeah, well…

But even so this function seem to have some issues. For some cases the returned source range is too short, cutting essential parts, for example:

using foo = unsigned int;

This gave me using foo = unsigned. It lead to a couple of workarounds.

Sorry for this “rant” on libclang.

On a more positive note I’ve also added support for attributes. Well, not really “support”, they are just skipped in parsing.

I might store attributes of an entity somewhere but most of them are not important or will be supported by comment attributes. But I’m open for discussion on that.

More robust parsing

In the early prototype if the parser encountered something weird, an assertion will fail and crash everything. This is not a good way for error recovering.

Now if the parser encounters something weird, it will throw an exception. This exception will be caught in the top-level loop, the error will be logged and the next entity will be parsed. This means that all “bad” entities are simply ignored when parsing but everything else will be parsed.

For example, if you have a class which my parsing code doesn’t like for some reason, it (and all members) will be skipped and parsing continues after it.

Logging is done with the spdlog library. I really like it, it is easy to use, supports enough features for my needs (mainly debug levels to be fair) and uses fmt for formatting which is a big plus.

Compilation config

I’ve also added support for configuration of the compilation options. This is a really basic thing that was missing from the prototype.

You can either directly pass include directories and macro definitions to the command line or pass the directory where a compile_commands.json file is stored.

One problem with the latter approach is the following: Inside the JSON file are the compile commands for each source file but standardese only needs header files. Often there isn’t a one-to-one mapping between the two, so I cannot use the flags for one file.

Instead I needed to take all the flags from all translation units and pass it to libclang. This can have negative consequences if there are translation units from multiple “projects”.

To avoid that I’ve also added special support for CMake. If you call find_package(standardese), you’ll get a function standardese_generate(). This function creates a custom target which will generate the documentation for a given target. The compilation options can also be given directly to it which allows sharing variables for header files and include directories. All other options must be given through an external configuration file though.

See the README for more information on that.

Entity filtering

One of the more advanced features I’ve added is entity filtering, i.e. hiding entities from the documentation generation.

The API allows a much more powerful filtering but the tool has the sufficient options: you can either filter all entities with a given name or only namespaces. Also there is a flag whether to extract private entities (disabled by default) or whether a documentation comment is required (enabled by default).

But this filtering is quite smart. Take the following code:

namespace detail
{
	struct type {};
}

using type = detail::type;

If you filter the namespace detail, you’ll get the following synopsis for the alias:

using type = implementation-defined;

This works in most cases and I think is a really nice feature.

If you don’t extract private members, it also does more than just ignoring all private members: If you have private virtual functions, they are not filtered! This supports the non-virtual interface pattern.

And while filtered and private entities are hidden from the synopsis, entities without a documentation comment are still included, just not separately documented.

What’s more?

The list of changes in this update aren’t huge, so why did it take so long?

The answer is simple: I’ve did multiple refactoring and other internal changes which are not visible. The entire internal structure is different now and will allow me to handle other features much more easily.

For example now I can easily tackle the problem of entity linking, i.e. referring to other entities in the documentation. This will be one of the main features of the next version. Another one is entity synthesis, i.e. generating C++ source code entities from documentation comments.This is especially useful for things libclang doesn’t support like variable templates. But it will also allow some other cool features.

So stay tuned for standardese 0.2 which will (hopefully) not take so long. In the meantime: please take a look at standardese and test it in your projects. Also share it and spread the word!

This blog post was written for my old blog design and ported over. If there are any issues, please let me know.