Standardese Documentation Generator: Post Mortem and My Open-Source Future

05 Nov 2019 by Jonathan

Back in 2016, I started standardese, a C++ documentation generator. However, in the past two years I haven’t really worked on it.

Now, I can officially announce that I have abandoned the project and transferred ownership. This blog post explains why.

Motivation

For my first big project, foonathan/memory, I’ve used Doxygen to generate the documentation. However, C++ is tricky: what you write in your header file is not necessarily the interface you want to show in the documentation. It starts with small stuff: detail namespaces, unspecified return types you want to hide and private virtual functions you want to include. Then there are SFINAE template parameters that need to become proper requirements, typedefs that should conceptually create new types, hidden base classes that inject member functions to save code duplication, base classes for EBO that should disappear, function objects that should be documented as functions, concepts (not the C++20 feature) that need to be documented and linked, overload sets that need to be grouped together, etc. etc.

Not to mention the obvious: parsing C++ is hard, really hard, really, really hard.

No surprise then, Doxygen – at least the 2015 Doxygen – can’t handle it properly. For foonathan/memory, I’ve used the common workaround of defining a DOXYGEN macro when Doxygen parses the code and using conditional compilation to give it a different source code than the one the C++ compiler actually sees. This meant a couple of #ifdef DOXYGEN to include the interface description of concepts, #ifndef DOXYGEN to hide some stuff, and macros like FOONATHAN_EBO(foo) that expand to base classes unless Doxygen is active. Of course, this was annoying.

So during my final weeks in high school, I planned on writing a documentation generator that just “gets” C++. It should understand all that … stuff … we have to do, and document it accordingly; writing a documentation generator that generated a documentation style similar to the C++ standard, so with Effects: and Requires: and so on - standardese. I had a couple of months before I start university, so I just write something, right?

I had absolutely no idea what I was getting into.

Early prototypes

To my surprise (the “me” that is currently writing this blog post and looked it up), I published the first prototype in May 2016. Using libclang it could parse C++ header files, extract the documentation comments, and generate documentation. It lacked all the advanced stuff, so at this point it was just a Doxygen with fewer features, but in the following months, I added more and more features and better support. A lot of special commands were added to the documentation comments, it learned to blacklist entities, arbitrary Markdown in documentation comments, cross references, documentation for base classes and parameters of an entity, grouping for overload resolution sets, modules to categorize entities, hyper-links in synopsis and a small templating language to embed documentation in another file.

At Meeting C++ 2016, I gave a lightning talk showcasing standardese and I’ve used it for my type_safe library, released in October 2016. You can find some example documentation output generated by that version of standardese here. As 2016 ended, I had a nice documentation generator.

But the things I had to do to get there…

Parsing C++ is HARD

I’ve used libclang to do C++ parsing, which is probably the main reason I am now writing this blog post.

You see, libclang is the stable C interface to the APIs of the clang compiler. Clang gets C++, which makes it better than the regex stuff Doxygen does (again, 2015, might be different now), and a stable API is good, so all is good, right?

No, I should have used libTooling, the unstable C++ API directly, because libclang does not expose all the information I needed. For example, it does not tell whether something is noexcept, or conditionally noexcept, which I need to generate good documentation. To get the noexceptness, I had to parse the function myself. However, this isn’t so bad, libclang gives you the tokens of the declaration, so you just iterate over them and see whether there is a noexcept in there…

Enter: the preprocessor.

Functions that take or return noexcept function pointers are also … fun.

Sometimes a function is noexcept but the noexcept token is nowhere to be seen. Instead, the function declaration contains a macro that expands to noexcept! No problem, take the tokens of a function, feed them through the preprocessor, and check for noexcept.

I probably should have re-considered my position of picking libclang at this point, or maybe started to extend the API a bit. But alas, I was young (not that I’m old now…) and stubborn, so I continued adding workaround after workaround. I don’t get cv-qualifiers of member functions? No problem, check the tokens. Oh, what about override and final? No problem, check the tokens.

After a while, the majority of the standardese source code were some workarounds and ad-hoc implementations of a C++ parser. But it worked, and I had nicely decoupled it, so my parsing stuff gave me a class hierarchy representing a C++ entity I could visit and query all the information I needed.

Then I got lazy in type_safe and wrote the following:

TYPE_SAFE_DETAIL_MAKE_STRONG_TYPEDEF_OP(addition, +)
TYPE_SAFE_DETAIL_MAKE_STRONG_TYPEDEF_OP(subtraction, -)
TYPE_SAFE_DETAIL_MAKE_STRONG_TYPEDEF_OP(multiplication, *)
TYPE_SAFE_DETAIL_MAKE_STRONG_TYPEDEF_OP(division, /)
TYPE_SAFE_DETAIL_MAKE_STRONG_TYPEDEF_OP(modulo, %)

Yes, those are macros that generate a bunch of code. Code that needs to be documented…

This meant my approach of taking the tokens and preprocessing them didn’t work: the preprocessor macros themselves generated entire declarations. So I needed to preprocess everything first, and then pass it to libclang…

Around this time, I had enough of the mess my parsing code had become, and I did the worst thing you could do: I’ve started from scratch. I created a new project, for parsing C++ into an AST to get information for documentation generation, reflection, etc. And I actually was successful: cppast was the result. In a nutshell, it is a C++ API plus workarounds over libclang, but this time I made the smart choice of making libclang a complete implementation detail. A different parsing back-end can be added without affecting any users. Maybe one day I’ll actually use libTooling.

The rewritten parsing code was cleaner and more robust than the one standardese had, so naturally I wanted to use it in standardese. But changing code is hard, so I made the worst thing you could do, again: I’ve started from scratch, again.

The standardese develop branch

So, now it is mid-2017. I was in the middle of my university studies and started to re-implement a C++ documentation generator. The C++ parsing code was done, so I focused on parsing the comments itself. In order to support Markdown, I’ve originally passed the contents of the comments to cmark to get an AST of the comment. It can also write an AST in various formats, so I’ve also used it to generate HTML and Latex output of the documentation. However, it wasn’t quite perfect.

First, I’ve added special commands like \exclude, \effects and so on, which I needed to manually parse (sounds familiar?). Second, the output AST was limited to the kinds of stuff Markdown supports, so I could build emphasis and code blocks, but not, for example, code blocks, with hyperlinks. This meant I needed to fallback to pure HTML for that, which was less ideal.

In the rewritten standardese - work taking place on the develop branch now - I wanted to solve those problems. I could handle the comment parsing just like I did the C++ parsing: create a new project that decouples the workaround, have a new and better AST, etc. Luckily, I didn’t have to, because GitHub already did it for me! They’ve started using cmark as their Markdown parser for ReadMes and stuff, and ran into the same problem I did: they had extensions that needed parsing. So they’ve created a fork that allows users to register their own parsing extensions, which was exactly what I needed!

To improve the output, I’ve basically created my own Markup AST, designed to generate documentation, and wrote code to serialize it to various formats. This just worked and is still my favorite part of the standardese code base.

So, to recap: I parse C++ code with cppast, I parse the comments with cmark and my parsing extensions, then magic happens that builds my Markup AST, which I then serialize. That “magic” part needs to do all of the logic of ignoring some declarations, merging multiple others, and so on. The end result was a 1.5k line file, which was my least favorite part of the standardese code base.

There was also a bigger problem: re-implementing all that logic was work.

C++ is my hobby

I program in C++, because it’s fun (something is probably wrong with me).

I wrote some memory allocators, because its a design and implementation challenge. I wanted to share it with others, so I put it on GitHub.

I experimented with some type-safety stuff, because it explores the limits of the C++ type system. I wanted to share it with others, so I put it on GitHub.

I wrote a documentation generator, because I needed one and it is something different from the stuff I’ve done before. I wanted to share it with others, so I put it on GitHub.

But re-writing the documentation generator, to end up with something that I’ve already had? That is work, that’s not fun!

By now it’s 2018 and I didn’t really do much with standardese anymore, I did different things, things that were fun: I wrote a container library, a bit-field library, started a tokenizer, etc. It was fun writing them, unlike standardese, which was too much work.

C++ became work

I put all that stuff on GitHub, because I wanted to share it with others; maybe others find it useful. And they did: people are using my projects! type_safe has 50,000 clones in the past two weeks by 1000 people, memory 2000 by 300.

But they file issues, which I needed to fix, created pull requests, which I needed to merge. And I felt I had an obligation to implement some more of the feature ideas I had in mind. It felt an awful lot like work.

So in November 2017, I started a Patreon: if what I am doing is work, I might as well get paid! It also gave me more motivation to work on things, but standardese? I kept pushing that further and further away, doing fun stuff instead.

Taking a break

2019 came and with it my final months of my Bachelor studies. In February, my university workload spiked, and I had less time for my hobby/work. Since then, I haven’t written a blog post nor declared one of my “Productive Periods” on Patreon. I still programmed a bit, but fun, private stuff which I am not going to put on GitHub.

During that break, I thought about my open source stuff. I still want to write fun stuff, and I still want to share it. And for some reason, people really like some of the stuff and use it on production, which means I feel an obligation to maintain them. But I don’t want to turn a fun project into work!

So I came up with a plan.

The future

I have created a labeling system for my projects. The status of a project can be one of the following:

in-development: I am currently actively working on the project. Feel free to use it, but note that it could (drastically) change. On the up-side, you’ll get more features.
maintenance only: I will definitely review PRs and help with issues. If you request a feature, I will probably ask to make a PR. I fix bugs when I have time, but note that this project has become “work”, so without incentives I won’t work a lot on it.
experimental prototype: this project is a fun idea I had and wanted to try out. You should probably not be using this in production. In the future, I might work on it more and polish it.
abandoned: I don’t want to work on this project anymore. This happens when I burn out on an “in-development” project, but it is not finished enough to warrant a “maintenance only” project. If you want to continue working on it, contact me, and we can work something out.

The project label can be found on my projects page, and - thanks to a fancy Hugo and shields.io setup - as badge on the project readme. The idea is that people will probably not start to actually use something labeled as “experimental prototype”, which means I don’t need to maintain it, so it doesn’t become work.

Still, if you like what I’m doing, I’d love if you check out my support page. You can either donate money directly or become a Patreon supporter: I’ll charge you per “productive period”, so if I have a lot of university stuff to do, you don’t need to support me. After a productive period, I’ll write a detailed report like the current one of what I did, where you can get sneak peeks into upcoming projects, talks and blog posts.

And standardese?

standardese is one of the projects listed as “abandoned” on my website: working on it has become too much “work”. In addition, I don’t think it will ever be able to fulfill my original goal and become usable for me in the extent that I’ve hoped. So instead of finishing it and finally merging the develop branch, I’ve abandoned it. I probably will never work on it again.

But there is good news!

standardese is still a tool other people find useful, so I have given ownership to a GitHub organization consisting of multiple people. The new maintainers have already released a new version. They have full control over the project; I will only help them out if necessary.

I wish them good luck in improving the documentation generator, C++ really needs one!