Implementation Challenge: Lossless, compact parse tree with iterative traversal

My parser combinator library lexy was originally designed to parse some grammar into a user-defined data structure, comparable to Boost.Spirit. This is ideal for parsing simple “data” grammars like JSON or email addresses, and also works for parsing programming languages: simply parse into your AST. However, by design lexy::parse() will only forward data explicitly produced by the parsing combinators which does not include punctuation, comments, or whitespace.

Inspired by matklad’s blog post about modern parser generators, I’ve decided to add a way to retain all the information and produce a lossless parse tree by calling lexy::parse_as_tree(). This requires no changes to your existing grammar and simply switches the output. With that, I could also add an online playground that visualizes the parse tree of a given grammar on the given input.

Implementing the actual code that produces a parse tree during parsing wasn’t too hard – I’ve already had a handler that controls what happens during parsing to implement lexy::match() and lexy::validate(). The challenging part was the actual data structure for storing a parse tree: it should be memory-efficient, as it can be big, and users should be able to easily iterate over every node without requiring recursion.

» read more »
Jonathan

Trivially copyable does not mean trivially copy constructible

About a month ago, I got an interesting pull request for lexy, my new parser combinator library. It fixed a seemingly weird issue relating trivially copyable types and special member function of classes containing unions. While digging into it, I learned a lot about trivial special member functions and made a somewhat surprising realization:

Just because a class is std::is_trivially_copyable does not mean the class is actually std::is_trivially_copy_constructible or even std::is_copy_constructible: you can have classes that you can’t copy, but they’re still trivially copyable, and classes where the copy constructor can do arbitrary amounts of non-trivial work, but they’re nonetheless trivially copyable!

Let me explain.

» read more »
Jonathan

What is the unit of a text column number?

I’ve recently published my parsing combinator library lexy. One of the things it does is issue a lexy::error if the input does not match the grammar. This error has a .position() which gives you the position where the error occurred.

In order to keep the happy path fast, .position() is not something that is easy to use for end users: it is simply an iterator into the input range. This is no good to a human user who wants something like line and column number to easily locate the problematic input.

Converting an iterator into line/column location seems simple enough: set line = column = 1 and iterate over the entire input until you’ve reached the position of the iterator. Every time you see a newline, increment the line number and set the column number back to 1. Otherwise, the column is implemented every time you … see what exactly?

What exactly is a “column” of a text and how do I compute it?

» read more »
Jonathan

Tricks with Default Template Arguments

Just like regular function parameters, template parameters can also have default parameters. For class templates, this behaves mostly just like default function arguments: if you pass fewer template arguments than required, default template arguments are used to fill the remaining places. However, for function templates, it gets more complicated as template parameters for functions can be deduced by the normal function arguments. This leads to some interesting side-effects. In particular, default arguments of template parameters don’t need to be put at the end!

Let’s take a look at a couple of things we can do with default template arguments.

» read more »
Jonathan

constexpr is a Platform

Let me share a useful insight with you: constexpr is a platform.

Just like you write code that targets Windows or a microcontroller, you write code that targets compile-time execution. In both cases you restrict yourself to the subset of C++ that works on your target platform, use conditional compilation if your code needs to be portable, and execute it on the desired target platform. You can thus view constexpr as another platform you can target; it just so happens to be run by your compiler.

This insight can answer a lot of design questions surrounding constexpr.

» read more »
Jonathan