I’m currently working on lexy, a C++ parsing DSL library:
you describe how input should be parsed, and lexy generates code for it, taking care of error recovery, parse tree generation, and parse values.
Such parser generators are classified based on the expressiveness of the corresponding formal language.
For example, a strict regular expression can only parse regular languages, which is a strict subset of a deterministic context free language, and so on.
lexy, being essentially syntax sugar for a recursive descent parser with (manually specified!) arbitrary lookahead but no other state, falls in the latter category.
Parsers in that category are unable to parse context-sensitive languages such as XML with matching tags.
To handle them, I’ve added support for “context variables”: state that can be modified during parsing.
However, in a recent refactoring of the context variables implementation, I’ve accidentally removed a big limitation, which makes lexy Turing-complete:
the parser is thus able to do arbitrary computation while parsing the input.
TL;DR: I’ve written a lexy grammar that is able to execute, not just parse, a simple Turing-complete language.
» read more »