It’s a python library that supplies a functionality very similar to the python re module (regular expressions) but for arbitrary sequences of objects instead of strings (sequences of characters).
In addition to that, it’s possible to match each object in a sequence with not only equality, but an arbitrary python function. For example, if you have a sequence of integers you can make a regular expression that asks for a even number followed by a prime number followed by a 3-divisible number.
This software was written by Rafael Carrascosa while working at Machinalis in the first months of 2012.
Contact: email@example.com or rafacarrascosa xyz gmail.com (replace ” xyz ” with “@”)
The syntax is a little bit different than python’s re, and similar to that of pyparsing, you have to more-or-less explicitly build the syntax tree of your regular expression. For instance:
"ab"` is `Literal("a") + Literal("b")` `"a*"` is `Star(Literal("a"))`
a = Literal("a") b = Literal("b") regex = Plus(a + b) | Star(b + b, greedy=False)
You can also assign a group to any sub-match and later on retrieve the matched content, for instance:
regex = Group(Plus(a + b), "foobar") | (b + b) m = match(regex, "abab") print m.span("foobar") # prints (0, 4)
For more, check out the examples in the examples folder.
At Machinalis we use REfO for applications similar to that in examples/words.py, check it out!
I use a Thompson-like virtual machine aproach, which ensures polynomial time worst-case complexity. See examples/poly_time.py for an example of this.
The implementation is heavily based on Russ Cox notes, see http://swtch.com/~rsc/regexp/regexp2.html for the source.
If you go to read the code, some glossary:
RE – regular expression
VM – virtual machine
- Epsilon transitions – All VM instructions that do not consume a symbol
or stop the thread (for example an Accept).