Sorry poi, not going to have any time to do that I'd like to, but it's just too much time to spend. It's a really cool concept though.
<highly-interesting-to-probably-only-me>
The only programming project I'll have time for in the next few months is my private experiment. I'm trying to make a performant regex engine that doesn't backtrack. It's kinda tricky making a structure that works well when you have all three of: atomic lookaheads, capturing groups+internal backreferences, repetition.
There's a really easy implementation of a non-backtracking implementation if you just want fail/pass criteria and a full match. It's simply an issue of building a continuation graph (could be a NFA or DFA even) for each consumed regex atom in a pattern. If you run out of continuations, there was no matching path and you do a failure exit. If the top continuation is a success, you do a success exit. If neither, you just continue evaluate the continuations for the next input character until you achieve one of those failure or success states.
Capturing matches requires actually remembering what path you took, because it's possible to have two different matches with different captures on the exact same string. Basically, you need to remember history per path and not just continuations per regex atom.
Backreferences makes it even trickier - you need to be able to rebuild the continuations graph based on the history on the current path - which means that you no longer can keep a single continuation graph. This leads to the need for several different continuation graphs based on which path you have taken earlier, but at the same time you're executing all paths at once since you only have one continuation per regex atom.
Atomic lookaheads - those makes it possibly to have a backreference to a capture occur BEFORE the actual capture. Easy to fix though - just unify the captures with the backreferences, so that you perform the capture on the earliest backreference and then treat the capture as if it had been a backreference instead. Which interferes nicely with having multiple paths... As if I didn't have enough of a headache already!
Repetitions aren't that much of a headache, they just allow captures to be reset to new values while still having matched another value earlier on in the input string. (With interesting effects on backreferences, especially if you consider my solution to atomic lookaheads.)
There's some minor headaches as well, such as dealing with Unicode in an appropriate manner.
</highly-interesting-to-probably-only-me>
In other words, I've got myself a nice large programming project already.
--
var Liorean = {
abode: "http://liorean.web-graphics.com/",
profile: "http://codingforums.com/member.php?u=5798"};