Problems yet to solve
Those damn attributes
In SAX, attributes are gone by the time you know what the element contains. Initially I handled this with a simple stack, but that wasn't enough to deal with recursive processing of element content turning into attributes. So...
SAX and objects
I've started building container objects - mini-DOM trees, perhaps - to temporarily hold information which arrives through SAX. I process the objects, and the objects send out SAX events when that processing is complete. I'm starting to ponder making this facility into a more general toolkit.
Matching issues - context, multiple parts, multiple matches
There are plenty of potential cases where a regular expression won't return the expected output, especially if users are perverse. There are ways to create more robust regular expressions, of course. Also, there may be times when certain matches are appropriate only in particular contexts. I haven't yet begun to address this, but may take a look at XPath.
Matching on markup AND lexical content
There may be times when the processor should test everything matching particular rules and apply transformation or annotation. I haven't set this up yet.
Previous Page <
> Next Page