com.simonstl.fragment
Class FragmentFilter

java.lang.Object
  |
  +--org.xml.sax.helpers.XMLFilterImpl
        |
        +--com.simonstl.fragment.FragmentFilter
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler, org.xml.sax.XMLFilter, org.xml.sax.XMLReader

public class FragmentFilter
extends org.xml.sax.helpers.XMLFilterImpl

This class provides a SAX 2.0 Filter which uses regular expressions to fragment particular elements into smaller labelled components. This class relies on the regular expression support built into the Apache Xerces parser, notably:

org.apache.xerces.utils.regex.RegularExpression
org.apache.xerces.utils.regex.Match

The RulesLoader class provides support for loading the rules files necessary to make the FragmentFilter work.

Version 0.07 abolishes the boolStack and bufferStack approach in favor of the DocComponent approach.

Version 0.06 adds support for DocComponent, which should eventually simplify the creation and processing of pieces smaller than elements.

Version 0.05 adds support for including characters outside of element content with the startChars() and endChars() methods.

Version 0.04 adds the ability to include the first match as one of the result elements and updated the retrieval of rules to permit FragmentRule to control more processing, notably repetition.

Version 0.03 adds the ability to skip fragments entirely by specifying an empty string for the nsURI, localName, and prefix. Nested regular expressions may produce multiple results.

The RulesLoader 0.02 supports a "skip" element which produces an element with those expectations to avoid a lot of extra typing.

Version:
0.07 4 July 2001
Author:
Simon St.Laurent

Constructor Summary
FragmentFilter()
          An empty constructor that requires the use of setParent before starting I don't think this works.
FragmentFilter(org.xml.sax.XMLReader parent)
          A constructor that takes the parser which will feed it SAX events
 
Method Summary
 void characters(char[] ch, int start, int len)
          If the current context isn't matched by any rules, this passes text through.
 void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
          At the end of the element, break up its content using the regex and report all of it to the recipient.
 void endFreeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
           
 void freeCharacters(char[] ch, int start, int len)
           
 FragmentRules getRules()
          in case you ever need to get the rules out of FragmentFilter
 void setRules(FragmentRules newRules)
          sets up the rules, read in from a config file by RulesLoader or otherwise concocted in Java
 void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
          Check at startElement for elements which should be fragmented.
protected  void startFreeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
           
 
Methods inherited from class org.xml.sax.helpers.XMLFilterImpl
endDocument, endPrefixMapping, error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, getProperty, ignorableWhitespace, notationDecl, parse, parse, processingInstruction, resolveEntity, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, setProperty, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FragmentFilter

public FragmentFilter()
An empty constructor that requires the use of setParent before starting I don't think this works.

FragmentFilter

public FragmentFilter(org.xml.sax.XMLReader parent)
A constructor that takes the parser which will feed it SAX events
Method Detail

setRules

public void setRules(FragmentRules newRules)
sets up the rules, read in from a config file by RulesLoader or otherwise concocted in Java
Parameters:
newRules - the set of rules the FragmentFilter will apply to content

getRules

public FragmentRules getRules()
in case you ever need to get the rules out of FragmentFilter

startElement

public void startElement(java.lang.String uri,
                         java.lang.String localName,
                         java.lang.String qName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Check at startElement for elements which should be fragmented. if the element's namespace URI and local name match a rule, text will be suppressed until the end element event.
Overrides:
startElement in class org.xml.sax.helpers.XMLFilterImpl

startFreeElement

protected void startFreeElement(java.lang.String uri,
                                java.lang.String localName,
                                java.lang.String qName,
                                org.xml.sax.Attributes atts)
                         throws org.xml.sax.SAXException

characters

public void characters(char[] ch,
                       int start,
                       int len)
                throws org.xml.sax.SAXException
If the current context isn't matched by any rules, this passes text through. Otherwise, it stores it in a buffer. The strange approach to the buffering is dictated by Xerces' passing enormously excessive character arrays.
Overrides:
characters in class org.xml.sax.helpers.XMLFilterImpl

freeCharacters

public void freeCharacters(char[] ch,
                           int start,
                           int len)
                    throws org.xml.sax.SAXException

endElement

public void endElement(java.lang.String uri,
                       java.lang.String localName,
                       java.lang.String qName)
                throws org.xml.sax.SAXException
At the end of the element, break up its content using the regex and report all of it to the recipient. Note: right now, this method is just calling this class's own methods. That puts a limit on recursive processing for fragmentation, and is a limitation I hope to remove in the near future. Also note that this approach only works for text stored atomically - don't try it on mixed content. Results will be ugly.
Overrides:
endElement in class org.xml.sax.helpers.XMLFilterImpl

endFreeElement

public void endFreeElement(java.lang.String uri,
                           java.lang.String localName,
                           java.lang.String qName)
                    throws org.xml.sax.SAXException