com.simonstl.fragment
Class FragmentFilter

java.lang.Object
  |
  +--org.xml.sax.helpers.XMLFilterImpl
        |
        +--com.simonstl.fragment.FragmentFilter
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler, org.xml.sax.XMLFilter, org.xml.sax.XMLReader

public class FragmentFilter
extends org.xml.sax.helpers.XMLFilterImpl

This class provides a SAX 2.0 Filter which uses regular expressions to fragment particular elements into smaller labelled components. This class relies on the regular expression support built into the Apache Xerces parser, notably: org.apache.xerces.utils.regex.RegularExpression org.apache.xerces.utils.regex.Match The RulesLoader class provides support for loading the rules files necessary to make the FragmentFilter work.


Constructor Summary
FragmentFilter()
          An empty constructor that requires the use of setParent before starting I don't think this works.
FragmentFilter(org.xml.sax.XMLReader parent)
          A constructor that takes the parser which will feed it SAX events
 
Method Summary
 void characters(char[] ch, int start, int len)
          If the current context isn't matched by any rules, this passes text through.
 void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
          At the end of the element, break up its content using the regex and report all of it to the recipient.
 FragmentRules getRules()
          in case you ever need to get the rules out of FragmentFilter
 void setRules(FragmentRules newRules)
          sets up the rules, read in from a config file by RulesLoader or otherwise concocted in Java
 void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
          Check at startElement for elements which should be fragmented.
 
Methods inherited from class org.xml.sax.helpers.XMLFilterImpl
endDocument, endPrefixMapping, error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, getProperty, ignorableWhitespace, notationDecl, parse, parse, processingInstruction, resolveEntity, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, setProperty, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FragmentFilter

public FragmentFilter()
An empty constructor that requires the use of setParent before starting I don't think this works.

FragmentFilter

public FragmentFilter(org.xml.sax.XMLReader parent)
A constructor that takes the parser which will feed it SAX events
Method Detail

setRules

public void setRules(FragmentRules newRules)
sets up the rules, read in from a config file by RulesLoader or otherwise concocted in Java
Parameters:
newRules - the set of rules the FragmentFilter will apply to content

getRules

public FragmentRules getRules()
in case you ever need to get the rules out of FragmentFilter

startElement

public void startElement(java.lang.String uri,
                         java.lang.String localName,
                         java.lang.String qName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Check at startElement for elements which should be fragmented. if the element's namespace URI and local name match a rule, text will be suppressed until the end element event.
Overrides:
startElement in class org.xml.sax.helpers.XMLFilterImpl

characters

public void characters(char[] ch,
                       int start,
                       int len)
                throws org.xml.sax.SAXException
If the current context isn't matched by any rules, this passes text through. Otherwise, it stores it in a buffer. The strange approach to the buffering is dictated by Xerces' passing enormously excessive character arrays.
Overrides:
characters in class org.xml.sax.helpers.XMLFilterImpl

endElement

public void endElement(java.lang.String uri,
                       java.lang.String localName,
                       java.lang.String qName)
                throws org.xml.sax.SAXException
At the end of the element, break up its content using the regex and report all of it to the recipient. Note: right now, this method is just calling this class's own methods. That puts a limit on recursive processing for fragmentation, and is a limitation I hope to remove in the near future. Also note that this approach only works for text stored atomically - don't try it on mixed content. Results will be ugly.
Overrides:
endElement in class org.xml.sax.helpers.XMLFilterImpl