|
|||||||||
PREV NEXT | FRAMES NO FRAMES |
See:
Description
Packages | |
com.simonstl.fragment |
Intro | Example | Recursion | Future | License (MPL) | Download
The com.simonstl.fragment package is designed to allow developers to fragment chunks of element content into smaller pieces during the course of parsing with a SAX-compliant parser. It is built on the filtering capabilities built into the SAX2 API.
The fragments are described using regular expressions, much like those specified in Appendix F of XML Schema Part 2, Datatypes. The actual regular expression processing is done using a utility package of the Apache XML project's Xerces parser. (For convenience, the FilterTester uses the Apache parser's SAX functionality as well as David Megginson's XML Writer.)
Fragment processing applies a set of rules to a given document during parsing. The rules are specified as an XML document, using a simple vocabulary:
<?xml version="1.0" encoding="UTF-8"?> <fragmentRules xmlns="http://simonstl.com/ns/fragments/"> <fragmentRule matchPattern="(\d{2,5})(\d{2})-(\d{2})"> <applyTo> <targetElement nsURI="http://simonstl.com/ns/test/" localName="gYearMonth"/> <targetElement nsURI="http://simonstl.com/ns/test/" localName="myYearMonth"/> </applyTo> <produce> <resultElement nsURI="http://simonstl.com/ns/types/" localName="century" prefix="type" /> <resultElement nsURI="http://simonstl.com/ns/types/" localName="year" prefix="type" /> <resultElement nsURI="http://simonstl.com/ns/types/" localName="month" prefix="type" /> </produce> </fragmentRule> </fragmentRules>
This set of fragmentRules
includes one rule which applies to two elements in the http://simonstl.com/ns/test/
namespace - gYearMonth
and myYearMonth
. The matchPattern
attribute contains a regular expression ((\d{2,5})(\d{2})-(\d{2})
) which defines how the content of those elements will be broken down. The produce
element contains information describing how those parts should be represented as elements.
This means that the XML document:
<test xmlns="http://simonstl.com/ns/test/"> <message>Hello! This document contains a gYearMonth.</message> <gYearMonth>1977-11</gYearMonth> <myYearMonth>1977-11</myYearMonth> </test>
will be reported as:
<?xml version="1.0" standalone="yes"?> <test xmlns="http://simonstl.com/ns/test/"> <message>Hello! This document contains a gYearMonth.</message> <gYearMonth><type:century xmlns:type="http://simonstl.com/ns/types/">19</type:ce ntury><type:year xmlns:type="http://simonstl.com/ns/types/">70</type:year><type: month xmlns:type="http://simonstl.com/ns/types/">11</type:month></gYearMonth> <myYearMonth><type:century xmlns:type="http://simonstl.com/ns/types/">19</type:c entury><type:year xmlns:type="http://simonstl.com/ns/types/">70</type:year><type :month xmlns:type="http://simonstl.com/ns/types/">11</type:month></myYearMonth> </test>
Rules may also be applied recursively to the results of prior rules. For some types - notably dates - this is important for proper processing of complex pieces. For example, we could add a rule which breaks the date information above even further, into individual digits:
<fragmentRule matchPattern="(\d{1})(\d{1})"> <applyTo> <targetElement nsURI="http://simonstl.com/ns/types/" localName="century" /> <targetElement nsURI="http://simonstl.com/ns/types/" localName="year" /> <targetElement nsURI="http://simonstl.com/ns/types/" localName="month" /> </applyTo> <produce> <resultElement nsURI="http://simonstl.com/ns/types/" localName="digit" prefix="type" /> <resultElement nsURI="http://simonstl.com/ns/types/" localName="digit" prefix="type" /> </produce> </fragmentRule>
When that rule was applied in combination with the previous rule, the results would look like:
<?xml version="1.0" standalone="yes"?> <test xmlns="http://simonstl.com/ns/test/"> <message>Hello! This document contains a gYearMonth.</message> <gYearMonth><type:century xmlns:type="http://simonstl.com/ns/types/"><type:digit >1</type:digit><type:digit>9</type:digit></type:century><type:year xmlns:type="h ttp://simonstl.com/ns/types/"><type:digit>7</type:digit><type:digit>0</type:digi t></type:year><type:month xmlns:type="http://simonstl.com/ns/types/"><type:digit >1</type:digit><type:digit>1</type:digit></type:month></gYearMonth> <myYearMonth><type:century xmlns:type="http://simonstl.com/ns/types/"><type:digi t>1</type:digit><type:digit>9</type:digit></type:century><type:year xmlns:type=" http://simonstl.com/ns/types/"><type:digit>7</type:digit><type:digit>0</type:dig it></type:year><type:month xmlns:type="http://simonstl.com/ns/types/"><type:digi t>1</type:digit><type:digit>1</type:digit></type:month></myYearMonth> </test>
Developers may find these smaller parts easier to work with.
Future development will focus on:
style
attributeIt's very simple stuff, but these kinds of transformations can simplify processing XML content with compound information chunks significantly.
The contents of this package are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/.
Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License.
The Original Code is available at http://simonstl.com/projects/fragment/original.
The Initial Developer of the Original Code is Simon St.Laurent. Portions created by Simon St.Laurent are Copyright (C) 2001 Simon St.Laurent. All Rights Reserved.
Contributor(s):
Download here.
|
|||||||||
PREV NEXT | FRAMES NO FRAMES |