Finding an open parser
do provide access in some form to their internal workings. Xerces Native Interface (XNI), for example.
Tree-walking or streams
A streaming parser makes the most sense to me for this kind of processing, though it is conceivable that an application might walk a tree and populate (or repopulate) its entities after construction.
Variations in (optional) access
Even among APIs, there are widely varying degrees of access, and even in the (relatively generous) DOM, "references to predefined entities are considered to be expanded by the HTML or XML processor so that characters are represented by their Unicode equivalent rather than by an entity reference. Moreover, the XML processor may completely expand references to entities while building the Document, instead of providing EntityReference nodes."
I wrote a toolkit last year that provides access to all of these parts as they emerge from an early stage of processing. Streams are provided. All that needs to be done (for an experiment, anyway) is expand Ents to cover more cases and combine it with the Ripper parser.
Previous Page <
> Next Page