Very Extensible Linking Language Unafraid of Markup (VELLUM)

1. Founding Points

VELLUM starts from a few basic foundations which lead it to take a different approach from most of the Web-oriented linking approaches currently specified.

2. What VELLUM Does

VELLUM attempts to provide a general-purpose solution to linking which addresses the complexities raised by the W3C's XPointer and XLink specifications by taking a very different approach. VELLUM does not assume that URIs and URI references are adequate to the task of identifying resources, representations, and fragments of representations, and strives to put XML hyperlinking on firmer but still approachable foundations. VELLUM supports and uses URIs and URI references, but offers options that extend those capabilities.

VELLUM is not a general-purpose solution to hypertext linking. VELLUM is intended to be used in cases where precision is important and verbosity is not a problem. While VELLUM could conceivably be mixed with other vocabularies and used to define links within them, it is not designed explicitly for such use. VELLUM is more appropriate for use in cases like external links and linkbases, where the links are stored separately from the resources they connect. (VELLUM's designer hopes that a simpler mechanism for in-line linking will emerge to complement VELLUM.)

VELLUM both builds on the URI framework and goes beyond the URI framework. URIs and URI references may be used within the VELLUM framework if the level of precision they provide is adequate, but developers can specify more information about issues like content-negotiation within the VELLUM framework if they choose. VELLUM also makes it possible to explicitly specify whether a connection involves an abstract resource or a particular concrete representation.

VELLUM also makes it possible for developers to create metadata which applies to their links in a local context. While XLink uses URIs for everything from href to arcrole, VELLUM lets developers use more intelligible identifiers whose meaning is defined within a particular VELLUM context. While VELLUM may be more verbose than a comparable XLink linkbase, it should (if designed thoughtfully) be more readable.

3. Working with VELLUM

VELLUM linkbases contain a "piece" of VELLUM. The simplest piece of VELLUM, with no links whatsoever, looks like:

<piece xmlns="http://simonstl.com/ns/vellum" />

Most pieces of VELLUM will contain at least a connections element and a set of traverse elements:

<piece xmlns="http://simonstl.com/ns/vellum" >
  <connections>
   <traverse>
     <from href="http://www.w3.org/TR/REC-xml#sec-common-syn" />
     <to href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames" />
   </traverse>
   <traverse>
     <from href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames" />
     <to href="http://www.w3.org/TR/REC-xml#sec-common-syn" />
   </traverse>
  </connections>
</piece>

This piece of VELLUM defines traversable links from Section 2.3 of the XML 1.0 Recommendation to Section 3 of Namespaces in XML and back again. Most pieces of VELLUM will define many more traversals than these, but this document illustrates both the one-way nature of traversals and the simplest form of resource addressing that VELLUM offers, URI references.

While one-to-one links are common and useful, there may be times when you want to create many-to-one links without the bother of creating each individual connection or the mess of multiple from and to elements inside a particular traversal. The set element, which may also appear inside the connections element, makes this simple:

<piece xmlns="http://simonstl.com/ns/vellum" >
  <connections>
   <set id="namespaceUses" >
     <member href="http://www.w3.org/TR/REC-xml#sec-common-syn" />
     <member href="http://www.w3.org/TR/xmlschema-2/#QName" />
   </set>
   <traverse>
     <from ref="namespaceUses" />
     <to href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames" />
   </traverse>
  </connections>
</piece>

This first creates a set containing two members, That set is referenced by the ref attribute of the from element, and both of its members will be connected as origins to the Namespaces spec fragment. When ref is used instead of href any place in VELLUM, it points to something else (identified by an id attribute) defined in the same piece of VELLUM. The ref and href attributes are mutually exclusive. Elements may have one or the other but not both.

The indirection provided by the ref attribute makes it possible to create lists of resources which have friendlier identifiers than often opaque URI references. The targets element lets you create locally-named resources instead of relying solely on URI references. For example, the same relationships between resources could be described using this markup:

<piece xmlns="http://simonstl.com/ns/vellum" >
  <targets>
    <target id="_xml-names" representation= "http://www.w3.org/TR/REC-xml#sec-common-syn" />
    <target id="_xml-schema-qnames"  representation= 
"http://www.w3.org/TR/xmlschema-2/#QName">
    <target id="_xmlns-qual-names"  representation= 
"http://www.w3.org/TR/REC-xml-names/#ns-qualnames" />
  </targets>
  
  <connections>
   <set id="namespaceUses" >
     <member ref="_xml-names" />
     <member ref="_xml-schema-qnames" />
   </set>
   <traverse>
     <from ref="namespaceUses" />
     <to ref="_xmlns-qual-names" />
   </traverse>
  </connections>
</piece>

The use of named targets can make the final section defining the connections between those targets much more readable. The use of targets also makes it possible to identify whether a URI is used to refer to a resource (abstract, opaque, negotiable) or a representation (concrete, stream of bits, certain). If those target elements had used the resource attribute instead of the representation attribute, they could have pointed to abstractions. Since these URI references contain fragment identifiers, and the interpretation of fragment identifiers is representation-dependent, these are identifiers for representations.

The use of named targets also makes it possible to create more extended definitions of those targets, including more information about how to retrieve a particular resource, the role of the target (this is currently in flux), and processes like fallback among choices if a particular representation provider fails to respond or a preferred representation is unavailable.

4. VELLUM Document Structures

Each VELLUM document is a "piece" of VELLUM. Inside of (or on, to speak metaphorically) the piece of VELLUM is a list of schemes (URI schemes), targets (resources or representations of resources), roles (TBD), and connections. All of these components are optional, though an empty piece elemnt is of little use. VELLUM processors must understand at least the HTTP scheme by default, and may process URI references.

<piece xmlns="http://simonstl.com/ns/vellum">
  <!--schemes-->
  <!--roles-->
  
  <!--targets -->
  <!--connections-->
</piece>

4.1 Schemes

The schemes element contains information about which URI schemes a VELLUM processor must support in order to manage the processing of the targets. This may eventually be used as a home for information about how to process URNs, possibly with DDDDS, but is currently a placeholder. HTTP must be supported by all VELLUM processors, so VELLUM documents which only connect HTTP resources can skip the schemes element entirely. [Namespace URIs will be used to identify fragment identifier schemes but may also be defined here.]

Note that the IDs used for schemes cannot conflict with each other or with the IDs used for roles, targets, or connections.

4.2 Roles

The roles element effectively provides shortcuts for more verbose URI roles like those used in XLink, but its use is completely optional. A roles element might look like:

  <roles>
    <role id="TOC">http://example.com/semantics/TOC</role>
    <role id="index">http://example.com/semantics/index</role>
  </roles>

Note that the IDs used for schemes cannot conflict with each other or with the IDs used for roles, targets, or connections.

4.3 Targets

[Large portions of this section still represent brainstorming.]

The targets element contains a set of target elements. Each target element must include an id attribute. The target element has two forms. In the short form, the target element may identify the resource or representation to which it points with either a resource or a representation attribute, and is empty. For example, this short target points to section 2.3 of the Namespaces in XML specification.

<target id="_xml-names" representation= "http://www.w3.org/TR/REC-xml#sec-common-syn" />

In the verbose form, that target element takes a URI sub-element as its first child. After that initial URI element, a list of representation elements describing possible interactions with the URI may follow. If there are no representation elements, the target is assumed to be the URI itself. For example, a targets element which contained URI targets for the W3C Namespaces in XML Recommendation and the IETF's RFC 2396 might look like:

<targets>
  <target id="_xmlns">
     <URI>http://www.w3.org/TR/REC-xml-names</URI>
  </target>
  <target id="URIsyntax">
     <URI>http://www.ietf.org/rfc/rfc2396.txt</URI>
  </target>
</targets>

If, however, the _xmlns target needed to point to the application/xml representation of that resource, additional information couldd appear in the _xmlns target element:

<target id="_xmlns">
  <URI>http://www.w3.org/TR/REC-xml-names</URI>
  <representation xmlns:mime="http://simonstl.com/ns/ietf/MIME">
     <mime:content-type value="application/xml" />
  </representation>
</target>

If the developer was uncertain whether the server would respond to application/xml and wanted to ensure that the representation could fall back to text/xml, the target element might instead look like:

<target id="_xmlns">
  <URI>http://www.w3.org/TR/REC-xml-names</URI>
  <representation xmlns:mime="http://simonstl.com/ns/ietf/MIME">
     <mime:content-type value="application/xml" />
     <mime:content-type value="text/xml" />
  </representation>
</target>

Preference is indicated by sequence - the first mime:content-type element will have precedence. If that fails, the second will be tried. If neither works, the target identifies nothing, though applications may present the URI to the user and request guidance. Multiple levels of options may be represented through containment. For example, if those representations needed to be in UTF-16, they might look like:

<target id="_xmlns">
  <URI>http://www.w3.org/TR/REC-xml-names</URI>
  <representation xmlns:mime="http://simonstl.com/ns/ietf/MIME">
     <mime:content-type value="application/xml">
        <mime:charset value="utf-16" />
     </mime:content-type>
     <mime:content-type value="text/xml">
        <mime:charset value="utf-16" />
     </mime:content-type>
  </representation>
</target>

If those representations needed to be in UTF-16 or UTF-8, they might look like:

<target id="_xmlns">
  <URI>http://www.w3.org/TR/REC-xml-names</URI>
  <representation xmlns:mime="http://simonstl.com/ns/ietf/MIME">
     <mime:content-type value="application/xml">
        <mime:charset value="utf-16" />
        <mime:charset value="utf-8" />
     </mime:content-type>
     <mime:content-type value="text/xml">
        <mime:charset value="utf-16" />
        <mime:charset value="utf-8" />
     </mime:content-type>
  </representation>
</target>

These chains can grow quite long, and while explicit chains may be useful in some cases, the and and or elements allow the combination of criteria without repetition:

<target id="_xmlns">
  <URI>http://www.w3.org/TR/REC-xml-names</URI>
  <representation xmlns:mime="http://simonstl.com/ns/ietf/MIME">
    <and>
     <or>
		<mime:content-type value="application/xml" />
		<mime:content-type value="text/xml" />
	 </or>
	 <or>
	    <mime:charset value="utf-16" />
		<mime:charset value="utf-8" />
	 </or>
    </and>
  </representation>
</target>

Applications may either generate q-values from the information set (in the case HTTP 1.1 content negotiation) or conduct the negotiation as a series of explicit attempts to get particular representations.

[cover media features and other MIME headers, q-value calculation.]

Negotiating for a preferred representation is one part of defining a precise target; one that representation has been retrieved, the use of fragment identifiers may be appropriate. VELLUM's approach to fragment identifiers is based on the W3C XPointer Framework, but uses a different syntax in its application. The W3C does not define an id() scheme, but shorthand pointers need an identifier, so the element() scheme is used. Other schemes use their XPointer names, except for the xmlns scheme. The "namespace binding context" for these pointers is defined by the namespace values currently in use where the pointer appears.

A target that points at Section 2.1 of the XML recommendation in its XML form using the ID value might look like:

  <target id="_xml">
     <URI>http://www.w3.org/TR/REC-xml</URI>
     <representation xmlns:mime="http://simonstl.com/ns/ietf/MIME"
                     xmlns:w3c="http://simonstl.com/ns/w3c/xptr">
       <mime:content-type value="application/xml">
          <fragment><w3c:element>sec-well-formed</w3c:element></fragment>
       </mime:content-type>
     </representation>
  </target>

Using a different style and the element() scheme might look like:

  <target id="_xml">
     <URI>http://www.w3.org/TR/REC-xml</URI>
     <representation xmlns:mime="http://simonstl.com/ns/ietf/MIME"
                     xmlns:w3c="http://simonstl.com/ns/w3c/xptr">
       <mime:content-type value="application/xml">
          <fragment><w3c:element>/2/2/1</w3c:element></fragment>
       </mime:content-type>
     </representation>
  </target>

While a pointer using the full xpointer() scheme syntax might look like:

  <target id="_xml">
     <URI>http://www.w3.org/TR/REC-xml</URI>
     <representation xmlns:mime="http://simonstl.com/ns/ietf/MIME"
                     xmlns:w3c="http://simonstl.com/ns/w3c/xptr">
       <mime:content-type value="application/xml">
          <fragment>
<w3c:xpointer>id("sec-well-formed")/range-to(id("charsets"))</w3c:xpointer>
          </fragment>
       </mime:content-type>
     </representation>
  </target>

This approach also simplifies the mixing of schemes from other sources, even namespace-identified schemes:

  <target id="_xml">
     <URI>http://www.w3.org/TR/REC-xml</URI>
     <representation xmlns:mime="http://simonstl.com/ns/ietf/MIME"
                     xmlns:w3c="http://simonstl.com/ns/w3c/xptr">
       <mime:content-type value="application/xml">
          <fragment>
<ssl:xpath1>//div2/@id="sec-well-formed"</w3c:xpath1>
          </fragment>
       </mime:content-type>
     </representation>
  </target>

Developers who want to type less can also use the short forms:

<target id="_xml" resource="http://www.w3.org/TR/REC-xml" />

or:

<target id="_xml" representation="http://www.w3.org/TR/REC-xml#sec-well-formed" />

Resources are treated as abstract opaque identifiers and should be URIs, while representations are explicitly pointers to a default representation. [issues with relative URI references to come]

Note that the IDs used for targets cannot conflict with each other or with the IDs used for roles, schemes, or connections.

[targets can also define XSLT transformations, XInclude and external entity processing. To come.]

4.4 Connections

VELLUM provides two kinds of connections between resources. Developers can specify one-way connections from one target to another, and they can specify sets which are collections of resources. Sets can themselves be treated as targets, and can be connected to targets or other sets. Sets may be useful both for abbreviation in the creation of multi-ended links and for the description of resource groups which don't have traversal semantics.

Defining a set is done using the set element, which may contain one or more member elements. The ref attributes of these member elements must contain an IDREF pointing to IDs of elements which are themselves either set elements or target elements:

   <set id="_xmlSpecs">
     <member ref="xml" />
     <member ref="xmlns" />
   </set>

[Possible additional metadata for sets to come. May also support IDREFS for ref attribute, or switch to href="#ID".]

Alternatively, the member elements may have URI references contained in href attributes. While this approach has all the drawbacks that URI references themselves have, it may be appropriate in some cases where the precision of the target element is unnecessary. For example:

   <set id="_xmlSpecs">
     <member href="http://www.w3.org/TR/REC-xml" />
     <member href="http://www.w3.org/TR/REC-xml-names" />
   </set>

Sets are just sets, with no traversal semantics. Creating navigable links in VELLUM requires using the traverse element to establish one-way connections. Every traverse element must contain a from child element and a to child element. These elements have ref and href attributes (which, like their counterparts on the set element, are mutually exclusive) and together they define a one-way path:

   <traverse>
      <from ref="_xml" />
      <to ref="_xmlns" />
   </traverse>

Or, using href and URI references:

   <traverse>
     <from href="http://www.w3.org/TR/REC-xml" />
     <to href="http://www.w3.org/TR/REC-xml-names" />
   </traverse>

The from element and to element may also reference sets rather than individual resources:

   <traverse>
      <from ref="XMLspecs" />
      <to ref="URIspecs" />
   </traverse>

Creating traversal paths between sets may result in the creation of unwanted paths. VELLUM permits the blocking of paths through the use of sentry elements. Blocked connections have sentries on them. In this first version of VELLUM, the presence of a sentry element simply blocks the creation of a traversal from one named resource or set to another, even if such a path is specified elsewhere.

For example, to block a link from the XML spec to the namespaces spec:

   <traverse>
      <from ref="_xml" />
      <to ref="_xmlns" />
      <sentry />
   </traverse>

The currently empty sentry element may be used to introduce conditional links in future versions of VELLUM, but presently functions only as a blocker.

[add role attribute to set, traversal].

5. Processing model

[To come]

Appendix A: Why Not XLink/XPointer?

XLink and XPointer have an impressive vision, but the fundamental approach chosen by their designers creates complicated problems. XPointer in particular suffers from the cramped space provided by fragment identifiers within URI references, attempting to provide far more capability than is consistent with URI reference usage. XLink combines verbose values with a compressed syntax.

The only parts of the existing Web infrastructure that the XLink developers seem to have appreciated are HTTP GET and URIs. XLink's developers made the naive assumption that URI references could work as simply for XML as they had for HTML. XPointer has transformed into a horrible mash as a result, and claims that XPointer is "simple" seem to rely on a set of assumptions borrowed from HTML practice. What XLink got from HTML was URI references. What XLink forgot about HTML was everything else that made it work well.

XLink's use of attributes may make sense if one sees hypertext linking as merely a matter of adding metadata to existing markup structures. Unfortunately, this choice imposes serious constraints in contexts (especially linkbases) where the markup structures being created are in fact dedicated to hypertext linking. In particular, XLink's reliance on defaulted attributes ties it much more strongly to the old SGML universe and the unreliable case of XML parsers that actually read the DTDs. Its use of URIs to identify roles seems attempts to encapsulate huge amounts of information in attribute values, relying on the wide-open nature of URIs to convey complex messages that cannot normally fit within the small semantic space of an attribute value. (XLink's reliance on defaulted attributes also ties it much more strongly to the old SGML universe and the unreliable case of XML parsers that actually read the DTDs.)

XPointer suffers even more severely from the attribute constraint. While XPointer's original goals were fairly ambitious, they have since been extended by the need to work with XML namespaces, leading to the creation of multiple schemes within the space of a single fragment identifier and the odd case of nesting URI references within URI references in order to make declarations so that those schemes can function. While XPointer hasn't gone to the effort that XQuery has in reinventing XML syntax, the namespace declarations and fallback processes suggest that XPointer is indeed a new case of compound (and element-like) structures being forced into attribute syntax.

VELLUM breaks out of the attribute annotation approach to provide hyperlinking with more room to express relationships precisely. While it is certainly more verbose than XLink and XPointer, it also allows a tighter fit with more Web mechanisms than those specifications permit.

Appendix B: Revisions

15 January 2003 - Changed title. (Thanks to Elliotte Rusty Harold.) Changed ID values starting with xml to _xml per XML 1.0 reservation. Clarified XPointer element() scheme by striking section about shorthand pointers. (Thanks to John Cowan.) Added mention of DDDDS in scheme section.

VELLUM Copyright 2003 Simon St.Laurent.