Because Word now puts so much content into its (non-XML) HTML, it's relatively easy to convert that HTML to XML without losing information.
Previous Page < TOC > Next Page