|As you've probably sussed by now the idea behind the bioxml strategy is to create several small dtds which can be combined to attack larger problems. This tutorial is going to step through many of the bioxml dtd's to build a relatively complex document. The parser I'll talk about is the Apache Group's Xerces, which is currently available in Java, C++, and soon in perl.
NOTE: All of the bioxml dtds have their own NameSpace. These start with bx (short for bioxml). So there is a Bx-Seq NameSpace, the top
element of which is bx-seq:seq.
NOTE: All of the dtds and example files in this tutorial are available at https://www.bioxml.org/dtds/ and https://www.bioxml.org/xml-samples/ (or will be very soon).
The first dtd we'll look at is Bx-Link:link. It's NameSpace is https://www.bioxml.org/dtds/bx-link/v0_1 . This dtd is used by most other bioxml dtds. It allows the linking of data :
- within the same xml document - ref_link.
- across the web to another xml data source - simple_xlink
- across the web to a non-xml data source - dbxref
This dtd's url is https://www.bioxml.org/dtds/current/link.dtd.
<!ELEMENT bx-link:link (
bx-link:ref_link | bx-link:simple_xlink | bx-link:dbxref
xmlns:bx-link CDATA #FIXED "/dtds/bxlink/v0_1/index.html"
All linking to xml datasources is done using IdRefs and/or XPointerS. Within a document, you simply add an IdRef attribute which points at another elements ID? tag.
NOTE: The ref_link, simple_xlink and dbxref have optional xmlns attributes in case another dtd imports only that element and not the entire bx-link dtd.
<!ELEMENT bx-link:ref_link EMPTY>
xmlns:bx-link CDATA "/dtds/v01/bxlink/index.html"
bx-link:ref IDREF #REQUIRED
bx-link:element_name CDATA #REQUIRED
To go accross the web to an XML resource, you need to use XLinkS and XPointerS. An XLink is similar to an html link, but you have much more control. It references an xml datasource. An XPointer extends the url of the XLink and tells the server which section of the referenced document to return. so:
XLink's and XPointer's are still undersupported, unfortunately, but they're now both finished W3C recommendations, I believe, so hopefully we'll see some progress soon. In the meantime, I expect any bioxml dataservers will fudge an xpointer implementation. They only need to support id. They simply return the element (and it's children) with that id.
For a good introduction to xlinks and xpointers, check out: https://www.brics.dk/~amoeller/XML/linking.html
<!ELEMENT bx-link:simple_xlink (#PCDATA)>
xmlns:bx-link CDATA "/dtds/bxlink/v0_1/index.html"
xmlns:xlink CDATA #FIXED "https://www.w3.org/1999/xlink"
xlink:type CDATA #FIXED "simple"
xlink:href CDATA #REQUIRED
xlink:role CDATA #IMPLIED
xlink:title CDATA #IMPLIED
xlink:show (embed|replace|new) #IMPLIED
xlink:actuate (auto|user) "user"
The other type of link is a more typical database cross-reference (dbxref). This just specifies a database name, url and unique id for the entry you are looking for.
<!ELEMENT bx-link:dbxref (
xmlns:bx-link CDATA #FIXED
<!ELEMENT bx-link:database (#PCDATA)>
bx-link:url CDATA #IMPLIED
<!ELEMENT bx-link:id (#PCDATA)>
bx-link:field CDATA #IMPLIED
Here is a sample xml file which shows a bx-link:simple_xlink. You can validate this file with your java xerces (assuming you're running linux/unix) with the command:
- java sax.SAXCount -Nwv https://www.bioxml.org/samples/link.xml The -v flag means validate. The -w is warmup the parser before timing and -N means turnoff namespaces so that FullyQualified? names don't give weird errors. This program doesn't do anything other than validate, count the tags in, and time the parsing of the document.
<!DOCTYPE bx-link:link SYSTEM "/home/brad/tmp/dtds/link.dtd">
Got it? If not, you can probably pick it up as we go on. The bx-link:link elements are used several more times.
On to SeqTutorial.
Related pages: Unclassified?