BioXML Wiki   

XML is a rapidly developing set of technologies which can help us in the transfer and visualization of data, particularly over the web. If you're looking for a general xml tutorial a good resource is (surprise) If you know the basics, read on...

BioXML Basics

  • DTD-let.

    The most basic building block in bioXML is something called a DTD-let. We're looking for a better name for that, so if you have any suggestions... Anyway, the idea behind a DTD-let is that it's a small, useful, non-trivial DTD which solves a small (bio-data) problem which can easily be combined with other DTD's to attack larger problems. Yeah, that's a bit of a mouthful.

    SO, an example DTD-let is the bioxml SeqDtd. It represents a bio-sequence. Obviously, bio-sequences are at the root of much of bioinformatics, so this is an important concept which can be built upon. The SeqDtd is general enough to represent any bio-sequence and doesn't try to do anything else. This is the essence of a dtd-let. It does a small, important job well and can act as a building block.

    For one final DTD-let example, check out the ComputationDtd. It uses a FeatureDtd, which in turn uses a DbxrefDtd.

  • ID's and IDREF's :

    Typically, most xml information is embedded in tags. But the xml spec provides another way to link data points. Elements can have an ID attributes and IDREF attributes. We've found that using this method of linking data, as opposed to embedded tags, can often eliminate redundancies and simplify the data model. Keep an eye out for these connections, as they are an integral part of the data model as well as BioXlinks?.

    <xml><toad id="ae3"><prince toad="ae3"></xml>

  • Namespaces

    Now this is where the fun starts. Namespaces are essential for xml in general, and for bioxml in particular. If you consider DTD-lets in biology that require a "database" tag, you'll come up with pretty much all of them. But how to know if you want dbxref/database or computation/program/database?

    That is where namespaces come into play. You can prefix each term with a NameSpace id followed by a colon. So those become bx-dbxref:database and bx-computation:database.

    ANd oh yeah, while I'm at it, the prefix to most bioxml namespaces is bx . That's short for BioXML. Get it?

    SO where does the problem lie? Namespaces were hacked onto XML after the XML 1.0 spec was released. And aye, there's the rub. DTD parsers don't recognize namespaces. They see bx-dbxref:database as bx-dbxref:database instead of as database in the bx-dbxref namespace. The current solution to this is to 1) put all fully qualified prefix:tagname in your DTD AND in your xml document and 2) make all xmlns:db-* attributes explicit in the dtd and document.

OK, on to the BioXMLTuorial

Related pages: Unclassified?
This page last edited on 12 Sep 2000
< Version:1.23
  • Search Wiki for: