bio.xml.org - Projects

bioxml.org

Home

Projects

Links

Mailing Lists

bioperl

Distributed Xml System(dxs)

Note: a lot of the details of this are taken from Lincoln Stein's DAS.

RATIONALE:

The idea behind dxs is that there are basically two things that
biologists need. The first is data. The second is a means to analyze
that data. Additionally, it is nice to be able to do things over the web
in an automated fashion. The current state of biological web-data-analysis
integration is cutting and pasting your genbank files into the blast window.
Clearly, a better approach would be useful.

APPROACH:

dxs is logically seperated into two types of servers, data servers
and application servers. The data servers store and regurgitate data in the
form of xml files or xml fragments. The application servers take xml data,
do some analysis on them, and return another xml document. And that, in a
nutshell, is dxs.

The way that this is all done is by calling urls on these servers. So
let's say Brad (that's me) has his data at the url www.bradmarshall.com (which
is remotely hosted), and knows that he would like to use the blast server at
www.we_blast_fast.com. He would go to bradmarshall.com and log in to get a
list of his available documents. He would then select the appropriate
document(s), and choose the application he wanted to run, either by inputting
www.we_blast_fast.com into a text box, or alternatively from a pull-down menu
or whatever's on the site. bradmarshall.com would then do a post to
we_blast_fast.com, giving it the appropriate xml document(s).
we_blast_fast.com would then run the analysis and give bradmarshall.com an xml
blast document back in another post. At bradmarshall.com Brad could choose
then to either use the document temporarily or save it for later.

Obviously that little scenario glosses over a lot of details, but, I
hope, shows the potential for very generic integration of data and applications
over the web. The glue layers are http and xml, two protocols that are free
and ubiquitous. (well, http more so than xml). Essentially, what any user of
the system would need to know is 1) where their data is 2) where the
application they would like to use is and 3) possibly a username and password.

DETAILS:

So let's talk details. Every server that is part of the network will
be required to support one url. This will be /dxs. This url is the
gateway to that server's services. That url will return, of course, an xml
document. The document will describe 1) which url's that server has open with
dxs services 2) which options each url accepts 3) which doctypes
each url accepts and 4) which doctypes the url returns, if any. So let's look
at a sample (very preliminary) document that will be gotten from hitting
http://www.we_blast_fast.com/dxs :

<?xml version='1.0'?>
<dxs>
    <service type="app">
        <name>blastn</name>
        <url>http://www.we_blast_fast.com/dxs/blast</url>
        <doctypes_accepted>seq</doctypes_accepted>
        <doctypes_returned>game</doctypes_returned>
    </service>
</dxs>

So let's step through more carefully how the transaction might take
place. Brad would log into bradmarshall.com. He would then specify that he
wants to use the services of www.we_blast_fast.com. The server on
bradmarshall.com would go out and retrieve the dxs document. It would then
present Brad with a list of services available, which in this case would be
just blastn, along with available documents (or doc-fragments) he has
available that are accepted by those services. He would then select
the documents to run, click the blastn hotlink and be forwarded to
www.we_blast_fast.com, where he would be presented with whatever options are
available for doing a blast search. Finally he would run the blast search and
be presented with his results. He would also have an option to store his newly
generated document at www.bradmarshall.com.

Got it? Alternatively, Brad could start his analysis at
www.we_blast_fast.com. In this case he would see their frontpage, which would
have an option to get data. He would enter the url www.bradmarshall.com.
The WBF server would again hit the url /dxs :

http://www.bradmarshall.com/dxs?doctype=seq

and get this document:

<dxs>
    <dbservice type="get">
         <name>Retrieve document.</name>
         <url>http://www.bradmarshall.com/dxs/getdoc</url>
         <option required="true">username</option>
         <option required="true" hidden="true">password</option>
         <option>docname</option>
         <option type="integer">maxdocs</option>
         <doctype_returned>any</doctype_returned>
    </dbservice>
</dxs>

This document would describe which url's bradmarshall.com supports
that will return a seq doctype. A form would be presented to the user at WBF to
fill in their username, password and possibly docname and/or maxdocs (the
maximum # of documents). These would be passed as url options to
bradmarshall.com :

http://www.bradmarshall.com/dxs/getdoc?username=brad&password=******&maxdocs=10
- although this should be handled as a post since we're using passwords

bradmarshall.com would return the list of appropriate (up to 10 in this
case) documents that could then be blasted. Currently I'm thinking the best way
to return a list of documents might be something like:

<dxs>
    <doclist>
        <domain>www.we_blast_fast.com</domain>
         <doc>Your doc here as CDATA</doc>
         <doc>Your doc here as CDATA</doc>
    </doclist>
</dxs>

This site is maintained by Brad Marshall ([email protected])