bioxml.org


 Home

 Projects

 Links

 Mailing Lists

 bioperl

 

  Distributed Xml System(dxs)

 Note: a lot of the details of this are taken from Lincoln Stein's DAS.
 

 RATIONALE:

 The idea behind dxs is that there are basically two things that 
biologists need.  The first is data.  The second is a means to analyze 
that data.  Additionally, it is nice to be able to do things over the web 
in an automated fashion.  The current state of biological web-data-analysis 
integration is cutting and pasting your genbank files into the blast window.  
Clearly, a better approach would be useful. 

APPROACH:

 dxs is logically seperated into two types of servers, data servers 
and application servers.  The data servers store and regurgitate data in the 
form of xml files or xml fragments.  The application servers take xml data, 
do some analysis on them, and return another xml document.  And that, in a 
nutshell, is dxs.

 The way that this is all done is by calling urls on these servers.  So 
let's say Brad (that's me) has his data at the url www.bradmarshall.com (which 
is remotely hosted), and knows that he would like to use the blast server at 
www.we_blast_fast.com.  He would go to bradmarshall.com and log in to get a 
list of his available documents.  He would then select the appropriate 
document(s), and choose the application he wanted to run, either by inputting 
www.we_blast_fast.com into a text box, or alternatively from a pull-down menu 
or whatever's on the site.  bradmarshall.com would then do a post to 
we_blast_fast.com, giving it the appropriate xml document(s).   
we_blast_fast.com would then run the analysis and give bradmarshall.com an xml 
blast document back in another post.  At bradmarshall.com Brad could choose 
then to either use the document temporarily or save it for later.

 Obviously that little scenario glosses over a lot of details, but, I 
hope, shows the potential for very generic integration of data and applications 
over the web.  The glue layers are http and xml, two protocols that are free 
and ubiquitous. (well, http more so than xml).  Essentially, what any user of 
the system would need to know is 1) where their data is  2) where the
application they would like to use is and 3) possibly a username and password. 

DETAILS:

 So let's talk details.  Every server that is part of the network will 
be required to support one url.  This will be /dxs. This url is the 
gateway to that server's services.  That url will return, of course, an xml 
document.  The document will describe 1) which url's that server has open with 
dxs services 2) which options each url accepts 3) which doctypes 
each url accepts and 4) which doctypes the url returns, if any.  So let's look 
at a sample (very preliminary) document that will be gotten from hitting 
http://www.we_blast_fast.com/dxs :

<?xml version='1.0'?>
<dxs>
    <service type="app">
        <name>blastn</name>
        <url>http://www.we_blast_fast.com/dxs/blast</url>
        <doctypes_accepted>seq</doctypes_accepted>
        <doctypes_returned>game</doctypes_returned>
    </service>
</dxs>

 So let's step through more carefully how the transaction might take 
place. Brad would log into bradmarshall.com.  He would then specify that he 
wants to use the services of www.we_blast_fast.com.  The server on 
bradmarshall.com would go out and retrieve the dxs document.  It would then 
present Brad with a list of services available, which in this case would be 
just blastn, along with available documents (or doc-fragments) he has 
available that are accepted by those services.  He would then select 
the documents to run, click the blastn hotlink and be forwarded to 
www.we_blast_fast.com, where he would be presented with whatever options are 
available for doing a blast search.  Finally he would run the blast search and 
be presented with his results.  He would also have an option to store his newly 
generated document at www.bradmarshall.com.

 Got it?  Alternatively, Brad could start his analysis at 
www.we_blast_fast.com.  In this case he would see their frontpage, which would 
have an option to get data.  He would enter the url www.bradmarshall.com.  
The WBF server would again hit the url /dxs :

http://www.bradmarshall.com/dxs?doctype=seq

 and get this document:
 

<dxs>
    <dbservice type="get">
         <name>Retrieve document.</name>
         <url>http://www.bradmarshall.com/dxs/getdoc</url>
         <option required="true">username</option>
         <option required="true" hidden="true">password</option>
         <option>docname</option>
         <option type="integer">maxdocs</option>
         <doctype_returned>any</doctype_returned>
    </dbservice>
</dxs>

 This document would describe which url's bradmarshall.com supports 
that will return a seq doctype.  A form would be presented to the user at WBF to 
fill in their username, password and possibly docname and/or maxdocs (the 
maximum # of documents).  These would be passed as url options to 
bradmarshall.com : 

http://www.bradmarshall.com/dxs/getdoc?username=brad&password=******&maxdocs=10
  - although this should be handled as a post since we're using passwords

 bradmarshall.com would return the list of appropriate (up to 10 in this 
case) documents that could then be blasted.  Currently I'm thinking the best way 
to return a list of documents might be something like:

<dxs>
    <doclist>
        <domain>www.we_blast_fast.com</domain>
         <doc>Your doc here as CDATA</doc>
         <doc>Your doc here as CDATA</doc>
    </doclist>
</dxs>

 

 


This site is maintained by Brad Marshall ([email protected])