| Distributed Xml System(dxs)
Note: a lot of the details of this are taken from Lincoln Stein's
DAS.
RATIONALE:
The idea behind dxs is that there are basically two things that
biologists need. The first is data. The second is a means
to analyze
that data. Additionally, it is nice to be able to do things over
the web
in an automated fashion. The current state of biological web-data-analysis
integration is cutting and pasting your genbank files into the blast
window.
Clearly, a better approach would be useful.
APPROACH:
dxs is logically seperated into two types of servers, data servers
and application servers. The data servers store and regurgitate
data in the
form of xml files or xml fragments. The application servers take
xml data,
do some analysis on them, and return another xml document. And
that, in a
nutshell, is dxs.
The way that this is all done is by calling urls on these servers.
So
let's say Brad (that's me) has his data at the url www.bradmarshall.com
(which
is remotely hosted), and knows that he would like to use the blast
server at
www.we_blast_fast.com. He would go to bradmarshall.com and log
in to get a
list of his available documents. He would then select the appropriate
document(s), and choose the application he wanted to run, either by
inputting
www.we_blast_fast.com into a text box, or alternatively from a pull-down
menu
or whatever's on the site. bradmarshall.com would then do a post
to
we_blast_fast.com, giving it the appropriate xml document(s).
we_blast_fast.com would then run the analysis and give bradmarshall.com
an xml
blast document back in another post. At bradmarshall.com Brad
could choose
then to either use the document temporarily or save it for later.
Obviously that little scenario glosses over a lot of details,
but, I
hope, shows the potential for very generic integration of data and
applications
over the web. The glue layers are http and xml, two protocols
that are free
and ubiquitous. (well, http more so than xml). Essentially, what
any user of
the system would need to know is 1) where their data is 2) where
the
application they would like to use is and 3) possibly a username and
password.
DETAILS:
So let's talk details. Every server that is part of the
network will
be required to support one url. This will be /dxs. This url is
the
gateway to that server's services. That url will return, of course,
an xml
document. The document will describe 1) which url's that server
has open with
dxs services 2) which options each url accepts 3) which doctypes
each url accepts and 4) which doctypes the url returns, if any.
So let's look
at a sample (very preliminary) document that will be gotten from hitting
http://www.we_blast_fast.com/dxs :
<?xml version='1.0'?>
<dxs>
<service type="app">
<name>blastn</name>
<url>http://www.we_blast_fast.com/dxs/blast</url>
<doctypes_accepted>seq</doctypes_accepted>
<doctypes_returned>game</doctypes_returned>
</service>
</dxs>
So let's step through more carefully how the transaction might
take
place. Brad would log into bradmarshall.com. He would then specify
that he
wants to use the services of www.we_blast_fast.com. The server
on
bradmarshall.com would go out and retrieve the dxs document.
It would then
present Brad with a list of services available, which in this case
would be
just blastn, along with available documents (or doc-fragments) he has
available that are accepted by those services. He would then
select
the documents to run, click the blastn hotlink and be forwarded to
www.we_blast_fast.com, where he would be presented with whatever options
are
available for doing a blast search. Finally he would run the
blast search and
be presented with his results. He would also have an option to
store his newly
generated document at www.bradmarshall.com.
Got it? Alternatively, Brad could start his analysis at
www.we_blast_fast.com. In this case he would see their frontpage,
which would
have an option to get data. He would enter the url www.bradmarshall.com.
The WBF server would again hit the url /dxs :
http://www.bradmarshall.com/dxs?doctype=seq
and get this document:
<dxs>
<dbservice type="get">
<name>Retrieve
document.</name>
<url>http://www.bradmarshall.com/dxs/getdoc</url>
<option required="true">username</option>
<option required="true"
hidden="true">password</option>
<option>docname</option>
<option type="integer">maxdocs</option>
<doctype_returned>any</doctype_returned>
</dbservice>
</dxs>
This document would describe which url's bradmarshall.com supports
that will return a seq doctype. A form would be presented to
the user at WBF to
fill in their username, password and possibly docname and/or maxdocs
(the
maximum # of documents). These would be passed as url options
to
bradmarshall.com :
http://www.bradmarshall.com/dxs/getdoc?username=brad&password=******&maxdocs=10
- although this should be handled as a post since we're using
passwords
bradmarshall.com would return the list of appropriate (up to 10
in this
case) documents that could then be blasted. Currently I'm thinking
the best way
to return a list of documents might be something like:
<dxs>
<doclist>
<domain>www.we_blast_fast.com</domain>
<doc>Your doc here
as CDATA</doc>
<doc>Your doc here
as CDATA</doc>
</doclist>
</dxs>
|