Difference between revisions of "Data Access"

From Dryad wiki
Jump to: navigation, search
("New" Dryad API)
m (Removed "dx" from DOIs)
 
(42 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Data Access ==
+
== Web Browser User Interface ==
  
NOTICE: Dryad is in the process of phasing out the older Handle-style identifiers
+
Primary access to Dryad is through its web interface, where users most commonly search on authors, titles, subjects and other metadata elements. Data files archived by Dryad may be downloaded one-by-one from their Dryad data package Web pages.
(those that contain "10255"). We will only continue to support Handle identifiers
 
for navigation to a data package page, retaining the functionality of existing
 
citations in the scientific literature. All other types of data access will be updated
 
to use DOIs. As a result, many of the access mechanisms on this page will be
 
changed during 2011/2012.
 
 
 
=== Web Browser User Interface ===
 
 
 
Primary access to Dryad is through its web interface, where users most commonly search on authors, titles, subjects and other metadata elements. Data files archived by Dryad may be downloaded one-by-one from their Dryad data package Web pages.
 
  
 
Additionally, DSpace, the platform on which Dryad is built, supports several "hidden" ways to hack the system's URLs to get useful metadata from the Web interface.
 
Additionally, DSpace, the platform on which Dryad is built, supports several "hidden" ways to hack the system's URLs to get useful metadata from the Web interface.
 
+
<blockquote>Finding a data package page using the article DOI or PMID:
<blockquote>
+
*[http://datadryad.org/discover?query= http://datadryad.org/discover?query=]"doi:10.1111/j.1558-5646.2007.00022.x"
 +
*[http://datadryad.org/discover?query= http://datadryad.org/discover?query=]"PMID:17348941"
 
Viewing full metadata: add "?show=full" to the end of the URL
 
Viewing full metadata: add "?show=full" to the end of the URL
* http://datadryad.org/resource/doi:10.5061/dryad.20?show=full
+
*[http://datadryad.org/resource/doi:10.5061/dryad.20?show=full http://datadryad.org/resource/doi:10.5061/dryad.20?show=full]
 
 
 
Viewing the raw DSpace representation of a page add "DRI" to the URL
 
Viewing the raw DSpace representation of a page add "DRI" to the URL
* http://datadryad.org/resource/doi:10.5061/dryad.12/DRI
+
*[http://datadryad.org/resource/doi:10.5061/dryad.12/DRI http://datadryad.org/resource/doi:10.5061/dryad.12/DRI]
Another way to view the raw DSpace markup is to add "?XML" to the end of the URL. This is less useful than the above method, though, because the page's content won't contain the externalized i18n strings.
+
Another way to view the raw DSpace markup is to add "?XML" to the end of the URL. This is less useful than the above method, though, because the page's content won't contain the externalized i18n strings.
* http://datadryad.org/resource/doi:10.5061/dryad.12?XML
+
*[http://datadryad.org/resource/doi:10.5061/dryad.12?XML http://datadryad.org/resource/doi:10.5061/dryad.12?XML]
 
+
Viewing metadata in machine-readable (METS) format. Can be performed using a DOI or a (legacy) handle:
Viewing metadata in machine-readable (METS) format:
+
*[http://datadryad.org/resource/doi:10.5061/dryad.12/mets.xml http://datadryad.org/resource/doi:10.5061/dryad.12/mets.xml]
* http://datadryad.org/resource/doi:10.5061/dryad.12/mets.xml
+
*[http://datadryad.org/metadata/handle/10255/dryad.1080/mets.xml http://datadryad.org/metadata/handle/10255/dryad.1080/mets.xml]
 
</blockquote>
 
</blockquote>
 +
== Programmatic Data Access ==
  
=== Programmatic Data Access ===
+
In addition to the web interface, Dryad can be accessed programmatically through several APIs.
 
 
In addition to the web interface, Dryad can be accessed programmatically through a [http://en.wikipedia.org/wiki/Sitemaps sitemap] or [http://www.openarchives.org/pmh/ OAI-PMH] interface.
 
 
 
====Sitemaps====
 
 
 
  warning: The sitemaps are not working as of 2011-10-13. Dryad staff are investigating.
 
 
 
The [http://datadryad.org/sitemap?map=0 Dryad sitemap] provides access to the links to all Dryad's data package and file pages, with the timestamp of their last update.  It is an XML formatted file that is [http://en.wikipedia.org/wiki/Gzip gzipped] for transmission.  An example snippet of the XML follows:
 
  
<blockquote><code>
+
=== Sitemaps ===
&lt;urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"&gt;
 
:&lt;url&gt;
 
::&lt;loc&gt;http://datadryad.org/handle/10255/dryad.721&lt;/loc&gt;
 
::&lt;lastmod&gt;2010-02-26T11:17:10Z&lt;/lastmod&gt;
 
:&lt;/url&gt;
 
&lt;/urlset&gt;
 
</code></blockquote>
 
  
====OAI-PMH====
+
The [http://datadryad.org/htmlmap Dryad HTML sitemap] provides access to the links to all Dryad's data package and file pages, with the timestamp of their last update. There is also an [http://datadryad.org/sitemap XML formatted sitemap]. An example snippet of the XML follows:
  
OAI-PMH is a harvesting protocol that may be used to access Dryad's metadata.  The [http://www.openarchives.org/OAI/openarchivesprotocol.html specification] is available, as are [http://www.oaforum.org/tutorial/ online tutorials], but we include a couple of examples of its use here for illustrative purposes.
+
=== OAI-PMH ===
  
<blockquote>
+
[http://www.openarchives.org/pmh/ OAI-PMH] is a harvesting protocol that may be used to access Dryad's metadata. The [http://www.openarchives.org/OAI/openarchivesprotocol.html specification] is available, as are [http://www.oaforum.org/tutorial/ online tutorials], but we include a couple of examples of its use here for illustrative purposes.
'''Identify'''
+
<blockquote>'''Identify'''
 
:Used to learn about the service
 
:Used to learn about the service
:* http://www.datadryad.org/oai/request?verb=Identify
+
:*[http://www.datadryad.org/oai/request?verb=Identify http://www.datadryad.org/oai/request?verb=Identify]
 
 
 
'''ListSets'''
 
'''ListSets'''
:Used to learn what sets of metadata are supported. Dryad offers a data package set and a data file set.
+
:Used to learn what sets of metadata are supported. Dryad offers a data package set and a data file set.
:* http://www.datadryad.org/oai/request?verb=ListSets
+
:*[http://www.datadryad.org/oai/request?verb=ListSets http://www.datadryad.org/oai/request?verb=ListSets]
 
 
 
'''ListMetadataFormats'''
 
'''ListMetadataFormats'''
:Used to learn what metadata formats can be returned by the service. Dryad currently offers [http://www.loc.gov/standards/mets/ METS]/[http://www.loc.gov/standards/mods/ MODS], [http://www.openarchives.org/OAI/openarchivesprotocol.html#dublincore OAI-DC] (Dublin Core), [http://www.openarchives.org/ore/ OAI-ORE]/[http://tools.ietf.org/html/rfc4287 Atom], and [http://www.w3.org/TR/rdf-syntax-grammar/ RDF/DC]. The amount of information mapped into each format varies. For now, we recommend using the OAI-DC metadata format.
+
:Used to learn what metadata formats can be returned by the service. Dryad currently offers [http://www.loc.gov/standards/mets/ METS]/[http://www.loc.gov/standards/mods/ MODS], [http://www.openarchives.org/OAI/openarchivesprotocol.html#dublincore OAI-DC] (Dublin Core), [http://www.openarchives.org/ore/ OAI-ORE]/[http://tools.ietf.org/html/rfc4287 Atom], and [http://www.w3.org/TR/rdf-syntax-grammar/ RDF/DC]. The amount of information mapped into each format varies. For now, we recommend using the OAI-DC metadata format.
:* http://www.datadryad.org/oai/request?verb=ListMetadataFormats
+
:*[http://www.datadryad.org/oai/request?verb=ListMetadataFormats http://www.datadryad.org/oai/request?verb=ListMetadataFormats]
 
 
 
'''ListIdentifiers'''
 
'''ListIdentifiers'''
:Used to list Dryad's OAI identifiers. It requires <code>from</code> and <code>metadataPrefix</code> parameters to know what range of identifiers to return and what format the metadata should be in (from the options returned by the ListMetadataFormats verb). We may modify this to return DOIs in the future.
+
:Used to list Dryad's OAI identifiers. It requires <code>from</code> and <code>metadataPrefix</code> parameters to know what range of identifiers to return and what format the metadata should be in (from the options returned by the ListMetadataFormats verb). We may modify this to return DOIs in the future.
:* http://www.datadryad.org/oai/request?verb=ListIdentifiers&from=2010-01-01&metadataPrefix=oai_dc&set=hdl_10255_3
+
:*[http://www.datadryad.org/oai/request?verb=ListIdentifiers&from=2010-01-01&metadataPrefix=oai_dc&set=hdl_10255_3 http://www.datadryad.org/oai/request?verb=ListIdentifiers&amp;from=2010-01-01&amp;metadataPrefix=oai_dc&amp;set=hdl_10255_3]
 
:NOTE: It is highly recommended that you use this call in conjunction with the "set" parameter, so you retrieve the records of interest. Otherwise, you may retrieve records that Dryad has harvested from other providers
 
:NOTE: It is highly recommended that you use this call in conjunction with the "set" parameter, so you retrieve the records of interest. Otherwise, you may retrieve records that Dryad has harvested from other providers
 
 
'''ListRecords'''
 
'''ListRecords'''
:Used to list Dryad records. It requires <code>from</code> and <code>metadataPrefix</code> parameters so it knows the range of records to return. The records will be returned in the format associated with the <code>metadataPrefix</code> requested. Available formats can be discovered by using the ListMetadataFormats verb.
+
:Used to list Dryad records. It requires <code>from</code> and <code>metadataPrefix</code> parameters so it knows the range of records to return. The records will be returned in the format associated with the <code>metadataPrefix</code> requested. Available formats can be discovered by using the ListMetadataFormats verb.
:* http://www.datadryad.org/oai/request?verb=ListRecords&from=2010-01-01&metadataPrefix=oai_dc&set=hdl_10255_3
+
:*[http://www.datadryad.org/oai/request?verb=ListRecords&from=2010-01-01&metadataPrefix=oai_dc&set=hdl_10255_3 http://www.datadryad.org/oai/request?verb=ListRecords&amp;from=2010-01-01&amp;metadataPrefix=oai_dc&amp;set=hdl_10255_3]
 
:NOTE: It is highly recommended that you use this call in conjunction with the "set" parameter, so you retrieve the records of interest. Otherwise, you may retrieve records that Dryad has harvested from other providers
 
:NOTE: It is highly recommended that you use this call in conjunction with the "set" parameter, so you retrieve the records of interest. Otherwise, you may retrieve records that Dryad has harvested from other providers
 
 
'''GetRecord'''
 
'''GetRecord'''
:Used to return a single record. It requires the OAI identifier of the record (the <code>identifier</code> parameter) and the format in which the record should be returned (the <code>metadataPrefix</code> parameter).
+
:Used to return a single record. It requires the OAI identifier of the record (the <code>identifier</code> parameter) and the format in which the record should be returned (the <code>metadataPrefix</code> parameter).
:* http://www.datadryad.org/oai/request?verb=GetRecord&identifier=oai:datadryad.org:10255/dryad.12&metadataPrefix=oai_dc
+
:*[http://www.datadryad.org/oai/request?verb=GetRecord&identifier=oai:datadryad.org:10255/dryad.12&metadataPrefix=oai_dc http://www.datadryad.org/oai/request?verb=GetRecord&amp;identifier=oai:datadryad.org:10255/dryad.12&amp;metadataPrefix=oai_dc]
 
 
 
</blockquote>
 
</blockquote>
 
+
==== Using resumptionTokens with OAI-PMH ====
===== Using resumptionTokens with OAI-PMH =====
 
  
 
OAI-PMH requests may result in partial results lists being returned. In these cases, the results list will contain a resumptionToken that can be used to retrieve the next page of results.
 
OAI-PMH requests may result in partial results lists being returned. In these cases, the results list will contain a resumptionToken that can be used to retrieve the next page of results.
Line 85: Line 55:
 
For example, for a call like:
 
For example, for a call like:
  
http://www.datadryad.org/oai/request?verb=ListRecords&from=2010-01-01&metadataPrefix=oai_dc&set=hdl_10255_3
+
[http://www.datadryad.org/oai/request?verb=ListRecords&from=2010-01-01&metadataPrefix=oai_dc&set=hdl_10255_3 http://www.datadryad.org/oai/request?verb=ListRecords&amp;from=2010-01-01&amp;metadataPrefix=oai_dc&amp;set=hdl_10255_3]
  
 
You will receive the first 100 records, ending with a resumptionToken of 2010-01-01T00:00:00Z/9999-12-31T23:59:59Z/hdl_10255_3/oai_dc/100
 
You will receive the first 100 records, ending with a resumptionToken of 2010-01-01T00:00:00Z/9999-12-31T23:59:59Z/hdl_10255_3/oai_dc/100
Line 91: Line 61:
 
You can then retrieve the next 100 records with:
 
You can then retrieve the next 100 records with:
  
http://www.datadryad.org/oai/request?verb=ListRecords&resumptionToken=2010-01-01T00:00:00Z/9999-12-31T23:59:59Z/hdl_10255_3/oai_dc/100
+
[http://www.datadryad.org/oai/request?verb=ListRecords&resumptionToken=2010-01-01T00:00:00Z/9999-12-31T23:59:59Z/hdl_10255_3/oai_dc/100 http://www.datadryad.org/oai/request?verb=ListRecords&amp;resumptionToken=2010-01-01T00:00:00Z/9999-12-31T23:59:59Z/hdl_10255_3/oai_dc/100]
  
 
Note that when using a resumptionToken, OAI expects you to only repeat the verb, not any of the other parameters that were part of the original request.
 
Note that when using a resumptionToken, OAI expects you to only repeat the verb, not any of the other parameters that were part of the original request.
  
 +
=== DataONE API ===
 +
 +
As part of Dryad's participation in the [http://dataone.org DataONE] project, Dryad makes content available through a specialized API.
  
===== Programmatic access to individual data files using OAI-PMH =====
+
*[http://www.datadryad.org/mn/object List of objects]
 +
*[https://datadryad.org/mn/object/doi:10.5061/dryad.1850/1 Sample metadata call]
 +
*[https://datadryad.org/mn/object/doi:10.5061/dryad.1850/1/bitstream Sample file download] -- URLs of this format can be used to download Dryad data directly from an external site, without forcing users to go through the Dryad interface.
 +
*[[DataONE RESTful API|Technical documentation for the DataONE API]]
  
The process for a machine to locate and download a file from Dryad takes a few steps. We're working to streamline it, and to make it more standards (in particular OAI-ORE and RDF/Linked Data) compliant,
+
==== Programmatic access to data files using the DataONE API ====
  
# Obtain a Dryad Identifier in "short" form (e.g., 10255/dryad.1234)
+
#Obtain the DataONE ID of a Dryad object using the DataONE listObjects call: [http://www.datadryad.org/mn/object http://www.datadryad.org/mn/object] (e.g., dryad.1850/1)
#* To list all identifiers of data files, use the OAI-PMH interface and restrict the "set" to the Dryad data files, like:
+
#Retrieve the file: [http://www.datadryad.org/mn/object/doi:10.5061/dryad.1850/1/bitstream http://www.datadryad.org/mn/object/doi:10.5061/dryad.1850/1/bitstream]
#** http://www.datadryad.org/oai/request?verb=ListIdentifiers&from=2010-01-01&metadataPrefix=oai_dc&set=hdl_10255_2
+
#Retrieve system metadata about a file, including size and MIME type: [http://www.datadryad.org/mn/meta/doi:10.5061/dryad.1850/1/bitstream http://www.datadryad.org/mn/meta/doi:10.5061/dryad.1850/1/bitstream]
#* For recent additions only, use the RSS feed for data files (described above)
+
#Retrieve descriptive metadata about a file: [http://www.datadryad.org/mn/object/doi:10.5061/dryad.1850/1 http://www.datadryad.org/mn/object/doi:10.5061/dryad.1850/1]
# Obtain the METS metadata
 
#* http://datadryad.org/metadata/handle/INSERT_SHORT_ID_HERE/mets.xml
 
#* This also applies to data packages, if the identifier you have is for a data package. In the METS metadata for a data package, elements <tt>&lt;dim:field&gt;</tt> with attributes <tt>element="relation" qualifier="haspart" mdschema="dc"</tt> will have the data file identifier (as a DOI) as value of the element. Remove the "doi:" to obtain the "short" form of the Dryad identifier.
 
# (temporary step, while Dryad metadata is in transition) You will need to transform the DOI to a Handle-style identifier. Use the lookup service at http://datadryad.org/doi?lookup=INSERT_DOI_HERE
 
# Then obtain the METS metadata for the data file as above.
 
# Parse the METS metadata to locate the bitstream URL.
 
#* It is in the <tt>&lt;mets:FLocat/&gt;</tt> element in the <tt>xlink:href</tt> attribute. It will look like<br/> /bitstream/handle/SHORT_ID/FILE_NAME?sequence=1 (the sequence number may vary)
 
#* If you are interested only in files of a particular type, look for the <tt>&lt;mets:file/&gt;</tt> element and check the value of its attribute <tt>MIMETYPE</tt>. For example, for MS Excel files the value should be "application/vnd.ms-excel".
 
# Prepend http://datadryad.org and download the file using the bitstream URL. For example,
 
#* http://datadryad.org/bitstream/handle/10255/dryad.633/ApineCYTB.nexus?sequence=1
 
  
==== DataONE API ====
+
If you desire the full filename before downloading, obtain the METS document as described above. The filename is in the <tt>&lt;mets:FLocat/&gt;</tt> element in the <tt>xlink:href</tt> attribute.
  
WARNING: The DataONE API is not yet finalized. The format
+
=== Accessing Data Packages via Journal ISSN ===
for calls to this API may change in the near future.
 
  
As part of Dryad's participation in the [http://dataone.org DataONE] project, Dryad makes content available through a specialized API.
+
Journals and their ISSNs can be accessed through a GET command:
  
*[http://www.datadryad.org/mn/object List of objects]
+
<code>http://datadryad.org/api/v1/journals</code>
*[https://datadryad.org/mn/object/doi:10.5061/dryad.1850/1/dap Sample metadata call]
 
*[https://datadryad.org/mn/object/doi:10.5061/dryad.1850/1/nex Sample file download]
 
*[[DataONE RESTful API|Technical documentation for the DataONE API]]
 
  
===== Programmatic access to data files using the DataONE API =====
+
<span style="font-family:arial,helvetica,sans-serif;">The corresponding ISSN can be used to get a list of packages in Dryad for that journal using the following GET command:
  
#Obtain the DataONE ID of a Dryad object using the DataONE listObjects call: [http://www.datadryad.org/mn/object http://www.datadryad.org/mn/object] (e.g., dryad.1850/1/nex)
+
<code>http://datadryad.org/api/v1/journals/{issn}/packages</code>
#Retrieve the file: [http://www.datadryad.org/mn/object/doi:10.5061/dryad.1850/1/nex http://www.datadryad.org/mn/object/doi:10.5061/dryad.1850/1/nex]
 
  
=== Links to Data Packages/Files ===
+
If multiple pages of results are returned, the next and previous page links can be accessed from the link headers with `rel=next` and `rel=prev`.
  
Dryad uses DOIs ([http://en.wikipedia.org/wiki/Digital_object_identifier Digital Object Identifiers]) to identify Dryad data packages and files. A few simple examples follow.
+
There are additional query parameters that can be used to modify the results returned.
 +
* `count` specifies the number of results per page.
 +
* `date_from` and `date_to` can filter results to packages released in a date range.
 +
* `cursor` can be used to specify the key used to start the results page.
  
<blockquote>
+
=== Links to Data Packages/Files ===
'''Data packages'''
 
* [http://dx.doi.org/10.5061/dryad.642 http://dx.doi.org/10.5061/dryad.1664]
 
* [http://dx.doi.org/10.5061/dryad.642 http://dx.doi.org/10.5061/dryad.642]
 
* [http://dx.doi.org/10.5061/dryad.1307 http://dx.doi.org/10.5061/dryad.1307]
 
  
 +
Dryad uses DOIs ([http://en.wikipedia.org/wiki/Digital_object_identifier Digital Object Identifiers]) to identify Dryad data packages and files. A few simple examples follow.
 +
<blockquote>'''Data packages'''
 +
*[http://doi.org/10.5061/dryad.642 http://doi.org/10.5061/dryad.1664]
 +
*[http://doi.org/10.5061/dryad.642 http://doi.org/10.5061/dryad.642]
 +
*[http://doi.org/10.5061/dryad.1307 http://doi.org/10.5061/dryad.1307]
 
'''Data files'''
 
'''Data files'''
* [http://dx.doi.org/10.5061/dryad.1664/1 http://dx.doi.org/10.5061/dryad.1664/1]
+
*[http://doi.org/10.5061/dryad.1664/1 http://doi.org/10.5061/dryad.1664/1]
* [http://dx.doi.org/10.5061/dryad.642/1 http://dx.doi.org/10.5061/dryad.642/1]
+
*[http://doi.org/10.5061/dryad.642/1 http://doi.org/10.5061/dryad.642/1]
* [http://dx.doi.org/10.5061/dryad.1307/1 http://dx.doi.org/10.5061/dryad.1307/1]
+
*[http://doi.org/10.5061/dryad.1307/1 http://doi.org/10.5061/dryad.1307/1]
* [http://dx.doi.org/10.5061/dryad.1307/2 http://dx.doi.org/10.5061/dryad.1307/2]
+
*[http://doi.org/10.5061/dryad.1307/2 http://doi.org/10.5061/dryad.1307/2]
* [http://dx.doi.org/10.5061/dryad.1307/3 http://dx.doi.org/10.5061/dryad.1307/3]
+
*[http://doi.org/10.5061/dryad.1307/3 http://doi.org/10.5061/dryad.1307/3]
 
</blockquote>
 
</blockquote>
 +
=== RSS Feeds ===
  
=== RSS Feeds ===
+
There are a couple of feed options. Feeds are used by some browsers and all feed and news readers. They may also be used for programmatic access.
 +
<blockquote>'''Everything -- data packages, data files, and metadata harvested from partner repositories'''
 +
*[http://datadryad.org/feed/rss_2.0/site http://datadryad.org/feed/rss_2.0/site]
 +
'''Data packages only'''
 +
*[http://datadryad.org/feed/rss_2.0/10255/3 http://datadryad.org/feed/rss_2.0/10255/3]
 +
'''Data files only'''
 +
*[http://datadryad.org/feed/rss_2.0/10255/2 http://datadryad.org/feed/rss_2.0/10255/2]
 +
</blockquote>
 +
=== Twitter Feed ===
  
There are a couple of feed options.  Feeds are used by some browsers and all feed and news readers.  They may also be used for programmatic access.
+
'''Data packages'''
  
<blockquote>
+
*[http://twitter.com/datadryadnew http://twitter.com/datadryadnew]
'''Everything -- data packages, data files, and metadata harvested from partner repositories'''
 
* http://datadryad.org/feed/rss_2.0/site
 
  
'''Data packages only'''
+
'''Primary tweets from Dryad (typically not data)'''
* http://datadryad.org/feed/rss_2.0/10255/3
 
  
'''Data files only'''
+
*[http://twitter.com/datadryad http://twitter.com/datadryad]
* http://datadryad.org/feed/rss_2.0/10255/2
 
</blockquote>
 
  
 
== SOLR search access ==
 
== SOLR search access ==
Line 169: Line 138:
 
Dryad content can be searched using a [http://lucene.apache.org/solr/ SOLR] interface.
 
Dryad content can be searched using a [http://lucene.apache.org/solr/ SOLR] interface.
  
* Basic query: [http://datadryad.org/solr/search/select/?q=Galliard http://datadryad.org/solr/search/select/?q=Galliard]
+
*Basic query: [http://datadryad.org/solr/search/select/?q=Galliard http://datadryad.org/solr/search/select/?q=Galliard]
* Field-specific query: [http://datadryad.org/solr/search/select/?q=dwc.ScientificName:drosophila http://datadryad.org/solr/search/select/?q=dwc.ScientificName:drosophila]
+
*Field-specific query: [http://datadryad.org/solr/search/select/?q=dwc.ScientificName:drosophila http://datadryad.org/solr/search/select/?q=dwc.ScientificName:drosophila]
* Search all text for a string, but limits results to two specified fields: [http://datadryad.org/solr/search/select/?q=Galliard&fl=dc.title,dc.contributor.author http://datadryad.org/solr/search/select/?q=Galliard&fl=dc.title,dc.contributor.author]
+
*Search all text for a string, but limits results to two specified fields: [http://datadryad.org/solr/search/select/?q=Galliard&fl=dc.title,dc.contributor.author http://datadryad.org/solr/search/select/?q=Galliard&fl=dc.title,dc.contributor.author]
* Looks up Dryad data based on an article DOI: [http://datadryad.org/solr/search/select/?q=dc.relation.isreferencedby:10.1038/nature04863&fl=dc.identifier,dc.title_ac http://datadryad.org/solr/search/select/?q=dc.relation.isreferencedby:10.1038/nature04863&fl=dc.identifier,dc.title_ac]
+
*Dryad data based on an article DOI: [http://datadryad.org/solr/search/select/?q=dc.relation.isreferencedby:10.1038/nature04863+DSpaceStatus:Archived&fl=dc.identifier,dc.title_ac http://datadryad.org/solr/search/select/?q=dc.relation.isreferencedby:10.1038/nature04863+DSpaceStatus:Archived&fl=dc.identifier,dc.title_ac]
* Look up all terms in the dc.subject facet, along with their frequencies:
+
*All terms in the dc.subject facet, along with their frequencies:
 +
 
 
[http://datadryad.org/solr/search/select/?q=location:l2&facet=true&facet.field=dc.subject_filter&facet.minCount=1&facet.limit=5000&fl=nothing http://datadryad.org/solr/search/select/?q=location:l2&facet=true&facet.field=dc.subject_filter&facet.minCount=1&facet.limit=5000&fl=nothing]
 
[http://datadryad.org/solr/search/select/?q=location:l2&facet=true&facet.field=dc.subject_filter&facet.minCount=1&facet.limit=5000&fl=nothing http://datadryad.org/solr/search/select/?q=location:l2&facet=true&facet.field=dc.subject_filter&facet.minCount=1&facet.limit=5000&fl=nothing]
* Look up article DOIs associated with all data published in Dryad over the past 90 days:
+
 
 +
*Article DOIs associated with all data published in Dryad over the past 90 days:
 +
 
 
[http://datadryad.org/solr/search/select/?q=dc.date.available_dt:%5BNOW-90DAY/DAY%20TO%20NOW%5D&fl=dc.relation.isreferencedby&rows=1000000 http://datadryad.org/solr/search/select/?q=dc.date.available_dt:%5BNOW-90DAY/DAY%20TO%20NOW%5D&fl=dc.relation.isreferencedby&rows=1000000]
 
[http://datadryad.org/solr/search/select/?q=dc.date.available_dt:%5BNOW-90DAY/DAY%20TO%20NOW%5D&fl=dc.relation.isreferencedby&rows=1000000 http://datadryad.org/solr/search/select/?q=dc.date.available_dt:%5BNOW-90DAY/DAY%20TO%20NOW%5D&fl=dc.relation.isreferencedby&rows=1000000]
  
== "New" Dryad API ==
+
*Data DOIs published in Dryad during January 2011, with results returned in JSON format:
 +
 
 +
[http://datadryad.org/solr/search/select/?q=location:l2+dc.date.available_dt:%5B2011-01-01T00:00:00Z%20TO%202011-01-31T23:59:59Z%5D&fl=dc.identifier&rows=1000000&wt=json http://datadryad.org/solr/search/select/?q=location:l2+dc.date.available_dt:%5B2011-01-01T00:00:00Z%20TO%202011-01-31T23:59:59Z%5D&fl=dc.identifier&rows=1000000&wt=json]
 +
 
 +
For more about using SOLR, see the [http://lucene.apache.org/solr/documentation.html Apache SOLR documentation].
 +
 
 +
== Widget API ==
 +
 
 +
The Widget API will become part of the "New" Dryad API (see below), but components are coming online. The Widget API provides simple images or dynamic iframes that link to content in Dryad and can be embedded into third-party sites.
 +
 
 +
*[[Widgets For Journals]]
 +
*[[Banner Image Widget API]]
 +
 
 +
== Dryad API 2 -- In development ==
  
 
We are in the process of designing a new API that will be easier to work with. It should be consistent, and subsume all of the other access mechanisms described above.
 
We are in the process of designing a new API that will be easier to work with. It should be consistent, and subsume all of the other access mechanisms described above.
  
'''WARNING:''' The API described in this section does not currently exist.
+
'''WARNING:''' The API described in this section does not currently exist. It is documented here to enable broader discussion.
It is being designed here to enable broader discussion.
 
  
 
Use cases that must be met:
 
Use cases that must be met:
* start with data package DOI, retrieve the contents
+
 
* start with article DOI or PMID, retrieve data package DOI
+
*start with data package DOI, retrieve the contents
* (search) given author name, retrieve list of matching package DOIs
+
*start with article DOI or PMID, retrieve data package DOI
* (search) given article title, retrieve list of matching package DOIs
+
*(search) given author name, retrieve list of matching package DOIs
* (search) given a set of fields that are typically unique -- e.g., author name, article title, year -- retrieve the single matching package DOI
+
*(search) given article title, retrieve list of matching package DOIs
* (search) given a journal name or publisher name, retrieve a list of matching package DOIs
+
*(search) given a set of fields that are typically unique -- e.g., author name, article title, year -- retrieve the single matching package DOI
* machine metadata access: start with package/file DOI, get relevant metadata field (including file sizes and access statistics)
+
*(search) given a journal name or publisher name, retrieve a list of matching package DOIs -- case insensitive!
* harvest: Get all article DOIs. Get all data package DOIs.
+
*machine metadata access: start with package/file DOI, get relevant metadata field (including file sizes and access statistics)
 +
*harvest: Get all article DOIs. Get all data package DOIs.
 +
*retrieve content in Dryad-native XML format or JSON format
  
 
Proposed retrieval API:
 
Proposed retrieval API:
* http://datadryad.org/api/object/
 
** Retrieves a list of data packages and data files available
 
** Each item in the list will contain a DOI, file type, file size, checksum, and modification date
 
** This is the same as for the DataONE protocol
 
* http://datadryad.org/api/object/''identifier''
 
** Retrieves a data package or data file, given its identifier
 
** If the identifier is a DOI, a metadata record will be returned
 
** If the identifier is a DOI with a file format appended, a bitstream will be returned
 
** This is the same as for the DataONE protocol
 
* http://datadryad.org/api/articlePackage/''article-identifier''
 
** Retrieves a data package associated with a given article
 
** although this could technically be combined with the /object, we don't want to dissociate /object from its close relationship with the no-argument form of /object
 
* http://datadryad.org/api/object/''identifier''/''fieldname''
 
** Retrieves the contents of a given metadata field
 
* http://datadryad.org/api/object/''identifier''/dap
 
** Retrieves the complete descriptive metadata for the given object, in Dryad Application Profile format
 
** This is the same as for the DataONE protocol
 
* http://datadryad.org/api/meta/''identifier''
 
** Retrieves system-level metadata (internal storage information -- not descriptive metadata) for the given object
 
** This is the same as for the DataONE protocol
 
  
 +
*[http://datadryad.org/api/object/ http://datadryad.org/api/object/]
 +
**Retrieves a list of data packages and data files available
 +
**Each item in the list will contain a DOI, file type, file size, checksum, and modification date
 +
**This is the same as for the DataONE protocol
 +
*[http://datadryad.org/api/object/ http://datadryad.org/api/object/]''identifier''
 +
**Retrieves a data package or data file, given its identifier
 +
**If the identifier is a DOI, a metadata record will be returned
 +
**If the identifier is a DOI with "/bitstream" appended, a data file (bitstream) will be returned
 +
**This is the same as for the DataONE protocol
 +
*[http://datadryad.org/api/object/ http://datadryad.org/api/object/]''identifier''/''fieldname''
 +
**Retrieves the contents of a given metadata field
 +
*[http://datadryad.org/api/articlePackage/ http://datadryad.org/api/articlePackage/]''article-identifier''
 +
**Retrieves a data package associated with a given article
 +
**although this could technically be combined with the /object, we want to preserve the meaning of /object as querying objects in Dryad. Articles are not in Dryad.
 +
*[http://datadryad.org/api/meta/ http://datadryad.org/api/meta/]''identifier''
 +
**Retrieves system-level metadata (internal storage information -- not descriptive metadata) for the given object
 +
**This is the same as for the DataONE protocol
 +
*[http://datadryad.org/api/stats/ http://datadryad.org/api/stats/]''identifier''
 +
**Retrieve usage statistics about a given item.
 +
 +
<br/>Proposed search API:
  
Proposed search API:
+
*[http://datadryad.org/api/search/******* http://datadryad.org/api/search/*******]
* http://datadryad.org/api/search/*******
+
**searching protocol will be the same as listed above for SOLR search
** searching protocol will be the same as listed above for SOLR search
 
  
 
Open questions:
 
Open questions:
# Should we deprecate /mn for DataONE access, and have everything go through /api? What would happen if the DataONE protocol changed in a manner incompatible with Dryad's API needs?
+
 
# Should /articlePackage return a metadata object, or just an identifier?
+
#Should requests for different formats (e.g., XML, JSON) be via a modifier in the base URL, or as a parameter? How is this handled by the the underlying system?
# Should requests for different formats (e.g., XML, JSON) be via a modifier in the base URL, or as a parameter? How do the underlying protocols do this?
 
# Should the "dap" metadata format include statistics information?
 
  
 
== Other access mechanisms ==
 
== Other access mechanisms ==
Line 232: Line 216:
 
If you know of other community-developed services that can search or retrieve content that are not listed here, please alert us at [mailto:help@datadryad.org help@datadryad.org]
 
If you know of other community-developed services that can search or retrieve content that are not listed here, please alert us at [mailto:help@datadryad.org help@datadryad.org]
  
* ROpenSci Dryad package for search and retrieval of Dryad data and metadata within R. [http://ropensci.org/tutorials/dryad-tutorial/ Tutorial]
+
*[http://ropensci.org/packages/dryad.html ROpenSci Dryad package] for search and retrieval of Dryad data and metadata within R.
  
 
== Suggest Alternatives ==
 
== Suggest Alternatives ==
  
We're interested in hearing what other forms of access people would like. If you have a suggestion for making Dryad's content more accessible, please let us know at [mailto:help@datadryad.org help@datadryad.org].
+
We're interested in hearing what other forms of access people would like. If you have a suggestion for making Dryad's content more accessible, please let us know at [mailto:help@datadryad.org help@datadryad.org].<br/><br/><br/><br/><br/><br/><br/><br/>
 
+
[[Category:Metadata]]<br/>[[Category:Software]]<br/>[[Category:Handshaking]]<br/>[[Category:Technical Documentation|Data Access]]
[[Category:Metadata]]
 
[[Category:Software]]
 
[[Category:Handshaking]]
 
[[Category:Technical Documentation]]
 

Latest revision as of 07:36, 20 November 2017

Web Browser User Interface

Primary access to Dryad is through its web interface, where users most commonly search on authors, titles, subjects and other metadata elements. Data files archived by Dryad may be downloaded one-by-one from their Dryad data package Web pages.

Additionally, DSpace, the platform on which Dryad is built, supports several "hidden" ways to hack the system's URLs to get useful metadata from the Web interface.

Finding a data package page using the article DOI or PMID:

Viewing full metadata: add "?show=full" to the end of the URL

Viewing the raw DSpace representation of a page add "DRI" to the URL

Another way to view the raw DSpace markup is to add "?XML" to the end of the URL. This is less useful than the above method, though, because the page's content won't contain the externalized i18n strings.

Viewing metadata in machine-readable (METS) format. Can be performed using a DOI or a (legacy) handle:

Programmatic Data Access

In addition to the web interface, Dryad can be accessed programmatically through several APIs.

Sitemaps

The Dryad HTML sitemap provides access to the links to all Dryad's data package and file pages, with the timestamp of their last update. There is also an XML formatted sitemap. An example snippet of the XML follows:

OAI-PMH

OAI-PMH is a harvesting protocol that may be used to access Dryad's metadata. The specification is available, as are online tutorials, but we include a couple of examples of its use here for illustrative purposes.

Identify
Used to learn about the service

ListSets

Used to learn what sets of metadata are supported. Dryad offers a data package set and a data file set.

ListMetadataFormats

Used to learn what metadata formats can be returned by the service. Dryad currently offers METS/MODS, OAI-DC (Dublin Core), OAI-ORE/Atom, and RDF/DC. The amount of information mapped into each format varies. For now, we recommend using the OAI-DC metadata format.

ListIdentifiers

Used to list Dryad's OAI identifiers. It requires from and metadataPrefix parameters to know what range of identifiers to return and what format the metadata should be in (from the options returned by the ListMetadataFormats verb). We may modify this to return DOIs in the future.
NOTE: It is highly recommended that you use this call in conjunction with the "set" parameter, so you retrieve the records of interest. Otherwise, you may retrieve records that Dryad has harvested from other providers

ListRecords

Used to list Dryad records. It requires from and metadataPrefix parameters so it knows the range of records to return. The records will be returned in the format associated with the metadataPrefix requested. Available formats can be discovered by using the ListMetadataFormats verb.
NOTE: It is highly recommended that you use this call in conjunction with the "set" parameter, so you retrieve the records of interest. Otherwise, you may retrieve records that Dryad has harvested from other providers

GetRecord

Used to return a single record. It requires the OAI identifier of the record (the identifier parameter) and the format in which the record should be returned (the metadataPrefix parameter).

Using resumptionTokens with OAI-PMH

OAI-PMH requests may result in partial results lists being returned. In these cases, the results list will contain a resumptionToken that can be used to retrieve the next page of results.

For example, for a call like:

http://www.datadryad.org/oai/request?verb=ListRecords&from=2010-01-01&metadataPrefix=oai_dc&set=hdl_10255_3

You will receive the first 100 records, ending with a resumptionToken of 2010-01-01T00:00:00Z/9999-12-31T23:59:59Z/hdl_10255_3/oai_dc/100

You can then retrieve the next 100 records with:

http://www.datadryad.org/oai/request?verb=ListRecords&resumptionToken=2010-01-01T00:00:00Z/9999-12-31T23:59:59Z/hdl_10255_3/oai_dc/100

Note that when using a resumptionToken, OAI expects you to only repeat the verb, not any of the other parameters that were part of the original request.

DataONE API

As part of Dryad's participation in the DataONE project, Dryad makes content available through a specialized API.

Programmatic access to data files using the DataONE API

  1. Obtain the DataONE ID of a Dryad object using the DataONE listObjects call: http://www.datadryad.org/mn/object (e.g., dryad.1850/1)
  2. Retrieve the file: http://www.datadryad.org/mn/object/doi:10.5061/dryad.1850/1/bitstream
  3. Retrieve system metadata about a file, including size and MIME type: http://www.datadryad.org/mn/meta/doi:10.5061/dryad.1850/1/bitstream
  4. Retrieve descriptive metadata about a file: http://www.datadryad.org/mn/object/doi:10.5061/dryad.1850/1

If you desire the full filename before downloading, obtain the METS document as described above. The filename is in the <mets:FLocat/> element in the xlink:href attribute.

Accessing Data Packages via Journal ISSN

Journals and their ISSNs can be accessed through a GET command:

http://datadryad.org/api/v1/journals

The corresponding ISSN can be used to get a list of packages in Dryad for that journal using the following GET command:

http://datadryad.org/api/v1/journals/{issn}/packages

If multiple pages of results are returned, the next and previous page links can be accessed from the link headers with `rel=next` and `rel=prev`.

There are additional query parameters that can be used to modify the results returned.

  • `count` specifies the number of results per page.
  • `date_from` and `date_to` can filter results to packages released in a date range.
  • `cursor` can be used to specify the key used to start the results page.

Links to Data Packages/Files

Dryad uses DOIs (Digital Object Identifiers) to identify Dryad data packages and files. A few simple examples follow.

Data packages

Data files

RSS Feeds

There are a couple of feed options. Feeds are used by some browsers and all feed and news readers. They may also be used for programmatic access.

Everything -- data packages, data files, and metadata harvested from partner repositories

Data packages only

Data files only

Twitter Feed

Data packages

Primary tweets from Dryad (typically not data)

SOLR search access

Dryad content can be searched using a SOLR interface.

http://datadryad.org/solr/search/select/?q=location:l2&facet=true&facet.field=dc.subject_filter&facet.minCount=1&facet.limit=5000&fl=nothing

  • Article DOIs associated with all data published in Dryad over the past 90 days:

http://datadryad.org/solr/search/select/?q=dc.date.available_dt:%5BNOW-90DAY/DAY%20TO%20NOW%5D&fl=dc.relation.isreferencedby&rows=1000000

  • Data DOIs published in Dryad during January 2011, with results returned in JSON format:

http://datadryad.org/solr/search/select/?q=location:l2+dc.date.available_dt:%5B2011-01-01T00:00:00Z%20TO%202011-01-31T23:59:59Z%5D&fl=dc.identifier&rows=1000000&wt=json

For more about using SOLR, see the Apache SOLR documentation.

Widget API

The Widget API will become part of the "New" Dryad API (see below), but components are coming online. The Widget API provides simple images or dynamic iframes that link to content in Dryad and can be embedded into third-party sites.

Dryad API 2 -- In development

We are in the process of designing a new API that will be easier to work with. It should be consistent, and subsume all of the other access mechanisms described above.

WARNING: The API described in this section does not currently exist. It is documented here to enable broader discussion.

Use cases that must be met:

  • start with data package DOI, retrieve the contents
  • start with article DOI or PMID, retrieve data package DOI
  • (search) given author name, retrieve list of matching package DOIs
  • (search) given article title, retrieve list of matching package DOIs
  • (search) given a set of fields that are typically unique -- e.g., author name, article title, year -- retrieve the single matching package DOI
  • (search) given a journal name or publisher name, retrieve a list of matching package DOIs -- case insensitive!
  • machine metadata access: start with package/file DOI, get relevant metadata field (including file sizes and access statistics)
  • harvest: Get all article DOIs. Get all data package DOIs.
  • retrieve content in Dryad-native XML format or JSON format

Proposed retrieval API:

  • http://datadryad.org/api/object/
    • Retrieves a list of data packages and data files available
    • Each item in the list will contain a DOI, file type, file size, checksum, and modification date
    • This is the same as for the DataONE protocol
  • http://datadryad.org/api/object/identifier
    • Retrieves a data package or data file, given its identifier
    • If the identifier is a DOI, a metadata record will be returned
    • If the identifier is a DOI with "/bitstream" appended, a data file (bitstream) will be returned
    • This is the same as for the DataONE protocol
  • http://datadryad.org/api/object/identifier/fieldname
    • Retrieves the contents of a given metadata field
  • http://datadryad.org/api/articlePackage/article-identifier
    • Retrieves a data package associated with a given article
    • although this could technically be combined with the /object, we want to preserve the meaning of /object as querying objects in Dryad. Articles are not in Dryad.
  • http://datadryad.org/api/meta/identifier
    • Retrieves system-level metadata (internal storage information -- not descriptive metadata) for the given object
    • This is the same as for the DataONE protocol
  • http://datadryad.org/api/stats/identifier
    • Retrieve usage statistics about a given item.


Proposed search API:

Open questions:

  1. Should requests for different formats (e.g., XML, JSON) be via a modifier in the base URL, or as a parameter? How is this handled by the the underlying system?

Other access mechanisms

If you know of other community-developed services that can search or retrieve content that are not listed here, please alert us at help@datadryad.org

Suggest Alternatives

We're interested in hearing what other forms of access people would like. If you have a suggestion for making Dryad's content more accessible, please let us know at help@datadryad.org.