Data Access
Contents |
Data Access
NOTICE: Dryad is in the process of phasing out the older Handle-style identifiers (those that contain "10255"). We will only continue to support Handle identifiers for navigation to a data package page, retaining the functionality of existing citations in the scientific literature. All other types of data access will be updated to use DOIs. As a result, many of the access mechanisms on this page will be changed during 2011/2012.
Web Browser User Interface
Primary access to Dryad is through its web interface, where users most commonly search on authors, titles, subjects and other metadata elements. Data files archived by Dryad may be downloaded one-by-one from their Dryad data package Web pages.
Additionally, DSpace, the platform on which Dryad is built, supports several "hidden" ways to hack the system's URLs to get useful metadata from the Web interface.
Viewing full metadata: add "?show=full" to the end of the URL Viewing the raw DSpace representation of a page add "DRI" to the URLAnother way to view the raw DSpace markup is to add "?XML" to the end of the URL. This is less useful than the above method, though, because the page's content won't contain the externalized i18n strings. Viewing metadata in machine-readable (METS) format:
- http://datadryad.org/DRI/handle/10255/dryad.12
- NOTE: With the 1.11 release, this URL will be changed to make use of the DOI: http://datadryad.org/resource/doi:10.5061/dryad.12/DRI
Programmatic Data Access
In addition to the web interface, Dryad can be accessed programmatically through a sitemap or OAI-PMH interface.
Sitemaps
warning: The sitemaps are not working as of 2011-10-13. Dryad staff are investigating.
The Dryad sitemap provides access to the links to all Dryad's data package and file pages, with the timestamp of their last update. It is an XML formatted file that is gzipped for transmission. An example snippet of the XML follows:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
- <url>
- <loc>http://datadryad.org/handle/10255/dryad.721</loc>
- <lastmod>2010-02-26T11:17:10Z</lastmod>
- </url>
</urlset>
OAI-PMH
OAI-PMH is a harvesting protocol that may be used to access Dryad's metadata. The specification is available, as are online tutorials, but we include a couple of examples of its use here for illustrative purposes.
IdentifyListSets
- Used to learn about the service
ListMetadataFormats
- Used to learn what sets of metadata. Dryad offers a data package set and a data file set.
ListIdentifiers
- Used to learn what metadata formats can be returned by the service. Dryad currently offers METS/MODS, OAI-DC (Dublin Core), OAI-ORE/Atom, and RDF/DC. The amount of information mapped into each format varies. For now, we recommend using the OAI-DC metadata format.
ListRecords
- Used to list Dryad's OAI identifiers. It requires
fromandmetadataPrefixparameters to know what range of identifiers to return and what format the metadata should be in (from the options returned by the ListMetadataFormats verb). We may modify this to return DOIs in the future.- NOTE: It is highly recommended that you use this call in conjunction with the "set" parameter, so you retrieve the records of interest. Otherwise, you may retrieve records that Dryad has harvested from other providers
GetRecord
- Used to list Dryad records. It requires
fromandmetadataPrefixparameters so it knows the range of records to return. The records will be returned in the format associated with themetadataPrefixrequested. Available formats can be discovered by using the ListMetadataFormats verb.- NOTE: It is highly recommended that you use this call in conjunction with the "set" parameter, so you retrieve the records of interest. Otherwise, you may retrieve records that Dryad has harvested from other providers
- Used to return a single record. It requires the OAI identifier of the record (the
identifierparameter) and the format in which the record should be returned (themetadataPrefixparameter).
Using resumptionTokens with OAI-PMH
OAI-PMH requests may result in partial results lists being returned. In these cases, the results list will contain a resumptionToken that can be used to retrieve the next page of results.
For example, for a call like:
You will receive the first 100 records, ending with a resumptionToken of 2010-01-01T00:00:00Z/9999-12-31T23:59:59Z/hdl_10255_3/oai_dc/100
You can then retrieve the next 100 records with:
Note that when using a resumptionToken, OAI expects you to only repeat the verb, not any of the other parameters that were part of the original request.
Programmatic access to individual data files using OAI-PMH
The process for a machine to locate and download a file from Dryad takes a few steps. We're working to streamline it, and to make it more standards (in particular OAI-ORE and RDF/Linked Data) compliant,
- Obtain a Dryad Identifier in "short" form (e.g., 10255/dryad.1234)
- To list all identifiers of data files, use the OAI-PMH interface and restrict the "set" to the Dryad data files, like:
- For recent additions only, use the RSS feed for data files (described above)
- Obtain the METS metadata
- http://datadryad.org/metadata/handle/INSERT_SHORT_ID_HERE/mets.xml
- This also applies to data packages, if the identifier you have is for a data package. In the METS metadata for a data package, elements <dim:field> with attributes element="relation" qualifier="haspart" mdschema="dc" will have the data file identifier (as a DOI) as value of the element. Remove the "doi:" to obtain the "short" form of the Dryad identifier.
- (temporary step, while Dryad metadata is in transition) You will need to transform the DOI to a Handle-style identifier. Use the lookup service at http://datadryad.org/doi?lookup=INSERT_DOI_HERE
- Then obtain the METS metadata for the data file as above.
- Parse the METS metadata to locate the bitstream URL.
- It is in the <mets:FLocat/> element in the xlink:href attribute. It will look like
/bitstream/handle/SHORT_ID/FILE_NAME?sequence=1 (the sequence number may vary) - If you are interested only in files of a particular type, look for the <mets:file/> element and check the value of its attribute MIMETYPE. For example, for MS Excel files the value should be "application/vnd.ms-excel".
- It is in the <mets:FLocat/> element in the xlink:href attribute. It will look like
- Prepend http://datadryad.org and download the file using the bitstream URL. For example,
DataONE API
WARNING: The DataONE API is not yet finalized. The format for calls to this API may change in the near future.
As part of Dryad's participation in the DataONE project, Dryad makes content available through a specialized API.
- List of objects
- Sample metadata call
- Sample file download
- Technical documentation for the DataONE API
Programmatic access to data files using the DataONE API
- Obtain the DataONE ID of a Dryad object using the DataONE listObjects call: http://www.datadryad.org/mn/object (e.g., dryad.1850/1/nex)
- Retrieve the file: http://www.datadryad.org/mn/object/doi:10.5061/dryad.1850/1/nex
Objects with a DataONE ID ending in "/dap" are metadata objects, which adhere to the Dryad Application Profile (DAP). If you have the identifier of a DAP metadata object and want to know which file object it describes, you can retrieve its DataONE metadata, using a URL such as: http://www.datadryad.org/mn/meta/doi:10.5061/dryad.1850/1/dap.
Links to Data Packages/Files
Dryad uses DOIs (Digital Object Identifiers) to identify Dryad data packages and files. A few simple examples follow. These may be resolved against the DOI resolver at http://dx.doi.org (when you do, remove the "doi:" prefix).
Data packages Data files
RSS Feeds
There are a couple of feed options. Feeds are used by some browsers and all feed and news readers. They may also be used for programmatic access.
Everything -- data packages, data files, and metadata harvested from partner repositories Data packages only Data files only
SOLR search access
Dryad content can be searched using a SOLR interface.
- Basic query: http://datadryad.org/solr/search/select/?q=Galliard
- Field-specific query: http://datadryad.org/solr/search/select/?q=dwc.ScientificName:drosophila
- Search all text for a string, but limits results to two specified fields: http://datadryad.org/solr/search/select/?q=Galliard&fl=dc.title,dc.contributor.author
- Looks up Dryad data based on an article DOI: http://datadryad.org/solr/search/select/?q=dc.relation.isreferencedby:10.1038/nature04863&fl=dc.identifier,dc.title_ac
- Look up all terms in the dc.subject facet, along with their frequencies:
- Look up article DOIs associated with all data published in Dryad over the past 90 days:
"New" Dryad API
We are in the process of designing a new API that will be easier to work with. It should be consistent, and subsume all of the other access mechanisms described above.
WARNING: The API described in this section does not currently exist. It is being designed here to enable broader discussion.
Use cases that must be met:
- start with article DOI or PMID, get data package DOI
- start with data package DOI, get the contents
- (search) given author name, get list of matching package DOIs
- (search) given article title, get list of matching package DOIs
- (search) given a set of fields that are typically unique -- e.g., author name, article title, year -- return the single matching package DOI
- machine metadata access: start with package/file DOI, get relevant metadata field (including file sizes and access statistics)
- harvest: Get all article DOIs. Get all data package DOIs.
Proposed retrieval API:
- http://datadryad.org/api/object/
- Retrieves a list of data packages and data files available
- Each item in the list will contain a DOI, file type, file size, checksum, and modification date
- This is the same as for the DataONE protocol
- http://datadryad.org/api/object/identifier
- Retrieves a data package or data file, given its identifier
- If the identifier is a DOI, a metadata record will be returned
- If the identifier is a DOI with a file format appended, a bitstream will be returned
- This is the same as for the DataONE protocol
- http://datadryad.org/api/articlePackage/article-identifier
- Retrieves a data package associated with a given article
- although this could technically be combined with the /object, we don't want to dissociate /object from its close relationship with the no-argument form of /object
- http://datadryad.org/api/object/identifier/fieldname
- Retrieves the contents of a given metadata field
- http://datadryad.org/api/object/identifier/dap
- Retrieves the complete descriptive metadata for the given object, in Dryad Application Profile format
- This is the same as for the DataONE protocol
- http://datadryad.org/api/meta/identifier
- Retrieves system-level metadata (internal storage information -- not descriptive metadata) for the given object
- This is the same as for the DataONE protocol
Proposed search API:
- http://datadryad.org/api/search/*******
- searching protocol will be the same as listed above for SOLR search
Open questions:
- Should we deprecate /mn for DataONE access, and have everything go through /api? What would happen if the DataONE protocol changed in a manner incompatible with Dryad's API needs?
- Should /articlePackage return a metadata object, or just an identifier?
- Should requests for different formats (e.g., XML, JSON) be via a modifier in the base URL, or as a parameter? How do the underlying protocols do this?
- Should the "dap" metadata format include statistics information?
Other access mechanisms
If you know of other community-developed services that can search or retrieve content that are not listed here, please alert us at help@datadryad.org
- ROpenSci Dryad package for search and retrieval of Dryad data and metadata within R. Tutorial
Suggest Alternatives
We're interested in hearing what other forms of access people would like. If you have a suggestion for making Dryad's content more accessible, please let us know at help@datadryad.org.