From Dryad wiki
Revision as of 08:59, 27 September 2011 by Ryan Scherle (talk | contribs) (Created page with "Dryad is a member node in the [http://dataone.org DataONE] network. DataONE is an NSF-funded !DataNet, a distributed organization that aims to provide persistent, robust, and se...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Dryad is a member node in the DataONE network. DataONE is an NSF-funded !DataNet, a distributed organization that aims to provide persistent, robust, and secure access to well-described and easily discovered data from from the genome to the ecosystem, including Earth observational data from atmospheric, ecological, hydrological, and oceanographic sources.


The standard URL prefix for interacting with Dryad DataONE interface is "mn" (e.g., http://datadryad.org/mn/)

Current Usage

The table below includes the current usage and implementation status.

|| *Method* || *REST Interface* || *Status* || || listObjects() || GET /object || Basic list works.
objectFormat needs to follow the correct output type.
Needs to respond to all possible parameters.
Only supports the (required) XML response format. || || get() || GET /object/{{{<guid>}}} || Implemented but relationships between data files and data packages need to be improved. || || describe() || HEAD /object/{{{<guid>}}} || Implemented. || || create() || POST /object || || || create() || PUT /object/{{{<guid>}}} || || || delete() || DELETE /object/{{{<guid>}}} || || || getSystemMetadata() || GET /meta/{{{<guid>}}} || Initial implementation works. || || (none) || HEAD /meta/{{{<guid>}}} || || || getChecksum() || GET /checksum/{{{<guid>}}} || Implemented. || || isAuthorized() || GET /isAuthorized/{{{<guid>}}} || || || getLogRecords() || GET /log || || || listEvents() || GET /monitor/event || || || ping() || || || || getObjectStatistics() || || || || getStatus() || || || || login() || || || || logout() || || ||

Identifiers and Versioning

Dryad needs to keep a timestamp as part of the D1 identifier for metadata, since the Dryad metadata can change without changing the Dryad version number.

Since Dryad doesn't natively keep separate identifiers for data and metadata, the identifiers given to D1 will need to included something to distinguish them (e.g., "/metadata").

See the DOI Usage page for information on Dryad's native identifier handling.

When Dryad replicates data from other systems, it will store data and metadata in separate objects, using Dryad metadata for Dryad's internal tracking purposes.

Sample identifiers:

  • Currently used by prototype (metadata): hdl:10255/dryad.105/mets.xml
  • Currently used by prototype (data, just getting first file): hdl:10255/dryad.105/mets.xml_data
  • Metadata: doi:10.5061/dryad.104/1/dap
  • Data: doi:10.5061/dryad.104/1/txt


The Member Node module is built and deployed the same way as the regular DSpace implementation. Its code lives in (trunk)/dspace/modules/dataone-mn.

Open Issues

  1. DOI-like identifiers that aren't registered DOIs (since they have the format on the end)
    1. Move the format to the beginning? txt/doi%3A10.5061/dryad.104/1. This may be a problem if the format becomes complex.
    2. Use a code like "obj" or "meta" at the beginning? What to do with other bitstreams?
    3. Actually register these as DOIs? Introduces a maintenance burden.
    4. Change the prefix from "doi" to "Dryad"? (Ryan likes this) Downsides?
  2. system metadata -- rightsholder. what are other people doing about this?
  3. The "txt" in "Data: doi%3A10.5061/dryad.104/1/txt" isn't great. It was intended to correspond to the objectFormat in the listObjects. Types.ObjectFormat is not yet fully defined, but it looks like it will be a MIME type or similar long string.
  4. Why text/xml instead of application/xml in http://mule1.dataone.org/ArchitectureDocs/REST_interface.html#get-meta-guid -- understand the reasons for that for XML that is going into a browser but would assume XML from /meta will be application parsed/processed.
  5. We're currently providing only data files (not packages), until DataONE decides on the packaging framework. When this happens, we will need to specify the relationships between package objects and file objects. This is different than within Dryad, because the D1 identifiers are slightly different.

For discussion in larger group:

  1. Checklist for API compliance? (what methods are included in Level 1, Level 2, etc.)
    1. What do the coordinating nodes currently expect?
    2. Is there a test suite to validate the functionality of a MN API?
  2. What is the process for adding a new type of science metadata to the system?
  3. /log returns something from the python nodes, but this format hasn't been defined
  4. Notifications when the API changes?
  5. Need to specify the search API -- possible options on the /object command
  6. For create() and update(), we typically do not assign an ID until curator approval -- what should we return???


DSpace-native Member Node API Implementation

It is built and deployed the same way as the regular DSpace implementation (see The Dryad HowTo).

An example get() call: https://datadryad.org/mn/object/doi%3A10.5061/dryad.20/1/dap

Relation to DSpace

  • We aim to make this work with DSpace as closely as possible, because it is likely that other institutions will want to become member nodes (e.g., UIUC, MIT)

TODO: Releasing Dryad's MN Implementation to General DSpace

  1. Move settings to config file
  2. Genericize the metadata output
  3. Ensure packaging information is not tightly tied into the MN code

Relation to DataONE

DataONE member node architecture documents

(Deprecated) Python Prototype Implementation

Roger Dahl created an initial implementation based on his generic Python member node, using metadata harvested from Dryad. It does not have current data, but it still serves as a reference implementation for many of the API methods.

An example get() call on the old prototype system: