Old: DOI Services Technology

From Dryad wiki
Revision as of 17:49, 17 December 2012 by Ryan Scherle (talk | contribs) (Implementation)

Jump to: navigation, search

Overview

Dryad mints, manages, and registers Digital Object Identifiers (DOIs) for data packages and data files deposited into the Dryad Data Repository. This page documents the technical details of these DOI services.

General information about Dryad's DOI services can be found on the DOI Services page.

The structure of DOIs is described on the DOI Usage page.

Storage information (and Warning)

WARNING: DOIs are currently stored in two places: the doi.db file and the "dryad" solr index. We need to fully document how these two locations are used, and update the code to only use one location (likely the solr index).


Command-line features

Local DOI database

Dryad maintains a local database of DOIs. These are used for fast lookups within the search system.

The local DOI service can be managed using a command line call:

dryad/bin/dspace doi-util
-h              Help... prints this usage information
-s              Search for a known DOI and return it
-m [DOI] [URL]  Mints a new DOI and places it in the local database
-r [DOI] [URL]  Registers a DOI, minting if necessary
-p <FILE>       Prints the DOI database to an output stream
-c              Outputs the number of DOIs in the database

Database sychronization tool. This synchronizes the local DOI database with the objects in the main Dryad store.

./dspace dsrun org.dspace.identifier.DOIDbSync
-s: to synchronize + report
-r: to produce the report

EZID DOI database

EZID manages Dryad's DOIs and their registration with the DOI Federation.

To check the status of a DOI registered with EZID:

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService 10.5061/DRYAD.2222

To update metadata for a DOI (pushing Dryad metadata to EZID):

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService username password doi-to-update target-url update

Notes about the above command:

  • to register a new DOI, replace "update" with "register"

To update DataCite with metadata from all Dryad objects:

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService username password syncall

The metadata transformation for DataCite is stored in DIM2DATACITE.xsl.

Workflow

For Submissions

  1. DOIs are minted at the point of submission to Dryad. When a data package is submitted, a call to mint a DOI (without registering it) is made to the DOI Service.
  2. The data package should contain data files -- for a DOI to be registered for the data file, there must be a link in the metadata from the data file to the data package.
  3. The data package then goes on to be curated by the Dryad Librarian.
  4. If the data package is approved, the DOI is registered with DataCite through the EZID DOI registration service.
  5. If the package isn't approved, the DOI remains unregistered
  6. Lastly, the registered DOI is emailed to the submitter so that it can be included in the article and used to reference the published data package.

For Citation Downloads

  1. DOIs are passed to the CitationServlet when a user requests a citation download or uses one of the sharing services that Dryad supports (Delicious, Digg, etc.)
  2. DOI Services resolve the DOI and extract the metadata from the record, making it available to be downloaded in RIS or !BibTex format.
  3. The CitationServlet uses Dryad's DOI Services currently, but might in the future use DSpace's _Identifier Services_ if it becomes an official module and Dryad's DOI resolution is natively built into it.

For Identifier Services

Dryad's DOI Services are also used by the _Identifier Services_ DSpace module. Dryad's DOI Services serve as the local DOI resolver for these DSpace services. In the future, we may better integrate our DOI Services into this module.

When an identifier is used (created, modified, or resolved), the IdentifierServiceImpl looks through all of the available IdentifierProviders to see which one is capable of handling the associated identifier. Handling is then passed to the appropriate provider.

NOTE: The DOIIdentifierProvider is stored in Dryad's api module, while other IdentifierProviders are stored in the identifier-services module.

Configuration

Configuration of the DOI Services module, requires additional parameters be set in the dspace.cfg configuration file. The Dryad project places these parameters in a Maven profile; they are then pulled into the dspace.cfg file when Dryad is built.

In the dspace.cfg file, the following parameters are used to configure the DOI services:

# URL that resolves DOIs
doi.hostname = [http://dx.doi.org http://dx.doi.org]
# Base URL of Dryad used in registering DOIs
dryad.url = [http://datadryad.org http://datadryad.org]
# DOI prefix associated with Dryad
doi.prefix = ${default.doi.prefix}
# Directory where DOI minter files should be stored
doi.dir = ${dspace.dir}/doi-minter
# File system location of the DOI database
doi.db.fspath = ${doi.dir}/doi.db
# Username and password of the CDL !DataCite Web service
doi.username = ${default.doi.username}
doi.password = ${default.doi.password}
# How long (# of chars) the DOI suffixes should be
doi.suffix.length = 5
# Local, static part of the suffix of the generated ID
doi.localpart.suffix = dryad.
# Whether the registration service should be used
doi.datacite.connected = ${default.doi.datacite.connected}
# URL for the DOI Services Web endpoint
doi.service.url=${default.doi.service}
# Indicates test mode for the Identifier Services connection to DOI Services
doi.service.testmode=false

These settings, put into your Maven profile (in most cases, the settings.xml file), are pulled into the dspace.cfg file when Dryad is built:

<!-- The real username and password of DOI registration service -->
<default.doi.username>USERNAME</default.doi.username>
<default.doi.password>PASSWORD</default.doi.password>
<!-- The DOI prefix for DOIs minted; for Dryad this is the value below -->
<default.doi.prefix>10.5061</default.doi.prefix>
<!-- Whether to rewrite URLs to use the local DOI resolver or the dx.doi.org one -->
<default.dryad.localize>true</default.dryad.localize>
<!-- Whether to register the DOIs minted or just pretend like you did -->
<default.doi.datacite.connected>false</default.doi.datacite.connected>
<!-- The actual endpoint of the DOI service -->
<default.doi.service>http://localhost:9999/doi</default.doi.service>
<!-- An index used for DataONE that works with the DOI registration process -->
<default.solr.dryad.server>http://localhost:9999/solr/dryad</default.solr.dryad.server>

Implementation

The core code is in dspace/modules/doi

Identifier services code is in dspace/modules/api/src/main/java/org/dspace/identifier/

DOIIdentifierProvider.mint() = the main method for modifying what a DOI means (to the local system)

Relation to DSpace

The Dryad DOI Services modules relates to the Identifier Services modulebeing developed for the DSpace community by @tmire.

Currently, the Dryad DOI services modules exist as a separate DSpace module, but in the future some of these services might be integrated into the Identifier Services module.

A related package in DSpace is the Handle server. Dryad's DOI Services replace our use of the DSpace Handle server, though the Handle server continues to serve links published before we moved to DOIs.

The DSpace PersistentIdentifiers proposal is also relevant.