Old: DOI Services Technology

From Dryad wiki
Revision as of 13:32, 21 June 2013 by DanLeehr (talk | contribs) (Configuration)

Jump to: navigation, search

Overview

Dryad mints, manages, and registers Digital Object Identifiers (DOIs) for data packages and data files deposited into the Dryad Data Repository. This page documents the technical details of these DOI services.

General information about Dryad's DOI services can be found on the DOI Services page.

The structure of DOIs is described on the DOI Usage page.

Storage information (and Warning)

WARNING: DOIs are currently stored in three places:

  1. postgres database -- The preferred location. This is where all other metadata is managed, and there is no reason to treat DOIs differently.
  2. doi.db file -- Semi-deprecated. This database is still used for some actions. It is probably best to change code so it uses postgres instead. If you encounter a good reason for continuing to use doi.db, please add it here. Otherwise, it will eventually enter full "deprecated" status.
  3. "dryad" solr index -- DEPRECATED. The index is no longer being updated, so it does not include all Dryad records. Do not write any new code that uses this index. When you encounter code that uses this index, change it.

Command-line features

Local DOI database

Dryad maintains a local database of DOIs. These are used for fast lookups within the search system.

The local DOI service can be managed using a command line call:

dryad/bin/dspace doi-util
-h              Help... prints this usage information
-s              Search for a known DOI and return it
-m [DOI] [URL]  Mints a new DOI and places it in the local database
-r [DOI] [URL]  Registers a DOI, minting if necessary
-p <FILE>       Prints the DOI database to an output stream
-c              Outputs the number of DOIs in the database

Database sychronization tool. This synchronizes the local DOI database with the objects in the main Dryad store.

./dspace dsrun org.dspace.identifier.DOIDbSync
-s: to synchronize + report
-r: to produce the report

EZID DOI database

EZID manages Dryad's DOIs and their registration with the DOI Federation.

To check the status of a DOI registered with EZID:

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService 10.5061/DRYAD.2222

To update metadata for a DOI (pushing Dryad metadata to EZID):

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService username password doi-to-update target-url update

Notes about the above command:

  • to register a new DOI, replace "update" with "register"

To update DataCite with metadata from all Dryad objects:

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService username password syncall

The metadata transformation for DataCite is stored in DIM2DATACITE.xsl.

Workflow

For Submissions

  1. DOIs are minted at the point of submission to Dryad. When a data package is submitted, a call to mint a DOI (without registering it) is made to the DOI Service.
  2. The data package should contain data files -- for a DOI to be registered for the data file, there must be a link in the metadata from the data file to the data package.
  3. The data package then goes on to be curated by the Dryad Librarian.
  4. If the data package is approved, the DOI is registered with DataCite through the EZID DOI registration service.
  5. If the package isn't approved, the DOI remains unregistered
  6. Lastly, the registered DOI is emailed to the submitter so that it can be included in the article and used to reference the published data package.

For Citation Downloads

  1. DOIs are passed to the CitationServlet when a user requests a citation download or uses one of the sharing services that Dryad supports (Delicious, Digg, etc.)
  2. DOI Services resolve the DOI and extract the metadata from the record, making it available to be downloaded in RIS or !BibTex format.
  3. The CitationServlet uses Dryad's DOI Services currently, but might in the future use DSpace's _Identifier Services_ if it becomes an official module and Dryad's DOI resolution is natively built into it.

For Identifier Services

Dryad's DOI Services are also used by the _Identifier Services_ DSpace module. Dryad's DOI Services serve as the local DOI resolver for these DSpace services. In the future, we may better integrate our DOI Services into this module.

When an identifier is used (created, modified, or resolved), the IdentifierServiceImpl looks through all of the available IdentifierProviders to see which one is capable of handling the associated identifier. Handling is then passed to the appropriate provider.

NOTE: The DOIIdentifierProvider is stored in Dryad's api module, while other IdentifierProviders are stored in the identifier-services module.

Configuration

Configuration of the DOI Services module, requires additional parameters be set in the dspace.cfg configuration file. The Dryad project places these parameters in a Maven profile; they are then pulled into the dspace.cfg file when Dryad is built.

In the dspace.cfg file, the following parameters are used to configure the DOI services:

# URL that resolves DOIs
doi.hostname = [http://dx.doi.org http://dx.doi.org]
# Base URL of Dryad used in registering DOIs
dryad.url = [http://datadryad.org http://datadryad.org]
# DOI prefix associated with Dryad
doi.prefix = ${default.doi.prefix}
# Directory where DOI minter files should be stored
doi.dir = ${dspace.dir}/doi-minter
# File system location of the DOI database
doi.db.fspath = ${doi.dir}/doi.db
# Username and password of the CDL !DataCite Web service
doi.username = ${default.doi.username}
doi.password = ${default.doi.password}
# How long (# of chars) the DOI suffixes should be
doi.suffix.length = 5
# Local, static part of the suffix of the generated ID
doi.localpart.suffix = dryad.
# Whether the registration service should be used
doi.datacite.connected = ${default.doi.datacite.connected}
# URL for the DOI Services Web endpoint
doi.service.url=${default.doi.service}
# Indicates test mode for the Identifier Services connection to DOI Services
doi.service.testmode=false
# The prefix to use instead of doi.prefix when doi.service.testmode is true
doi.testprefix = ${default.doi.testprefix}

These settings, put into your Maven profile (in most cases, the settings.xml file), are pulled into the dspace.cfg file when Dryad is built:

<!-- The real username and password of DOI registration service -->
<default.doi.username>USERNAME</default.doi.username>
<default.doi.password>PASSWORD</default.doi.password>
<!-- The DOI prefix for DOIs minted; for Dryad this is the value below -->
<default.doi.prefix>10.5061</default.doi.prefix>
<!-- Whether to rewrite URLs to use the local DOI resolver or the dx.doi.org one -->
<default.dryad.localize>true</default.dryad.localize>
<!-- Whether to register the DOIs minted or just pretend like you did -->
<default.doi.datacite.connected>false</default.doi.datacite.connected>
<!-- The actual endpoint of the DOI service -->
<default.doi.service>http://localhost:9999/doi</default.doi.service>
<!-- Test mode configuration.  Used instead of default.doi.prefix if test mode is true in dspace.cfg -->
<default.doi.testprefix>10.5072/FK2/10.5061</default.doi.testprefix>
<!-- An index used for DataONE that works with the DOI registration process -->
<default.solr.dryad.server>http://localhost:9999/solr/dryad</default.solr.dryad.server>

Implementation

The core code is in dspace/modules/doi

Identifier services code is in dspace/modules/api/src/main/java/org/dspace/identifier/

DOIIdentifierProvider.mint() = the main method for modifying what a DOI means (to the local system)

Relation to DSpace

The Dryad DOI Services modules relates to the Identifier Services modulebeing developed for the DSpace community by Atmire.

Some notes:

  • IdentifierService provides an abstraction over many different IdentifierProviders. Multiple Providers may be used within a single DSpace instance.

Currently, the Dryad DOI services modules exist as a separate DSpace module, but in the future some of these services might be integrated into the Identifier Services module.

A related package in DSpace is the Handle server. Dryad's DOI Services replace our use of the DSpace Handle server, though the Handle server continues to serve links published before we moved to DOIs.

Berlin Implementation

The Technical University of Berlin has implemented a DOI service that works with the API of the DataCite MDS system. It may be included in an upcoming release of DSpace: