Old: DOI Services Technology

From Dryad wiki
Revision as of 10:43, 14 April 2014 by DanLeehr (talk | contribs) (Storage information (and Warning))

Jump to: navigation, search


Dryad mints, manages, and registers Digital Object Identifiers (DOIs) for data packages and data files deposited into the Dryad Data Repository. This page documents the technical details of these DOI services.

General information about Dryad's DOI services can be found on the DOI Services page.

The structure of DOIs is described on the DOI Usage page.

Storage information (and Warning)

This is currently under development. DOIs are moving from the doi.db file to a postgres table

WARNING: DOIs are currently stored in three places:

  1. Postgres doi table. [SQL Schema]. This is the authoritative location where DOIs are minted.
  2. Postgres metadatavalue table. Item DOIs are recorded in the dc.identifier metadata field. Not authoritative but used for relationships.
  3. "dryad" solr index -- DEPRECATED. The index is no longer being updated, so it does not include all Dryad records. Do not write any new code that uses this index. When you encounter code that uses this index, change it.

Prior to 2014-04-11, the authoritative location of DOIs was the doi.db file. It is a file written by the [perst] library. Access to this file was wrapped by DOIDatabase.java. There were [problems with concurrent access to this file under heavy load], so we migrated to a Postgres table.

Command-line features

Local DOI database

Dryad maintains a local database of DOIs. These are used for fast lookups within the search system.

The local DOI service can be managed using a command line call:

dryad/bin/dspace doi-util
-h              Help... prints this usage information
-s              Search for a known DOI and return it
-p <FILE>       Prints the DOI database to an output stream
-c              Outputs the number of DOIs in the database

Database sychronization tool. This synchronizes the local DOI database with the objects in the main Dryad store.

./dspace dsrun org.dspace.identifier.DOIDbSync
-s: to synchronize + report
-r: to produce the report

DOI Migration (one-time)

To migrate from the perst doi db file to the postgres table, use the DOIMigrator class

/opt/dryad/bin/dspace dsrun org.dspace.doi.DOIMigrator

EZID DOI database

EZID manages Dryad's DOIs and their registration with the DOI Federation.

To check the status of a DOI registered with EZID:

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService 10.5061/DRYAD.2222

To update metadata for a DOI (pushing Dryad metadata to EZID):

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService username password doi-to-update target-url update

Notes about the above command:

  • to register a new DOI, replace "update" with "register"

To update DataCite with metadata from all Dryad objects:

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService username password syncall

The metadata transformation crosswalk for DataCite is stored in DIM2DATACITE.xsl. Items that are in publication blackout are transformed with DIM2DATACITE-BLACKOUT.xsl.

The determination of which crosswalk to use is made by checking the metadata for the item. When an item enters blackout, a provenance record is added that includes the phrase "Entered publication blackout". If this is the last provenance record at the time of registration, the blackout crosswalk is used.

When the publication blackout ends, the item is approved and the approval is added as provenance. When the registration is updated, "blackout" is no longer the last provenance record, so the item is registered with the standard metadata.


For Submissions

  1. DOIs are minted at the point of submission to Dryad. When a data package is submitted, a call to mint a DOI (without registering it) is made to the DOI Service.
  2. The data package should contain data files -- for a DOI to be registered for the data file, there must be a link in the metadata from the data file to the data package.
  3. The data package then goes on to be curated by the Dryad Librarian.
  4. If the data package is approved, the DOI is registered with DataCite through the EZID DOI registration service.
  5. If the package isn't approved, the DOI remains unregistered
  6. Lastly, the registered DOI is emailed to the submitter so that it can be included in the article and used to reference the published data package.

For Citation Downloads

  1. DOIs are passed to the CitationServlet when a user requests a citation download or uses one of the sharing services that Dryad supports (Delicious, Digg, etc.)
  2. DOI Services resolve the DOI and extract the metadata from the record, making it available to be downloaded in RIS or !BibTex format.
  3. The CitationServlet uses Dryad's DOI Services currently, but might in the future use DSpace's _Identifier Services_ if it becomes an official module and Dryad's DOI resolution is natively built into it.

For Identifier Services

Dryad's DOI Services are also used by the _Identifier Services_ DSpace module. Dryad's DOI Services serve as the local DOI resolver for these DSpace services. In the future, we may better integrate our DOI Services into this module.

When an identifier is used (created, modified, or resolved), the IdentifierServiceImpl looks through all of the available IdentifierProviders to see which one is capable of handling the associated identifier. Handling is then passed to the appropriate provider.

NOTE: The DOIIdentifierProvider is stored in Dryad's api module, while other IdentifierProviders are stored in the identifier-services module.


Configuration of the DOI Services module, requires additional parameters be set in the dspace.cfg configuration file. The Dryad project places these parameters in a Maven profile; they are then pulled into the dspace.cfg file when Dryad is built.

In the dspace.cfg file, the following parameters are used to configure the DOI services:

# URL that resolves DOIs
doi.hostname = [http://dx.doi.org http://dx.doi.org]
# Base URL of Dryad used in registering DOIs
dryad.url = [http://datadryad.org http://datadryad.org]
# DOI prefix associated with Dryad
doi.prefix = ${default.doi.prefix}
# Directory where DOI minter files should be stored
doi.dir = ${dspace.dir}/doi-minter
# File system location of the DOI database
doi.db.fspath = ${doi.dir}/doi.db
# Username and password of the CDL !DataCite Web service
doi.username = ${default.doi.username}
doi.password = ${default.doi.password}
# How long (# of chars) the DOI suffixes should be
doi.suffix.length = 5
# Local, static part of the suffix of the generated ID
doi.localpart.suffix = dryad.
# Whether the registration service should be used
doi.datacite.connected = ${default.doi.datacite.connected}
# URL for the DOI Services Web endpoint
# Indicates test mode for the Identifier Services connection to DOI Services
# The prefix to use instead of doi.prefix when doi.service.testmode is true
doi.testprefix = ${default.doi.testprefix}

These settings, put into your Maven profile (in most cases, the settings.xml file), are pulled into the dspace.cfg file when Dryad is built:

<!-- The real username and password of DOI registration service -->
<!-- The DOI prefix for DOIs minted; for Dryad this is the value below -->
<!-- Whether to rewrite URLs to use the local DOI resolver or the dx.doi.org one -->
<!-- Whether to register the DOIs minted or just pretend like you did -->
<!-- The actual endpoint of the DOI service -->
<!-- Test mode configuration.  Used instead of default.doi.prefix if test mode is true in dspace.cfg -->
<!-- An index used for DataONE that works with the DOI registration process -->


The core code is in dspace/modules/doi


  • communicates with the EZID API to register, update, and lookup DOIs.
  • Provides a method to get the DataCite metadata (extractDataciteMetadata) for a registered DOI, so registration status can be relayed to a curator.


  • provides a utility method to check if an item is currently in publication blackout for purposes of DataCite metadata

Identifier services code is in dspace/modules/api/src/main/java/org/dspace/identifier/

DOIIdentifierProvider.mint() = the main method for modifying what a DOI means (to the local system)

For more details on EZID, see:


Communication is enabled if the doi.datacite.connected property is true in the dspace.cfg file Test DOIs are minted and registered if doi.service.testmode is true in the dspace.cfg file

  • The test mode prefix should begin with 10.5072/FK2 (e.g. 10.5072/FK2/10.5061 for Dryad)
  • EZID allows API consumers to use this prefix to test the API. Entries are created with EZID but the DOIs are not pushed out to dx.doi.org and are deleted

Relation to DSpace

The Dryad DOI Services modules relates to the Identifier Services module developed for the DSpace community by Atmire.

Some notes:

  • IdentifierService provides an abstraction over many different IdentifierProviders. Multiple Providers may be used within a single DSpace instance.

Currently, the Dryad DOI services modules exist as a separate DSpace module, but in the future some of these services might be integrated into the Identifier Services module.

A related package in DSpace is the Handle server. Dryad's DOI Services replace our use of the DSpace Handle server, though the Handle server continues to serve links published before we moved to DOIs.

Berlin Implementation

The Technical University of Berlin has implemented a DOI service that works with the API of the DataCite MDS system. It may be included in an upcoming release of DSpace:

DSpace 4 DOI Module

DSpace 4 includes support for DOIs as identifiers, as well as DOI registration with EZID/DataCite.

In April 2014, dleehr reviewed the DSpace 4 implementation and compared to Dryad's implementation:

  • DOIs are stored in item metadata as dc.identifier.uri in DSpace 4, and dc.identifier in Dryad.
  • There is a DOI class and database table "DOI". These names collide with Dryad classes/tables, and are not interface-compatible out of the box
  • In the DSpace 4 implementation, DOI synchronization/registration is coupled with the identifier storage. The DOI table includes a registration status column.
  • DSpace 4 has a DOIConnector interface for reserving/registering DOIs, implemented by DataCiteConnector.
  • DSpace 4 also implements DOI registration through EZID with EZIDIdentiferProvider.
  • DSpace 4 synchronizes DOIs with external provider using DOIOrganiser. Dryad has DOIDbSync