DOI Services Technology

From Dryad wiki
Jump to: navigation, search

Overview

Dryad mints, manages, and registers Digital Object Identifiers (DOIs) for data packages and data files deposited into the Dryad Data Repository. This page documents the technical details of these DOI services.

General information about Dryad's DOI services can be found on the DOI Services page.

The structure of DOIs is described on the DOI Usage page.

Storage information (and Warning)

WARNING: DOIs are currently stored in three places:

  1. Postgres doi table. SQL Schema. This is the authoritative location where DOIs are minted.
  2. Postgres metadatavalue table. Item DOIs are recorded in the dc.identifier metadata field. Not authoritative but used for relationships.
  3. "dryad" solr index -- DEPRECATED. The index is no longer being updated, so it does not include all Dryad records. Do not write any new code that uses this index. When you encounter code that uses this index, change it.

Prior to 2014-04-11, the authoritative location of DOIs was the doi.db file in /opt/dryad/doi-minter. This file is no longer used. It was written by the perst library. Access to this file was wrapped by DOIDatabase.java. There were problems with concurrent access to this file under heavy load, so we migrated to a Postgres table GitHub pull request to migrate doi.db to postgres.

Command-line features

Local DOI database

Dryad maintains a local database of DOIs. These are used for fast lookups within the search system.

The local DOI service can be managed using a command line call:

dryad/bin/dspace doi-util
Usage:
-h              Help... prints this usage information
-s              Search for a known DOI and return it
-p <FILE>       Prints the DOI database to an output stream
-c              Outputs the number of DOIs in the database

Database sychronization tool. This synchronizes the local DOI database with the objects in the main Dryad store.

./dspace dsrun org.dspace.identifier.DOIDbSync
-s: to synchronize + report
-r: to produce the report

DOI Migration (historical)

The DOIMigrator class was used to move DOIs from the perst doi.db file to the postgres DOI table. This class is now deprecated since the migration has been completed and doi.db is no longer used.

Run it like other DSpace command-line tools

/opt/dryad/bin/dspace dsrun org.dspace.doi.DOIMigrator

Viewing DataCite Metadata

Metadata can be viewed through DataCite using a URL like

https://search.datacite.org/works?query=%22doi%3A10.5061%2Fdryad.20%22

or metadata can be generated from individual objects using a command like

xsltproc dspace/config/crosswalks/DIM2DATACITE.xsl http://localhost:9999/resource/doi:<your_DOI>/mets.xml >test_datacite.xml

DataCite DOI database

DataCite manages Dryad's DOIs and their registration with the DOI Federation.

Java code

Note: the Java class is named CDLDataCiteService, even though the actual service is no longer housed at CDL.

To check the status of a DOI's registration:

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService 10.5061/DRYAD.2222

To update metadata for a DOI (pushing Dryad metadata to DataCite):

/opt/dryad/bin/dsrun org.dspace.doi.CDLDataCiteService username password doi:10.5061/dryad.xxxx https://datadryad.org/resource/doi:10.5061/dryad.xxxx update

Notes about the above command:

  • to register a new DOI, replace "update" with "register"
  • the DOI service defaults to the central DataCite system, but can be overriden with the environment variable DOI_SERVER

To update DataCite with metadata from all Dryad objects, there is a convenience script in dryad-utils:

dryad-utils/sync_provenance.py username password start_item
# start_item is the last item_id synced. If not specified, start from 0

The metadata transformation crosswalk for DataCite is stored in DIM2DATACITE.xsl. Items that are in publication blackout are transformed with DIM2DATACITE-BLACKOUT.xsl.

The determination of which crosswalk to use is made by checking the metadata for the item. When an item enters blackout, a provenance record is added that includes the phrase "Entered publication blackout". If this is the last provenance record at the time of registration, the blackout crosswalk is used.

When the publication blackout ends, the item is approved and the approval is added as provenance. When the registration is updated, "blackout" is no longer the last provenance record, so the item is registered with the standard metadata.

Python code

The dryad-utils library contains doi_tool.py, which is a wrapper around the python library from EZID. This toolkit can be used as in:

doi_tool.py --username username --password password --action update --doi doi:10.5061/dryad.XXX

The username and password may be passed on the command line, or overridden with environment variables. Likewise, the default Dryad server and DOI server may be overridden with environment variables. The complete list of environment variables supported by the script is:

  • DOI_SERVER - server used for DOI services. Defaults to https://ez.datacite.org
  • DOI_USER - username for the DOI_SERVER
  • DOI_PASS - password for the DOI_SERVER
  • DRYAD_URL - Dryad server that is used to retrieve metadata when registering/updating a DOI. Defaults to https://datadryad.org

See also: Manually updating DOI metadata

Workflow

For Submissions

  1. DOIs are minted at the point of submission to Dryad. When a data package is submitted, a call to mint a DOI (without registering it) is made to the DOI Service.
  2. The data package should contain data files -- for a DOI to be registered for the data file, there must be a link in the metadata from the data file to the data package.
  3. The data package then goes on to be curated by the Dryad Librarian.
  4. If the data package is approved, the DOI is registered with DataCite.
  5. If the package isn't approved, the DOI remains unregistered
  6. Lastly, the registered DOI is emailed to the submitter so that it can be included in the article and used to reference the published data package.

For Citation Downloads

  1. DOIs are passed to the CitationServlet when a user requests a citation download or uses one of the sharing services that Dryad supports (Delicious, Digg, etc.)
  2. DOI Services resolve the DOI and extract the metadata from the record, making it available to be downloaded in RIS or !BibTex format.
  3. The CitationServlet uses Dryad's DOI Services currently, but might in the future use DSpace's _Identifier Services_ if it becomes an official module and Dryad's DOI resolution is natively built into it.

For Identifier Services

Dryad's DOI Services are also used by the _Identifier Services_ DSpace module. Dryad's DOI Services serve as the local DOI resolver for these DSpace services. In the future, we may better integrate our DOI Services into this module.

When an identifier is used (created, modified, or resolved), the IdentifierServiceImpl looks through all of the available IdentifierProviders to see which one is capable of handling the associated identifier. Handling is then passed to the appropriate provider.

NOTE: The DOIIdentifierProvider is stored in Dryad's api module, while other IdentifierProviders are stored in the identifier-services module.

Configuration

Configuration of the DOI Services module, requires additional parameters be set in the dspace.cfg configuration file. The Dryad project places these parameters in a Maven profile; they are then pulled into the dspace.cfg file when Dryad is built.

In the dspace.cfg file, the following parameters are used to configure the DOI services:

# URL that resolves DOIs
doi.hostname = [http://dx.doi.org http://dx.doi.org]
# Base URL of Dryad used in registering DOIs
dryad.url = [http://datadryad.org http://datadryad.org]
# DOI prefix associated with Dryad
doi.prefix = ${default.doi.prefix}
# Directory where DOI minter files should be stored
doi.dir = ${dspace.dir}/doi-minter
# Username and password of the CDL !DataCite Web service
doi.username = ${default.doi.username}
doi.password = ${default.doi.password}
# How long (# of chars) the DOI suffixes should be
doi.suffix.length = 5
# Local, static part of the suffix of the generated ID
doi.localpart.suffix = dryad.
# Whether the registration service should be used
doi.datacite.connected = ${default.doi.datacite.connected}
# URL for the DOI Services Web endpoint
doi.service.url=${default.doi.service}
# Indicates test mode for the Identifier Services connection to DOI Services
doi.service.testmode=false
# The prefix to use instead of doi.prefix when doi.service.testmode is true
doi.testprefix = ${default.doi.testprefix}

These settings, put into your Maven profile (in most cases, the settings.xml file), are pulled into the dspace.cfg file when Dryad is built:

<!-- The real username and password of DOI registration service -->
<default.doi.username>USERNAME</default.doi.username>
<default.doi.password>PASSWORD</default.doi.password>
<!-- The DOI prefix for DOIs minted; for Dryad this is the value below -->
<default.doi.prefix>10.5061</default.doi.prefix>
<!-- Whether to rewrite URLs to use the local DOI resolver or the dx.doi.org one -->
<default.dryad.localize>true</default.dryad.localize>
<!-- Whether to register the DOIs minted or just pretend like you did -->
<default.doi.datacite.connected>false</default.doi.datacite.connected>
<!-- The actual endpoint of the DOI service -->
<default.doi.service>http://localhost:9999/doi</default.doi.service>
<!-- Test mode configuration.  Used instead of default.doi.prefix if test mode is true in dspace.cfg -->
<default.doi.testprefix>10.5072/FK2/10.5061</default.doi.testprefix>
<!-- An index used for DataONE that works with the DOI registration process -->
<default.solr.dryad.server>http://localhost:9999/solr/dryad</default.solr.dryad.server>

Implementation

The core code is in dspace/modules/doi

CDLDataCiteService

  • communicates with the EZID API (on the DataCite system) to register, update, and lookup DOIs.
  • Provides a method to get the DataCite metadata (extractDataciteMetadata) for a registered DOI, so registration status can be relayed to a curator.

DryadDOIRegistrationHelper

  • provides a utility method to check if an item is currently in publication blackout for purposes of DataCite metadata

Identifier services code is in dspace/modules/api/src/main/java/org/dspace/identifier/

DOIIdentifierProvider.mint() = the main method for modifying what a DOI means (to the local system)

Configuration

Communication is enabled if the doi.datacite.connected property is true in the dspace.cfg file Test DOIs are minted and registered if doi.service.testmode is true in the dspace.cfg file

  • The test mode prefix should begin with 10.5072/FK2 (e.g. 10.5072/FK2dryad for Dryad)

Relation to DSpace

The Dryad DOI Services modules relates to the Identifier Services module developed for the DSpace community by Atmire.

Some notes:

  • IdentifierService provides an abstraction over many different IdentifierProviders. Multiple Providers may be used within a single DSpace instance.

Currently, the Dryad DOI services modules exist as a separate DSpace module, but in the future some of these services might be integrated into the Identifier Services module.

A related package in DSpace is the Handle server. Dryad's DOI Services replace our use of the DSpace Handle server, though the Handle server continues to serve links published before we moved to DOIs.

Berlin Implementation

The Technical University of Berlin has implemented a DOI service that works with the API of the DataCite MDS system. It may be included in an upcoming release of DSpace:

DSpace 4 DOI Module

DSpace 4 includes support for DOIs as identifiers, as well as DOI registration with EZID/DataCite.

In April 2014, dleehr reviewed the DSpace 4 implementation and compared to Dryad's implementation:

  • DOIs are stored in item metadata as dc.identifier.uri in DSpace 4, and dc.identifier in Dryad.
  • There is a DOI class and database table "DOI". These names collide with Dryad classes/tables, and are not interface-compatible out of the box
  • In the DSpace 4 implementation, DOI synchronization/registration is coupled with the identifier storage. The DOI table includes a registration status column.
  • DSpace 4 has a DOIConnector interface for reserving/registering DOIs, implemented by DataCiteConnector.
  • DSpace 4 also implements DOI registration through EZID with EZIDIdentiferProvider.
  • DSpace 4 synchronizes DOIs with external provider using DOIOrganiser. Dryad has DOIDbSync

See Also