ODIN CodeSprint

From Dryad wiki
Jump to: navigation, search

Odin's Hamr

The full name: The ORCID/DataCite Integration Network's tool for Human/Authority Metadata Reconciliation

Participants: Ryan Scherle, John Kaye

Original Proposals

British Library Proposal

PROJECT 2: ORCID authorship monitoring for data centres

Proposed by: Tom Demeranville

Introduction: In order to complete the “virtuous circle” of updates to DOI metadata, data centres need easy access to the information stored against their DOIs in ORCID. A service will be developed that allows a set of DOIs to be registered and monitored for changes of authorship within ORCID. The authorship information will be intelligently “diffed” against the DOI metadata and differences provided to the registrant. This could either be as a batch job, a query API or a human readable website.

Estimated effort: 2 developers, 1 user

Prerequisites: Any of the following: Scripting or programming skills, experience with ORCID’s API, familiarity with DOI metadata, data-centre domain knowledge.

Dryad Proposal

(originally from ORCID Integration)

Requirements:

  • There should be an automated way to harvest previously claimed items (both articles and data packages) from ORCID and add the associated ORCIDs into Dryad metadata.
  • Note that anyone can claim any work in ORCID -- so self-claims cannot be used for managing co-author privileges. Self-claimed items should have a lower confidence score than ORCIDS attached by a submitter.

Inspired by the initial Hamr prototypes developed at Code4Lib2011.

Current Design

ODINS_HAMR_CONTEXT_zps49d266f1.png

Current Design

Create a tool to monitor author changes within ORCID for DOIs for a given data centre or DataCite DOI allocator

  • Query ORCID API for datacite DOI's associated with a given data centre prefix - 10.5061 for DRYAD
  • Return all authors and ORCIDs attached to DOI
  • Match results to Data Centre metadata
  • Allow import of names and ORCIDs from ORCID into data centre metadata

Allow export of results to be compatible with different systems and metadata standards.

Implementation

The initial implementation is as simple as possible -- a DSpace curation task that generates a CSV report of author names in Dryad matched with author names in ORCID.

Code is currently available in the OdinsHamr.java file in the Dryad codebase.

ODINsHAMRDraftImplementation_zpsec06d3dd.png

To run Odin's Harm for a single Dryad item:

/opt/dryad/bin/dspace curate -v -t odinshamr -i 10255/dryad.33537 -r -

To run Odin's Hamr for all Dryad data packages:

/opt/dryad/bin/dspace curate -v -t odinshamr -i 10255/3 -r -

Use cases

Broad use case for data repositories to obtain authors and ORCID's from the ORCID registry by searching by DataCite, CrossRef or other identifier.

Specific use cases from ODIN:

HSS UK Data Service:

  • Data with DataCite DOI's
  • Custom built repository (not DSpace), public metadata available via OAI feed with DC, Marc and DDI formats
  • OAI Identifier example:oai:esds.ac.uk:ESDS/ESDSL/sn1858.xml&nbsp
  • Author fields: DC: <dc:creator> Marc: 720 DDI: AuthEnty
  • NO DOI's IN THIS METADATA, however DOI corresponds with ID e.g. 10.5255/UKDA-SN-1858-1 = oai:esds.ac.uk:ESDS/ESDSL/sn1858.xm

ICPSR also has OAI service, but only in DC

Other HSS Poc study areas currently considering metadata publication using DDI via http://colectica.com/

Tom? API for http://odin-discover.eu/

See also

Results

Odin's Hamr showed that we can use relatively simple techniques to get high-quality matches for our metadata. We can leverage the article-to-data relationships and add in relatively simple string matching to get highly accurate mappings.

It's reasonably robust to differences in presence/absence of initials and middle names:
doi:10.5061/dryad.162, doi:10.1093/molbev/msn130, , "Vision, Todd J.", 0000-0002-6133-2581, "Vision, Todd", 0.8
doi:10.5061/dryad.1611, doi:10.1600/036364410792495872, , "Cranston, Karen A.", 0000-0002-4798-9499, "Cranston, Karen Ann", 0.89473684
doi:10.5061/dryad.84r5q, doi:10.1126/science.1231707, , "Weber, Andreas P. M.", 0000-0003-0970-4672, "Weber, Andreas", 0.7

It is also robust to differences in diacritics:
doi:10.5061/dryad.50dc6, doi:10.1111/1755-0998.12085, , "Pérez-Porro, Alicia R.", 0000-0002-8873-1734, "R. Perez-Porro, Alicia", 0.6818181

At first glance, this looks like it may be an error, but it's actually correct:
doi:10.5061/dryad.mm54f, doi:10.1038/nature11241, , "Heslop-Harrison, Pat", 0000-0002-3105-2167, "Heslop-Harrison, JS", 0.85


There are still not a huge number of Dryad items claimed in ORCID, but this tool can be used to continually update our links to ORCID as more researchers claim their works.
total authors represented in Dryad: 15491
authors who have claimed their Dryad data in ORCID: 9
authors who have been matched with Odin's Hamr, leveraging claimed articles: 496 instances, 315 unique

relates to 441 Dryad Records (DOI's) out of how many?

ORCID Union

An experimental ORCID/DataCite Integration Network tool for comparing ORCID metadata across repositories.

Goal: Starting with a DOI for dataset X in one metadata source (e.g. DataCite), retrieve from Okkam'sEntity Name System (ENS) ORCIDs (or other researcher IDs) available for each of the creators & contributors of dataset X from not only repository A but all other sources indexed within ENS. Report the score for each match to a secondary source.

Justification: A data creator may have an ORCID or other researcher ID in the metadata of one system but not in another. A lookup will return all the places where an ORCID has been asserted for that person.

Participants: Stefano Bortoli, Jan Dvorak, Alejandra Gonzalez-Beltran, Todd Vision

See [1]

Additional Information