Curator Tools

Dryad curators will have tools for validating submissions. This page describes some of the desired tools, though more detailed description is needed.

= Requirements =

Level 1 Curation Support
TO DO
 * Curator can generate reports within DSpace. See details on Curator Reports page.
 * Curator can perform batch editing of metadata fields across the repository from within DSpace.
 * Curator can perform edits at the data package level and choose whether they are applied to data file records in that package.
 * Curator can view list of suggested matches between newly published articles and content in Dryad, and can click to accept or reject the addition of article metadata to Dryad for that pairing. See details on Update of Publication Metadata.

DONE (IN SOME FORM)
 * Spell checking (handled in browser, and low priority)
 * Virus scanning of depositor-uploaded files
 * Curator can detect/verify file formats. (Using some combination of JHOVE, PRONOM, GDFR)

Level 2 Curation Support

 * Integration of HIVE to curator interface to suggest additional keywords based on author-supplied keywords, article title, and abstract. Curator would click to add, none would be added by default.
 * Tools for file format conversion. Some conversions could be done in batches (e.g., all MS Excel files to tab delimited or comma separated), but will also need to function on single files. Curator should be able list all files in the repository by format.

Level 3 Curation Support

 * Integration of HIVE to curator interface to suggest additional keywords based on article full text. Curator would click to add, none would be added by default.
 * Support for real name authority work, possibly linking with LCNAF.
 * There is a metadata extractor that looks for key phrases in the article PDF (e.g., "locations of the specimens") which may indicate datasets underlying the article.
 * The extractor can be configured to search for arbitrary phrases.
 * The extractor can be run by a curator pressing a button on the publication page. The results page contains a list of matching phrases in one column, with a list of dataset titles in the second column.
 * Basic implementation:
 * convert PDF to text
 * search within the text for phrases matching the list of target phrases

We need to look over the use cases from the Fedora Data Curation Working Group.