Old:Curation System Requirements

From Dryad wiki
Jump to: navigation, search
STATUS: This page is no longer being maintained and of historical interest only.

This outdated page describes desired features to assist Dryad curation processes. It is primarily of historical value.

July 2009

  1. Ways for the depositor to contact the curator.
  2. A more streamlined, less time-consuming way to ADD a data file to a publication that has already been published. Currently, a "fake" publication must be made, the data file uploaded, and then the fake publication deleted, and the metadata (dc.relation) manually changed to create the relationship between the new data file and the real publication.
  3. Please see the mockups page for some of the visualizations of the following features:
    1. An "in tray" with tasks and notifications - these will be lists of submissions that are new and requiring curation/approval. Notifications will be like errors, notices of deviant processes, etc. - for example, someone trying to submit an incredibly large dataset that is over the storage limit.
    2. A batch edit view - this of course will be handled in part or completely by the new version of DSpace
      • Finding duplicate data objects - right now in the system it is too difficult to find these. It is possible, particularly in the current system, that a file can be uploaded more than once, and once will have to be deleted. A helpful batch process would be to withdraw a number at once, to also have the option of withdrawing all the data objects associated with one publication, etc.
      • Add one READ ME file for a number of data objects - this SHOULD also be an option for the author/depositor.
      • Right now, you have a list of data objects with links, and you have to look at each individual data object in order to edit it. THEREFORE,batch editing based on metadata field.
        • Data set titles -> in a batch, add author names or part of the article title to a set of data objects, or whatever is decided to be appropriate for data object titles. If the dataset titles can be generated completely automatically, so the better.
      • We need to be able to mimic the inheritance that takes place during deposition - if we make changes to the publication metadata, we should have the option to apply it to all associated datasets. The curator can be asked, "Do you want to apply these changes to associated datasets?", yes or no. This is essential - there will be some datasets that have 30 some files, and if we make a change to one, we should be able to cascade those changes.
    3. Curators should be able to view lists of items that need additional attention:
      • articles that have no associated datasets
      • articles that do not have full bibliographic metadata (volume, number, DOI)
    4. Feature: integration with curator tools - ability to run JHOVE, for example, from the interface.
    5. List of high profile, high use datasets, updated continually -> these will require higher curation focus. These can be listed in the "in tray," and in a "reports" section.
    6. ...we need a section where reports based on stock queries can be displayed, or can be run and displayed at any time by the curator.
    7. Feature: I would like to see a feature like in ContentDM where as new metadata is entered, it becomes part of a controlled vocabulary - the curator can have the option to add or delete items from this CV, but this would be a very good interim solution until HIVE comes into play, and would help with building a name authority for authors. For example, a depositor adds "Ryan Scherle" as an author, and the curator sees that this name is in the cue to be added to the author CV. The curator approves this. The question then is how it is used/displayed to the user - when they are typing, for example, the name can appear as a suggestion, as with other CV terms.
  4. VERSIONING - here are some of my ideas/recommendations, which would require system support:
    • METADATA VERSIONING: The recommendation is to always keep the original version that is submitted by the depositor - keep all the metadata, etc. - this can always be reverted back to and/or used as a reference. Further changes made to the metadata by the curator will not be tracked. Only the most "up to date" version will be displayed to the users, with the original version available via the curator interface. CONFIRM: is the original version the only one that we want to keep, or would be curator want to look at all the subsequent versions? Can this just be taken care of by statistics - i.e., is it merely better just to KNOW how many times a record/field, etc. has been changed?
    • DATASET VERSIONING: When there are changes/corrections made to the actual contents of the dataset, and a version has already been published in Dryad, the NEW version should be considered a new, unique entity, therefore assigned a new unique identifier. The following Dublin Core elements should be used to relate the various versions of the dataset: dc.relation.replaces, dc.relation.isreplacedby, dc.relation.isversionof. The actual linking of the datasets via these elements will most likely be done manually, or at least heavily supervised by, the curator (?).