The Dryad repository project team is developing a plan to hire a curator to work full-time towards the curation of datasets hosted by the Dryad repository, and supervise undergraduate assistants assigned to curatorial tasks. The Dryad repository team will be developing a job description and schedule for hiring a curator during the summer of 2009.
From the Digital Curation Centre:
- Digital curation can be defined as follows: 'The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future.' Data curation can also include managing vast data sets for daily use; updating it to keep it readable, etc. Therefore the term data curator is applicable to a large range of professional backgrounds, from minimal management of digital materials, to the addition of metadata, to managing institutional repositories.
Professional Data Curation Tasks
- Name authority/authority control for authors
- Quality control
- Clean up citation fields.
- View the contents of metadata fields across the repository, and enforce consistency.
- Maintain documentation of cataloging/curation policies.
- Spot check entries to make sure they have high-quality metadata.
- Spot check entries to make sure the files have the data they claim to have.
- Determine when files need to be migrated to new formats and supervise the migration process.
- Working knowledge of metadata standards, application profiles, Semantic Web concepts
- Working knowledge of (applicable) vocabularies, ontologies, and mapping strategies
- Supervisory/managerial experience
- Basic computing skills
- to verify file contents and perform file migrations
- Basic knowledge of relational databases
- to perform simple batch updates and describe more complex updates to the developers
- A minimum of a BA/BS in biological field
- to identify taxon names, gene names, etc.
- to recognize high-value data sets and give them more curatorial attention
- Excellent communication skills
- to communicate with authors
- to write documentation
- to create Dryad tutorials for conferences
- "Microarray Data Curator Position Available," PLEXdb Curator
- "Scientific Data Curator," Phenoscape project @ NESCent
- "Scientific Data Curator," Harvard School of Public Health Bioinformatics Core (HBC)
- "Biological Data Curator," South African National Bioinformatics Institute, eVOC system, a controlled vocabulary to describe gene expression states (http://evocontology.org)
- "Scientific Database Curator (2 Positions)," for the curation of the UniProt Knowledgebase (UniProtKB).
- "Scientific Curator," The Jackson Laboratory
Summer 2009 Curation Project
Sarah Carrier produced two documents during summer 2009 that detail the curatorial management of data and metadata in Dryad, and offer some ideas for overall policy and requirements. The first document is for the redesign of the Dryad interface to better accommodate curation tasks. The second is a manual that details the current (as of summer 2009) curation workflow. This manual will be used by a curator hired fall 2009.
These two documents represent the latest information regarding curation. Other pages on the wiki that include curation information used during the summer 2009 curation project:
- Curator Tools
- Summer 2009 Curation Workflow Specification
- Curation System Requirements and Mockups
- Cataloging Guidelines 2009
- Study initiatied to capture "status" of metadata records, when to know they are ready to be published. Sarah Carrier pursued this work in the metadata class, and ended up establishing the "Dryad" namespace, and then registered the "status" element for the Dryad application profile, following the Singapore Framework recommendation.
- A selected review of standards follows here:
- A-Core: Metadata about Content Metadata. Retrieved December 9, 2008, from http://metadata.net/admin/draft-iannella-admin-01.txt. - who, what, where, when.
- AC - Administrative Components - Dublin Core DCMI Administrative Metadata: Final Specification: http://www.bs.dk/standards/AdministrativeComponents.htm. (Validity date - start and/or end date of the validity of the metadata content.)
- IEEE LOM (Learning Object Model)- http://ltsc.ieee.org/wg12/files/LOM_1484_12_1_v1_Final_Draft.pdf. - "completion status or condition of this
learning object" -- not the metadata record; also has version.