Old:Cataloging Guidelines 2009

STATUS: This page is no longer being maintained and of historical interest only.

These guidelines were written to describe the process for cataloging data deposited in Dryad, and were derived from previous metadata documents. While some of the processes described here are still in use, this page is outdated and primarily of historical value.

Submission Process - June 2, 2009

 * Submitting data to Dryad consists of three simple steps:
 * Describe your publication
 * Upload and describe your datasets
 * Approve datasets for publication


 * 1) Select the journal in which the article appears using the dropdown menu.
 * 2) If the journal is partner journal, enter the manuscript number. This will automatically prepopulate the metadata fields for the publication corresponding to the manuscript number.
 * 3) If the article is not from a partner journal, and "Other" is chosen, the next step is to describe the article in as much detail as possible. Use the publication guidelines below.
 * 4) Upload datasets and describe them. They can be uploaded individually, or all together as a zip file. Use the data object guidelines below.
 * 5) Edit the publication metadata, or choose to finalize the submission.

Curation Process - June 2, 2009
''Right now we are not going to be ADDING metadata to what has already been contributed, we are simply checking for accuracy and completeness. This is the current process, and not the long term workflow.'' For BOTH datasets and publications, it is important to double check if the right metadata has gone into the appropriate fields. Also, it is important to check for correct spelling and punctuation in all metadata fields.

PUBLICATIONS


 * 1) Find ORIGINAL article - go to publisher's website. This is hopefully as simple as copying the DOI and pasting it into the address bar. If there is no DOI, then that is a problem, and it needs to be found.
 * 2) *Here you will find the majority of the information needed to check the accuracy and completeness of the PUBLICATION metadata. You are probably on the journal's website.
 * 3) Double check the author's names - if there is only a first initial, other sources will need to be referenced in order to find the full author's name. For example, ISI Web of Science. In the future, all partner journals will be sending the full author names, so this would only apply to non-partner submissions.
 * 4) Double check the TITLE, the SUBJECT KEYWORDS, CITATION information (year, volume, issue, pages, etc.) - check for accuracy.
 * 5) *THE CITATION STRING: we have not yet chosen a standard for this - in the meantime, we will be using the AMERICAN NATURALIST's citation format, for example: Belyea, L. R., and J. Lancaster. 1999. Assembly rules within a contingent ecology. Oikos 86(7):402–416. The name of the journal must be spelled out.
 * 6) ADD the publisher's name - this is not yet required metadata, but should be added.
 * 7) Is there a corresponding author? This information needs to be checked, and added if necessary - the name will be on the article or on the publisher's website.
 * 8) The DATE ISSUED should only be a year - is what is listed correct?
 * 9) *NOTE: if you make any changes to this metadata for the publication, it will not automatically be inherited by the associated datasets. Any changes, then, must be made to the data object's metadata as well.

DATASETS


 * 1) For each dataset, double check that the same metadata from the publication has been inherited correctly.
 * 2) Double check the FORMAT of the file - is this correct? Can it be downloaded/opened/etc.?
 * 3) Right now we don't have a standard for the TITLES, but in the meantime, the standard will be: the first part of the article title (like the first 3 or 4 words, whatever makes sense), (the first author's name with a plus if there are more, year) number of the dataset. If there is a unique name for the file, put it into the DESCRIPTION field. SO, they could look like this:
 * 4) *Traits, Habitats, and Clades (Mayfield+, 2009) 1
 * 5) *Traits, Habitats, and Clades (Mayfield+, 2009) 2

Authors (dc.contributor.author)
Should be the same as the associated publication, unless a different set of authors is explicitly stated. Currently, authority control is manual.

(In the original application profile, dc:creator is used instead of dc:contributor)

Title (dc.title)
Human-readable description of the dataset. Should not be more than 100 characters. If the author does not provide any additional information, we will use the filename as the title (currently under discussion), and assume that the contents of the file are obvious to anyone who reads the associated article.

Date of Issue (dc.date.issued)
The official date of publication, inherited by dataset.

Embargo (dc.date.embargoedUntil)
A date after which the dataset will be made public. This is only used for datasets under embargo. The length of embargo is still under discussion.

Type (dc.type)
Choose an appropriate type, default will be "Dataset" and done automatically.

Subject Keywords (dc.subject)
Keywords from the publication will be attached to datasets. Other keywords may be manually applied to datasets.

Description (dc.description)
Non-repeatable.

Human-readable description of the dataset. Can contain much more detail than the title.

Any description that seems too long to put in this element (e.g., more than one page of text) should be placed in a separate file, which will be a supplemental datastream of this object. It will be given a name of the form READMEx.yyy, where x is a sequence number (ommitted if only one documentation file is submitted) and yyy is the file extension of the original (documentation) file.

Described By Publication (dc.relation.ispartof)
Primary identifier (typically a DOI) of the publication in which this dataset is most fully described, or in which it first appeared. It is possible, though unlikely, for a dataset to have multiple primary articles. Datasets will not reference all articles they are used in. This would be redundant with information in the articles, and would cause unneeded updates to the dataset records.

Rights Statement (dc.rights.uri)
A short human-readable phrase describing the access rights, which may also be machine-readable. Applied automatically to all resources in Dryad. Can also include an original rights statement from the journal/publisher. Decision is that it will point to a Creative Commons license.

Geographic Areas (dc.coverage.spatial)
Textual description of the geographic area covered by the dataset, and automatically inherited by the datasets with the option to edit.

Eventually, this must use a vocabulary through HIVE, like gaz.obo or TGN.

Geologic Timespans (dc.coverage.temporal)
Textual description of the timespan covered by the article. Currently entered in manually for publication, but inherited automatically for the datasets, with the option of editing.

File Metadata (bitstream format indicator)
Code indicating the type of file. This is automatically detected by DSpace, but can be modified manually.

(Both fields use dc:format with optional qualifiers 'medium' or 'extent')

Taxonomic Names (dwc.ScientificName)
Taxonomic names to which the article refers. Can be either scientific or common, as long as they are correct. Will be automatically inherited by data objects.

DSpace metadata automatically assigned
dc.date.accessioned dc.date.available dc.description.provenance dc.identifier.uri -&gt; this is the Dryad handle that is automatically assigned

Publication Cataloging Guidelines
The overall goal is to provide enough DC metadata that we can automatically generate the dc.identifier.citation and/or an openURL query.

Authors (dc.contributor.author)
List the full names of authors. Do not just copy abbreviated names from a citation, try to find the actual names.

Currently, authority control is manual. Author/contributor names should be formatted as "Firstname MI Lastname" but depends on the text received from the publisher, and will probably have to be changed by the curator. This is essential to note, because the author names will be inherited by the data object records. Initially, we won't track email addresses in normal metadata, just letting DSpace track the submitter in the provenance. WHAT ABOUT LATER??

(The original application profile used dc:creator instead of dc:contributor)

Article Title (dc.title)
Title of the article.

(Please see 'dc.ispartofseries' for journal title.)

Date of Issue (dc.date.issued)
The official date of publication. Year is required. Include month and day if possible. '''??? June 2 - doesn't seem to be accomodated by interface. Only "journal issue" -&gt; "year" is offered.'''

Publisher (dc.publisher)
The original publisher of the article. Note: This should be a publishing company, which is normally different than the journal name. This is currently manually assigned.

Full Citation (dc.identifier.citation)
(listed as dcterms:bibliographic citation in app profile)

A plain-text citation. Currently, copied from the publisher's site if available. Some attempt should be made to normalize case (don't include all caps). In the future, this may be automatically generated, to provide consistent formatting.

Journal (dc.relation.isPartOfSeries)
Name of journal. '''It would appear that volume and issue number are also going into this field. This is unclear.'''

DOI (dc.identifier.uri)
Select URI and enter the DOI of the publication in URL form, if available. Otherwise, use the most "permanent" URL available that represents the publication.

Type (dc.type)
Choose "Article". Done automatically.

Subject Keywords (dc.subject)
Initially, only explicitly-stated keywords will be cataloged as such. In the future, we hope to perform more automatic keyword extraction.

Abstract (dc.description.abstract)
The abstract from the publication.

Rights Statement (dc.rights.uri)
A short human-readable phrase describing the access rights, which may also be machine-readable. Applied automatically to all resources in Dryad. Can also include an original rights statement from the journal/publisher. Decision is that it will point to a Creative Commons license.

Geographic Areas (dc.coverage.spatial)
Textual description of the geographic area covered by the publication.

Eventually, this must use a vocabulary through HIVE, like gaz.obo or TGN.

Geologic Timespans (dc.coverage.temporal)
Textual description of the timespan covered by the dataset. Currently entered in manually.

Primary Contact (dc.contributor.correspondingAuthor)
The corresponding author.

Manuscript Number (dc.identifier.manuscriptNumber)
If this is available, it will be automatic.

Taxonomic Names (dwc.ScientificName)
Taxonomic names to which the article refers. Can be either scientific or common, as long as they are correct.

DSpace metadata automatically assigned
dc.date.accessioned dc.date.available dc.description.provenance dc.indentifier.uri -&gt; assigned by Dryad as a handle dc.relation.haspart - automatically assigned by DSpace when data objects are uploaded for the publication, but please note that this can be edited manually to create relationships AFTER a publication/data file is published