Old:Cataloging Guidelines 2009

From Dryad wiki
Jump to: navigation, search
STATUS: This page is no longer being maintained and of historical interest only.

These guidelines were written to describe the process for cataloging data deposited in Dryad, and were derived from previous metadata documents. While some of the processes described here are still in use, this page is outdated and primarily of historical value.

Submission Process - June 2, 2009

  • Submitting data to Dryad consists of three simple steps:
    1. Describe your publication
    2. Upload and describe your datasets
    3. Approve datasets for publication
  1. Select the journal in which the article appears using the dropdown menu.
  2. If the journal is partner journal, enter the manuscript number. This will automatically prepopulate the metadata fields for the publication corresponding to the manuscript number.
  3. If the article is not from a partner journal, and "Other" is chosen, the next step is to describe the article in as much detail as possible. Use the publication guidelines below.
  4. Upload datasets and describe them. They can be uploaded individually, or all together as a zip file. Use the data object guidelines below.
  5. Edit the publication metadata, or choose to finalize the submission.

Curation Process - June 2, 2009

Right now we are not going to be ADDING metadata to what has already been contributed, we are simply checking for accuracy and completeness. This is the current process, and not the long term workflow. For BOTH datasets and publications, it is important to double check if the right metadata has gone into the appropriate fields. Also, it is important to check for correct spelling and punctuation in all metadata fields.

PUBLICATIONS

  1. Find ORIGINAL article - go to publisher's website. This is hopefully as simple as copying the DOI and pasting it into the address bar. If there is no DOI, then that is a problem, and it needs to be found.
    • Here you will find the majority of the information needed to check the accuracy and completeness of the PUBLICATION metadata. You are probably on the journal's website.
    1. Double check the author's names - if there is only a first initial, other sources will need to be referenced in order to find the full author's name. For example, ISI Web of Science. In the future, all partner journals will be sending the full author names, so this would only apply to non-partner submissions.
    2. Double check the TITLE, the SUBJECT KEYWORDS, CITATION information (year, volume, issue, pages, etc.) - check for accuracy.
      • THE CITATION STRING: we have not yet chosen a standard for this - in the meantime, we will be using the AMERICAN NATURALIST's citation format, for example: Belyea, L. R., and J. Lancaster. 1999. Assembly rules within a contingent ecology. Oikos 86(7):402–416. The name of the journal must be spelled out.
    3. ADD the publisher's name - this is not yet required metadata, but should be added.
    4. Is there a corresponding author? This information needs to be checked, and added if necessary - the name will be on the article or on the publisher's website.
    5. The DATE ISSUED should only be a year - is what is listed correct?
    • NOTE: if you make any changes to this metadata for the publication, it will not automatically be inherited by the associated datasets. Any changes, then, must be made to the data object's metadata as well.

DATASETS

  1. For each dataset, double check that the same metadata from the publication has been inherited correctly.
  2. Double check the FORMAT of the file - is this correct? Can it be downloaded/opened/etc.?
  3. Right now we don't have a standard for the TITLES, but in the meantime, the standard will be: the first part of the article title (like the first 3 or 4 words, whatever makes sense), (the first author's name with a plus if there are more, year) number of the dataset. If there is a unique name for the file, put it into the DESCRIPTION field. SO, they could look like this:
    • Traits, Habitats, and Clades (Mayfield+, 2009) 1
    • Traits, Habitats, and Clades (Mayfield+, 2009) 2

Data File Cataloging Guidelines

Authors (dc.contributor.author)

Should be the same as the associated publication, unless a different set of authors is explicitly stated. Currently, authority control is manual.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Author Entity primarily responsible for making the content of the resource. The entity or entities responsible for the creation and development of the data set. Required Repeatable Inherited (after publication metadata is acquired or created, the names are inherited. List can be edited.)

(In the original application profile, dc:creator is used instead of dc:contributor)

Title (dc.title)

Human-readable description of the dataset. Should not be more than 100 characters. If the author does not provide any additional information, we will use the filename as the title (currently under discussion), and assume that the contents of the file are obvious to anyone who reads the associated article.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Descriptive Title A name given to the resource. Descriptive title of the dataset. Required Non-Repeatable Generated automatically with the option of editing (WILL become more standardized once a format is chosen, and generated automatically)

Date of Issue (dc.date.issued)

The official date of publication, inherited by dataset.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Date of Issue Date of formal issuance (e.g., publication) of the resource. Date of publication. Required Non-Repeatable Automatically inherited.

Embargo (dc.date.embargoedUntil)

A date after which the dataset will be made public. This is only used for datasets under embargo. The length of embargo is still under discussion.

Type (dc.type)

Choose an appropriate type, default will be "Dataset" and done automatically.

Subject Keywords (dc.subject)

Keywords from the publication will be attached to datasets. Other keywords may be manually applied to datasets.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Subject Keywords The topic of the resource. Dataset keywords. Required Repeatable Automatically inherited from publication, but can be edited and added to.

Description (dc.description)

Non-repeatable.

Human-readable description of the dataset. Can contain much more detail than the title.

Any description that seems too long to put in this element (e.g., more than one page of text) should be placed in a separate file, which will be a supplemental datastream of this object. It will be given a name of the form READMEx.yyy, where x is a sequence number (ommitted if only one documentation file is submitted) and yyy is the file extension of the original (documentation) file.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Description Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource. Description of dataset. Optional Non-repeatable Manual

Described By Publication (dc.relation.ispartof)

Primary identifier (typically a DOI) of the publication in which this dataset is most fully described, or in which it first appeared. It is possible, though unlikely, for a dataset to have multiple primary articles. Datasets will not reference all articles they are used in. This would be redundant with information in the articles, and would cause unneeded updates to the dataset records.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Described By Publication A related resource. Identifier of the published article with which dataset is associated. Required Repeatable Automatically inherited from publication metadata

Rights Statement (dc.rights.uri)

A short human-readable phrase describing the access rights, which may also be machine-readable. Applied automatically to all resources in Dryad. Can also include an original rights statement from the journal/publisher. Decision is that it will point to a Creative Commons license.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Rights Statement Information about rights held in and over the resource. Statement regarding rights held in and over the resource. Required Repeatable Automatic

Geographic Areas (dc.coverage.spatial)

Textual description of the geographic area covered by the dataset, and automatically inherited by the datasets with the option to edit.

Eventually, this must use a vocabulary through HIVE, like gaz.obo or TGN.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Geographic Areas Spatial topic may be a named place or a location specified by its geographic coordinates. The spatial description of the data set specified by a geographic description and geographic coordinates. Optional Repeatable Automatic

Geologic Timespans (dc.coverage.temporal)

Textual description of the timespan covered by the article. Currently entered in manually for publication, but inherited automatically for the datasets, with the option of editing.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Geologic Timespans Temporal period may be a named period, date, or date range. The temporal description of the data set including start date and end date of the collection/creation of the data set. Optional Repeatable Automatic

File Metadata (bitstream format indicator)

Code indicating the type of file. This is automatically detected by DSpace, but can be modified manually.

Field Label Formal Definition User Definition Requirement Cardinality Generation
File Format The physical or digital manifestation of the resource. The format in which the data set is stored. Can also represent software format. Required Repeatable Automatic
Field Label Formal Definition User Definition Requirement Cardinality Generation
File Size The physical or digital manifestation of the resource. The size of the file storage. Required Repeatable Automatic (CV: PRONOM - http://www.nationalarchives.gov.uk/pronom/)

(Both fields use dc:format with optional qualifiers 'medium' or 'extent')

Taxonomic Names (dwc.ScientificName)

Taxonomic names to which the article refers. Can be either scientific or common, as long as they are correct. Will be automatically inherited by data objects.

Field Label Formal Definition User Definition Requirement Cardinality Generation
File Size The full name of lowest level taxon to which the cataloged item can be identified (e.g., genus name, specific epithet, subspecific epithet, etc.). The full name of lowest level taxon to which the cataloged item can be identified (e.g., genus name, specific epithet, subspecific epithet, etc.). Optional Repeatable Currently manual, but will be enhanced by HIVE.

DSpace metadata automatically assigned

dc.date.accessioned
dc.date.available
dc.description.provenance
dc.identifier.uri -> this is the Dryad handle that is automatically assigned

Publication Cataloging Guidelines

The overall goal is to provide enough DC metadata that we can automatically generate the dc.identifier.citation and/or an openURL query.

Authors (dc.contributor.author)

List the full names of authors. Do not just copy abbreviated names from a citation, try to find the actual names.

Currently, authority control is manual. Author/contributor names should be formatted as "Firstname MI Lastname" but depends on the text received from the publisher, and will probably have to be changed by the curator. This is essential to note, because the author names will be inherited by the data object records. Initially, we won't track email addresses in normal metadata, just letting DSpace track the submitter in the provenance. WHAT ABOUT LATER??

Field Label Formal Definition User Definition Requirement Cardinality Generation
Authors An entity primarily responsible for making the resource. Author(s) of the article. Required Repeatable Automatic for partner journals, manual for non-partners

(The original application profile used dc:creator instead of dc:contributor)

Article Title (dc.title)

Title of the article.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Article Title A name given to the resource. Title of the article. Required Non-Repeatable Automatic for partner journals, manual for non-partners

(Please see 'dc.ispartofseries' for journal title.)

Date of Issue (dc.date.issued)

The official date of publication. Year is required. Include month and day if possible. ??? June 2 - doesn't seem to be accomodated by interface. Only "journal issue" -> "year" is offered.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Date of Issue Date of formal issuance (e.g., publication) of the resource. Date of publication. Required Non-Repeatable Automatically assigned for partner journals, manual for non-partners.

Publisher (dc.publisher)

The original publisher of the article. Note: This should be a publishing company, which is normally different than the journal name. This is currently manually assigned.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Publisher An entity responsible for making the resource available. Journal publisher. Optional Repeatable Manual

Full Citation (dc.identifier.citation)

(listed as dcterms:bibliographic citation in app profile)

A plain-text citation. Currently, copied from the publisher's site if available. Some attempt should be made to normalize case (don't include all caps). In the future, this may be automatically generated, to provide consistent formatting.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Full Citation Details of the bibliographic item that contains the resource along with the position of the resource within it. The citation information for the journal article. Required Repeatable Automatic

Journal (dc.relation.isPartOfSeries)

Name of journal. It would appear that volume and issue number are also going into this field. This is unclear.

DOI (dc.identifier.uri)

Select URI and enter the DOI of the publication in URL form, if available. Otherwise, use the most "permanent" URL available that represents the publication.

Field Label Formal Definition User Definition Requirement Cardinality Generation
DOI An unambiguous reference to the resource within a given context. The Digital Object Identifier of a journal article. Required Non-Repeatable Automatic, but for some articles the DOI will not be available and the curator must find it.

Type (dc.type)

Choose "Article". Done automatically.

Language (dc.language.iso)

Subject Keywords (dc.subject)

Initially, only explicitly-stated keywords will be cataloged as such. In the future, we hope to perform more automatic keyword extraction.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Subject Keywords The topic of the resource. Article keywords. Required Repeatable Automatically assigned for partner journals, manual for non-partners

Abstract (dc.description.abstract)

The abstract from the publication.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Abstract An account of the resource. Article abstract. Required Non-Repeatable Automatically assigned for partner journals, manual for non-partners

Rights Statement (dc.rights.uri)

A short human-readable phrase describing the access rights, which may also be machine-readable. Applied automatically to all resources in Dryad. Can also include an original rights statement from the journal/publisher. Decision is that it will point to a Creative Commons license.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Rights Statement Information about rights held in and over the resource. Statement regarding rights held in and over the resource. Required Repeatable Automatic

Geographic Areas (dc.coverage.spatial)

Textual description of the geographic area covered by the publication.

Eventually, this must use a vocabulary through HIVE, like gaz.obo or TGN.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Geographic Areas Spatial topic may be a named place or a location specified by its geographic coordinates. The spatial description of the data set specified by a geographic description and geographic coordinates. Optional Repeatable Currently manual, later will be semi-automatic through use of HIVE

Geologic Timespans (dc.coverage.temporal)

Textual description of the timespan covered by the dataset. Currently entered in manually.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Geologic Timespans Temporal period may be a named period, date, or date range. The temporal description of the data set including start date and end date of the collection/creation of the data set. Optional Repeatable Currently manual.

Primary Contact (dc.contributor.correspondingAuthor)

The corresponding author.

Manuscript Number (dc.identifier.manuscriptNumber)

If this is available, it will be automatic.

Taxonomic Names (dwc.ScientificName)

Taxonomic names to which the article refers. Can be either scientific or common, as long as they are correct.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Taxonomic Names The full name of lowest level taxon to which the cataloged item can be identified (e.g., genus name, specific epithet, subspecific epithet, etc.). The full name of lowest level taxon to which the cataloged item can be identified (e.g., genus name, specific epithet, subspecific epithet, etc.). Optional Repeatable Currently manual, but will be enhanced by HIVE.

DSpace metadata automatically assigned

dc.date.accessioned
dc.date.available
dc.description.provenance
dc.indentifier.uri -> assigned by Dryad as a handle
dc.relation.haspart - automatically assigned by DSpace when data objects are uploaded for the publication, but please note that this can be edited manually to create relationships AFTER a publication/data file is published