Old:Cataloging Guidelines

STATUS: This page is no longer being maintained and of historical interest only.

These guidelines were written to describe the process for cataloging data deposited in Dryad, and were derived from previous metadata documents. While many of the metadata elements described here remain in use, this page is outdated and primarily of historical value. This page contains some of the same content as another outdated page, Application Profile Cataloging Guidelines.

Cataloging Process
Unfortunately, the current process is a bit convoluted. This will become easier as we adapt DSpace to better suit our needs.


 * 1) Select an item to be cataloged from the Google Spreadsheet titled "Dryad Data Processing".
 * 2) In the Publications collection, catalog the publication, including all data that can be entered on default DSpace screens. Some fields aren't available on these screens, and will need to be added in the modification step.
 * 3) Follow the Publication Cataloging Guidelines below.
 * 4) Make sure to specify that it has been published before, so all of the entry fields are shown.
 * 5) When DSpace requests a file upload, use a dummy file, like blank.html. The actual PDF should not be placed in the public repository.
 * 6) Check whether the article already has supplemental materials available on the publisher's website. If so, and if they are not part of the current submission, ask the author's permission to include these materials in Dryad.
 * 7) In the Data collection, catalog the datasets (each in its own record), including all data that can be entered on default DSpace screens. Some fields aren't available on these screens, and will need to be added in the modification step.
 * 8) Follow the Dataset Cataloging Guidelines below.
 * 9) Specify whether it has been published before. If it has, enter the initial publication date for Date of Issue. If not, this will be filled in automatically.
 * 10) When you upload the data file, if DSpace does not recognize the filetype, [mailto:rscherle@nescent.org tell Ryan] so he can update the list of acceptable filetypes. (It's ok to continue with the submission process while this happens.)
 * 11) Go to My Dryad and approve the submissions.
 * 12) Modify each dataset record:
 * 13) Navigate to the record and click the Edit button.
 * 14) Add the DOI for the publication as a dc.relation.ispartof.
 * 15) Add the dwc.scientific name, if applicable.
 * 16) Add the dc:coverage, if applicable.
 * 17) Double-check the content with the Dataset Cataloging Guidelines below.
 * 18) Modify the publication record:
 * 19) Navigate to the record and click the Edit button.
 * 20) Add the handle for each dataset as a dc.relation.haspart.
 * 21) Add the dwc.scientific name, if applicable.
 * 22) Double-check the content with the Publication Cataloging Guidelines below.
 * 23) Remove the dummy bitstream file.
 * 24) Enter a copy of the publication record in the private instance of dryad, and upload a copy of the actual publication file.
 * 25) *It is not necessary to copy all of the fields. The title, authors, DOI, and PDF are sufficient.
 * 26) Update status of the submission in the Google spreadsheet.
 * 27) Notify the corresponding author that the item is available through Dryad.

Key for Metadata Generation Method
A = Automatic A(M) = Automatic with manual modification D = Derived (after profile information is provided, the element values can be automatically generated)

authors (dc.contributor.author)
Should be the same as the associated publication, unless a different set of authors is explicitly stated. Currently, authority control is manual.

(In the original application profile, dc:creator is used instead of dc:contributor)

title (dc.title)
Non-repeatable.

Human-readable description of the dataset. Should not be more than 100 characters. If the author does not provide any additional information, we will use the filename as the title, and assume that the contents of the file are obvious to anyone who reads the associated article.

date of issue (dc.date.issued)
If you don't choose "this has been published before", automatically filled with the current date. Otherwise specify the date on which it was previously published.

embargo (dc.date.embargoedUntil)
A date after which the dataset will be made public. This is only used for datasets under embargo.

type (dc.type)
Choose an appropriate type, most likely "Dataset" or "Image".

keywords (dc.subject)
Keywords from the publication will be attached to datasets only when it is obvious that they apply. Other keywords may be manually applied to datasets.

description (dc.description)
Non-repeatable.

Human-readable description of the dataset. Can contain much more detail than the title.

Any description that seems too long to put in this element (e.g., more than one page of text) should be placed in a separate file, which will be a supplemental datastream of this object. It will be given a name of the form READMEx.yyy, where x is a sequence number (ommitted if only one documentation file is submitted) and yyy is the file extension of the original (documentation) file.

(During recent discussions, this element 'description' was merged with 'title' to yield field name 'Descriptive Title.' Please see above.)

language (dc.language.iso)
If the data file includes human-readable text, choose an appropriate language.

primary publication (dc.relation.isReferencedBy)
Primary identifier (typically a DOI) of the publication in which this dataset is most fully described, or in which it first appeared. It is possible, though unlikely, for a dataset to have multiple primary articles. Datasets will not reference all articles they are used in. This would be redundant with information in the articles, and would cause unneeded updates to the dataset records.

(in the original application profile, dc.relation.isPartOf was considered)

rights statement (dc.rights)
A short human-readable phrase describing the access rights, which may also be machine-readable. For exmaple:


 * CreativeCommons license (CC-BY)
 * Public Domain
 * Copyright held by publisher

A blank value indicates that the dataset is a "normal" status item.

locality (dc.coverage.spatial)
Textual description of the geographic area covered by the dataset.

Eventually, this must use a vocabulary, like gaz.obo or TGN.

dates covered (dc.coverage.temporal)
Textual description of the timespan covered by the dataset.

software/file type (bitstream format indicator)
Code indicating the type of file. This is automatically detected by DSpace, but can be modified manually.

(Both fields use dc:format with optional qualifiers 'medium' or 'extent')

Publication Cataloging Guidelines
The overall goal is to provide enough DC metadata that we can automatically generate the dc.identifier.citation and/or an openURL query. We can use a lot of the information in the DC citation guidelines.

authors (dc.contributor.author)
List the full names of authors. Do not just copy abbreviated names from a citation, try to find the actual names.

Currently, authority control is manual. Author/contributor names will typically be formatted as "Lastname, Firstname" OR as "Lastname, A. B.", depending on the text received from the publisher. We will optimize for searches on lastname only, knowing that Firstname may often only be available as initials. We will store email addresses for disambiguation. Initially, we won't track email addresses in normal metadata, just letting DSpace track the submitter in the provenance.

(The original application profile used dc:creator instead of dc:contributor)

title (dc.title)
Title of the publication.

(Please see 'series' for journal title.)

date of issue (dc.date.issued)
The official date of publication. Year is required. Include month and day if possible.

publisher (dc.publisher)
The original publisher of the article. Note: This should be a publishing company, which is normally different than the journal name.

citation (dc.identifier.citation)
(listed as dcterms:bibliographic citation in app profile)

A plain-text citation. Currently, copied from the publisher's site if available. Some attempt should be made to normalize case (don't include all caps). In the future, this may be automatically generated, to provide consistent formatting.

series/report no. (dc.relation.isPartOfSeries)
Series is the ISSN of the journal. Report number is all other information that identifies this item's location within the series (volume, number, pages).

identifiers (dc.identifier.uri)
Select URI and enter the DOI of the publication in URL form, if available. Otherwise, use the most "permanent" URL available that represents the publication.

type (dc.type)
Choose "Article".

language (dc.language.iso)
Choose the most appropriate language.

subject keywords (dc.subject)
Initially, only explicitly-stated keywords will be cataloged as such. In the future, we hope to perform more automatic keyword extraction.

Note: Species/taxa names will be cataloged as such, and will not be replicated as keywords (though they will be searchable as keywords).

abstract (dc.description.abstract)
The abstract from the publication.

rights statement (dc.rights)
A short human-readable phrase describing the access rights, which may also be machine-readable. For exmaple:


 * CreativeCommons license (CC-BY)
 * Public Domain
 * Copyright held by publisher

A blank entry indicates the copyright status is unknown.

Open Questions

 * 1) Do we need to track the publisher separately from the journal? What will we do with this information?