Old:Application Profile Cataloging Guidelines

STATUS: This page is no longer being maintained and of historical interest only.

These guidelines were written to describe the process for cataloging data deposited in Dryad, and were derived from previous metadata documents. While many of the metadata elements described here remain in use, this page is outdated and primarily of historical value. This page contains some of the same content as another outdated page, Cataloging Guidelines.

Key for Metadata Generation Method
A = Automatic A(M) = Automatic with manual modification D = Derived (after profile information is provided, the element values can be automatically generated)

authors (dc.contributor.author)
Should be the same as the associated publication, unless a different set of authors is explicitly stated. Currently, authority control is manual.

(In the original application profile, dc:creator is used instead of dc:contributor)

title (dc.title)
Non-repeatable.

Human-readable description of the dataset. Should not be more than 100 characters. If the author does not provide any additional information, we will use the filename as the title, and assume that the contents of the file are obvious to anyone who reads the associated article.

date of issue (dc.date.issued)
If you don't choose "this has been published before", automatically filled with the current date. Otherwise specify the date on which it was previously published.

embargo (dc.date.embargoedUntil)
A date after which the dataset will be made public. This is only used for datasets under embargo.

type (dc.type)
Choose an appropriate type, most likely "Dataset" or "Image".

keywords (dc.subject)
Keywords from the publication will be attached to datasets only when it is obvious that they apply. Other keywords may be manually applied to datasets.

description (dc.description)
Non-repeatable.

Human-readable description of the dataset. Can contain much more detail than the title.

Any description that seems too long to put in this element (e.g., more than one page of text) should be placed in a separate file, which will be a supplemental datastream of this object. It will be given a name of the form READMEx.yyy, where x is a sequence number (ommitted if only one documentation file is submitted) and yyy is the file extension of the original (documentation) file.

(During recent discussions, this element 'description' was merged with 'title' to yield field name 'Descriptive Title.' Please see above.)

language (dc.language.iso)
If the data file includes human-readable text, choose an appropriate language.

primary publication (dc.relation.isReferencedBy)
Primary identifier (typically a DOI) of the publication in which this dataset is most fully described, or in which it first appeared. It is possible, though unlikely, for a dataset to have multiple primary articles. Datasets will not reference all articles they are used in. This would be redundant with information in the articles, and would cause unneeded updates to the dataset records.

(in the original application profile, dc.relation.isPartOf was considered)

rights statement (dc.rights)
A short human-readable phrase describing the access rights, which may also be machine-readable. For exmaple:


 * CreativeCommons license (CC-BY)
 * Public Domain
 * Copyright held by publisher

A blank value indicates that the dataset is a "normal" status item.

locality (dc.coverage.spatial)
Textual description of the geographic area covered by the dataset.

Eventually, this must use a vocabulary, like gaz.obo or TGN.

dates covered (dc.coverage.temporal)
Textual description of the timespan covered by the dataset.

software/file type (bitstream format indicator)
Code indicating the type of file. This is automatically detected by DSpace, but can be modified manually.

(Both fields use dc:format with optional qualifiers 'medium' or 'extent')

Publication Cataloging Guidelines
The overall goal is to provide enough DC metadata that we can automatically generate the dc.identifier.citation and/or an openURL query. We can use a lot of the information in the DC citation guidelines.

authors (dc.contributor.author)
List the full names of authors. Do not just copy abbreviated names from a citation, try to find the actual names.

Currently, authority control is manual. Author/contributor names will typically be formatted as "Lastname, Firstname" OR as "Lastname, A. B.", depending on the text received from the publisher. We will optimize for searches on lastname only, knowing that Firstname may often only be available as initials. We will store email addresses for disambiguation. Initially, we won't track email addresses in normal metadata, just letting DSpace track the submitter in the provenance.

(The original application profile used dc:creator instead of dc:contributor)

title (dc.title)
Title of the publication.

(Please see 'series' for journal title.)

date of issue (dc.date.issued)
The official date of publication. Year is required. Include month and day if possible.

publisher (dc.publisher)
The original publisher of the article. Note: This should be a publishing company, which is normally different than the journal name.

citation (dc.identifier.citation)
(listed as dcterms:bibliographic citation in app profile)

A plain-text citation. Currently, copied from the publisher's site if available. Some attempt should be made to normalize case (don't include all caps). In the future, this may be automatically generated, to provide consistent formatting.

series/report no. (dc.relation.isPartOfSeries)
Series is the name of the journal. Report number is all other information that identifies this item's location within the series (volume, number, pages).

identifiers (dc.identifier.uri)
Select URI and enter the DOI of the publication in URL form, if available. Otherwise, use the most "permanent" URL available that represents the publication.

type (dc.type)
Choose "Article".

language (dc.language.iso)
Choose the most appropriate language.

subject keywords (dc.subject)
Initially, only explicitly-stated keywords will be cataloged as such. In the future, we hope to perform more automatic keyword extraction.

Note: Species/taxa names will be cataloged as such, and will not be replicated as keywords (though they will be searchable as keywords).

abstract (dc.description.abstract)
The abstract from the publication.

rights statement (dc.rights)
A short human-readable phrase describing the access rights, which may also be machine-readable. For exmaple:


 * CreativeCommons license (CC-BY)
 * Public Domain
 * Copyright held by publisher

A blank entry indicates the copyright status is unknown.