Old:Application Profile Cataloging Guidelines

From Dryad wiki
Jump to: navigation, search

STATUS: This page is no longer being maintained and of historical interest only.

These guidelines were written to describe the process for cataloging data deposited in Dryad, and were derived from previous metadata documents. While many of the metadata elements described here remain in use, this page is outdated and primarily of historical value. This page contains some of the same content as another outdated page, Cataloging Guidelines.

Key for Metadata Generation Method

A = Automatic
A(M) = Automatic with manual modification
D = Derived (after profile information is provided, the element values can be automatically generated)

Dataset Cataloging Guidelines

authors (dc.contributor.author)

Should be the same as the associated publication, unless a different set of authors is explicitly stated. Currently, authority control is manual.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Author Entity primarily responsible for making the content of the resource. The entity or entities responsible for the creation and development of the data set. Required Repeatable D (after profile information is provided, the element values can be automatically generated)

(In the original application profile, dc:creator is used instead of dc:contributor)

title (dc.title)

Non-repeatable.

Human-readable description of the dataset. Should not be more than 100 characters. If the author does not provide any additional information, we will use the filename as the title, and assume that the contents of the file are obvious to anyone who reads the associated article.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Descriptive Title A name given to the resource. Descriptive title of the dataset. Required Non-Repeatable A(M) (generated automatically, for example from figure or table titles in the publication, or 'Data supporting Figure 2' (suggestion by Hilmar), but can also be modified by a user)

date of issue (dc.date.issued)

If you don't choose "this has been published before", automatically filled with the current date. Otherwise specify the date on which it was previously published.

embargo (dc.date.embargoedUntil)

A date after which the dataset will be made public. This is only used for datasets under embargo.

type (dc.type)

Choose an appropriate type, most likely "Dataset" or "Image".

keywords (dc.subject)

Keywords from the publication will be attached to datasets only when it is obvious that they apply. Other keywords may be manually applied to datasets.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Keywords The topic of the resource. Dataset keywords. Required Repeatable A(M) (could be pulled from article and also elaborated upon by user)

description (dc.description)

Non-repeatable.

Human-readable description of the dataset. Can contain much more detail than the title.

Any description that seems too long to put in this element (e.g., more than one page of text) should be placed in a separate file, which will be a supplemental datastream of this object. It will be given a name of the form READMEx.yyy, where x is a sequence number (ommitted if only one documentation file is submitted) and yyy is the file extension of the original (documentation) file.

(During recent discussions, this element 'description' was merged with 'title' to yield field name 'Descriptive Title.' Please see above.)

language (dc.language.iso)

If the data file includes human-readable text, choose an appropriate language.

primary publication (dc.relation.isReferencedBy)

Primary identifier (typically a DOI) of the publication in which this dataset is most fully described, or in which it first appeared. It is possible, though unlikely, for a dataset to have multiple primary articles. Datasets will not reference all articles they are used in. This would be redundant with information in the articles, and would cause unneeded updates to the dataset records.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Primary Publication A related resource. Digital Object Identifier of the published article with which dataset is associated. Required Repeatable A

(in the original application profile, dc.relation.isPartOf was considered)

rights statement (dc.rights)

A short human-readable phrase describing the access rights, which may also be machine-readable. For exmaple:

  • CreativeCommons license (CC-BY)
  • Public Domain
  • Copyright held by publisher

A blank value indicates that the dataset is a "normal" status item.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Rights Statement Information about rights held in and over the resource. Statement regarding rights held in and over the resource. Required Repeatable A(M)

locality (dc.coverage.spatial)

Textual description of the geographic area covered by the dataset.

Eventually, this must use a vocabulary, like gaz.obo or TGN.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Locality Spatial topic may be a named place or a location specified by its geographic coordinates. The spatial description of the data set specified by a geographic description and geographic coordinates. Optional Repeatable A(M) (use of CVs, for example, [TGN] http://www.getty.edu/research/tools/vocabulary/tgn/index.html)

dates covered (dc.coverage.temporal)

Textual description of the timespan covered by the dataset.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Dates Covered Temporal period may be a named period, date, or date range. The temporal description of the data set including start date and end date of the collection/creation of the data set. Optional Repeatable A(M)

software/file type (bitstream format indicator)

Code indicating the type of file. This is automatically detected by DSpace, but can be modified manually.

Field Label Formal Definition User Definition Requirement Cardinality Generation
File Format The physical or digital manifestation of the resource. The format in which the data set is stored. Can also represent software format. Required Repeatable A
Field Label Formal Definition User Definition Requirement Cardinality Generation
File Size The physical or digital manifestation of the resource. The size of the file storage. Required Repeatable A (CV: PRONOM - http://www.nationalarchives.gov.uk/pronom/)

(Both fields use dc:format with optional qualifiers 'medium' or 'extent')

Publication Cataloging Guidelines

The overall goal is to provide enough DC metadata that we can automatically generate the dc.identifier.citation and/or an openURL query. We can use a lot of the information in the DC citation guidelines.

authors (dc.contributor.author)

List the full names of authors. Do not just copy abbreviated names from a citation, try to find the actual names.

Currently, authority control is manual. Author/contributor names will typically be formatted as "Lastname, Firstname" OR as "Lastname, A. B.", depending on the text received from the publisher. We will optimize for searches on lastname only, knowing that Firstname may often only be available as initials. We will store email addresses for disambiguation. Initially, we won't track email addresses in normal metadata, just letting DSpace track the submitter in the provenance.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Author An entity primarily responsible for making the resource. Author(s) of the article. Required Repeatable Automatic

(The original application profile used dc:creator instead of dc:contributor)

title (dc.title)

Title of the publication.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Title A name given to the resource. Title of the article. Required Non-Repeatable A

(Please see 'series' for journal title.)

date of issue (dc.date.issued)

The official date of publication. Year is required. Include month and day if possible.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Date of Issue Date of formal issuance (e.g., publication) of the resource. Date of publication. Required Non-Repeatable Automatic

publisher (dc.publisher)

The original publisher of the article. Note: This should be a publishing company, which is normally different than the journal name.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Publisher An entity responsible for making the resource available. Journal publisher. Required Repeatable Automatic

citation (dc.identifier.citation)

(listed as dcterms:bibliographic citation in app profile)

A plain-text citation. Currently, copied from the publisher's site if available. Some attempt should be made to normalize case (don't include all caps). In the future, this may be automatically generated, to provide consistent formatting.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Citation Details of the bibliographic item that contains the resource along with the position of the resource within it. The citation information for the journal article. Required Repeatable A

series/report no. (dc.relation.isPartOfSeries)

Series is the name of the journal. Report number is all other information that identifies this item's location within the series (volume, number, pages).

identifiers (dc.identifier.uri)

Select URI and enter the DOI of the publication in URL form, if available. Otherwise, use the most "permanent" URL available that represents the publication.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Digital Object Identifier An unambiguous reference to the resource within a given context. The Digital Object Identifier of a journal article. Required Non-Repeatable Automatic

type (dc.type)

Choose "Article".

language (dc.language.iso)

Choose the most appropriate language.

subject keywords (dc.subject)

Initially, only explicitly-stated keywords will be cataloged as such. In the future, we hope to perform more automatic keyword extraction.

Note: Species/taxa names will be cataloged as such, and will not be replicated as keywords (though they will be searchable as keywords).

Field Label Formal Definition User Definition Requirement Cardinality Generation
Subject Keywords The topic of the resource. Article keywords. Required Repeatable A(M)

abstract (dc.description.abstract)

The abstract from the publication.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Abstract An account of the resource. Article abstract. Required Non-Repeatable A

rights statement (dc.rights)

A short human-readable phrase describing the access rights, which may also be machine-readable. For exmaple:

  • CreativeCommons license (CC-BY)
  • Public Domain
  • Copyright held by publisher

A blank entry indicates the copyright status is unknown.

Field Label Formal Definition User Definition Requirement Cardinality Generation
Rights Statement Information about rights held in and over the resource. Statement regarding rights held in and over the resource. Required Repeatable A(M)