Old:Repository Policies

From Dryad wiki
Jump to: navigation, search
STATUS: This page is no longer being maintained and of historical interest only.

These early policies have either been incorporated into the design and documentation of the repository or abandoned as stakeholder needs have changed. This outdated page may have historical value.


  1. All data deposited must be under an open-access license.
  2. All authors must agree to the license at submission, one author is the primary contact (corresponding author). If several datasets are used in an article, each dataset may have a different corresponding author. The corresponding author remains the primary contact for questions about keywords, if someone has trouble understanding/using the data, or if someone wants to use the data for commercial purposes.
  3. For a synthetic study, it may not be possible to obtain licenses for all of the data. In this case, as much data as possible should be deposited in the repository, and there should be a "dataset" that contains detailed references to all datasets that were used (and how they were obtained).
  4. If a dataset is deposited with an embargo period, the metadata will be made available immediately, so the dataset can be cited.
  5. Identifiers represent the abstract content of a dataset, not any specific bit-level representation. The bit-level representations may change as old formats become invalid (though the originally submitted package will always be retained).
  6. All data files must have a file extension that corresponds with the file content. There are no other restrictions on filenames, but the submission form will encourage good naming practices. The curator reserves the right to change filenames as needed.
  7. By default, each data file is cataloged in its own record. When files are highly dependent on each other (e.g., an HTML file and associated images), they may be stored in a single record. When a large number of files is deposited from a single publication, an aggregation record may be created as a single storage location for metadata. We are _not_ trying to provide incredibly fine-grained access, and aggregations will make data submission easier for authors. However, we do not want to lump many files into bitstreams of a single metadata record, because this would make migration to new formats more difficult to manage, and the individual files would no longer be citable.

External Repositories

  1. We will always assign a handle to content from other repositories, simply so we can track our local version. However, we need to note (in a status field) that ours is not the authoritative copy of the item.


  1. The main Dryad repository will contain metadata records for articles, but the actual articles will remain in a non-public repository. We will maintain the non-public copies for automatic processing and preservation purposes.
  2. End-users will access publications directly from the journal repositories, not from Dryad.
  3. Every publication must have a DOI or other long-term identifier. If it does not, we will work with the journal to create one.
  4. The DOI for a publication will always be stored in URL form.
  5. The public repository contains the authoritative metadata record. Metadata in the private repository is retained for disambiguation, but may become out of date. The DOI, which is used for linking records in the public and private repository, is the only metadata field that is considered authoritative.


This section only contains information under discussion. For full details, see the Cataloging Guidelines.

  1. All metadata fields marked as Required should be interpreted as "Required when Available (or Known)".

Bibliographic Records (publications)

  1. Actual publication files will be stored in the private repository. There is no need to sync handles with items in the public repository, because we will simply reference the DOIs to match records. For the same reason, we will not need to put detailed metadata on publications in the private repoisitory.
  2. If a publication has a more permissive license than the Dryad Submission License, we will record that license in the public repository rights statement.

Dataset Records

  1. The citation for a dataset will use the "issued" date, which is initially populated from the date of the associated article, but may be changed by the author to reflect the date on which the dataset first became available.

Resolved questions

(These need to be integrated with both the cataloging process and the Cataloging Guidelines.)

  1. How do we indicate the corresponding author? Use DDI (version 2.1) for primary contact information -- ddi.contact, can qualify with .email and so on for data that appears as attributes in the DDI spec
  2. How to store the journal name? We need this to be separate so it can drive branding (logo display). Currently putting in series (dc.relation.isPartOfSeries). Answer: keep in isPartOfSeries, but instead of a name, store the ISSN, which is more useful.
  3. How to store the other information that the guidelines put in dcterms:bibliographicCitation? (volume, number, pages). Format as specified by OpenURL (KEV). However, we should keep an eye on the Bibliographic Ontology Specification.
  4. Identifier granualarity and citations. If a paper references 100 gene sequences, and each has its own ID, do we really want people to cite all 100 in their pubs? Or should we encourage uploading a list of the IDs used, and cite this list? Answer: leave to curator's discretion.
  5. Use JHOVE to generate technical metadata? There is no immediate need, so this can wait until we identify a need.
  6. Store gene names in a controlled field, separately from scientific name? Store, but postpone controlled vocabulary. Gene names are messier than species names, so a keyword-like field is the best immediate solution. SARAH/HOLLIE. Sarah - I looked around and found one schema that could be used for "gene" - MicroArray Gene Expression - Markup Language (MAGE-ML). Please see: http://www.mged.org/Workgroups/MAGE/mage-ml.html
  7. When we acquire content from other repositories, how do we indicate this? Do we promote our identifiers over the old ones? We will always assign our own identifiers, though we will provide searchers with links to equivalent entries in other locations. For our partner repositories, we will suggest that users cite the other repository's identifier.
  8. Publications without a DOI (or other persistent identifier) will be referenced by our handle

Open questions


  1. What to do about taxon names that change over time? We can't really control this unless we enforce a vocabulary. The Encyclopedia of Life may solve this problem for us.
  2. Can/should we add author names to the LC name authority file? This is something for a future grant, or at least for the metadata curator. JANE: Yes. Could investigate participating in LC/NACO. Fits in with data curator solution.
  3. When subject headings come from a vocabulary, how should we indicate this? Include a qualifier we define, or limit ourselves to the official DC qualifiers? This will be answered by the HIVE project. JANE/ABBEY: We can use our own qualifiers from other schemes as long as they are registered namespaces.