Citing Data

From Dryad wiki
Revision as of 08:08, 14 July 2009 by (talk | contribs) (Comments on the proposed format)

Jump to: navigation, search

Proposed format for Dryad

Suggested wording for answering the question, How should data in Dryad be cited?

Revised 7/13/09:

When using a data package archived in Dryad, always cite the original paper associated with the data, using the normal format of the journal you are writing for.

Additionally, please cite the data package as follows:

Sidlauskas, B. (2007) Data from "Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes." Evolution 61:299–316. Dryad Digital Repository,

Furthermore, it may sometimes be useful to cite a specific data file directly, particularly if a data file is used that needs to be distinguished from the rest of the data package. When a specific data file is cited, please use a format such as the following, using the handle from the specific data file:

Sidlauskas, B. (2007) Landmark consensuses. Data file from "Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes." Evolution 61:299–316. Dryad Digital Repository,

Comments on the proposed format

PS notes:

1. I put the quotes around the article title because I think that's what I heard us propose, but I don't really like the way it looks, although it does offset the article title from the rest of the citation. Are there other alternatives?

2. It's my understanding that we agreed:

  • data packages in Dryad are equivalent to the corresponding published article
  • data files can be cited individually by using the file name
  • authors should be encouraged to use meaningful and unique file names; if they upload files with non-unique file names the system will append a numerical identifier so that they are unique within the data package
    • we will continue to automatically populate the data "Title" field with the file name, with the option for the depositor to change it, but if they don't we will leave the default as is - we will not establish a standard data title naming convention for Dryad
  • the display of a record in Dryad should change as follows to model & encourage the desired citation behavior:
    • prefacing the publication title at the top of the page with Data from
    • changing the label for the article citation from Full Citation to Publication
    • adding a new field, labeled Data Citation with the appropriate citation for the item (data package or data file)
  • different versions of a data file will get different identifiers
  • we don't recommend that citations include an Access Date because data handles are persistent and unique, making Access Date information unnecessary

Mike Whitlock's Notes: Include the return after the first sentence to set it aside. It is the most important point of the whole text. Don’t include the DOI for the paper. Normal paper citations do not include the DOI, and it is confusing here whether it might allude to the paper or the data.

Ryan's thoughts:

  • The text indicates that this is a method for citing a data file, but the sample handle links to a data package, not a data file.
  • Should we provide examples for both packages and files?
  • Shouldn't the filename/title be included somewhere? If I want to cite two files from a package, are the handles the only way to distinguish the citations?

More comments (7/2) from Mike:

Just to clarify, when I drafted the citation policy, the main intent was to express that the paper be cited primarily That is the current (but still unvoted) position of the executive.

For the way to cite the package or file itself, I had just copied something from the dryad website, and then tweaked it hurriedly after an e-mail from Peggy with some good ideas. So I don't have strong feelings about the ways to cite data files or packages.

I agree with Ryan that a package and a file should have different ways to cite.

I'm starting to think that the citation for the data shouldn't be so tied to the paper. After all the paper ought to be cited anyway.

How about, for data packages:

Sidlauskas, B. 2007. Testing for Unequal Rates of Morphological Diversification in the Absence of a Detailed Phylogeny: A Case Study From Characiform Fishes. Dryad Digital Repository.

And for data files:

Sidlauskas, B. 2007. Landmark Consensuses. Dryad Digital Repository.

Scholarly articles

  • Peter Buneman's thoughts on making identifiers citable.
  • A Proposed Standard for the Scholarly Citation of Quantitative Data by Micah Altman and Gary King.
    • Summary of article: Citations to numerical data should include, at a minimum, six required components. The first three components are traditional, directly paralleling print documents. They include the author(s) of the data set, the date the data set was published or otherwise made public, and the data set title. The other three are: a unique global identifier, a universal numeric fingerprint, and a bridge service. They are also designed to take advantage of the digital form of quantitative data.
    • Sample citation based on minimum recommended components:
    • Useful comments on versioning: "We recommend versions of the same data set be given new identifiers and treated as separate data sets, with links back to the prior version kept in the metadata describing that data set. Forward links to new versions from the original are easily accomplished via a metadata search on the unique global identifier. New versions of very large data sets (relative to available storage capacity) can be kept by creating a new object that contains only differences from the original, and describing how to combine the differences with the original on the object's metadata description page. Version changes should be reflected by a change in the date, and may also be noted in the title, or by using the extended citation elements."
  • We Need Publishing Standards for Datasets and Data Tables - White paper from OECD publishing.
    • Advocates a slightly more verbose citation standard than Altman & King. (includes a comparison table for the two standards)
    • In the new system being built by OECD, "All the DOIs for the datasets and tables will be deposited with CrossRef, ready for other publishers to use."

Guides from other initiatives and institutions

  1. Interuniversity Consortium for Political and Social Research (ICPSR) - Citing Electronic Data Files
  2. National Center for Health Statistics (NCHS) - How to Cite Electronic Media
  3. MIT Libraries - Social Science Data Services
  4. Socioeconomic Data and Applications Center (SEDAC) Guide for Citing Data, Applications and Web Resources
  5. STD-DOI project from the German Science Foundation
    • Citation of Data
        • Nozawa, Toru (2004): IPCC-DDC_CCSRNIES_SRES_B2: 211 YEARS MONTHLY MEANS, National Institute for Environmental Studies and Center for Climate System Research Japan, WDCC. doi:10.1594/WDCC/CCSRNIES_SRES_B2
        • Kamm,H; Machon, L; Donner, S (2004): Gas Chromatography (KTB Field Lab), GFZ Potsdam. doi:10.1594/GFZ/ICDP/KTB/ktb-geoch-gaschr-p
        • Stein, R.; Fahl, K. (2003): Distribution of grain size and clay minerals in surface sediments of the Kara Sea, PANGAEA, doi:10.1594/PANGAEA.119754.

Other Repository Standards

  • ORNL DAAC users cite both the paper and the dataset.
  • Pangaea:
    • From the About page: Each dataset can be identified, shared, published and cited by using a Digital Object Identifier (DOI). Data are archived as supplements to publications or as citable data collections. Citations are available through the catalog of the German National Library of Science and Technology (TIBORDER).
    • There is a statement at the top of most pages: Always quote citation when using data! When looking at a record, the following is listed:
      • Citation: Barker, Peter F; Kennett, James P; Shipboard Scientific Party (2005): Core section summary of Hole 113-690C, doi:10.1594/PANGAEA.253771
      • Reference(s): Barker, Peter F; Kennett, James P; et al (1988): Proceedings of the Ocean Drilling Program, Initial Reports, College Station, Texas (Ocean Drilling Program), 113, 785 pp
        ODP/TAMU (2005): Janus Database (data copied from JANUS to PANGAEA February to June 2005), Ocean Drilling Program, Texas A&M University, College Station TX 77845-9547, USA,
    • Example: Stein, R.; Fahl, K. (2003): Distribution of grain size and clay minerals in surface sediments of the Kara Sea, PANGAEA, doi:10.1594/PANGAEA.119754.
  • Encyclopedia of Life citation recommendations
    • Listed example: Hancock, John. 2009. "Xysticus posti: Diagnostic description." Edited by David Shorthouse. In The Nearctic Spider Database. Accessed 15 January 2009, available from Encyclopedia of Life,
  • Treebase?
  • Fishbase?
  • NOAA Paleoclimatology Program - Data Citation
    • Example:
      • General form for citing published World Data Center for Paleoclimatology Data: Anderson, D.W., W.L. Prell, and N.J. Barratt. 1989. Estimates of sea surface temperature in the Coral Sea at the last glacial maximum. Paleoceanography 4(6):615-627. Data archived at the World Data Center for Paleoclimatology, Boulder, Colorado, USA.
    • Related: Notices when using National Climate Data Center data:
      • "Please acknowledge contributors and where appropriate, data cooperatives (e.g. International Tree-Ring Data Bank), when using these data."
      • Also offered by NCDC:
        • SUGGESTED DATA CITATION: Shen, G.T. and E.A. Boyle. 2004.
          Lead in Corals Data.
          IGBP PAGES/World Data Center for Paleoclimatology
          Data Contribution Series #2004-096.
          NOAA/NGDC Paleoclimatology Program, Boulder CO, USA.
        • ORIGINAL REFERENCE: Shen, G.T. and E.A. Boyle. 1987.
          Lead in corals: reconstruction of historical industrial fluxes to the surface ocean.
          Earth and Planetary Science Letters 82: 289-304.
  • U.S. Geological Survey's Earth Resources Observation and Science (EROS) Center/NASA's Land Processes Distributed Active Archive Center (LP DAAC)
    • Only request acknowledgement, i.e., "Data available from the U.S. Geological Survey" or "These data are distributed by the Land Processes Distributed Active Archive Center (LP DAAC), located at USGS/EROS, Sioux Falls, SD."
    • SIMILARLY: the Goddard Earth Science Data and Information Services Center requires a statement like: "The data used in this study were acquired as part of the activities of the NASA Earth-Sun System Division, and are archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC)"
    • SIMILARLY: PO.DAAC - Crediting PO.DAAC Data Products, Images and Services
      • "Please provide acknowledgement of the use of PO.DAAC data products, images, and services in publications or presentations."