Old:Best practices for data archiving

From Dryad wiki
Jump to: navigation, search

STATUS: This page is no longer being maintained and of historical interest only.

Information for authors, to provide guidance on what to publish and how.

Dryad data archiving: what to deposit & recommendations for authors

[orig. from Mike Whitlock 5/25/09; updates by PS & SWC 6/09; & PS 8/09, 9/29/09]

    • see also Depositing Data to Dryad
      • Dryad strives to collect data supporting articles appearing in its partner journals.
      • All data files must be associated with a publication.
      • You should submit sufficient data that another researcher would be able to evaluate the findings described in the publication.
      • Data should be archived at the stage where it is ready for statistical analysis.
      • Archived data files may include, but are not limited to:
        • spreadsheets or other tables
        • images or maps
        • alignments
        • character matrices
      • You may also submit data that was originally collected for another publication, as long as that data is referenced by the current publication.
    • By submitting data to Dryad, you are agreeing to license the data under Creative Commons Zero (CC0), which essentially states that others can use the data for any purpose as long as they cite you.
    • for more info see the DRYAD COLLECTION POLICY
    • Please provide a ReadMe file for each data package you deposit so that the data can be correctly interpreted. Multiple ReadMe files may be submitted if it is necessary to document each data file separately.
    1. Write your ReadMe file in a plain text file
    2. Introductory information to include:
      • for each filename, a short description of what data it includes
      • who collected the data and whom to contact with questions
    3. Describe:
      • column headings for any tabular data
      • the units of measurement used
      • what symbols are used to record missing data
      • any specialized formats or abbreviations used
      • any additional related data collected that was not included in the current data package
    4. Use standardized formats and follow scientific conventions in both your ReadMe file and in your data files:
        • EXAMPLES: 20090824 is 24 August 2009. Times can be appended as YYYYMMDDThhmmss; 3 seconds after 11:05 pm on March 18, 2002 is 20020318T130503. Punctuation can be added to improve readability: 2009-08-24 or 2002-03-18T13:05:03.
      • Taxonomic names Dryad provides access to multiple standard taxonomic sources [give link to list here?]
      1. EXAMPLE: Torellia vallonia (the scientific name for the acorn hairysnail; example from ITIS (Integrated Taxonomic Information System) (http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=72592)
      • Geospatial references Use (list resources & tools here)
        • EXAMPLE: Yardley Village (inhabited place); example from TGN (Getty Thesaurus of Geographic Names)
      • Geologic time spans
        • EXAMPLE: Middle Paleolithic, ?

Suggestions for Data Management

  1. Start preparing your data for archiving when it is collected and analyzed.
  2. Use non-proprietary file formats wherever possible; they are more likely to be readable in the future. For example, a plain text file has a longer life than a propriety word processing format, and a file of comma or tab-delimited values has a longer life than a proprietary spreadsheet file format. Consider using *text only* (ASCII) within spreadsheets and other documents; color, images, and other embedded objects are difficult to migrate.
  3. Use descriptive file names to reflect the contents of the file. Include enough information to uniquely identify the data file and distinguish them from one another. File names should be included in the ReadMe documentation.
  4. For spreadsheets, explain all column or row codes in the file and/or ReadMe document, and indicate what units are used.
  5. Be as explicit (and consistent) as possible about references to external resources (such as database identifiers, or controlled vocabularies).

Additional detail & examples of best practices for data management can be found here:

  • Some Simple Guidelines for Effective Data Management, Borer ET, Seabloom EW, Jones MB, Schildhauer M (2009). Bulletin of the Ecological Society of America 90(2), 205-214. doi:10.1890/0012-9623-90.2.205.
  • Managing and Sharing Data: a best practice guide for researchers. (2009) UK Data Centre. [1]

A list of preferred file formats can be found here.