Preservation working group 2013

= Working Group Charge =

To develop a state-of-the-art preservation plan for Dryad, with an emphasis on recommendations that are immediately useful and feasible to implement.

= Working Group Members =

Co-chairs


 * Sara Mannheimer, Dryad curator, SILS/UNC-CH & MRC
 * Jane Greenberg, professor, SILS/UNC-CH, director, MRC

Members


 * Alex Ball, Research Officer, UKOLN, University of Bath
 * Robin Dale, Director of Digital and Preservation Services, LYRASIS
 * Michael Day, Digital Curation Centre, UKOLN, University of Bath
 * Ruth Duerr, Principal Investigator/Project Manager, Data Management and Cyberinfrastructure Manager, Data Stewardship, National Snow & Ice Data Center
 * Cal Lee, Associate Professor, SILS/UNC-CH
 * Robin Rice, Data Librarian, EDINA and Data Library, University of Edinburgh
 * Ayoung Yoon, SILS doctoral student
 * Elena Feinstein, former Dryad curator, SILS/UNC-CH & MRC

= Working Group Meetings =

Kickoff meeting, 26 Feb 2013, 11:00am EST/16:00GMT

 * Connection information sent via email

Agenda


 * 1) Introductions (brief)
 * 2) Overview of working group goals
 * 3) *Creation of Dryad policy statement on preservation (strategic, long term)
 * 4) *Plan for implementation (priorities, immediate actions and long term goals)
 * 5) Quick, preservation-related facts about Dryad
 * 6) *size of collection: more than 7000 files and growing every day, see Dryad homepage for current stats
 * 7) *daily ingest of new content
 * 8) *heterogeneous files types, some non-preferred for preservation (e.g., proprietary formats), see this text file for a snapshot of formats in Dryad on Feb 5 2013
 * 9) *backups and checksums (Ryan will describe)
 * 10) *currently no real preservation activities beyond the basics
 * 11) *in process of joining CLOCKSS
 * 12) Questions from the group
 * 13) Communication preferences: wiki? email list?
 * 14) Action item for group members
 * 15) *Send us one to three examples of model preservation policy documents from other organizations, that are relevant to Dryad
 * 16) *Send us one to three other resources, such as templates, best practices documents, or preferred format documentation
 * 17) *Send us your impression of our most immediate preservation need: what should we start doing right now?

Meeting Notes

Introduction

Jane:


 * Data in Dryad are associated with published articles, usually in scientific & medical publications.
 * Data is heterogeneous.
 * Dryad has some best practices and applications but they are minimal, and haven’t had any preservation plan yet.
 * Need to come up with preservation policy statement, e.g., what are some immediate actions that we can do, and what are the long-term goals.

Elena:


 * Dryad focuses on small and individual data, and deals with heterogeneous file types. Some are not good for preservation.
 * We do some quality control, but generally not doing many. We do back-up and checksum.

Ryan:


 * Having production server is in NCSU, and they do regular maintenance. Had mirror sever at Duke, update every 5 min. The secondary system has own back-up practices. Using DSpace, which automatically record and do checksum.
 * Replication of content: content is in process of replication.
 * Having too many copies of content, but not preserve well.

Ruth


 * How much effort for documenting?

Elena:


 * Very little, model for curation effort is really light. Because of heterogeneous files, we can’t have all different types of files.
 * Not doing as much as quality control as desirable.
 * Hope to educate scientific community about what to do.

Ruth


 * Public display of preservation readiness can be one idea: use 1-5 scale of star system, regarding how understandable data is.
 * Some Dryad data are well described, but others are not clear.
 * File format is the other issue, not all will survive.

Ryan


 * Not get to the problem of file format yet. For now, trying to scientific community get to the practice of depositing, and need to balance as scientists hesitate data archiving.

Ruth


 * Attitude will change quickly.

Jane


 * Prioritizing file format might be needed, e.g., which one has more greater risk.

Ruth


 * If knows file format should be changed for preservation at some point, can change file format shortly after ingest.

Robin D


 * If not do it early on, you may not have enough information.
 * We don’t want many people are reluctant (to deposit) but need to make sure data will be understandable.

Jane


 * Interested in examples of other data repositories. How they convey this with diverse data sets.

Robin D


 * An example of ICPSR

Robin R


 * We encourage submitting definition of datasets.
 * License issue
 * Documentation: can eyeball datasets and see if documentation is usable and reasonable.

Ruth


 * We try to examine data as it comes.
 * We serve very broad audiences; different communities need data in different formats. We reformat data for those communities.
 * How much reusable data is Dryad getting?

Elena


 * Difficult to track
 * We do have some data reused for publication

Jane


 * Know download rate, but it doesn’t necessarily mean that it’s used. We also don’t know if they directly contact to authors.

Robin R


 * Any ambition to get Trusted Digital Repository?

Elena


 * Not current plan to do external audit, but want to get close to in the future.

Robin R


 * Data Seal of Approval as the other example.

Action items


 * Send one to three examples of model preservation policy documents from other organizations, that are relevant to Dryad
 * Send one to three other resources, such as templates, best practices documents, or preferred format documentation
 * Send your impression of our most immediate preservation need: what should we start doing right now?

Schedules


 * Board (membership) meeting at May
 * In late spring, draft policy documentation will be ready for your feedback.

= Materials: links, suggestions, feedback =

Model preservation policies
Ruth:


 * [[media:030617_DMP.pdf|NSIDC's Data Management Policies document]] - which is quite old (all the technical information in it is outdated but for the rest it is still accurate)
 * [[media:Daac_data_policy_V70.pdf|The Data Acceptance Plan for the NSIDC DAAC]] - our NASA funded project
 * [[media:DC_CollectionPolicy_Draft7.pdf|The draft collection policy for the Data Conservancy]] - since this time, the DC has gone to an instance by instance basis; but the base policy is still sound
 * She says: While I am not sure that I would call anything from anybody a "model" document, I do think all of these were appropriate to their goals and audiences and faithful to the idea of data curation.

Ayoung


 * ICPSR's Data management plans
 * Odum Institute preservation policies: http://www.irss.unc.edu/odum/contentSubpage.jsp?nodeid=629

Sara:


 * Purdue University Research Repository Preservation Policy also has a list of several digital preservation policies that informed PURR's policy.
 * ICPSR Digital Preservation Policy Framework
 * Hathitrust, although a different kind of repository, has a nice, concise policy.

Other resources (templates, best practices documentation, etc.)
Ruth:


 * [[media:Nsidc los V2.pdf|The NSIDC Levels of Service document]] - defining the various levels of service we provide.
 * She says: While I am not sure that I would call anything from anybody a "model" document, I do think all of these were appropriate to their goals and audiences and faithful to the idea of data curation.

Ayoung:


 * ICPSR: Elements of data management plans and Framework for creating data management plan
 * MIT's Writing NSF data management plans
 * Research data management plan templates from University of Melbourne

Sara:


 * MetaArchive Digital Preservation Policy Template

ASERL webinar handout

Suggestions of our most immediate preservation need (What should we start doing right now?)
Ruth:


 * I guess my statement of your most immediate need is the issue about metadata - coming up with ways to characterize what you've got, improve it, and check what is being deposited - all the things we discussed [in the meeting].

Additional comments
Hilmar (Assistant Director for Informatics as NESCent):


 * Asks if the Digital Preservation Network accomplishes what we are trying to do with CLOCKSS, though notes that it's not clear how eligible Dryad would be to participate as an independent not-for-profit.