Old:DataCite Metadata

From Dryad wiki
Jump to: navigation, search
 Status: This document is for historical purposes only. For current information about DOI registration, see DOI Services.


The canonical version of this document has now been moved offline. To include your comments, please contact Ryan directly.

Comments on the 1.1 version of the DataCite Metadata Kernel


Dryad uses DataCite to register DOIs for data. DataCite is in the process of specifying the metadata that will be associated with each DOI. This metadata will be widely used, both by tools that want to look up metadata for individual items and by other services that manage DOIs (e.g., CrossRef).

(Note: The documentation for this Metadata Kernel is located in the Dryad Dropbox at Dryad/dataCiteDOI/DataCite-MetadataKernel_v1.docx)

Overall, the Metadata Kernel is well designed, and will meet Dryad's needs. There are few required elements, which map well to metadata available in Dryad (with the exception of Element 6 Discipline, mentioned below). The optional elements allow a great amount of detail, and will be useful to capture the most important metadata stored in Dryad. In particular, Element 13 RelatedIdentifier will be critical to express relationships between Dryad data packages, Dryad data files, and the journal articles associated with the data.

Nicely articulated introduction. Section 1.2, consider integrating the phrases “best practice” in this text, and in connection w/optional elements. Understood that the Metadata WG may want to hold on to the phrase “best practice” until further implementation and evaluation, but the work is, definitely, headed in this direction, and the mandatory with relevant optional elements (case-by-case), generally represent a best practice.--Janeg@ils.unc.edu 12:10, 12 September 2010 (EDT)

General comment: Has there been any discussion of the word “element” vs. “property”? Dublin Core/linked data community has been shifting from element to property in documentation over the last 5 years; and Dryad app. profile uses the word property.--Janeg@ils.unc.edu 12:10, 12 September 2010 (EDT)

Specific Comments

Page 6:

  • In the proposed citation style, why is the DOI repeated with two different representations?
  • Section 2.2, Citation. Consider spacing recommendation between elements. This is not just a holdover from ISBD, but a key aspect of citation practice. In most cases, it will not impact parsing and interoperability, but still, something to consider. --Janeg@ils.unc.edu 12:14, 12 September 2010 (EDT)

Required fields:

  • Element 2 Creator: It would be very useful for DataCite to specify more structure (whether a given name or surname should be first, separate email addresses from personal names, etc.). This will head off many issues with combining metadata from separate sources. (Matt Jones also noted this)
  • Element 2 Creator: Alternatively, one could specify name format, and allow one of the formats to be an ORCID-style ID.
  • Element 2: Section 2.3, Creator: Definition not consistent w/Dublin Core, and somewhat ambiguous with the phrase “working in the data.” [from jane: Brings up visions of scientists in a bath of data!] Generally, creator is in reference to individuals (or organizations) chiefly responsible for the intellectual creation. With data this is the actually gathering, productions, creation of the data object. --Janeg@ils.unc.edu 12:14, 12 September 2010 (EDT)
  • Element 4 Publisher: It would be helpful to control publisher names to avoid different forms of the same publisher being used for different records. A registry could include acceptable abbreviations and synonyms.
  • Element 5 Date: Should the citation really include the full timestamp, or just the year? [--Tjvision 17:05, 11 September 2010 (EDT)]
  • Element 6 Discipline: How will this information be used by other systems? Is it important enough to be required? We don't currently have a way to reliably assign DDCs, though we do have variably controlled keywords (free text, MeSH terms, etc). So our dc:subject will not map to DataCite Discipline. Also, there is overlap between some of the values most relevant to our content (e.g. 570 Life sciences biology; 580 Plants (Botany) 560 Fossils & prehistoric life). We might be forced to simply use the most general category for all data, which would make it more efficient to assign it to the publisher.
  • Element 6: Discipline. Consider being explicit here in a recommended best practice. Is the preferred practice to include both the DDC class notation (the 3 digits) and the term (the controlled term)? The use of the phrase “controlled vocabulary” may confuse some, particularly w/DDC, which is a classification system. It is possible that some will just use the term, not the class code, and vice/versa. Perhaps list acceptable secondary systems, with this element being repeatable. --Janeg@ils.unc.edu 12:14, 12 September 2010 (EDT)

Optional fields:

  • Element 7 PublicationPlace: This doesn't have much meaning for digital resources that are widely replicated. We could enter the physical location of our primary copy, but I cannot imagine how someone would use this information (particularly since the values are uncontrolled).
  • Element 9.1 dateType: I don't understand the values "EndPublicationDate" and "StartPublicationDate". What do these mean?
  • Element 9.1 dateType: Is the dateType "available" expressing the same thing as the required PublicationDate? If not, what is the difference?
  • Element 12 ResourceIdentifier: What is the difference between specifying a ResourceIdentifier vs specifying a RelatedIdentifier with type isAlsoPublishedAs? These seem to be expressing the same relationship.
  • Element 13.1 Include LSID
  • Element 13.2 relationType: These are great, and will be very useful for expressing our required relationships. There seems to be some overlap between isCompiledBy and isPartOf. Can these be combined? If not, I suggest isCompiledBy be changed to isAggregatedBy for consistency with OAI-ORE. [Is the converse relationship missing: Aggregates? --Tjvision 17:05, 11 September 2010 (EDT)]
  • Element 13.2 Within Dryad, we will need to figure out best practice for describing the relationship to the article describing the initial collection of the data. Is it an instance of IsCitedBy, or should it have a different relationship?
  • Element 15 Format: Is it possible to allow any valid mime type, rather than just this list? There are many useful types missing from the current list (e.g., application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
  • Element 15 It would be useful to distinguish the format of the resource from that of the landing page (and to know if a landing page is in fact present). It would be even better to require enough information to support content negotiation. In which case, size (element 14) might be ambiguous.
  • Element 17 Rights: The description indicates embargo information should be stored here. How is embargo expressed using the controlled values?
  • Element 17 Rights: We would like to add Creative Commons Zero to the listed license types, if that is different from CC-PD.
  • Element 18.1 Do not understand the meaning of 'DataSetSoftware'.
  • General comments: Tables p. 7-15, concise, informative, and user friendly. --Janeg@ils.unc.edu 12:14, 12 September 2010 (EDT)