Metacat OAI Provider

Status: Complete

UNM is creating an OAI-PMH provider to add on to Metacat. The core Metacat code is available from the Ecoinformatics SVN. This work will integrate directly into the Metacat development tree.

Project Final Report
The [[Media:Metacat-OAI-PMH-Project-Plan.pdf‎|project final report]] provides background and details on the implementation of the provider.

Data Provider

 * http://metacat.lternet.edu/knb/dataProvider?verb=GetRecord&metadataPrefix=eml-2.1.0&identifier=urn:lsid:knb.ecoinformatics.org:knb-lter-sgs:12
 * http://metacat.lternet.edu/knb/dataProvider?verb=GetRecord&metadataPrefix=oai_dc&identifier=urn:lsid:knb.ecoinformatics.org:knb-lter-gce:26
 * http://metacat.lternet.edu/knb/dataProvider?verb=Identify
 * http://metacat.lternet.edu/knb/dataProvider?verb=ListIdentifiers&metadataPrefix=eml-2.1.0&from=2001-01-01&until=2010-01-01
 * http://metacat.lternet.edu/knb/dataProvider?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2001-01-01&until=2010-01-01
 * http://metacat.lternet.edu/knb/dataProvider?verb=ListMetadataFormats
 * http://metacat.lternet.edu/knb/dataProvider?verb=ListRecords&metadataPrefix=eml-2.1.0
 * http://metacat.lternet.edu/knb/dataProvider?verb=ListRecords&metadataPrefix=oai_dc
 * http://metacat.lternet.edu/knb/dataProvider?verb=ListSets

Harvester
(as of 2011-10-12, not functional)

Requirements

 * Our primary goal is to enhance access, not to provide a failover copy. It is ok for Dryad to only store the Dryad-format metadata, and not use the full EML.
 * Metacat exposes all metadata through an OAI-PMH provider.
 * The provider makes all data available in these formats:
 * Simple DC
 * Dryad application profile (qualified DC with extensions)
 * EML
 * Dryad harvests all metadata exposed by Metacat.
 * The record in Dryad must point to the record in Metacat, enabling users to find the actual datasets.
 * Metacat harvests metadata from the Dryad OAI-PMH provider, and makes it available alongside the native Metacat metadata.
 * The design of the OAI-PMH "adapter" should provide a more generalized interface that supports cross-walking other metadata standards to/from Metacat beyond just EML.
 * Not an actual requirement, but keep in mind that we need to eventually download the data files.

Answered Questions

 * Should any "sets" be defined in the provider? Is there any natural breakdown of the MetaCat data into categories?
 * Identity of data provider (which LTER site)
 * Is there any need to convert from simple DC to EML?
 * This is desirable in the long run, but may not be needed for the immediate project.
 * UNM can send us the metacat XSL, so we can display EML "properly" if we want. Is there any reason Dryad should display the EML in a more Metacat-like manner?
 * Depends on Dryad's approach to displaying contents from other repositories. We probably don't want to display this directly in Dryad, just display the portion of the metadata that fits Dryad, and then link out to the "real" site.
 * How should harvested content be stored/displayed in Dryad?
 * The vast majority of datasets in Metacat do not have direct relationships with publications.
 * Metacat does have "aggregations" of data, but these could easily be represented as individual records (as we are doing with the DC records).
 * Hilmar suggests that for non-publication data, we create a separate section of search results listed as "related content in other repositories"
 * Do the rights statements on Metacat files affect Dryad's ability to make the metadata searchable? Probably not, if we always redirect users to the Metacat item pages.

About Metacat
All Metacat items have an ID with the following parts:


 * scope (knb-lter, esa)
 * identifier
 * revision

Metacat supports "simple" URLs with the format: http://metacat.lternet.edu/knb/metacat/knb-lter-nin.24402/lter


 * knb-lter-nin.24402 is the ID
 * lter is the presentation format

Metacat supports LSID with the format: urn:lsid:esa.org:esa:8:7


 * LSID resolving is "only local" (not remote metacat servers?)
 * LSID is used as the accession number by ESA, but what do other sites use?

About EML
The EML schema documents (EML 2.0.1) can be downloaded at http://knb.ecoinformatics.org/software/download.html#eml (note that there is a new revision, EML 2.1, of the schema due to be released this Spring).

EML examples: station at Brunswick, Georgia for 1915 to 2004.
 * Georgia Coastal Ecosystem LTER (knb-lter-gce.247.9.xml): Annual summaries of daily climatological observations from the National Weather Service weather
 * North Temperate Lake LTER (knb-lter-ntl.110.2): Lake Metabolism in North Temperate Lakes.
 * Andrews Experimental Forest LTER (knb-lter-and.3185.4.xml): Role of vegetation and coarse wood debris on soil processes and mycorrhizal mat distribution patterns at the Hi-15, Andrews Experimental Forest.

Data in EML that has no logical place in the Dryad Application Profile:
 * description of dataset size, encoding, table format, implementation details
 * details of fields within the dataset
 * geographic bounding boxes (Note: After further discussion, Ryan suggested that we should map the four bounding coordinates, each in its own coverage element. -- Duane Costa, 3/10/2009)
 * processing method
 * software
 * author/organization distinction
 * access control