Metacat OAI Provider

From Dryad wiki
Revision as of 14:09, 26 February 2009 by Ryan Scherle (talk | contribs) (About Metacat)

Jump to: navigation, search

UNM is creating an OAI-PMH provider to add on to Metacat. The core Metacat code is available from the Ecoinformatics SVN. This work will integrate directly into the Metacat development tree.

Requirements

  • Our primary goal is to enhance access, not to provide a failover copy. It is ok for Dryad to only store the Dryad-format metadata, and not use the full EML.
  • Metacat exposes all metadata through an OAI-PMH provider.
  • The provider makes all data available in these formats:
  • Dryad harvests all metadata exposed by Metacat.
    • The record in Dryad must point to the record in Metacat, enabling users to find the actual datasets.
  • Metacat harvests metadata from the Dryad OAI-PMH provider, and makes it available alongside the native Metacat metadata.
  • The design of the OAI-PMH "adapter" should provide a more generalized interface that supports cross-walking other metadata standards to/from Metacat beyond just EML.
  • Not an actual requirement, but keep in mind that we need to eventually download the data files.

Tasks

  • Ryan/Mark: schedule a bi-weekly call to touch base on progress.
  • Ryan: discuss with Matt Jones (and Mark?) how to ensure that we're implementing things in a way to will work well for other Metacat installations.
  • Ryan: work with MRC to determine "ideal" Dryad records.
    • Initial versions are done, but Ryan needs to verify.
  • Duane: Create XSL to convert EML -> simple DC. (1 week)
  • Duane: Create XSL to convert EML -> Dryad application profile. (3 weeks)
  • Duane: Create XSL to convert dryad application profile -> EML. (included in above)
  • Ryan: Evaluate quality of simple DC and Dryad application profile records, suggest modifications to XSL.
  • Duane: Complete/create OAI-PMH provider functionality in LTER Metacat. (2 months)
    • Depending on status of current Metacat functionality, either complete existing implementation, or integrate a new system such as OCLC's OAIcat, a UIUC provider, or the DLESE provider.
      • Note: There is no existing OAI-PMH implementation for Metacat (per Matt Jones).
  • Ryan: Configure harvest of Metacat metadata and test metadata availability in Dryad. (1 week)
  • Duane: Install an OAI-PMH harvester at LTER and configure to harvest from the Dryad provider. (1 month)

Open Questions

  • Should any "sets" be defined in the provider? Is there any natural breakdown of the MetaCat data into categories?
    • Identity of data provider (which LTER site)
  • Is there any need to convert from simple DC to EML?
    • This is desirable in the long run, but may not be needed for the immediate project.
  • UNM can send us the metacat XSL, so we can display EML "properly" if we want. Is there any reason Dryad should display the EML in a more Metacat-like manner?
    • Depends on Dryad's approach to displaying contents from other repositories. We probably don't want to display this directly in Dryad, just display the portion of the metadata that fits Dryad, and then link out to the "real" site.

About Metacat

All Metacat items have an ID with the following parts:

  • scope (knb-lter, esa)
  • identifier
  • revision

Metacat supports "simple" URLs with the format: http://metacat.lternet.edu/knb/metacat/knb-lter-nin.24402/lter

  • knb-lter-nin.24402 is the ID
  • lter is the presentation format

Metacat supports LSID with the format: urn:lsid:esa.org:esa:8:7

  • LSID resolving is "only local" (not remote metacat servers?)
  • LSID is used as the accession number by ESA, but what do other sites use?

Mailing list is at http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/metacat-dev

About EML

The EML schema documents (EML 2.0.1) can be downloaded at http://knb.ecoinformatics.org/software/download.html#eml (note that there is a new revision, EML 2.1, of the schema due to be released this Spring).

EML examples:

station at Brunswick, Georgia for 1915 to 2004.

Data in EML that has no logical place in the Dryad Application Profile:

  • description of dataset size, encoding, table format, implementation details
  • details of fields within the dataset
  • geographic bounding boxes
  • processing method
  • software
  • author/organization distinction
  • access control