Difference between revisions of "Metacat OAI Provider"
Ryan Scherle (talk | contribs) (→Open Questions) |
Ryan Scherle (talk | contribs) (→Open Questions) |
||
Line 39: | Line 39: | ||
** This is desirable in the long run, but may not be needed for the immediate project. | ** This is desirable in the long run, but may not be needed for the immediate project. | ||
* UNM can send us the metacat XSL, so we can display EML "properly" if we want. Is there any reason Dryad should display the EML in a more Metacat-like manner? | * UNM can send us the metacat XSL, so we can display EML "properly" if we want. Is there any reason Dryad should display the EML in a more Metacat-like manner? | ||
+ | ** Depends on Dryad's approach to displaying contents from other repositories. We probably don't want to display this directly in Dryad, just display the portion of the metadata that fits Dryad, and then link out to the "real" site. | ||
== About Metacat == | == About Metacat == |
Revision as of 13:39, 26 February 2009
UNM is creating an OAI-PMH provider to add on to Metacat. The core Metacat code is available from the Ecoinformatics SVN. This work will integrate directly into the Metacat development tree.
Requirements
- Our primary goal is to enhance access, not to provide a failover copy. It is ok for Dryad to only store the Dryad-format metadata, and not use the full EML.
- Metacat exposes all metadata through an OAI-PMH provider.
- The provider makes all data available in these formats:
- Simple DC
- Dryad application profile (qualified DC with extensions)
- EML
- Dryad harvests all metadata exposed by Metacat.
- The record in Dryad must point to the record in Metacat, enabling users to find the actual datasets.
- Metacat harvests metadata from the Dryad OAI-PMH provider, and makes it available alongside the native Metacat metadata.
- The design of the OAI-PMH "adapter" should provide a more generalized interface that supports cross-walking other metadata standards to/from Metacat beyond just EML.
- Not an actual requirement, but keep in mind that we need to eventually download the data files.
Tasks
- Ryan/Mark: schedule a bi-weekly call to touch base on progress.
- Ryan: discuss with Matt Jones (and Mark?) how to ensure that we're implementing things in a way to will work well for other Metacat installations.
- Ryan: work with MRC to determine "ideal" Dryad records.
- Initial versions are done, but Ryan needs to verify.
- Duane: Create XSL to convert EML -> simple DC. (1 week)
- Duane: Create XSL to convert EML -> Dryad application profile. (3 weeks)
- Duane: Create XSL to convert dryad application profile -> EML. (included in above)
- Ryan: Evaluate quality of simple DC and Dryad application profile records, suggest modifications to XSL.
- Duane: Complete/create OAI-PMH provider functionality in LTER Metacat. (2 months)
- Depending on status of current Metacat functionality, either complete existing implementation, or integrate a new system such as OCLC's OAIcat, a UIUC provider, or the DLESE provider.
- Note: There is no existing OAI-PMH implementation for Metacat (per Matt Jones).
- Depending on status of current Metacat functionality, either complete existing implementation, or integrate a new system such as OCLC's OAIcat, a UIUC provider, or the DLESE provider.
- Ryan: Configure harvest of Metacat metadata and test metadata availability in Dryad. (1 week)
- Duane: Install an OAI-PMH harvester at LTER and configure to harvest from the Dryad provider. (1 month)
- Possible implementations include OCLC's OAIHarvester2, the UIUC harvester, and the DLESE harvester.
Open Questions
- Should any "sets" be defined in the provider? Is there any natural breakdown of the MetaCat data into categories?
- Identity of data provider (which LTER site)
- Is there any need to convert from simple DC to EML?
- This is desirable in the long run, but may not be needed for the immediate project.
- UNM can send us the metacat XSL, so we can display EML "properly" if we want. Is there any reason Dryad should display the EML in a more Metacat-like manner?
- Depends on Dryad's approach to displaying contents from other repositories. We probably don't want to display this directly in Dryad, just display the portion of the metadata that fits Dryad, and then link out to the "real" site.
About Metacat
Metacat supports "simple" URLs with the format: http://metacat.lternet.edu/knb/metacat/knb-lter-nin.24402/lter
- knb-lter-nin.24402 is the ID
- lter is the presentation format
Metacat supports LSID with the format: urn:lsid:esa.org:esa:8:7
- LSID resolving is "only local" (not remote metacat servers?)
- LSID is used as the accession number by ESA, but what do other sites use?
Mailing list is at http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/metacat-dev
About EML
The EML schema documents (EML 2.0.1) can be downloaded at http://knb.ecoinformatics.org/software/download.html#eml (note that there is a new revision, EML 2.1, of the schema due to be released this Spring).
EML examples:
- Georgia Coastal Ecosystem LTER (knb-lter-gce.247.9.xml): Annual summaries of daily climatological observations from the National Weather Service weather
station at Brunswick, Georgia for 1915 to 2004.
- North Temperate Lake LTER (knb-lter-ntl.110.2): Lake Metabolism in North Temperate Lakes.
- Andrews Experimental Forest LTER (knb-lter-and.3185.4.xml): Role of vegetation and coarse wood debris on soil processes and mycorrhizal mat distribution patterns at the Hi-15, Andrews Experimental Forest.
Data in EML that has no logical place in the Dryad Application Profile:
- description of dataset size, encoding, table format, implementation details
- details of fields within the dataset
- geographic bounding boxes
- processing method
- software
- author/organization distinction
- access control