Difference between revisions of "Metacat OAI Provider"

From Dryad wiki
Jump to: navigation, search
(Open Questions)
 
(23 intermediate revisions by 3 users not shown)
Line 1: Line 1:
UNM is creating an OAI-PMH provider to add on to Metacat. The core Metacat code is available from the [https://code.ecoinformatics.org/code/ Ecoinformatics SVN]. This work will integrate directly into the Metacat development tree.  
+
'''Status: Complete'''
 +
 
 +
UNM is creating an OAI-PMH provider to add on to Metacat. The core Metacat code is available from the [https://code.ecoinformatics.org/code/ Ecoinformatics SVN]. This work will integrate directly into the Metacat development tree.
 +
 
 +
== Project Final Report ==
 +
 
 +
The [[Media:Metacat-OAI-PMH-Project-Plan.pdf‎|project final report]] provides background and details on the implementation of the provider.
 +
 
 +
== Demonstration links ==
 +
 
 +
=== Data Provider ===
 +
 
 +
* http://metacat.lternet.edu/knb/dataProvider?verb=GetRecord&metadataPrefix=eml-2.1.0&identifier=urn:lsid:knb.ecoinformatics.org:knb-lter-sgs:12
 +
* http://metacat.lternet.edu/knb/dataProvider?verb=GetRecord&metadataPrefix=oai_dc&identifier=urn:lsid:knb.ecoinformatics.org:knb-lter-gce:26
 +
* http://metacat.lternet.edu/knb/dataProvider?verb=Identify
 +
* http://metacat.lternet.edu/knb/dataProvider?verb=ListIdentifiers&metadataPrefix=eml-2.1.0&from=2001-01-01&until=2010-01-01
 +
* http://metacat.lternet.edu/knb/dataProvider?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2001-01-01&until=2010-01-01
 +
* http://metacat.lternet.edu/knb/dataProvider?verb=ListMetadataFormats
 +
* http://metacat.lternet.edu/knb/dataProvider?verb=ListRecords&metadataPrefix=eml-2.1.0
 +
* http://metacat.lternet.edu/knb/dataProvider?verb=ListRecords&metadataPrefix=oai_dc
 +
* http://metacat.lternet.edu/knb/dataProvider?verb=ListSets
 +
 
 +
=== Harvester  ===
 +
 
 +
(as of 2011-10-12, not functional)
  
 
== Requirements ==
 
== Requirements ==
Line 7: Line 31:
 
* The provider makes all data available in these formats:
 
* The provider makes all data available in these formats:
 
** Simple DC
 
** Simple DC
** [[Level One Application Profile|Dryad application profile]] (qualified DC with extensions)
+
** [[Metadata Profile|Dryad application profile]] (qualified DC with extensions)
 
** EML
 
** EML
 
* Dryad harvests all metadata exposed by Metacat.
 
* Dryad harvests all metadata exposed by Metacat.
Line 15: Line 39:
 
* Not an actual requirement, but keep in mind that we need to eventually download the data files.
 
* Not an actual requirement, but keep in mind that we need to eventually download the data files.
  
== Tasks ==
 
  
* Ryan/Mark: schedule a bi-weekly call to touch base on progress.
 
* Ryan: discuss with Matt Jones (and Mark?) how to ensure that we're implementing things in a way to will work well for other Metacat installations.
 
* Ryan: work with MRC to determine "ideal" Dryad records.
 
** Initial versions are done, but Ryan needs to verify.
 
* Duane: Create XSL to convert EML -> simple DC. (1 week)
 
* Duane: Create XSL to convert EML -> Dryad application profile. (3 weeks)
 
* Duane: Create XSL to convert dryad application profile -> EML. (included in above)
 
* Ryan: Evaluate quality of simple DC and Dryad application profile records, suggest modifications to XSL.
 
* Duane: Complete/create OAI-PMH provider functionality in LTER Metacat. (2 months)
 
** Depending on status of current Metacat functionality, either complete existing implementation, or integrate a new system such as OCLC's [http://www.oclc.org/research/software/oai/cat.htm OAIcat], a [http://uilib-oai.sourceforge.net/ UIUC provider], or the [http://www.dlese.org/Metadata/tool/index.php DLESE provider].
 
*** Note: There is no existing OAI-PMH implementation for Metacat (per Matt Jones).
 
* Ryan: Configure harvest of Metacat metadata and test metadata availability in Dryad. (1 week)
 
* Duane: Install an OAI-PMH harvester at LTER and configure to harvest from the Dryad provider. (1 month)
 
** Possible implementations include [http://www.oclc.org/research/software/oai/harvester2.htm OCLC's OAIHarvester2], the [http://uilib-oai.sourceforge.net/ UIUC harvester], and the [http://www.dlese.org/Metadata/tool/index.php DLESE harvester].
 
  
== Open Questions ==
+
== Answered Questions ==
  
 
* Should any "sets" be defined in the provider? Is there any natural breakdown of the MetaCat data into categories?
 
* Should any "sets" be defined in the provider? Is there any natural breakdown of the MetaCat data into categories?
Line 39: Line 48:
 
** This is desirable in the long run, but may not be needed for the immediate project.
 
** This is desirable in the long run, but may not be needed for the immediate project.
 
* UNM can send us the metacat XSL, so we can display EML "properly" if we want. Is there any reason Dryad should display the EML in a more Metacat-like manner?
 
* UNM can send us the metacat XSL, so we can display EML "properly" if we want. Is there any reason Dryad should display the EML in a more Metacat-like manner?
 +
** Depends on Dryad's approach to displaying contents from other repositories. We probably don't want to display this directly in Dryad, just display the portion of the metadata that fits Dryad, and then link out to the "real" site.
 +
* How should harvested content be stored/displayed in Dryad?
 +
** The vast majority of datasets in Metacat do not have direct relationships with publications.
 +
** Metacat does have "aggregations" of data, but these could easily be represented as individual records (as we are doing with the DC records).
 +
** Hilmar suggests that for non-publication data, we create a separate section of search results listed as "related content in other repositories"
 +
* Do the rights statements on Metacat files affect Dryad's ability to make the metadata searchable? Probably not, if we always redirect users to the Metacat item pages.
 +
 +
== About Metacat  ==
 +
 +
All Metacat items have an ID with the following parts:
 +
 +
*scope (knb-lter, esa)
 +
*identifier
 +
*revision
 +
 +
Metacat supports "simple" URLs with the format: http://metacat.lternet.edu/knb/metacat/knb-lter-nin.24402/lter
  
== About Metacat ==
+
*knb-lter-nin.24402 is the ID
 +
*lter is the presentation format
  
Metacat supports "simple" URLs with the format:
+
Metacat supports LSID with the format: urn:lsid:esa.org:esa:8:7
http://metacat.lternet.edu/knb/metacat/knb-lter-nin.24402/lter
 
* knb-lter-nin.24402 is the ID
 
* lter is the presentation format
 
  
Metacat supports LSID with the format:
+
*LSID resolving is "only local" (not remote metacat servers?)  
urn:lsid:esa.org:esa:8:7
+
*LSID is used as the accession number by ESA, but what do other sites use?
* LSID resolving is "only local" (not remote metacat servers?)
 
* LSID is used as the accession number by ESA, but what do other sites use?
 
  
Mailing list is at
+
<br>
http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/metacat-dev
 
  
 
== About EML ==
 
== About EML ==
Line 71: Line 91:
 
* description of dataset size, encoding, table format, implementation details
 
* description of dataset size, encoding, table format, implementation details
 
* details of fields within the dataset
 
* details of fields within the dataset
* geographic bounding boxes
+
* geographic bounding boxes (Note: After further discussion, Ryan suggested that we should map the four bounding coordinates, each in its own coverage element. -- Duane Costa, 3/10/2009)
 
* processing method
 
* processing method
 
* software
 
* software
Line 77: Line 97:
 
* access control
 
* access control
  
[[Category:Work Packages]]
 
 
[[Category:Software]]
 
[[Category:Software]]
 +
[[Category:Work Packages Complete]]
 +
[[Category:Handshaking]]

Latest revision as of 09:05, 12 October 2011

Status: Complete

UNM is creating an OAI-PMH provider to add on to Metacat. The core Metacat code is available from the Ecoinformatics SVN. This work will integrate directly into the Metacat development tree.

Project Final Report

The project final report provides background and details on the implementation of the provider.

Demonstration links

Data Provider

Harvester

(as of 2011-10-12, not functional)

Requirements

  • Our primary goal is to enhance access, not to provide a failover copy. It is ok for Dryad to only store the Dryad-format metadata, and not use the full EML.
  • Metacat exposes all metadata through an OAI-PMH provider.
  • The provider makes all data available in these formats:
  • Dryad harvests all metadata exposed by Metacat.
    • The record in Dryad must point to the record in Metacat, enabling users to find the actual datasets.
  • Metacat harvests metadata from the Dryad OAI-PMH provider, and makes it available alongside the native Metacat metadata.
  • The design of the OAI-PMH "adapter" should provide a more generalized interface that supports cross-walking other metadata standards to/from Metacat beyond just EML.
  • Not an actual requirement, but keep in mind that we need to eventually download the data files.


Answered Questions

  • Should any "sets" be defined in the provider? Is there any natural breakdown of the MetaCat data into categories?
    • Identity of data provider (which LTER site)
  • Is there any need to convert from simple DC to EML?
    • This is desirable in the long run, but may not be needed for the immediate project.
  • UNM can send us the metacat XSL, so we can display EML "properly" if we want. Is there any reason Dryad should display the EML in a more Metacat-like manner?
    • Depends on Dryad's approach to displaying contents from other repositories. We probably don't want to display this directly in Dryad, just display the portion of the metadata that fits Dryad, and then link out to the "real" site.
  • How should harvested content be stored/displayed in Dryad?
    • The vast majority of datasets in Metacat do not have direct relationships with publications.
    • Metacat does have "aggregations" of data, but these could easily be represented as individual records (as we are doing with the DC records).
    • Hilmar suggests that for non-publication data, we create a separate section of search results listed as "related content in other repositories"
  • Do the rights statements on Metacat files affect Dryad's ability to make the metadata searchable? Probably not, if we always redirect users to the Metacat item pages.

About Metacat

All Metacat items have an ID with the following parts:

  • scope (knb-lter, esa)
  • identifier
  • revision

Metacat supports "simple" URLs with the format: http://metacat.lternet.edu/knb/metacat/knb-lter-nin.24402/lter

  • knb-lter-nin.24402 is the ID
  • lter is the presentation format

Metacat supports LSID with the format: urn:lsid:esa.org:esa:8:7

  • LSID resolving is "only local" (not remote metacat servers?)
  • LSID is used as the accession number by ESA, but what do other sites use?


About EML

The EML schema documents (EML 2.0.1) can be downloaded at http://knb.ecoinformatics.org/software/download.html#eml (note that there is a new revision, EML 2.1, of the schema due to be released this Spring).

EML examples:

station at Brunswick, Georgia for 1915 to 2004.

Data in EML that has no logical place in the Dryad Application Profile:

  • description of dataset size, encoding, table format, implementation details
  • details of fields within the dataset
  • geographic bounding boxes (Note: After further discussion, Ryan suggested that we should map the four bounding coordinates, each in its own coverage element. -- Duane Costa, 3/10/2009)
  • processing method
  • software
  • author/organization distinction
  • access control