Difference between revisions of "CLOCKSS Technology"

From Dryad wiki
Jump to: navigation, search
(Manifest Pages)
m
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
<span style="font-size:larger;">'''STATUS UPDATE, November 2014: Dryad's content is not currently being replicated through the CLOCKSS network. A CLOCKSS&nbsp;<span style="font-family: arial, sans-serif;">working group is developing a new and separate set of guidelines for databases; when these are in place, it may be possible for Dryad to enter into a new agreement with CLOCKSS.</span>'''</span>
 +
 +
________
 +
 
Dryad's content will be replicated through the CLOCKSS network.
 
Dryad's content will be replicated through the CLOCKSS network.
  
Line 5: Line 9:
 
== Manifest Pages ==
 
== Manifest Pages ==
  
Each quarter, Dryad will publish a set of manifest pages that list content newly available in Dryad.
+
Each quarter, Dryad publishs a set of manifest pages that list content newly available in Dryad.
  
 
There is a command line process that generates a manifest page. The manifest page will be a static page. It will contain links to all "new" data packages. A new data package is a package that meets at least one of the following conditions:
 
There is a command line process that generates a manifest page. The manifest page will be a static page. It will contain links to all "new" data packages. A new data package is a package that meets at least one of the following conditions:
* The package was archived since the previous manifest page was generated.
+
 
* The package, or one of its constituent files, was modified since the previous manifest page was generated.
+
*The package was archived since the previous manifest page was generated.
* At least one file in the package has come out of embargo status since the previous manifest page was generated.
+
*The package, or one of its constituent files, was modified since the previous manifest page was generated.
* A new version of the package has become available since the previous manifest page was generated.
+
*At least one file in the package has come out of embargo status since the previous manifest page was generated.
 +
*A new version of the package has become available since the previous manifest page was generated.
  
 
To generate a new manifest page:
 
To generate a new manifest page:
<pre>
+
<pre>/opt/dryad/bin/generate-sitemaps
/opt/dryad/bin/generate-sitemaps
+
</pre>
 +
A summary manifest page is available from [http://datadryad.org/htmlmap http://datadryad.org/htmlmap]. This page contains a link to each of the individual manifest pages previously generated.
 +
 
 +
== if-modified-since behavior ==
 +
 
 +
Dryad pages respond to HTTP requests that use the if-modified-since header. A summary of the possible request types and responses:
 +
 
 +
*request for valid Dryad page, without if-modified-since header: 200 OK with page content
 +
*request for non-existent Dryad page, without if-modified-since header: 404 Not Found with possible page content
 +
*request for valid Dryad page, using if-modified-since header that has date earlier than page modification date: 200 OK with page content
 +
*request for valid Dryad page, using if-modified-since header that has date later than page modification date: 304 Not Modified without page content
 +
*request for non-existent Dryad page, using if-modified-since header (with any date): 404 Not Found with possible page content
 +
 
 +
To test the if-modified-since behavior, use a command like:
 +
<pre>curl --silent --head  --header 'If-Modified-Since: Tue, 15 Oct 2013 22:31:10 GMT'  http://localhost:2222/resource/doi:10.5061/dryad.7d6k3
 
</pre>
 
</pre>
 +
== Relation to DSpace ==
  
A summary manifest page is available from http://datadryad.org/htmlmap. This page contains a link to each of the individual manifest pages previously generated.
+
Dryad has implemented the CLOCKSS manifest pages as a modification to DSpace's normal sitemap/htmlmap feature. The relevant classes are:
  
[[Category:Technical Documentation]]
+
*dspace/modules/api/src/main/java/org/dspace/app/sitemap/HTMLSitemapGenerator.java
 +
*dspace/modules/api/src/main/java/org/dspace/app/sitemap/GenerateSitemaps.java

Latest revision as of 09:00, 12 November 2014

STATUS UPDATE, November 2014: Dryad's content is not currently being replicated through the CLOCKSS network. A CLOCKSS working group is developing a new and separate set of guidelines for databases; when these are in place, it may be possible for Dryad to enter into a new agreement with CLOCKSS.

________

Dryad's content will be replicated through the CLOCKSS network.

CLOCKSS will crawl the Dryad site and harvest publicly-viewable content. Data files will only be replicated in CLOCKSS once any embargo has expired.

Manifest Pages

Each quarter, Dryad publishs a set of manifest pages that list content newly available in Dryad.

There is a command line process that generates a manifest page. The manifest page will be a static page. It will contain links to all "new" data packages. A new data package is a package that meets at least one of the following conditions:

  • The package was archived since the previous manifest page was generated.
  • The package, or one of its constituent files, was modified since the previous manifest page was generated.
  • At least one file in the package has come out of embargo status since the previous manifest page was generated.
  • A new version of the package has become available since the previous manifest page was generated.

To generate a new manifest page:

/opt/dryad/bin/generate-sitemaps

A summary manifest page is available from http://datadryad.org/htmlmap. This page contains a link to each of the individual manifest pages previously generated.

if-modified-since behavior

Dryad pages respond to HTTP requests that use the if-modified-since header. A summary of the possible request types and responses:

  • request for valid Dryad page, without if-modified-since header: 200 OK with page content
  • request for non-existent Dryad page, without if-modified-since header: 404 Not Found with possible page content
  • request for valid Dryad page, using if-modified-since header that has date earlier than page modification date: 200 OK with page content
  • request for valid Dryad page, using if-modified-since header that has date later than page modification date: 304 Not Modified without page content
  • request for non-existent Dryad page, using if-modified-since header (with any date): 404 Not Found with possible page content

To test the if-modified-since behavior, use a command like:

curl --silent --head  --header 'If-Modified-Since: Tue, 15 Oct 2013 22:31:10 GMT'  http://localhost:2222/resource/doi:10.5061/dryad.7d6k3

Relation to DSpace

Dryad has implemented the CLOCKSS manifest pages as a modification to DSpace's normal sitemap/htmlmap feature. The relevant classes are:

  • dspace/modules/api/src/main/java/org/dspace/app/sitemap/HTMLSitemapGenerator.java
  • dspace/modules/api/src/main/java/org/dspace/app/sitemap/GenerateSitemaps.java