Difference between revisions of "CLOCKSS Technology"
Ryan Scherle (talk | contribs) (→Relation to DSpace) |
m |
||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | <span style="font-size:larger;">'''STATUS UPDATE, November 2014: Dryad's content is not currently being replicated through the CLOCKSS network. A CLOCKSS <span style="font-family: arial, sans-serif;">working group is developing a new and separate set of guidelines for databases; when these are in place, it may be possible for Dryad to enter into a new agreement with CLOCKSS.</span>'''</span> | ||
+ | |||
+ | ________ | ||
+ | |||
Dryad's content will be replicated through the CLOCKSS network. | Dryad's content will be replicated through the CLOCKSS network. | ||
Line 8: | Line 12: | ||
There is a command line process that generates a manifest page. The manifest page will be a static page. It will contain links to all "new" data packages. A new data package is a package that meets at least one of the following conditions: | There is a command line process that generates a manifest page. The manifest page will be a static page. It will contain links to all "new" data packages. A new data package is a package that meets at least one of the following conditions: | ||
− | * The package was archived since the previous manifest page was generated. | + | |
− | * The package, or one of its constituent files, was modified since the previous manifest page was generated. | + | *The package was archived since the previous manifest page was generated. |
− | * At least one file in the package has come out of embargo status since the previous manifest page was generated. | + | *The package, or one of its constituent files, was modified since the previous manifest page was generated. |
− | * A new version of the package has become available since the previous manifest page was generated. | + | *At least one file in the package has come out of embargo status since the previous manifest page was generated. |
+ | *A new version of the package has become available since the previous manifest page was generated. | ||
To generate a new manifest page: | To generate a new manifest page: | ||
− | <pre> | + | <pre>/opt/dryad/bin/generate-sitemaps |
− | /opt/dryad/bin/generate-sitemaps | ||
</pre> | </pre> | ||
− | + | A summary manifest page is available from [http://datadryad.org/htmlmap http://datadryad.org/htmlmap]. This page contains a link to each of the individual manifest pages previously generated. | |
− | A summary manifest page is available from http://datadryad.org/htmlmap. This page contains a link to each of the individual manifest pages previously generated. | ||
== if-modified-since behavior == | == if-modified-since behavior == | ||
Line 24: | Line 27: | ||
Dryad pages respond to HTTP requests that use the if-modified-since header. A summary of the possible request types and responses: | Dryad pages respond to HTTP requests that use the if-modified-since header. A summary of the possible request types and responses: | ||
− | * request for valid Dryad page, without if-modified-since header: 200 OK with page content | + | *request for valid Dryad page, without if-modified-since header: 200 OK with page content |
− | * request for non-existent Dryad page, without if-modified-since header: 404 Not Found with possible page content | + | *request for non-existent Dryad page, without if-modified-since header: 404 Not Found with possible page content |
− | * request for valid Dryad page, using if-modified-since header that has date earlier than page modification date: 200 OK with page content | + | *request for valid Dryad page, using if-modified-since header that has date earlier than page modification date: 200 OK with page content |
− | * request for valid Dryad page, using if-modified-since header that has date later than page modification date: 304 Not Modified without page content | + | *request for valid Dryad page, using if-modified-since header that has date later than page modification date: 304 Not Modified without page content |
− | * request for non-existent Dryad page, using if-modified-since header (with any date): 404 Not Found with possible page content | + | *request for non-existent Dryad page, using if-modified-since header (with any date): 404 Not Found with possible page content |
To test the if-modified-since behavior, use a command like: | To test the if-modified-since behavior, use a command like: | ||
− | <pre> | + | <pre>curl --silent --head --header 'If-Modified-Since: Tue, 15 Oct 2013 22:31:10 GMT' http://localhost:2222/resource/doi:10.5061/dryad.7d6k3 |
− | curl --silent --head --header 'If-Modified-Since: Tue, 15 Oct 2013 22:31:10 GMT' http://localhost:2222/resource/doi:10.5061/dryad.7d6k3 | ||
</pre> | </pre> | ||
− | |||
== Relation to DSpace == | == Relation to DSpace == | ||
Dryad has implemented the CLOCKSS manifest pages as a modification to DSpace's normal sitemap/htmlmap feature. The relevant classes are: | Dryad has implemented the CLOCKSS manifest pages as a modification to DSpace's normal sitemap/htmlmap feature. The relevant classes are: | ||
− | |||
− | |||
− | + | *dspace/modules/api/src/main/java/org/dspace/app/sitemap/HTMLSitemapGenerator.java | |
+ | *dspace/modules/api/src/main/java/org/dspace/app/sitemap/GenerateSitemaps.java |
Latest revision as of 09:00, 12 November 2014
STATUS UPDATE, November 2014: Dryad's content is not currently being replicated through the CLOCKSS network. A CLOCKSS working group is developing a new and separate set of guidelines for databases; when these are in place, it may be possible for Dryad to enter into a new agreement with CLOCKSS.
________
Dryad's content will be replicated through the CLOCKSS network.
CLOCKSS will crawl the Dryad site and harvest publicly-viewable content. Data files will only be replicated in CLOCKSS once any embargo has expired.
Manifest Pages
Each quarter, Dryad publishs a set of manifest pages that list content newly available in Dryad.
There is a command line process that generates a manifest page. The manifest page will be a static page. It will contain links to all "new" data packages. A new data package is a package that meets at least one of the following conditions:
- The package was archived since the previous manifest page was generated.
- The package, or one of its constituent files, was modified since the previous manifest page was generated.
- At least one file in the package has come out of embargo status since the previous manifest page was generated.
- A new version of the package has become available since the previous manifest page was generated.
To generate a new manifest page:
/opt/dryad/bin/generate-sitemaps
A summary manifest page is available from http://datadryad.org/htmlmap. This page contains a link to each of the individual manifest pages previously generated.
if-modified-since behavior
Dryad pages respond to HTTP requests that use the if-modified-since header. A summary of the possible request types and responses:
- request for valid Dryad page, without if-modified-since header: 200 OK with page content
- request for non-existent Dryad page, without if-modified-since header: 404 Not Found with possible page content
- request for valid Dryad page, using if-modified-since header that has date earlier than page modification date: 200 OK with page content
- request for valid Dryad page, using if-modified-since header that has date later than page modification date: 304 Not Modified without page content
- request for non-existent Dryad page, using if-modified-since header (with any date): 404 Not Found with possible page content
To test the if-modified-since behavior, use a command like:
curl --silent --head --header 'If-Modified-Since: Tue, 15 Oct 2013 22:31:10 GMT' http://localhost:2222/resource/doi:10.5061/dryad.7d6k3
Relation to DSpace
Dryad has implemented the CLOCKSS manifest pages as a modification to DSpace's normal sitemap/htmlmap feature. The relevant classes are:
- dspace/modules/api/src/main/java/org/dspace/app/sitemap/HTMLSitemapGenerator.java
- dspace/modules/api/src/main/java/org/dspace/app/sitemap/GenerateSitemaps.java