Embargo Technology

From Dryad wiki
Jump to: navigation, search

Overview

The embargo system restricts access to bitstreams. It does not affect metadata. When an embargo is in effect, only curators and administrators can access bitstream content.

Functionality

When an item is under embargo, URLs to access the bitstreams will only work correctly if the user is logged in to a privileged account. The web pages for data packages and data files will not display links to the bitstreams. However, metadata about the bitstreams may still be available via some mechanisms.

Embargo information is stored in metadata. However, the actual act of "setting" an embargo takes place in the access control rules. Therefore, it is possible for the metadata and access control rules to become out of sync.

The embaro lifter process runs as a nightly cron job. It lifts the access control rules on any items that have passed their embargoedUntil date. However, it does *not* restore embargoes, so it cannot be used to correct an incorrectly lifted embargo.

Workflow

In Submission System

When a user uploads a data file, they can select an embargo setting. This particular setting is only stored in metadata. The setting is stored in dc.type.embargo. Possible values are:

  • (blank = no embargo)
  • untilArticleAppears
  • oneyear
  • custom

When the item is archived in the repository, the access control restrictions are set, and any dates are updated. (However, there usually is not a date for the system to process at this time.)

When the associated article is published, curators go to the administrative Item Embargo page and set the correct ending date for the embargo.

Command-line (and cron) Tools

/opt/dryad/bin/dspace embargo-lifter

With no options, this command lifts any embargoes that need to be lifted. It uses the class org.dspace.embargo.EmbargoManager. More options for the embargo lifter are below:

usage: org.dspace.embargo.EmbargoManager
-c,--check         Function: ONLY check the state of embargoed Items, do
NOT lift any embargoes.
-h,--help          help
-i,--identifier    Process ONLY this Handle identifier(s), which must be
an Item.  Can be repeated.
-l,--lift          Function: ONLY lift embargoes, do NOT check the state
of any embargoed Items.
-n,--dryrun        Do not change anything in the data model, print
message instead.
-q,--quiet         Do not print anything except for errors.
-v,--verbose       Print a line describing action taken for each
embargoed Item found.

Embargo Validator Script

The embargo validator runs nightly as a Nagios process. To run it manually, log into the dev server and run

/home/dryad/embargo_validator/check_and_mail.sh

The source for the validator is available at https://github.com/datadryad/embargo-validator. It is a python script and can run manually without mailing any results

Embargo Validator Process

When the embargo validator runs, it performs the following steps:

  1. Query SOLR for all data files with an embargo date (dc.date.embargoedUntil) in the future and returns their DOIs
  2. For each embargoed Data File:
    1. Fetch the DRI metadata
    2. Extract the METS metadata URL from the DRI
    3. Fetch the METS metadata
    4. Look for bitstream links and future embargo dates in the METS metadata
    5. Attempt to download any bitstream links for embargoed items
    6. Report on found links and download successes
  3. Fetch the Recently Published Data RSS feed to get DOIs of data packages
    1. Fetch the package DRI/METS metadata to get file METS urls
      1. Check each data file as above. Note these should not be embargoed but the steps are the same

On Administrative Pages

The form for editing embargoes is EditItemEmbargoForm.java

  • The processing for this form occurs in FlowItemUtils.java

When an Embargo is Set/Updated

  1. metadata is removed from dc.date.embargoedUntil
  2. the new lift date is entered in dc.date.embargoedUntil
  3. the embargo is set in the DSpace DefaultEmbargoSetter
    1. for each bundle and each bitstream, the all READ policies are removed -- at the database level, this is "DELETE FROM resourcepolicy WHERE resource_type_id=<<type of the item>> AND resource_id=<<item_id of the item>> AND action_id= 0"

Configuration

Although the dspace.cfg contains configuration settings for embargo, these settings cannot be changed without careful analysis of the associated code. The Dryad-specific embargo code assumes that specific configuration settings are in place. If the configuration is changed without changing the associated code, embargoes will not work as expected.

Relation to DSpace

Dryad's implementation of embargo is a customization of the DSpace embargo system. See the Embargo section of the DSpace manual for more details.

Design Documents