Workflow State in Database

Overview
This page describes how the progress of a submitted item is reflected in the Postgres database tables. This occurs in two phases: an item is in the workspace while a submitter is editing the item, and in the workflow while a curator is reviewing the submission. It may be useful to refer to the DSpace database schema (links to page for Dspace 1.7). The Extreme Curation Techniques page has some information on how to manipulate these tables if submissions get broken.

State of item in workspace
When the user starts a dryad submission, a row in the workspaceitem table is created. The workspaceitem row specifies the...

Steps
Progress of a user submission through the workspace consists of a series of steps. These are defined in '''TBD. '''As of Dryad 1.11, these are:


 * DescribePublicationStep
 * SelectPublicationStep
 * CompletedPublicationStep
 * CompletedDataStep

These are the database changes associated with each step.

CompletedDataStep
The item first appears when the submitter completes the publication metadata page.

At this point, the item is represented in the tables item and workspaceitem. The item row specifies the item_id, submitter_id, and false for in_archive and withdrawn. The owning collection is null. The workspaceitem row specifies the workspace_item_id, item_id, and the collection_id (which seems to be a default). The stage_reached is set to 1, as is the page_reached value. The multiple_titles, published_before, and multiple_files fields are null.

After the file specification page is completed (and the submission overview page is displayed), the stage_reached field in the workspaceitem row is set to 3, and page_reached remains 1.

There is a stage 4.... what does it mean?

When the submitter submits the item, the row in workspace item is removed and a row in workflowitem is added.

State of an item in the workflow
When an item is submitted for curation, a row in the workflowitem table is created. After corresponding values are copied from the workspaceitem row, that row is deleted. Workflowitem rows are created for packages and for each file item that is partof the package. In addition, a row in the taskowner table is created for the package (but not individual data files). The taskowner table, which is not part of the standard Dspace schema (and therefore does not appear in the schema diagram linked above), contains the following columns:
 * taskowner_id - row identifier
 * workflow_item_id - id of the workflow_item
 * step_id - string specifying the step (see below)
 * action_id - string specifying the action for the step (needs discussion)
 * workflow_id - string specifying the workflow (seems to be 'default' by default)
 * owner_id - id of the eperson owning the task (e.g., the curator who claimed it)

Another table, tasklistitem holds many of the same fields as taskowner. However, tasklistitem is not used for items in the review stage -- it is used (only?) for items in the task pool that haven't been claimed by a curator yet.

A final table, workflowitemrole, does not seem to be used in Dryad.

Steps
The workflow consists of a series of steps and actions. Steps are states defined by name in {dspace-dir}/config/workflow.xml. Current steps are:


 * requiresReviewStep
 * reviewStep
 * dryadAcceptEditReject
 * finalPaymentStep
 * reAuthorizationPaymentStep
 * pendingPublicationFinalPaymentStep
 * pendingPublicationReauthorizationPaymentStep
 * registerPendingPublicationStep
 * pendingPublicationStep
 * pendingDelete

Steps may have one or more actions. Actions are java classes that do the work of the workflow system. Actions have outcomes (integer return values) that the step can use to route the item to an alternate step. Steps can also have a userSelectionMethod that defines how the step is activated (e.g. claimed by a user or automatically)

These are the database changes associated with each step:

requiresReviewStep
Curator decides whether the submission requires review. It reads and then deletes any metadata tagged with "workflow.submit.skipReviewStage".

reviewStep
Activate action Adds a reviewer key to the metadata (workflow.step.reviewerkey, data = &lt;uuid&gt;). Reads metadata from workflow.review.mailUsers to find reviewers, apart from curators, to review the submission.

Execute action checks for a workflow.step.approved metadata entry to pass the workflow item onto the next step.

dryadAcceptEditReject
The curator can accept or reject the submission at this step. The dryadAcceptEditRejectAction is invoked to handle the curator's action.

The item can be accepted for archival. The archive option action adds a dc.description.provenance metadata record, with data "Approved for entry into archive by &lt;curator&gt; on &lt;time&gt;" to the item's metadata (metadatavalue table). Upon acceptance, the item is checked to see if payment should be collected - it is sent to finalPaymentStep.

The item can be accepted and sent to Publication Blackout. This option adds a dc.description.provenance record with data "Entered publication blackout by &lt;curator&gt; on &lt;time&gt;". The item is then sent to pendingPublicationFinalPaymentStep.

If the item is rejected, a dc.description.provenance record, with data "Rejected by &lt;curator&gt;, reason: &lt;reason&gt; " on " &lt;date/time&gt;" is added to the item. The item is returned to the workspace (meaning a new row in workspaceitem is created and filled from the workflowitem row, which is then deleted).

If the journal is not configured for publication blackout, there are no more steps and the item is Archived.

finalPaymentStep / pendingPublicationFinalPaymentStep
These steps both invoke the FinalPaymentAction to determine how payment will be handled for the submission. They are identical except that pendingPublicationFinalPaymentStep is part of the blackout workflow.

If the submission is attached to a waiver country, a voucher, or a journal with a subscription, the step is complete. If the submission will be paid by an individual, this action uses the PaypalService to complete the transaction.

If the transaction does not complete, the next step is reAuthorizationPaymentStep / pendingPublicationReauthorizationPaymentStep.

reAuthorizationPaymentStep / pendingPublicationReauthorizationPaymentStep
These steps both invoke the ReAuthorizationPaymentAction and use the ReAuthorizationPaymentActionXMLUI to display a user interface to re-authorize payment

registerPendingPublicationStep
This step handles items that have been approved by a curator with the blackout option. Items that enter this step are immediately processed with registerPendingPublicationAction. This action adds a dc.decription.provenance metadata record and a dc.date.accessoned metadata record with the current time. The item's DOI is registered with limited metadata (since it is in blackout) and it moves through the workflow to pendingPublicationStep.

pendingPublicationStep
This is the last step of the publication blackout workflow. Only items that have gone through registerPendingPublicationStep will arrive here. Items in this step are in Publication Blackout, waiting for a curator to claim them. This step uses the afterPublicationAction to display a UI, allowing the curator to move the item from blackout into the archive.

dc.description.provenance metadata record, with data "Approved for entry into archive by &lt;curator&gt; on &lt;time&gt;" to the item's metadata (metadatavalue table). The item's DOI registration will be updated with full metadata and the item is archived.

pendingDelete
This handles the case where the item was rejected by reviewers. It adds a dc.description.provenance metadata record with data "&lt;startid&gt; rejected by &lt;curator or null&gt; reason: rejected by reviewers on &lt;date/time&gt;". It then pushes the item back into the submitter's workspace.

General Discussion
The new workflowitem row has workflow_id and item_id specified, as is the collection_id. The fields multiple_titles, published_before, and multiple_files are set. The state and owner fields are null. The item row appears to be unaltered at this point.

When the item is approved, the workflowitem row is deleted, and the item row is updated: in_archive is set true and the item's owning collection is set (it was previously null).

Sample item -- in workspace
Database tables for a data package with one data file: item = item_id | submitter_id | in_archive | withdrawn |      last_modified        | owning_collection -+--++---++--- 48308  |          137 | f          | f         | 2013-06-18 15:45:25.677-04 | 48307  |          137 | f          | f         | 2013-06-18 15:45:25.674-04 |

workflowitem = none

workspaceitem = workspace_item_id | item_id | collection_id | multiple_titles | published_before | multiple_files | stage_reached | page_reached ---+-+---+-+--++---+-- 49425  |   48308 |             1 |                 |                  |                |             3 |            1 49424   |   48307 |             2 |                 |                  |                |             4 |            1

taskowner = none

Sample item -- in workflow (unclaimed)
item= item_id | submitter_id | in_archive | withdrawn |      last_modified        | owning_collection +--++---++--- 59477  |         4789 | f          | f         | 2013-05-09 10:20:10.358-04 | 59479  |         4789 | f          | f         | 2013-05-09 10:20:10.377-04 |

workflowitem= workflow_id | item_id | collection_id | state | owner | multiple_titles | published_before | multiple_files +-+---+---+---+-+--+ 16310      |   59477 |             2 |       |       | f               | f                | f 16311       |   59479 |             1 |       |       | f               | f                | f

workspaceitem = none

taskowner = none

Sample item -- in workflow (claimed)
Database tables for a data package with one data file: item = item_id | submitter_id | in_archive | withdrawn |       last_modified        | owning_collection -+--++---++--- 48308 |         137 | f          | f         | 2013-06-18 15:13:41.291-04 | 48307 |         137 | f          | f         | 2013-06-18 15:24:51.523-04 |

workflowitem = workflow_id | item_id | collection_id | state | owner | multiple_titles | published_before | multiple_files -+-+---+---+---+-+--+ 9253 |  48308 |             1 |       |       | f               | f                | f 9252 |   48307 |             2 |       |       | f               | f                | f

workspaceitem = none

taskowner = taskowner_id | workflow_item_id |       step_id         |          action_id          | workflow_id | owner_id --+--++-+-+-- 2739 |            9252 | dryadAcceptEditReject  | dryadAcceptEditRejectAction | default     |        1

NOTE: taskowner *only* has an entry for the data package object.

Sample item -- accepted
????

Sample item -- in review
Package:
 * permissions:
 * item Read for COLLECTION_2_WORKFLOW_ROLE_curator
 * metadata:
 * workflow.step.reviewerKey = review code
 * workflow.archive.mailUsers = the proper list of email addresses (one in each occurrence of the field)
 * workflow.step.inProgressUsers = submitter's eperson ID
 * database:
 * taskowner has step_id=reviewStep, action_id=reviewAction, owner_id=submitter's eperson ID
 * workflowitem has normal workflow settings
 * For template SQL to add rows to these tables, see Extreme Curation Techniques

File:
 * permissions:
 * item/bundle/bitstream Read for COLLECTION_2_WORKFLOW_ROLE_curator
 * metadata:
 * nothing special
 * database:
 * nothing special

Sample item -- in blackout
Package:
 * permissions: (none)
 * metadata:
 * dc.description.provenance includes "Entered publication blackout" but not "Approved for entry into archive"
 * database:
 * all same as unclaimed workflow item -- the indication of blackout is stored in solr