Fixing items lost in approval
In some cases, data files and data packages aren't approved correctly or have inconsistent metadata/DOI registrations. These issues can be fixed, usually by correcting the underyling data in the SQL database, as well as the SOLR index. It may be necessary to update external services (e.g. EZID) as well.
As with all cases of accessing the database directly, be very careful, make backups, and keep a log of all changes - either in a local text file or notes app.
Areas of Interest
- SQL Database
- Controls whether the item archived or in workflow/workspace
- See Workflow State in Database
- Compare contents of item table and metadatavalue with a known good archive
- Since items start in workspace and move to workflow, finding something in workspace here would be rare.
- Other possible scenarios: Package item is archived but one or more data files are not.
- External DOI registration
- DOIs are updated or registered when an item is edited, so this can be an easy fix. Just edit some metadata and change it back.
- Dictated by resourcepolicy table.
- dictates who has access to what. 0 is anonymous user.
- Permissions are assigned on items, bundles, bitstreams.
- look at dryad_resourcepolicy.rb script from returning submissions.
- Also best to compare these to known good packages.
- SOLR indexing
- Index is built off the database, including the item, workflow, metadata, and permissions tables
- Curation system and user submissions, as well as search are based on the solr index
- If something doesn't show up on the site, it's likely an indexing issue. And indexing is based on the database so usually fix the database then reindex the item(s)
Typical issues (and workarounds)
Items lost on approval
Item gets lost when curator approved. dleehr suspects this is a timeout waiting for EZID DOI registration, so the process didn't finish. The item(s) are still in the workflow (curation) but not indexed in solr anymore, so curators can't find them
To verify: search for item id in workflowitem table. If it has a row here, it's in the workflow and has not been archived. Also, check the item table for these items, the in_archive column should be 'f'.
- Reindex the item(s) with
/opt/dryad/bin/dspace update-discovery-index -i <item id>. This will allow it to be found in curation (workflow overview)
- Make sure it is claimable. This script will add tasklistitem rows for all current curators https://github.com/datadryad/dryad-utils/blob/master/fix_tasklistitems.py
- Claim the item as a curator and re-approve it. Or notify curators that it can be re-tried.
Some files not registered and not embargoed
Curator approves a data package with multiple files. Some of the files are approved correctly and have DOI registrations and embargo settings as expected. Some files may be missing metadata, including embargo dates.
The cause needs further investigation, but I believe this happens in the middle of https://github.com/datadryad/dryad-repo/blob/dryad-master/dspace/modules/api/src/main/java/org/dspace/content/InstallItem.java when waiting for service.register() to finish. If service.register() throws an exception or takes too long leading to a timeout, any files left to process will fail.
To verify: Look at the item table for the data file rows. If this is the case, there will be data package items that are in_archive=t but data file items (part of the package) that are in_archive=f.
/opt/dryad/bin/dspace curate -t partiallyinstalleddatafiles -i 10255/3 -r -
This finds data files whose packages are archived and installs them. It also logs the changes in CSV format to the console as a report.