Extreme Curation Techniques

From Dryad wiki
Jump to: navigation, search

WARNING! The techniques described on this page are dangerous.

Use with caution, and ensure you have a backup in place.

See Also

Locating Content

Locating an item/bitstream, given a DOI

In the database:

select * from metadatavalue where text_value='TARGET_DOI';
select * from item2bundle where item_id=ITEM_ID;
select * from bundle2bitstream where bundle_id=BUNDLE_ID;
select * from bitstream where bitstream_id=BITSTREAM_ID;

Once you have the information from the bitstream table, you can look in the assetstore for the file with the appropriate internal_id.

Replacing a Bitstream

There are times when the interface won't let you directly replace a bitstream (e.g., certain places within the submission workflow).

  1. Locate the appropriate bitstream on disk (see above)
  2. Replace the file (ensure the filename stays the same).
  3. Update the metadata in the bitstream table of the database:
update bitstream set size_bytes=NEW_SIZE,name='NEW_NAME', source='NEW_NAME', checksum='NEW_MD5_CHECKSUM', bitstream_format_id=NEW_FORMAT_ID where bitstream_id=BITSTREAM_ID;

Updating the bitstream_format_id is critical, it dictates the content type. The Apache web server is configured to compress specific content types when sending them to browsers. If you replace a small text file with a large binary file, Apache will try (and fail) to compress the file when a user tries to download it.

Changing Bitstream Formats

  1. Create the appropriate format in the format registry
  2. Note the id number for the format
  3. Update the metadata in the bitstream table:
update bitstream set bitstream_format_id=43 where name like '%.zip';

Altering workflow tables

There have been a number of cases where submissions have gotten into broken states that render normal curation impossible. Workflow processing is described in detail at Workflow State in Database.

Adding rows to workflow tables

Adding a row to workflow tables (e.g., workflowitem, taskowner, tasklistitem) requires generating a new row id for the row (this is actually the case for most dryad tables). There is a function for doing this - getnextid('tablename') - where tablename is just the name of the table you're adding to. For example, the following statement adds a new task with associated workflow_item to the taskowner table and sets the processing to the initial step (and action) and specifies the owner (curator) assigned to this workflow:

 INSERT INTO taskowner(task_owner_id,workflow_item_id,step_id,action_id,workflow_id,owner_id) VALUES (getnextid('taskowner'),9999,dryadAcceptEditReject,dryadAcceptEditRejectAction,default,1001);

The following statement creates a new row in the workflowitem table (for a data file):

INSERT INTO workflowitem(workflow_id,item_id,collection_id,multiple_titles,published_before,multiple_files) VALUES (getnextid('workflowitem'),999999,1,'f','f','f');

Moving content between databases

To migrate content from one database to another:

  1. exported database rows for selected items:

psql -U postgres -d dryad_repo -c "copy (select * from metadatavalue where item_id in (43612,43611,43610,43604,43602,43600,43599,43598,43597,43596,43595,43594,43593,43591)) to stdout" >emMetadataValue.out

  1. copy the exported file to the location of the target database
  1. import tables to postgres

psql -U dryad_app -d dryad_repo -c "copy metadatavalue from stdin" < emMetadataValue.out

  1. update postgres sequences -- this is required so primary keys are generated properly!

SELECT setval('metadatavalue_seq', (SELECT MAX(metadata_value_id) FROM metadatavalue)+1);

Resetting sequences

Sometimes the database sequences get out of sync. When this happens, you will get errors like `ERROR: duplicate key value violates unique constraint`.

To fix these: run `psql -U dryad_app dryad_repo < ~dryad/dryad-repo/dspace/etc/postgres/update_sequence.sql`

Moving content to your control

Sometimes, it is useful to remove an item from the original submitter's control, and pretend it was originally submitted by you. To do this, update the item's owner in two tables:

  • item
  • taskowner

Database Table Replacement

If you manage to accidentally update an entire table instead of just a single row, or do something else horrible to a table in the database:

  1. Find the latest backup of the database at `/opt/dryad-data/databaseBackups`.
  2. Drop the messed-up table from the database: within psql, `drop table <bad_table> cascade;`
  3. Find the sql file in the code base that created the bad table: `grep bad_table dryad-repo/dspace/etc/postgres/*`
  4. Re-execute that file to recreate that table: `psql -U dryad_app dryad_repo < ~dryad/dryad-repo/dspace/etc/postgres/<the_sql_file>`
  5. Restore the table from the backup: `pg_restore -d dryad_repo -U dryad_app -a -c -t  <bad_table> <latest_backup.sql>`
  6. Make sure that the sequences are set correctly: run `psql -U dryad_app dryad_repo < ~dryad/dryad-repo/dspace/etc/postgres/update_sequence.sql`