Difference between revisions of "TreeBASE Submission Integration"

From Dryad wiki
Jump to: navigation, search
m (Relevant Text from the Grant Proposal)
((Disabled) Initial submission to Dryad)
 
(58 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''Status:''' Nescent and Yale are currently finalizing the design and beginning implementation.
+
== Overview  ==
  
== Overview ==
+
Authors who submit content to Dryad have the option to forward their Dryad submission to TreeBASE.  This saves the author time by automating the submission of data to multiple repositories, and creates an explicit link between the entries in the two repositories for easier data reuse.
  
Authors who submit content to TreeBASE or Dryad will be have the option to make their content appear in both systems.
+
Alternately, authors who initially submit content to TreeBASE may create a link from a Dryad data package to the relevant item in TreeBASE.
  
TreeBASE content will be searchable through Dryad, even if the author has not explicitly included the content in a Dryad data package.
+
== Instructions ==
  
This integration will be based on the following technologies:
+
=== Initial submission to TreeBASE ===
* [https://wiki.ucop.edu/display/Curation/BagIt BagIt] -- A lightweight format for packaging digital content and ensuring that it is transferred intact.
 
* [http://www.openarchives.org/pmh/ OAI-PMH] -- A protocol developed by the digital library community to allow harvesting of metadata from remote repositories.
 
  
We are evaluating the [http://purl.org/net/sword/ SWORD] protocol to manage the transfer of BagIt packages, but we have not yet determined whether SWORD will be lightweight enough to justify its use.
+
Authors may choose to deposit their data with TreeBASE first, and link the submission to a Dryad deposit.
  
== Use cases ==
+
After the TreeBASE submission has been completed, the author will login to Dryad and begin a Dryad submission.  At the second stage of the Dryad submission process, the author will be asked to provide the appropriate data file(s). At this point, rather than uploading a new copy of the file(s), the author may enter an identification number for the data and the name of the repository in which the data has been submitted.
  
=== User submits to Dryad first ===
 
  
# User submits Nexus to Dryad, and pushes "send to TreeBASE" button.
+
[[Image:1.png]]
#* Button says "I want to deposit my tree(s) in TreeBASE and enhance the description there. I realize that any annotations I create in TreeBASE will also be released under the CC0 license."
 
# Dryad pushes object to TreeBASE. (This is before the object is curated in Dryad)
 
## citation data and all uploaded nexus files are packed into a BAGIT package and pushed onto TreeBASE
 
## TreeBASE has a PUT RESTful service for receiving data (later this may be reimplemented as a SWORD service)
 
## TreeBASE only accepts the PUT if the sender's IP is within the Dryad range
 
## TreeBASE responds by returning an URL
 
# Dryad emails the user to confirm "your content was forwarded to TreeBASE". The email includes the link, saying "click on this to finalize your submission in TreeBASE". (If the link can be received quickly enough, it is also displayed within the Dryad interface).
 
# User goes to TreeBASE and completes record.
 
## Clicking on the link takes the user to a special log-in page in TreeBASE; upon logging in, TreeBASE is triggered to unpack the BAGIT and create a submission based on the contents
 
# Dryad harvests TreeBASE content.
 
  
=== User submits to TreeBASE first, links from Dryad record ===
 
  
# User submits/edits package in Dryad and includes a TB ID
+
So, if the data has already been uploaded to TreeBASE, instead of uploading the data again, a submitter may just enter the TreeBASE identifier and select TreeBASE as the remote repository from the dropdown menu in the submission form.  This will create a link between the Dryad data record and the data stored in the remote repository.
# Is it already in Dryad (via harvest)?
 
#* If so, create internal links in Dryad
 
#* If not, ask the user for their access code.
 
#* If that doesn't work, tell them to just upload their nexus file as a separate data file
 
# Dryad adds TB ID to the record, and TB will be able to check up on it.
 
# Also add an alert for Dryad curators to follow up.
 
  
=== User submites to TreeBASE, Dryad harvests only ===
 
  
* Treat as any other harvested content (second-class)
+
[[Image:2.png]]
  
 +
=== (Disabled) Initial submission to Dryad ===
  
 +
{{StatusBox|The workflow no longer supports initial submission to Dryad. This feature was disabled due to usability concerns. The documentation will remain here in case we decide to reinstate the feature.}}
  
== Process for completing a submission within TreeBASE ==
+
The submission is initially deposited with Dryad. Data files are then forwarded from Dryad to TreeBASE.
  
=== Minimum Requirements ===
+
At the second stage of the Dryad submission process, a submitter will see the option to "choose file" from their local machine.  This will upload the data into Dryad.  When this has happened, the submission form's page will change to indicate the file size of the uploaded data file.
* nexus file
 
** at least one tree OR at least one matrix
 
** if there is a tree and a matrix, the taxon labels must match up.
 
** must be "understood" by Mesquite
 
* citation
 
* analysis info linking matrices and trees
 
  
=== Detailed Process ===
+
[[Image:3.png]]
  
* create account
+
If a file has been uploaded through the "choose file" interface, at the last stage in the submission process, the author will be given the option to upload the file that has been uploaded to Dryad to TreeBASE as well. Checking the checkbox and selecting TreeBASE from the repository dropdown will initiate the file's upload to TreeBASE.
* login
 
* create new submission
 
* type title
 
** the submission gets a PURL at this point
 
** the PURL can have a code added for reviewer access
 
* fill in citation
 
** minimum: year, title, journal name (or book/section title)
 
** journal names auto-suggest as you type
 
* add authors
 
** minimum: at least one author (with first name and last name)
 
** must always search for an existing author first, even if you know they're not in the system
 
** allows reordering or deleting authors while you're in the process
 
* upload file(s)
 
** minimum: must be nexus, as described above
 
* (optional) add notes
 
** this is a textarea, with a reasonable character limit (not enough for a readme file)
 
* (optional) edit details for matrices
 
* (optional) edit row segment template
 
** minimum: row ID, start index, end index
 
* (optional) provide more details for trees
 
* (optional) taxa
 
** match all named taxa against ubio or ncbi
 
** although the cleanup is optional, the TB editor may reject it if it's not cleaned up
 
* analysis
 
** minimum: create an analysis with at least one step. Typically, this will be a matrix that is processed to create one or more trees.
 
** minimum: otu labels must match in the analysis steps
 
* when initial submission complete, user clicks "change to ready state"
 
** this triggers the curator to look at it
 
** user can leave items as "in progress" as long as they want -- this is a "poor man's embargo" system
 
  
== Open Questions ==
+
[[Image:4.png]]
  
# Can Dryad records be transferred immediately, or must they be approved by a Dryad curator first? If records are transferred before curator approval, when is the permanent ID assigned?
+
The author will receive two email messages:
# Is it possible to carry over authentication? (single sign-on) Can/should Dryad track user account info on other systems? (or will everyone move to DataONE authentication?)
+
# A confirmation of the Dryad submission, including the DOI that Dryad has assigned to the submission.
# Does the user have to press a button to submit to TreeBASE, or could it just be automatic? If we could link the user accounts, the submission could just show us when the user logs into TreeBASE.
+
# A notification that TreeBASE has received the forwarded data files. In this message, there will be a URL for accessing the submission within the TreeBASE system.
# Should TreeBASE have a "pull" method, where users logged in to TreeBASE can import content with a Dryad ID?
 
  
 +
The author will need to follow the URL from the second message. After logging in to TreeBASE, the author will be able to complete the TreeBASE description of the data files, entering information that was not already part of the Dryad submission.
  
== Random Notes ==
+
'''NOTE:''' If files are embargoed when they are deposited at Dryad, these embargo settings will not carry over to TreeBASE by default. The author must select appropriate embargo settings in each system.
* TB does not make content available until the associated article is published
 
* (new) TB only has one identifier, which is used all the way through the process
 
* TB has thousands of in-progress submissions, which are waiting for the publication to be accepted.
 
* Dryad often knows that an article has been accepted, and should tell TB about this
 
* TB may have an embargo process, which Dryad should use for embargoed items
 
  
== (old) Workflow ==
+
== Technical Documentation ==
  
'''NOTE:''' This section is outdated, and needs to be cleaned up. More details are available in the general [https://www.nescent.org/wg_dryad/Category:Handshaking Handshaking] pages.
 
  
[[Image:TreebaseDryadCoordination.JPG|thumb|right|512px|Whiteboard notes from the initial discussion, including integration with Dryad submissions.]]
+
More detail on the TreeBASE/BagIt handshaking can be found on the Dryad [[BagIt Handshaking]] page.
# User submits to Dryad (and completes the submission).
 
# User is presented with a button "Also submit this content to TreeBASE"
 
# When the button is pressed, all relevant Dryad data/metadata is forwarded to TreeBASE as a SWORD package (publication becomes a TreeBASE study, each tree & matrix becomes TreeBASE data).
 
# Items are in the TreeBASE submission system, waiting for the user to finish. The user can login to TreeBASE at any time and complete the submission, adding additional information as necessary. (Or they may ignore it)
 
# When TreeBASE submission is complete Dryad picks up the submission in its next OAI-PMH harvest (from the [[TreeBASE OAI Provider]]).
 
# Dryad matches the items to existing Dryad records. Typically the matching will rely on Dryad handles being present in the records that TreeBASE serves via OAI, but matching may also rely on publication DOI, titles, or other metadata.
 
  
== Relevant Text from the Grant Proposal ==
+
This integration is based on the following technologies:
  
* "[handshaking] so that, where required by the journal or requested by the author, data will simultaneously be deposited in Dryad and... TreeBASE."
+
*[https://wiki.ucop.edu/display/Curation/BagIt BagIt] -- A lightweight format for packaging digital content and ensuring that it is transferred intact.
* "Dryad will collect any metadata required by the target database that has not already been captured, submit the pertinent data to the target database using a non-interactive programmatic gateway, and obtain the submission status, accession numbers, or possible error messages from the target database."
+
*[http://www.openarchives.org/pmh/ OAI-PMH] -- A protocol developed by the digital library community to allow harvesting of metadata from remote repositories.
* "For TreeBASE, we will design and implement a robust, web-service based submission Application Programming Interface (API). An extensive redesign of TreeBASE by the CIPRES project (www.phylo.org) is scheduled for release in 2007. However, it currently lacks a submission API. The software to be added will include the automated data validation steps that are part of the new TreeBASE submission process (e.g. validating the NEXUS format, matching terminal taxa against the uBio NameBank). When TreeBASE rejects a submission, the depositor will be notified, advised how to correct the problem, and asked to resubmit. "
 
  
== Progress with Handshaking (as of 7-22-10) ==
+
We are evaluating the [http://purl.org/net/sword/ SWORD] protocol to manage the transfer of BagIt packages, but we have not yet determined whether SWORD will be lightweight enough to justify its use.
  
* "[handshaking] so that, where required by the journal or requested by the author, data will simultaneously be deposited in Dryad and... TreeBASE."
+
== Design History ==
* "Dryad will collect any metadata required by the target database that has not already been captured, submit the pertinent data to the target database using a non-interactive programmatic gateway, and obtain the submission status, accession numbers, or possible error messages from the target database."
 
* "For TreeBASE, we will design and implement a robust, web-service based submission Application Programming Interface (API). An extensive redesign of TreeBASE by the CIPRES project (www.phylo.org) is scheduled for release in 2007. However, it currently lacks a submission API. The software to be added will include the automated data validation steps that are part of the new TreeBASE submission process (e.g. validating the NEXUS format, matching terminal taxa against the uBio NameBank). When TreeBASE rejects a submission, the depositor will be notified, advised how to correct the problem, and asked to resubmit. "
 
  
  
See also: [[TreeBASE OAI Provider]]
+
For information on design decisions, look at the [https://wiki.ucop.edu/display/Curation/BagIt BagIt] and [http://www.openarchives.org/pmh/ OAI-PMH] pages listed in the Technical documentation and at the [[TreeBASE OAI Provider]] page on this wiki.  We also looked at [http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmi/admiral.aspx ADMIRAL: A data management infrastructure for research across the life sciences]
  
[[Category:Work Packages]]
+
[[Category:Other Repositories]]
 +
[[Category: Features]]
 +
[[Category:Help]]
 
[[Category:Handshaking]]
 
[[Category:Handshaking]]

Latest revision as of 09:51, 16 August 2016

Overview

Authors who submit content to Dryad have the option to forward their Dryad submission to TreeBASE. This saves the author time by automating the submission of data to multiple repositories, and creates an explicit link between the entries in the two repositories for easier data reuse.

Alternately, authors who initially submit content to TreeBASE may create a link from a Dryad data package to the relevant item in TreeBASE.

Instructions

Initial submission to TreeBASE

Authors may choose to deposit their data with TreeBASE first, and link the submission to a Dryad deposit.

After the TreeBASE submission has been completed, the author will login to Dryad and begin a Dryad submission. At the second stage of the Dryad submission process, the author will be asked to provide the appropriate data file(s). At this point, rather than uploading a new copy of the file(s), the author may enter an identification number for the data and the name of the repository in which the data has been submitted.


1.png


So, if the data has already been uploaded to TreeBASE, instead of uploading the data again, a submitter may just enter the TreeBASE identifier and select TreeBASE as the remote repository from the dropdown menu in the submission form. This will create a link between the Dryad data record and the data stored in the remote repository.


2.png

(Disabled) Initial submission to Dryad

Status: The workflow no longer supports initial submission to Dryad. This feature was disabled due to usability concerns. The documentation will remain here in case we decide to reinstate the feature.

The submission is initially deposited with Dryad. Data files are then forwarded from Dryad to TreeBASE.

At the second stage of the Dryad submission process, a submitter will see the option to "choose file" from their local machine. This will upload the data into Dryad. When this has happened, the submission form's page will change to indicate the file size of the uploaded data file.

3.png

If a file has been uploaded through the "choose file" interface, at the last stage in the submission process, the author will be given the option to upload the file that has been uploaded to Dryad to TreeBASE as well. Checking the checkbox and selecting TreeBASE from the repository dropdown will initiate the file's upload to TreeBASE.

4.png

The author will receive two email messages:

  1. A confirmation of the Dryad submission, including the DOI that Dryad has assigned to the submission.
  2. A notification that TreeBASE has received the forwarded data files. In this message, there will be a URL for accessing the submission within the TreeBASE system.

The author will need to follow the URL from the second message. After logging in to TreeBASE, the author will be able to complete the TreeBASE description of the data files, entering information that was not already part of the Dryad submission.

NOTE: If files are embargoed when they are deposited at Dryad, these embargo settings will not carry over to TreeBASE by default. The author must select appropriate embargo settings in each system.

Technical Documentation

More detail on the TreeBASE/BagIt handshaking can be found on the Dryad BagIt Handshaking page.

This integration is based on the following technologies:

  • BagIt -- A lightweight format for packaging digital content and ensuring that it is transferred intact.
  • OAI-PMH -- A protocol developed by the digital library community to allow harvesting of metadata from remote repositories.

We are evaluating the SWORD protocol to manage the transfer of BagIt packages, but we have not yet determined whether SWORD will be lightweight enough to justify its use.

Design History

For information on design decisions, look at the BagIt and OAI-PMH pages listed in the Technical documentation and at the TreeBASE OAI Provider page on this wiki. We also looked at ADMIRAL: A data management infrastructure for research across the life sciences