How To Install Dryad

From Dryad wiki
Jump to: navigation, search

Dryad is a repository for data sets underlying scientific publications, with an initial focus on evolution, ecology, and related fields. It is built on top of DSpace, an open source digital repository software package. There are no official releases of Dryad yet, but you can check its source code out of the project's Git repository and built it yourself. This isn't difficult, but does require that you have a few prerequisites installed.

Building a Virtual Machine with Vagrant

The preferred way to install Dryad is with the Vagrant+Ansible configuration on GitHub.

Vagrant is a tool that allows virtual machines to be created and customized by external scripts. Ansible is a tool that handles provisioning (installing software, running tasks, modifying files) of systems, including virtual machines.

The advantage to this approach is that the configuration is not baked into a virtual machine image, it is specified in the Vagrantfile and the ansible vars. To change the dryad install directory, postgres version, or postgres host, you need only change one line in this file.

The project is available on GitHub: https://github.com/datadryad/vagrant-dryad, full documentation exists there.

Installing Dryad "from scratch"

Installing the Prerequisite Software

Required software:

  • git client
  • Java (MUST BE Oracle version 5 or 6. Java 7 does not work yet.)
  • Maven (version 2.2.1 is recommended. 2.0.x has had problems. Maven 3 does not work at the present time, but may in the future)
  • Ant
  • PostgreSQL
  • Perl
  • Apache Tomcat

Other software that may be used, but is not required:

  • Apache Web Server (to proxy Tomcat)
  • Jenkins continuous integration platform

Since Dryad and DSpace are Java-based projects, they will run on a variety of operating systems. For the purposes of this guide, we will step through installing Dryad on an Ubuntu Linux system.

Installing on Ubuntu Linux (9.10)

Setting up Java

NOTE: As of 2013-1-24, Dryad still requires use of the Sun/Oracle java. This '''cannot''' be OpenJDK, and '''cannot''' be the Oracle java distributed by the apt-get system. It must be downloaded directly from Oracle.

Check that the newly installed Sun JDK/JRE is the default JDK/JRE for your Ubuntu system. We need to do this because Ubuntu comes with OpenJDK installed as the default JDK/JRE. Unfortunately, though, OpenJDK does not work with our version of Cocoon, the framework on which parts of the DSpace user interface are built. So, we need to check the system's default JDK/JRE and set it to be Sun's (if it isn't already).

Depending on how you installed java, you may be able to type:

sudo update-alternatives --config java sudo update-alternatives --config javac

If either of these isn't set to use the Sun version of java or javac, select the number associated with the Sun version so that it will be used as the default.

However, if Dryad doesn't function properly, you may need to edit the startup script for Tomcat, forcing it to use the correct version of Java.

Other dependencies

The core prerequisites for the Dryad project can all be found in the Ubuntu software package management system. Aptitude may used to install them:

sudo aptitude install git
sudo aptitude install maven2
sudo aptitude install ant
sudo aptitude install postgresql
sudo aptitude install perl
sudo aptitude install tomcat6
sudo aptitude install apache2


Preparing the Local System

Typically, when running on a server, Dryad runs under the 'dryad' account. This is not required, but subsequent instructions on this page assume the 'dryad' user is present. Dryad will run out of a normal user account -- just make the appropriate adjustments in the commands below.

Creating a dryad user (Ubuntu)

Creating a 'dryad' User on an Ubuntu Linux System First, we want to create a dryad user on the machine:

sudo useradd -m dryad
sudo passwd dryad

All following commands on this page should be run as the dryad user.

Setting up symlinks and scripts

Dryad installations run by the core Dryad team always use a standard set of directories and commands. Create symlinks and scripts so these standard directories/commands point to the actual installed locations on your machine. See Standard Server Configuration.

Installing Dryad on Linux

git clone http://github.com/datadryad/dryad-repo.git

Dryad uses Maven profiles to configure local instances of the Dryad application. Where before, one would edit the ${dspace.dir}/config/dspace.cfg file, now a Maven profile should be created and the configuration information put in there. The dspace.cfg, as it is checked out of Subversion, is set up to read its configuration values from this Maven profile configuration.

Below is a sample Maven profile (the ${MAVEN_HOME}/conf/settings.xml file) that can be modified for custom use:

<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<profiles>
<profile>
<id>env-dev</id>
<properties>
<!-- This is where you want Dryad installed after the build -->
<default.dspace.dir>/opt/dryad</default.dspace.dir>
<default.dspace.hostname>localhost</default.dspace.hostname>
<default.dspace.port>9999</default.dspace.port>
<!--We suggest dryad_repo as the repo name and dryad_app as the db user-->
<default.db.url>jdbc:postgresql://localhost:5432/dryad_repo</default.db.url>
<default.db.username>dryad_app</default.db.username>
<default.db.password>mydbpassword</default.db.password>
<!-- The DataCite username/password that has permission to register DOIs -->
<default.doi.username>myusername</default.doi.username>
<default.doi.password>mypassword</default.doi.password>
<!-- DOI prefix; 10.5061 is the official Dryad prefix -->
<default.doi.prefix>10.5061</default.doi.prefix>
<!-- Whether URLs and DOIs should resolve locally; true for dev instance -->
<default.dryad.localize>true</default.dryad.localize>
<!-- Whether DOIs should be registered; false for a dev instance -->
<default.doi.datacite.connected>false</default.doi.datacite.connected>
<!-- Mail setup; for development, it's easiest to use a GMail account -->
<default.mail.server>smtp.gmail.com</default.mail.server>
<default.mail.server.username>mygmailaccount@gmail.com</default.mail.server.username>
<default.mail.server.password>mygmailpassword</default.mail.server.password>
<!-- A configuration for using GMail as the mail server -->    <default.mail.extraproperties>mail.smtp.socketFactory.port=465,mail.smtp.socketFactory.class=javax.net.ssl.SSLSocketFactory,mail.smtp.socketFactory.fallback=false</default.mail.extraproperties>
<!-- A configuration for using localhost<default.mail.extraproperties>mail.smtp.localhost=SUBDOMAIN.HOSTNAME.TLD</default.mail.extraproperties>-->
<!-- In dev, these are usually the same, but may be different in production -->
<default.mail.admin>mygmailaccount@gmail.com</default.mail.admin>
<default.mail.help>mygmailaccount@gmail.com</default.mail.help>
<!-- The email of the DSpace account the harvester should be run as -->
<default.harvester.eperson>mygmailaccount@gmail.com</default.harvester.eperson>
<!-- The location of the Solr server that indexes Dryad content -->
<default.solr.search.server>http://localhost:9999/solr/search/</default.solr.search.server>
<default.solr.stats.server>http://localhost:9999/solr/statistics</default.solr.stats.server>
<default.solr.dryad.server>http://localhost:9999/solr/dryad</default.solr.dryad.server>
<default.solr.authority.server>http://localhost:9999/solr/authority</default.solr.authority.server>
<!-- BagIt -->
<default.bagit.executable>/users/dryad/bagit-3.6/bin/bag</default.bagit.executable>
<default.bagit.testing.mode>false</default.bagit.testing.mode>
<!-- Journal config -->
<default.submit.journal.config>/opt/dryad/config/DryadJournalSubmission.properties</default.submit.journal.config>
</properties>
</profile>
</profiles>
</settings>

The next step is to change into the directory that was created by the Subversion check-out process. We want, then, to change directories into the dspace directory of the Dryad subproject (e.g., the dryad/dryad directory):

cd dryad-repo/dspace

If you've used a directory for the dspace.dir value that doesn't already exist, you'll need to create the new directory; this can be done with the mkdir command. For instance:

sudo mkdir /opt/dryad

You will also need to change the ownership of this directory to the dryad user:

sudo chown dryad /opt/dryad

Once this is done, you are ready to build Dryad. For its build management system, Dryad uses Maven 2. This means to compile the Dryad package, you need run a simple command from the dspace directory (e.g., dryad/dryad/dspace). Since you should still be in this directory, type:

mvn package -P env-dev

(Replace "env-dev" with the name of the maven profile you set above.) When this is done, Dryad has been successfully compiled. The next step is to set up the Postgresql database and deploy the Dryad application to a location from which it can be served by Tomcat.


Compiling Dryad on an OSX Machine

Same as above, but the default Maven location is: /usr/share/maven


Initializing Dryad's Postgresql Database

(see PostgreSQL for more about postgres usage)

While DSpace, and so Dryad, uses Maven 2 for compiling, it uses Ant for deploying the application to a Web server. The steps for doing this are very simple but there are a few additional configurations that need to take place first. Again, we'll walk through the steps for a Linux system, but they should be generalizable to a Windows or Mac OS X system as well.

Initializing the Database on an Ubuntu Linux Machine

First, we need to setup the Postgresql database so that it is ready to accept the DSpace/Dryad setup.

(Optional) edit the pg_hba.conf file to change how permissions are handled. The simplest way to set it up is to set the "local" connections to mode "trust". This will allow you to access postgres without using passwords when connecting from the local machine. For more information, see the Troubleshooting section below.

Create a dryad user in the database:

sudo -u postgres createuser -U postgres -d -P dryad_app

Next, we create the database into which Dryad will install itself:

sudo -u dryad createdb -U dryad_app -E UNICODE dryad_repo
Initializing database content

The easiest way to initialize database content is to copy it from an existing Dryad instance: Updating Data from Existing Instance

In the future, Dryad will provide a minimal database image that is more suitable for testing.

Installing with a blank database
NOTICE: Although DSpace will create a blank database, Dryad currently requires
at least one data package in the database. Until Dryad is updated, the above method
is preferred.

Once the database is ready, we can use the Ant script included in the DSpace/Dryad distribution to finish initializing everything. The Ant script uses the configuration that we modified in the config/dspace.cfg file (so will install Dryad to the location set in the dspace.dir variable). If you encounter any problems, check the settings in the dspace.cfg file to make sure that they are what you think they should be.

To run the Ant script, we need to change into the build directory (which was created as a result of running the Maven build script). To do this, type:

cd target/dspace-*.dir

Next, run Ant to initialize the database:

ant fresh_install

Lastly, you want to create an administrator for your Dryad instance; when you run the following script, you will be walked through this process (this should be run from the main dryad/dryad/dspace directory):

bin/create-administrator

Updating Dryad Prior to (Re)Deploying

After the initialization of the Dryad database, you want to run an update that will pull any changes you have made to the Dryad codebase into the instance that is to be deployed using Tomcat. There are a variety of parameters that can be used in the update that will indicate whether you want to update just the webapp, the webapp and the Java code, or the webapp, Java code, and configuration files. The first update command below will update the configuration and pick up on any changes that might have been made to the codebase.

From the target/dspace-*.dir directory, type:

ant -Doverwrite=true update

To just update the webapps (if there haven't been any changes to the Java code) type:

ant update_webapps

Dryad can also be built referencing a particular dspace.cfg file if needed:

ant -Dconfig=/home/dryad/dspace.cfg update

If you find changes that you are expecting to see are not showing up, you can always do a clean install of the codebase using "sudo -u dryad mvn clean package" prior to running the Ant task.

If you are updating a development installation of dryad, you probably want to invoke maven as

mvn clean package -P env-dev -U

If maven complains about not recognizing the env-dev profile (message will appear at the end of the maven output) you may want to check the profiles defined in your maven settings, either in {MAVEN_HOME}/conf/settings.xml or ({USER_HOME}/.m2/settings.xml).  The -U option simply forces maven to check remote repositories for updates.

Deploying Dryad to Tomcat

When running Dryad, the server needs to interact with the Web application in a way that gives it access to directories and databases. This can be accomplished by changing ownership of the deployed directory (in our case, /opt/dryad) to the user running Tomcat (often a tomcat or www-data user) or by having the Tomcat instance run as the dryad user. Which path you choose may be determined by what other webapps you intend to run in Tomcat (and what their needs are) and whether you're using a continuous integration server like Hudson to manage your build processes.

To change the user that Tomcat runs as, edit the /etc/init.d/tomcat6 file using nano:

sudo nano /etc/init.d/tomcat6

When you edit, find the line that says "TOMCAT6_USER=tomcat6" and change it to "TOMCAT6_USER=dryad". You will then also need to change the permissions on the directories to which that Tomcat needs to write.

Tomcat writes to the cache file:

sudo chown -R dryad /var/cache/tomcat6

It writes to the lib directory:

sudo chown -R dryad /var/lib/tomcat6

It also writes to the log directory:

sudo chown -R dryad /var/log/tomcat6

You may need to adjust Tomcat's policy files so the Web application has permission to write to the file system. Newer versions of Tomcat on Ubuntu will require this. Other Linux distributions might not, but it doesn't hurt either. First, use nano to edit the "/etc/tomcat6/policy.d/50local.policy" file.

sudo nano /etc/tomcat6/policy.d/50local.policy

Add the below to the bottom of the file:

grant codeBase "file:///opt/dryad/webapps/-" {
permission java.security.AllPermission;
};

Next, you need to add the Dryad webapps to the Tomcat /etc/tomcat6/server.xml file (where webapps are statically configured). Replace the Host element in the file with the following:

<Host name="localhost" appBase="/opt/dryad/webapps" unpackWARs="true" autoDeploy="true" xmlValidation="false" xmlNamespaceAware="false">
<Context docBase="xmlui" path="" reloadable="true" cachingAllowed="false" allowLinking="true"/>
<Context docBase="solr" path="/solr" reloadable="true" cachingAllowed="false" allowLinking="true">
<Environment name="solr/home" type="java.lang.String" value="/opt/dryad/solr/" override="true" />
</Context>
<Context docBase="oai" path="/oai" reloadable="true" cachingAllowed="false" allowLinking="true"/>
<Context docBase="doi" path="/doi" reloadable="true" cachingAllowed="false" allowLinking="true"/>
</Host>

You will also want to change the port Tomcat runs at (to correspond to the port that we've configured/used in this HowTo); to do this replace:

<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" URIEncoding="UTF-8" redirectPort="8443" />

with:

<Connector port="9999" protocol="HTTP/1.1" connectionTimeout="20000" URIEncoding="UTF-8" redirectPort="8443" />

It's worth noting that the port at which Dryad is run determines which Dryad logo is displayed at the top of the page. (The logo-switching logic is in the theme's Dryad.xsl) Below is a list of ports and corresponding Dryad logos:

  • Port 9999 -- development logo
  • Port 8080 -- production logo
  • Port 8888 -- staging logo
  • Port 7777 -- demo logo
  • Port 6666 -- MRC logo

WARNING: Some web browsers will restrict access to port 6666 (and maybe others). You may need to change your browser configuration to access this port. Alternatively, you can verify that the server is running with a simple curl command, and then set up Apache to proxy the port (see below).

You should now be able to restart Tomcat and have it serve Dryad at the port and URL that you set in the dspace.cfg file.

sudo /etc/init.d/tomcat6 restart

Proxying Tomcat with Apache

For a user-facing Dryad instance, you will want to configure Apache to proxy Tomcat so that all requests come through port 80 and so that the server is listening to a particular domain. It's possible to do this in Tomcat, of course, but many people prefer to use Apache as the public face for their Web applications.

To proxy the Dryad webapp, just add the following to Apache's configuration file for the domain from which you want to serve Dryad.

ProxyPass / http://localhost:9999/
ProxyPassReverse / http://localhost:9999/

For your own development purposes, it is not required that you proxy Tomcat with Apache. Working with Tomcat running at another port (like 9999) works just fine.

Notes:

  1. The production instance of Dryad uses many more settings for proxies and redirects. It is best to copy a configuration file from an existing server.
  2. The default "htdocs" directory for apache should contain copies of the Dryad maintenance page, logo, and favicon.
  3. If you are running more than one Dryad server on a machine, you will need to maintain multiple instances of the VirtualHost section in the Apache configs, with each section declaring a different ServerName.
  4. See the Troubleshooting page for more information about possible problems with the Apache proxy.

Configuring DSpace

To run correctly, Dryad's DSpace instance must be configured, and the relevant collections, communities, and metadata elements must be present in the postgres database. See DSpace Configuration for details.

Running Jenkins to Manage Deployment

Jenkins is a great tool to manage the build/deploy cycle of the code. It does take a bit of setup. Issues to remember:

  • Jenkins must run on a servlet container other than the Dryad Tomcat, because it needs to restart the Dryad Tomcat.
  • We normally use Jetty for the alternate servlet container.
  • Jetty must run on different ports than Tomcat, so they do not interfere with each other.

Scripts to Maintain Dryad

There are a series of DSpace and Dryad scripts that you might want to set in your machine's cron jobs. These keep Dryad running as expected. Below is an example crontab -l listing:

  1. Send out subscription e-mails at 01:00 every day

0 1 * * * /opt/dryad/bin/dspace sub-daily

  1. Generate sitemaps for Dryad at 01:30 every day (also necessary for DOI generation)

30 1 * * * /opt/dryad/bin/dspace generate-sitemaps

  1. Run the media filter at 02:00 every day

0 2 * * * /opt/dryad/bin/dspace filter-media

  1. Run the embargo bit updater at 02:30 every day

30 2 * * * /opt/dryad/bin/dspace embargo-lifter

  1. Run the checksum checker at 03:00

0 3 * * * /opt/dryad/bin/dspace checker -lpu

  1. Run the register-dois script at 03:30 to catch new submissions without DOIs

30 3 * * * /opt/dryad/bin/dspace register-dois

  1. Mail the checker's results to the sysadmin at 04:00

0 4 * * * /opt/dryad/bin/dspace checker-emailer

  1. Run the update-dois script at 4:30 to update DSpace with newly registered DOIs

30 4 * * * /opt/dryad/bin/dspace update-dois

  1. Run stat analyses

0 5 * * * /opt/dryad/bin/dspace stat-general 0 5 * * * /opt/dryad/bin/dspace stat-monthly 0 6 * * * /opt/dryad/bin/dspace stat-report-general 0 6 * * * /opt/dryad/bin/dspace stat-report-monthly

The root account also has a crontab that runs the web-check script. Web-check verifies that the tomcat server is running and restarts if it is not.

Useful resources

Dspace database schema

Final Notes and Troubleshooting

  • The Handle server should only be run on the production machine (not on development or staging machines).
  • To update a handle service you need to re-run the SimpleSetup which will generate a new sitebndl file. That file will need to be sent to CNRI(hdladmin@cnri.reston.va.us) and will be used to update the prefix(es) in the Global Registry.
  • For the error "DSpace has failed to initialize, during stage 3. Error while attempting to read the XML UI configuration file", the solution is to check all paths for config files. Make sure Tomcat has the rights to read them. Note that the XMLUI webapp has a WEB-INF/web.xml that contains a path to the dspace.cfg file.
  • If you're trying to start the handle server and get an error about filename too long, this means that the start script is using a dsrun with an echo statement in it. Remove the echo.
  • For the error "java.net.MalformedURLException: unknown protocol: resource", modify the log4j.properties, so the rootLogger is not set to debug level. See DSpace bug 239.
  • If you get an error related to the bi_* tables on running bin/index-init or bin/index-update, it might be that the dryad_app user doesn't have permission to create tables in the db (which s/he needs); change that then rerun the index-init to fix and create the tables.
  • If you get an error "java.lang.RuntimeException: Unable to aquire dispatcher named default", you may be calling one of the old DSpace shell scripts. Use the bin/dspace command instead.
  • If you get an error about a metadata field, ensure that all required metadata fields exist. "dsrun org.dspace.administer.MetadataImporter -f config/registries/FILENAME" (replace FILENAME with one of the files in the registries directory)
  • As of 2012-11-9, there may be an issue with a missing DOI database (e.g. /opt/dryad/doi-minter) on a fresh install, which stops the webapp from starting in tomcat.
  • Email contact for technical support: help@datadryad.org

Java permission issues

If you encounter strange issues like failures to write to Java Beans or other issues with Cocoon, Spring, Workflow, DSpace Service Manager, etc., it is likely that you are using a version of Java that is incompatible with Dryad. See the section above on setting up Java.

(These problems may be caused by default JVM security settings, rather than the actual version of Java, but we haven't investigated in detail.)

Installation issues with maven

As of 2012-8-1, there is a problem deploying a new installation of Dryad on a fresh machine, due to maven dependencies. The current workaround requires manually installing these dependencies to your local maven repository. (Once Dryad upgrades the underlying DSpace version, this will no longer be an issue.)

You can manually install jars using the following command. In this case, you'll need a copy of each jar on your local filesystem. The files are available from the dspace/etc/discoverySnapshot directory in the dryad-master branch. You can install the files by running the following commands:

cd dspace/etc/discoverySnapshot
mvn install:install-file -Dfile=discovery-solr-provider-0.9.4-SNAPSHOT.jar  -DgroupId=org.dspace.discovery -DartifactId=discovery-solr-provider -Dversion=0.9.4-SNAPSHOT -Dpackaging=jar
mvn install:install-file -Dfile=dspace-solr-solrj-1.4.0.1-SNAPSHOT.jar  -DgroupId=org.dspace.dependencies.solr -DartifactId=dspace-solr-solrj -Dversion=1.4.0.1-SNAPSHOT -Dpackaging=jar
mvn install:install-file -Dfile=discovery-xmlui-block-0.9.4-SNAPSHOT.jar  -DgroupId=org.dspace.discovery -DartifactId=discovery-xmlui-block -Dversion=0.9.4-SNAPSHOT -Dpackaging=jar
mvn install:install-file -Dfile=carrot2-mini-3.1.0.jar -DgroupId=org.carrot2 -DartifactId=carrot2-mini -Dversion=3.1.0 -Dpackaging=jar
cd ../postgres
mvn install:install-file -DgroupId=postgresql -DartifactId=postgresql -Dversion=9.4-1206-jdbc41 -Dpackaging=jar -Dfile=postgresql-9.4-1206-jdbc41.jar

Database Authentication Issues

If you have trouble connecting the database user after you have successfully created the dryad_app user, it might be that your postgresql database is set up for ident authentication instead of password authentication. To change this, edit your pb_hba.conf file (where this is will depend on your system), changing ident to md5 (see the postgresql documentation for more details).

Also, if you don't know the postgres admin password, but have sudo privileges on the machine, you can reset the database password with a couple of steps (if you don't need to reset the database's password, you can step this step).

First, shutdown the database:

sudo service postgresql stop

Next, edit the pg_hba.conf file using nano:

sudo nano /etc/postgresql/(version)/main/pg_hba.conf

Change the "local all postgres md5" line to "local all postgres trust" and restart the server:

sudo service postgresql start

You can then change the postgres password by typing:

psql -U postgres template1 -c "alter user postgres with password '[newpassword]';"

Once this is done, you need to shutdown the server:

sudo service postgresql stop

You can then change back the pg_hba.conf file to use "md5" and restart the server:

sudo service postgresql start

See Also