Difference between revisions of "DNS and Failover"

From Dryad wiki
Jump to: navigation, search
(Basic DNS setup)
 
(40 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 Goals:  
+
Dryad's DNS and failover system is managed by Amazon Route 53. During outages of the primary Dryad server, the failover system re-routes traffic to a secondary server.
  
*Provide one way replication to a read only copy of the primary datadryad.org server  
+
Goals:
*Make replication as close to real time as possible  
+
 
 +
*Provide one way replication to a read only copy of the primary datadryad.org server
 +
*Make replication as close to real time as possible
 
*Make failover to the secondary server and failback to the primary server automatic
 
*Make failover to the secondary server and failback to the primary server automatic
  
<br>  
+
This page focuses on the process for failover. For details about the servers that serve as primary and secondary, see [[WG:Server Setup]].
 +
 
 +
== Basic DNS setup ==
 +
 
 +
The primary DNS records are managed at University of California, but they redirect to nameservers at Amazon. So most DNS changes must be managed through the Route 53 system.
 +
 
 +
== Detecting Failure ==
 +
 
 +
Failover is based on tests made by Amazon Route 53. There is a health called "Failover -- Primary Test". This test combines the results of some other tests to determine whether the failover should occur. When this test is triggered, the Amazon DNS system will start delivering the IP address for the secondary server instead of the primary server.
 +
 
 +
Failover can be manually triggered (e.g. for system upgrades) using a process described on the [[WG:Emergency]] page.
 +
 
 +
== What happens during a failure ==
 +
 
 +
*During a failure of the primary server, all datadryad.org requests go to the secondary server.
 +
*Apache is configured on the secondary server to disallow logins or submission of data.
 +
**The Dryad pages have login/submission features replaced with messages saying the feature is currently disabled.
 +
**If users try to access a login/submission URL directly (e.g., using a link from an integrated journal), a static HTML page is displayed. This page explains that submissions are currently disabled.
 +
 
 +
== Secondary server ==
 +
 
 +
Apache mod_rewrite and mod_substitute disable logins to this instance of Dryad
 +
 
 +
Files replicate by rsync.
 +
 
 +
== Keeping secondary server in sync ==
 +
 
 +
=== Main rsync of data files ===
 +
 
 +
There is a cron job on the secondary server (root account) that performs the rsync every minute: /root/scripts/rsync-from-ncsu.sh
 +
 
 +
=== Solr replication ===
 +
 
 +
*Solr master-slave replication for solr indexes as per&nbsp;[http://wiki.apache.org/solr/SolrReplication http://wiki.apache.org/solr/SolrReplication]
 +
**Configuration is stored in /opt/dryad/solr/&lt;index&gt;/conf/solrconfig.xml, with different settings on the primary and secondary machines.
 +
**Can see some details with [http://secundus.datadryad.org/solr/dryad/replication?command=details http://secundus.datadryad.org/solr/dryad/replication?command=details]
 +
 
 +
=== Database sync (bucardo) ===
 +
 
 +
If the sync email (nightly cronjob on this system to admin@datadryad.org) shows that PGSQL tables are out of sync, then on the secondary server run these commands to resync all tables:
 +
<pre>bucardo_ctl update sync dryad_delta_sync3 onetimecopy=1
 +
bucardo_ctl reload dryad_delta_sync3
 +
bucardo_ctl status dryad_delta_sync3
 +
</pre>
 +
*The secondary system is running Bucardo which provides asynchronous database replication of the dryad_repo database from the primary server at NCSU to the secondary server at Duke
 +
 
 +
==== Sysadmin notes on Bucardo ====
 +
 
 +
'''Notes in this section need to be validated. They may not apply to Dryad, since they were originally used for other NESCent projects.'''
 +
 
 +
*does not add sequences by default, add them just like tables
 +
*async master-master replication - using swap sync whichever row in a DB was changed most recently can be updated on the other DB. Right now only 2 DBs are supported.
 +
*install instructions are good except for the need for the following
 +
<pre>#bucardo_ctl set default_email_from=bucardo@pikaia.nescent.org
 +
#bucardo_ctl set default_email_to=sysadmin@nescent.org
 +
 
 +
[root@pikaia ~]# cat /etc/bucardorc
 +
dbpass=thebucardopass
 +
debugfile=0
  
Current situation:  
+
</pre>
 +
*Set up cron jobs!!!!! [http://bucardo.org/wiki/Bucardo/Cron http://bucardo.org/wiki/Bucardo/Cron]
 +
*set up replication on slave server where db is bodysize_production and schema_info is a table without a primary key
 +
<pre>createdb -Upostgres bodysize_production
 +
pg_restore -Upostgres -F c -d bodysize_production bodysize_production
 +
bucardo_ctl add database bodysize_production name=bodysize_master host=hyneria.nescent.org user=postgres pass=thepassword
 +
bucardo_ctl add database bodysize_production name=bodysize_slave host=localhost user=postgres pass=thepassword
 +
# add all tables except schema_info which has no primary key to a delta sync
 +
bucardo_ctl add all tables db=bodysize_master --herd=bodysize1 -T schema_info
 +
# if you want to add sequences add the following
 +
# bucardo_ctl add all sequences db=bodysize_master --herd=bodysize1
 +
bucardo_ctl add sync bodysizedelta source=bodysize1 targetdb=bodysize_slave type=pushdelta
 +
bucardo_ctl validate bodysizedelta
 +
# add tables without primary keys to a separate fullcopy sync
 +
bucardo_ctl add herd bodysize2 schema_info
 +
bucardo_ctl add table schema_info db=bodysize_master --herd=bodysize2
 +
bucardo_ctl add sync bodysizefull source=bodysize2 targetdb=bodysize_slave type=fullcopy
 +
#to get the sync going, since it is already running we will reload instead of bucardo_ctl start
 +
bucardo_ctl reload_config
 +
bucardo_ctl kick bodysizefull
 +
bucardo_ctl status
 +
bucardo_ctl status bodysizefull
 +
#force a full copy rather than a delta with the following two commands to catch up with changes that occurred between the pg_restore and pushdelta sync
 +
bucardo_ctl update sync bodysizedelta onetimecopy=1
 +
bucardo_ctl reload bodysizedelta
  
*Dryad production at NCSU runs rsync every minute of everything in /opt (unless the previous rsync run hasn't finished) to the failover system at Duke
+
rubyrep scan be used to double-check sync status
*The failover system is running Bucardo which provides asynchronous database replication of the dryad_repo database from the primary server at NCSU to the secondary server at Duke
+
./rubyrep scan -s -b -c bodysize.conf
*Apache is configured on the secondary server at Duke to disallow logins or submission of data. &nbsp;Users would never see this FQDN, but the secondary site can be reached directly at http://dryad-dev.nescent.org  
+
</pre>
*As member universities of MCNC, Duke and NCSU have free access to MCNC's Cisco GSS systems. &nbsp;These systems are redundant and very reliable. &nbsp;MCNC has configured the servers for DNS based failover from the primary to the secondary datadryad.org systems.
+
*some more about Bucardo at [http://bucardo.org/wiki/Bucardo/Documentation/Overview Bucardo wiki]
  
#failover is based on http head requests. &nbsp;If the webserver returns a 200 status, the primary site is considered up. &nbsp;If not, the GSS sends new DNS requests to the secondary server until the primary server responds again. &nbsp;When the primary server responds again, the DNS look ups point to it again. &nbsp;http://www.cisco.com/en/US/docs/app_ntwk_services/data_center_app_services/gss4400series/v1.3/configuration/cli/gslb/guide/Intro.html#wp1119392
+
== Configuration ==
#I have verified that this works as expected using the datadryad.com domain&nbsp;
 
  
*All traffic between the two servers goes through IPSEC to encrypt all data transfer (and keep from getting blocked by the Duke Intrustion Prevention System)<br>
+
(This section needs to be updated for Amazon)
  
<br>
+
== Other notes ==
  
Ideas for improvement:  
+
Ideas for improvement:
  
*If we don't want to depend on a third party like MCNC or want more extensive "health" checks, we could set up a virtual machine (or two) at a cloud host such as EC2 and use it for failover. &nbsp;This would allow for more extensive testing of the primary site in order to trigger a failover. &nbsp;I have used this in the past (http://cbonte.github.com/haproxy-dconv/configuration-1.4.html#4-http-check expect) and it can trigger failover based on a string in the HTTP response similar to our current Nagios heath checks. &nbsp;This would also be inexpensive ($50-$100/month) as the virtual machines could be very small such as EC2 micro instances. &nbsp;Large data transfers could go directly to the primary server rather than through the load balancer and thus would not count against any bandwidth quotas.  
+
*See [https://trello.com/c/YcZs6y1R Trello card on cloud architecture]
*Rather than rsync, we could use something like glusterfs for real time file replication. &nbsp;This would require extensive testing and be much more complex, but is a mature technology and widely used -&nbsp;http://www.gluster.org/community/documentation/index.php/Gluster_3.2:_Managing_GlusterFS_Geo-replication. &nbsp;I am have been using glusterfs on 8 old DSCR nodes we used for OpenSim.  
+
*We need to have solr run in a separate instance of tomcat or jetty or upgrade to the latest solr before this will work. Jetty would probably use less memory.
*If we want to stick with MCNC or another failover service using HTTP status for heath checks, we could set up Nagios health checks of the production site that would shut down Apache and trigger a failover if a certain string is not on the website.  
+
*If we don't want to depend on a third party like MCNC or want more extensive "health" checks, we could set up a virtual machine (or two) at a cloud host such as EC2 and use it for failover. &nbsp;This would allow for more extensive testing of the primary site in order to trigger a failover. &nbsp;I have used this in the past ([http://cbonte.github.com/haproxy-dconv/configuration-1.4.html#4-http-check http://cbonte.github.com/haproxy-dconv/configuration-1.4.html#4-http-check] expect) and it can trigger failover based on a string in the HTTP response similar to our current Nagios heath checks. &nbsp;This would also be inexpensive ($50-$100/month) as the virtual machines could be very small such as EC2 micro instances. &nbsp;Large data transfers could go directly to the primary server rather than through the load balancer and thus would not count against any bandwidth quotas.
 +
*Rather than rsync, we could use something like glusterfs or csync or unison for real time two-way file replication. &nbsp;This would require extensive testing and be much more complex, but is a mature technology and widely used -&nbsp;[http://www.gluster.org/community/documentation/index.php/Gluster_3.2:_Managing_GlusterFS_Geo-replication http://www.gluster.org/community/documentation/index.php/Gluster_3.2:_Managing_GlusterFS_Geo-replication]. &nbsp;I am have been using glusterfs on 8 old DSCR nodes we used for OpenSim.
 +
*If we want to stick with MCNC or another failover service using HTTP status for heath checks, we could set up Nagios health checks of the production site that would shut down Apache and trigger a failover if a certain string is not on the website.
 +
*Use two way database replication. &nbsp;Bucardo supports this and the basics could be set up fairly easily, but would require much testing.
 
*Make the failover site read/write. &nbsp;If we control the failover process, we could make the secondary server read/write. &nbsp;Before we would switch back to the primary, we could sync files and the database back from the secondary to the primary. &nbsp;This would involve some down time and more complication, but it doable.
 
*Make the failover site read/write. &nbsp;If we control the failover process, we could make the secondary server read/write. &nbsp;Before we would switch back to the primary, we could sync files and the database back from the secondary to the primary. &nbsp;This would involve some down time and more complication, but it doable.
 +
*SOLR 4 (in alpha as of 8/2012) should handle master-master replication. &nbsp;Currently SOLR can only perform master-slave replication.
 +
 +
<br/>Misc.
 +
 +
Current rsync settings:
 +
 +
nice -n 11 ionice -c2 -n7 rsync -ahW --progress --stats --delete --exclude 'solr/' --exclude 'access.log*' --exclude 'tivoli/' --exclude 'log/' /opt/ secundus.datadryad.org:/opt/
 +
 +
 +
 +
These items could be synced on a weekly basis or when needed:
 +
 +
rsync -ahW --progress --stats --delete --exclude 'largeFilesToDeposit/' --exclude 'memcpu_dump/' /home/ dryad-dev.nescent.org:/home/<br/>#after changes to production configuration<br/>rsync -ahW --progress --stats --delete --exclude 'access.log*' --exclude 'tivoli/' --exclude 'log/' /opt/ dryad-dev.nescent.org:/opt/<br/>rsync -ahW --progress --stats --delete /usr/local/apache-tomcat-6.0.33/ dryad-dev.nescent.org:/usr/local/apache-tomcat-6.0.33/<br/>rsync -ahW --progress --stats --delete --exclude='logs/' --exclude='temp/' --exclude='newrelic/' /var/tomcat/ dryad-dev.nescent.org:/var/tomcat/<br/>rsync -ahW --progress --stats --delete /usr/java/ dryad-dev.nescent.org:/usr/java/<br/>rsync -ahW --progress --stats --delete /var/www/dryad/ dryad-dev.nescent.org:/var/www/dryad/<br/>#only run during setup<br/>#rsync -avhW --progress --stats --delete /etc/httpd/ dryad-dev.nescent.org:/etc/httpd/
 +
 +
[[Category:Technical Documentation]]

Latest revision as of 20:04, 1 May 2019

Dryad's DNS and failover system is managed by Amazon Route 53. During outages of the primary Dryad server, the failover system re-routes traffic to a secondary server.

Goals:

  • Provide one way replication to a read only copy of the primary datadryad.org server
  • Make replication as close to real time as possible
  • Make failover to the secondary server and failback to the primary server automatic

This page focuses on the process for failover. For details about the servers that serve as primary and secondary, see WG:Server Setup.

Basic DNS setup

The primary DNS records are managed at University of California, but they redirect to nameservers at Amazon. So most DNS changes must be managed through the Route 53 system.

Detecting Failure

Failover is based on tests made by Amazon Route 53. There is a health called "Failover -- Primary Test". This test combines the results of some other tests to determine whether the failover should occur. When this test is triggered, the Amazon DNS system will start delivering the IP address for the secondary server instead of the primary server.

Failover can be manually triggered (e.g. for system upgrades) using a process described on the WG:Emergency page.

What happens during a failure

  • During a failure of the primary server, all datadryad.org requests go to the secondary server.
  • Apache is configured on the secondary server to disallow logins or submission of data.
    • The Dryad pages have login/submission features replaced with messages saying the feature is currently disabled.
    • If users try to access a login/submission URL directly (e.g., using a link from an integrated journal), a static HTML page is displayed. This page explains that submissions are currently disabled.

Secondary server

Apache mod_rewrite and mod_substitute disable logins to this instance of Dryad

Files replicate by rsync.

Keeping secondary server in sync

Main rsync of data files

There is a cron job on the secondary server (root account) that performs the rsync every minute: /root/scripts/rsync-from-ncsu.sh

Solr replication

Database sync (bucardo)

If the sync email (nightly cronjob on this system to admin@datadryad.org) shows that PGSQL tables are out of sync, then on the secondary server run these commands to resync all tables:

bucardo_ctl update sync dryad_delta_sync3 onetimecopy=1
bucardo_ctl reload dryad_delta_sync3
bucardo_ctl status dryad_delta_sync3
  • The secondary system is running Bucardo which provides asynchronous database replication of the dryad_repo database from the primary server at NCSU to the secondary server at Duke

Sysadmin notes on Bucardo

Notes in this section need to be validated. They may not apply to Dryad, since they were originally used for other NESCent projects.

  • does not add sequences by default, add them just like tables
  • async master-master replication - using swap sync whichever row in a DB was changed most recently can be updated on the other DB. Right now only 2 DBs are supported.
  • install instructions are good except for the need for the following
#bucardo_ctl set default_email_from=bucardo@pikaia.nescent.org
#bucardo_ctl set default_email_to=sysadmin@nescent.org

[root@pikaia ~]# cat /etc/bucardorc
dbpass=thebucardopass
debugfile=0

createdb -Upostgres bodysize_production
pg_restore -Upostgres -F c -d bodysize_production bodysize_production
bucardo_ctl add database bodysize_production name=bodysize_master host=hyneria.nescent.org user=postgres pass=thepassword
bucardo_ctl add database bodysize_production name=bodysize_slave host=localhost user=postgres pass=thepassword
# add all tables except schema_info which has no primary key to a delta sync
bucardo_ctl add all tables db=bodysize_master --herd=bodysize1 -T schema_info
# if you want to add sequences add the following
# bucardo_ctl add all sequences db=bodysize_master --herd=bodysize1
bucardo_ctl add sync bodysizedelta source=bodysize1 targetdb=bodysize_slave type=pushdelta
bucardo_ctl validate bodysizedelta
# add tables without primary keys to a separate fullcopy sync
bucardo_ctl add herd bodysize2 schema_info
bucardo_ctl add table schema_info db=bodysize_master --herd=bodysize2
bucardo_ctl add sync bodysizefull source=bodysize2 targetdb=bodysize_slave type=fullcopy
#to get the sync going, since it is already running we will reload instead of bucardo_ctl start
bucardo_ctl reload_config
bucardo_ctl kick bodysizefull
bucardo_ctl status
bucardo_ctl status bodysizefull
#force a full copy rather than a delta with the following two commands to catch up with changes that occurred between the pg_restore and pushdelta sync
bucardo_ctl update sync bodysizedelta onetimecopy=1
bucardo_ctl reload bodysizedelta

rubyrep scan be used to double-check sync status
./rubyrep scan -s -b -c bodysize.conf

Configuration

(This section needs to be updated for Amazon)

Other notes

Ideas for improvement:

  • See Trello card on cloud architecture
  • We need to have solr run in a separate instance of tomcat or jetty or upgrade to the latest solr before this will work. Jetty would probably use less memory.
  • If we don't want to depend on a third party like MCNC or want more extensive "health" checks, we could set up a virtual machine (or two) at a cloud host such as EC2 and use it for failover.  This would allow for more extensive testing of the primary site in order to trigger a failover.  I have used this in the past (http://cbonte.github.com/haproxy-dconv/configuration-1.4.html#4-http-check expect) and it can trigger failover based on a string in the HTTP response similar to our current Nagios heath checks.  This would also be inexpensive ($50-$100/month) as the virtual machines could be very small such as EC2 micro instances.  Large data transfers could go directly to the primary server rather than through the load balancer and thus would not count against any bandwidth quotas.
  • Rather than rsync, we could use something like glusterfs or csync or unison for real time two-way file replication.  This would require extensive testing and be much more complex, but is a mature technology and widely used - http://www.gluster.org/community/documentation/index.php/Gluster_3.2:_Managing_GlusterFS_Geo-replication.  I am have been using glusterfs on 8 old DSCR nodes we used for OpenSim.
  • If we want to stick with MCNC or another failover service using HTTP status for heath checks, we could set up Nagios health checks of the production site that would shut down Apache and trigger a failover if a certain string is not on the website.
  • Use two way database replication.  Bucardo supports this and the basics could be set up fairly easily, but would require much testing.
  • Make the failover site read/write.  If we control the failover process, we could make the secondary server read/write.  Before we would switch back to the primary, we could sync files and the database back from the secondary to the primary.  This would involve some down time and more complication, but it doable.
  • SOLR 4 (in alpha as of 8/2012) should handle master-master replication.  Currently SOLR can only perform master-slave replication.


Misc.

Current rsync settings:

nice -n 11 ionice -c2 -n7 rsync -ahW --progress --stats --delete --exclude 'solr/' --exclude 'access.log*' --exclude 'tivoli/' --exclude 'log/' /opt/ secundus.datadryad.org:/opt/


These items could be synced on a weekly basis or when needed:

rsync -ahW --progress --stats --delete --exclude 'largeFilesToDeposit/' --exclude 'memcpu_dump/' /home/ dryad-dev.nescent.org:/home/
#after changes to production configuration
rsync -ahW --progress --stats --delete --exclude 'access.log*' --exclude 'tivoli/' --exclude 'log/' /opt/ dryad-dev.nescent.org:/opt/
rsync -ahW --progress --stats --delete /usr/local/apache-tomcat-6.0.33/ dryad-dev.nescent.org:/usr/local/apache-tomcat-6.0.33/
rsync -ahW --progress --stats --delete --exclude='logs/' --exclude='temp/' --exclude='newrelic/' /var/tomcat/ dryad-dev.nescent.org:/var/tomcat/
rsync -ahW --progress --stats --delete /usr/java/ dryad-dev.nescent.org:/usr/java/
rsync -ahW --progress --stats --delete /var/www/dryad/ dryad-dev.nescent.org:/var/www/dryad/
#only run during setup
#rsync -avhW --progress --stats --delete /etc/httpd/ dryad-dev.nescent.org:/etc/httpd/