Installing Galaxy

Setting up a production version of GMOD Galaxy for general use.

This presents one way to create an optimized production Galaxy instance. Variations are certainly possible and some of the choices presented are/were dictated by local culture. Certain settings may be more suitable for production or development environments. Nonetheless, this presents a start-to-stop process for installation and setup.

Note: this is a living document, will change across time, and is occasionally terse or cryptic where I have yet to fill it out.

Assumptions

  • Running on unix on Fedora or a similar system
  • Galaxy will be running off a suburl (e.g. http://foobar/galaxy)
  • Superuser privileges may be required at some points
  • Apache is used as a frontend server.

Prepping environment

Create a user for Galaxy to run as and under. Galaxy will be installed into this users account:

% /usr/sbin/useradd galaxy
% passwd galaxy

Note that we don't install Galaxy "inside" Apache, as this would expose all of Galaxy (including datasets) to anyone on the web.

Install virtualenv, so we can later create a sandboxed python interpreter:

% yum install python-virtualenv.noarch

See http://virtualenv.openplanning.org

If needed, install mercurial, so the Galaxy repository can be used for installation (and later updating):

% yum install mercurial

Change to the galaxy user and into its home directory. Clone the Galaxy repository:

% hg clone https://bitbucket.org/galaxy/galaxy-dist

This will create galaxy-dist in the home directory.

Create a local sand-boxed Python interpreter for the galaxy user. We'll install all local data in the "local" dir:

% virtualenv --no-site-packages local

Then activate this interpreter, which will modify $PATH so the sand-boxed python is used by galaxy:

% source ./local/bin/activate

Alternatively, an entirely separatePython interpreter could be installed for Galaxy. You could use the system interpreter, but either of these two schemes avoids library collision or dueling versions.

Check this installation by running Galaxy:

% cd galaxy-dist % sh run.sh

It may report that numerous "eggs" (Python librararies) are being updated, before saying that the server is starting. At that point, it may also report a network error (socket error 98) if another applpication is using the default socket (8080) or it is blocked from connecting to it.

Adjusting network settings

Galaxy settings are editted in the file universe_wsgi.ini. Edit the port Galaxy will use:

port = 7070

And the addresses it will listen to. With the default settings, Galaxy only listens to localhost and is not accessible over the network:

host = 0.0.0.0

Connecting to a real database

Using Postgres, or an equivalent real db, create a database for Galaxy's use:

CREATE DATABASE galaxy_prod;

Give the user permissions to create tables and so on:

GRANT ALL ON galaxy_prod.* TO galaxy_prod_user@localhost IDENTIFIED BY foobar;

Edit the database connection in universe_wsgi.ini to give the connection as an SQLAlchemy URI string:

db = postgres://galaxy_prod_user:foobar@123.45.67.89:5432/galaxy_prod

Note that the example given in the Galaxy documentation is wrong, or at least opaque.

Start up the system and see that it works. There will be an extended period of migrating tables.

Optimizing database use

Many database settings can be tweaked to speed Galaxy. Some are:

Reduce connection overhead by using only one connection to the database per thread:

database_engine_option_strategy = threadlocal

Large queries or datasets may cause issues, so Postgres database cursors should be cached:

database_engine_option_server_side_cursors = True

If plagued by errors of insufficient database pool connections, increase these:

#database_engine_option_pool_size = 5
#database_engine_option_max_overflow = 10

Setting up a proxy

Galaxy can run off its own internal webserver, but in production it is far preferable to use a proper server as a proxy. These instructions assume this is Apache and the server is to be accessed at http://fobarbaz.com/galaxy-inst. See https://bitbucket.org/galaxy/galaxy-central/wiki/Config/ApacheProxy and http://docs.uabgrid.uab.edu/wiki/Galaxy#Apache_and_Postgres_Setup

Edit the httpd.conf file:

% vi /etc/httpd/conf/httpd.conf

Rewrite requests on the standard port and the suburl to Galaxy:

<VirtualHost *:80>
ServerName 158.119.147.41
RewriteEngine on
#RewriteLog "/etc/httpd/logs/rewrite_log"
#RewriteLogLevel 9
RewriteRule ^/galaxy$ /galaxy/ [R]
#RewriteRule ^/galaxy/static/style/(.*) /home/galaxy/galaxy_dist/static/june_2007_style/blue/$1 [L]
#RewriteRule ^/galaxy/static/scripts/(.*) /home/galaxy/galaxy_dist/static/scripts/packed/$1 [L]
#RewriteRule ^/galaxy/static/(.*) /home/galaxy/galaxy_dist/static/$1 [L]
#RewriteRule ^/galaxy/favicon.ico /home/galaxy/galaxy_dist/static/favicon.ico [L]
#RewriteRule ^/galaxy/robots.txt /home/galaxy/galaxy_dist/static/robots.txt [L]
RewriteRule ^/galaxy(.*) http://localhost:7070$1 [P] </VirtualHost>

RewriteLog commands can be used to debug the rewrites. See below for other commented out lines.

Restart the apache server:

% /etc/init.d/httpd restart

Ideally you'd like to serve static content (images, style sheets etc.) straight through Apache to take the load off Galaxy. The commented lines above show failed attempts to do so. The error seems to be outside of Galaxy in Apache and results in none of the static content showing up and the error log shows “(13) permission denied” errors. Things tried to diagnose and correct this:

  • Logged the rewrite calls to see they rewrite to the correct paths for the static content
  • Checked the proxy-filter and filter-with declarations
  • Checked the unix permissions on the static dir
  • Tested for non-existent or incorrect paths (which generate a different error)
  • Inserted directory declarations to “Allow from all” for the static dir
  • Checked for and tried .htaccess files
  • Checked SELinux is disabled

Branding

In universe_wsgi.ini, edit the name of the site:

brand = HPA Bioinformatics

and url linked to by the logo:

logo_url = http://www.hpa-bioinfotools.org.uk

and the "email comments" address:

bugs_email = mailto:paul-michael.agapow@hpa.org.uk

Running Galaxy

You can run Galaxy as a detached daemon:

% sh ./run.sh --daemon % sh ./run.sh --stop-daemon % sh ./run.sh --status % sh ./run.sh --monitor-restart # restart if stopped

Create a startup (init) script:

% vi /etc/init.d/galaxy

and write it as something like this:

#!/bin/bash
#
# /etc/rc.d/init.d/galaxy
#
# Manages the Galaxy webserver
# Based on http://www.sensi.org/~alec/unix/redhat/sysvinit.html
#
# chkconfig: 2345 80 20
# description: Manages the Galaxy webserver

# The chkconfig is levels, strat priority, stop priority. Last two should add to 100.
# You get an error/failure if you try to restrat a stopped service.

# Source function library.
. /etc/rc.d/init.d/functions

GALAXY_USER=galaxy
GALAXY_DIST_HOME=/home/galaxy/galaxy_dist
GALAXY_RUN="${GALAXY_DIST_HOME}/run.sh"
GALAXY_PID="${GALAXY_DIST_HOME}/paster.pid"

case "$1" in
start)
echo -n "Starting galaxy services: "
daemon --user $GALAXY_USER "${GALAXY_RUN} --daemon --pid-file=${GALAXY_PID}"
touch /var/lock/subsys/galaxy
;;
stop)
echo -n "Shutting down galaxy services: "
daemon --user $GALAXY_USER "${GALAXY_RUN} --stop-daemon"
rm -f /var/lock/subsys/galaxy
;;
status)
daemon --user galaxy "${GALAXY_RUN} --status"
;;
restart)
$0 stop; $0 start
;;
reload)
$0 stop; $0 start
;;
*)
echo "Usage: galaxy {start|stop|status|reload|restart"
exit 1
;;
esac

Set the permissions as 755. Check the owner:

% chmod 755 /etc/init.d/galaxy
% ls -la /etc/init.d/galaxy

Add to the system services and check:

% /sbin/chkconfig --add galaxy
% /sbin/chkconfig --list galaxy

Start the service with:

% /etc/init.d/galaxy start

Misc

Set Galaxy to use a local area as temporary storage:

% vi ~/.bash_profile

then:

TEMP=$HOME/galaxy_dist/database/tmp export TEMP

Don't forget:

% source ~/.bash_profile

The front page can be customized by editing static/welcome.html. You should at least edit out the "customize this page" message ...

Other style customizations are possible. Note that some may be cached by the system and take some time to show up.

Notes

Some documentation refers to your installation dir as "galaxy_dist", others as "galaxy-dist". Look out for this causing errors.