Posts categorized under: science

Academic job ad red flags

Words you don't want to read in a job description
"competitive salary":
Like several other terms on this list, this phrase is frequently used in a completely sincere manner. We pay decently. If we offer you the job, then we'll negotiate. Unfortunately, it is just as frequently used as a way of avoiding the subject of remuneration, in the ...

Bayesian stats in very plain language

My pass at explaining the often misunderstood.

Introduction

Some years ago, I got into an argument with someone abut the relative merits of Bayesian versus Maximum Likelihood in phylogenetics. They asserted the two were basically the same or would come to the same answers. I countered that while they would often agree, they were measuring different things ...

What I done learned about REDCap

A few surprises

For those not in the know, REDCap is a platform for creating and editing databases through the web. And by and large, it works fine. It saves a lot of development effort. It provides good reporting tools for users. It's secure and robust. But there are some things to ...

(Re-)building databases with csvsql

Tips and traps when going from a database dump to a database

The scenario

You have a bunch of related CSV files.

Maybe they're the result of a raw database dump. Maybe they've been generated in some other way: experimental results, various public data sets, whatever. But the important thing is that you need to make a database from them ...

Words a bioinformatician never wants to hear

Based on hard-won experience.

(This first appeared on biocodershub.net courtesy of Rad, and has since popped up on coderscrowd.com. It enjoyed some moments of viral popularity, with many aggrieved practitioners chipping in on the comments of the article. Following the resurrection of my website, it's a good opportunity to bring this ...

Tools for data

How to store data, what to use

Prompted by a recent tweet asking what people used for storing and managing their data, I wrote down my own hard-won lessons on the topic. In rough order of preference and data complexity:

A hierarchical strategy

Use restructured text for documentation

Or markdown / asciidoc. The advantages of this being:

  • It ...

Writing knitr in restructured text

Swapping out Markdown for a different markup.

knitr is a useful R package/tool for documenting analysis. Basically, it allows the embedding of R code "chunks" within a simple text document. This document can then be "knitted", which means that the R code is interpreted and reinserted in the document along with the results of that code ...

Common tasks in Galaxy

It's all there in the documentation, but sometimes it's hard to find. This document gives you another place to look.

So how do I ...

... create admin users?

Curiously, the identity of admin users is hardcoded into the Galaxy configuration file. (Which makes it secure, I guess, but separate ...

Compiling Quickjoin and file formats

Problems with building qjoin and getting it to read stockholm files.

Quickjoin / qjoin is an excellent commandline program for rapid construction of neighbour-joining trees. However, while using it recently, I had a few problems getting it to read Stockholm files, the most accessible of the formats it can use.

The ...

Galaxy toolsheds

Galaxy toolsheds

Relatively painless tool-sharing

This is a more recent innovation in Galaxy, which can make it a somewhat confused one: the concept of the toolshed has changed over its lifetime, the documentation is incomplete, and there's a slightly strange emphasis in the documentation that exists. So …

Mile-high description ...

Language Wars

"What language would you recommend to introduce programming to an audience of life science students at a bachelor level?"

(Originally published on BiocodersHub)

Following several lengthy and passionate discussions in different venues on what language to use for teaching bioinformatics, I've started cutting and pasting my reply. And here it is.


You'll get a lot of different opinions on this because:

  • It's a religious issue. That ...

Cleaning biosequences

A simple script to check and purge sequence files of possible problems.

Some times you need sequences that are unambiguous (i.e. only 'ACGT', lacking gaps) whether it's because of the limitations or assumptions of tools (like omegaMap) or just because you want to know where SNPs or sequencing ...

Coloring dendroscope files

How to programatically label phylogenies.

The need had arisen for the tips of a large phylogeny to be labelled in a systematic way. Rather than "point and click" within Dendroscope, this script takes a .den/dendro file and colors the tips according to a "color description" file. This is a simple csv file with taxa ...

Consensus in BioRuby

Explaining the ill-explained ways to obtain a consensus sequence in BioRuby.

In BioRuby, alignments are equipped with several methods for obtaining consensus sequences. Unfortunately, these have terse descriptions which point you at the BioPerl documentation, with the added bonus of not quite working like the BioPerl equivalents.

First, let's create a very simple alignment, where everything agrees except the last ...

Galaxy miscellanea

Odds and ends and the surprising.

Redirects

If you are serving the installation with a proxy redirect (e.g. the galaxy server is running on port 7070 but is being redirect by Apache to appear at port 80 on /galaxy), while you can access Galaxy at both addresses, login will ...

More about MrBayes

Some (more) notes about the venerable Bayesian reconstruction program.

Error when setting parameter "Gap" (2)

When attempting to execute a Nexus file, MrBayes kept spitting back this cryptic error upon loading:

Executing file "c_vp1_nuc_seqs.nxs" [...]
Reading data block Allocated matrix [...]
Data is Dna Gap character matches matching or missing characters ...

Ross Crozier 1943-2009

The sudden death of Ross Crozier on the 12th of November was heralded largely by a slow ripple of email, phone calls and Facebook messages across the globe. I found out from an email that started with a short but singularly complete sentence:

Terrible news.

It is sobering to think ...

What works - NGS assemblers

A quick paper review on picking the best assembler

You could spend all day just keeping up with developments in next-generation sequencing. Companies announce new and revolutionary technologies seemingly every month, promising to do more, better and for less. Yet at the same time, it’s difficult to hack your way through the marketing tallk and get hard figures ...

Drawing sequence logos

A very simple script to do a simple but tedious task.

Sequence logos are a common way of representing SNPs and diversity in groups of sequences. This script automates the task. It's a bit rough around the edges and serves mainly as a base for further hacking.

Usage is:

drawlogo.rb [options] FILE1 [FILE2 ...]

where options are:

-h, --help Display ...

Ordnance Survey locations

Converting between OS grid references and longitudes-latitudes.

The Ordinance Survey is a UK-peculiar geospatial format, ubiquitous via street atlases, hiking charts and (yes) farming and epidemiological maps. It is explained in great detail is several places, but here's a quick overview:

The OS grid is a set of 25 squares, 500 kilometers a side, arranged 5-by-5 ...

Error 1 for Mrbayes

What happens when a make fails.

If this happens to you when trying to compile MrBayes:

% make
gcc -DUNIX_VERSION -DUSE_READLINE -O3 -Wall    -c -o mb.o mb.c
gcc -DUNIX_VERSION -DUSE_READLINE -O3 -Wall    -c -o mcmc.o mcmc.c
gcc -DUNIX_VERSION -DUSE_READLINE -O3 -Wall    -c -o bayes.o bayes.c
bayes.c:45:31: readline/readline ...

Fetch sequences from db

A simple script to grab bioseqs by accession.

This just wraps the BioRuby fetch functionality in a friendly commandline interface. In brief, it can accept accession ids on the commandline or from a piped file (one accession per line) and save the corresponding sequences from the db. Sequences may be downloaded via the bioruby or EBI servers. The ...

Installing Galaxy

Setting up a production version of GMOD Galaxy for general use.

This presents one way to create an optimized production Galaxy instance. Variations are certainly possible and some of the choices presented are/were dictated by local culture. Certain settings may be more suitable for production or development environments. Nonetheless, this presents a start-to-stop process for installation and setup.

Note: this ...

Parsing Dendroscope nodes

For when you have to do lots to a big tree.

Previously, I showed how Dendroscope files can be easily manipulated with brute-force regex, so you can right scripts to color a mass of nodes, rather than having to format them one-by-one in the GUI. However, more complex manipulations require ...