Post tagged: programming

Using AWS for research computing

Your own private computing cluster?

This is based upon my reply to a question on reddit concerning experiences with using Amazon Web Services (including ELastic Computing, Glacier, etc.) for research.

I was part of a pan-European research consortium that used it for our shared computing infrastructure: databases, a few web apps and web sites, mailers …

What's in a (file)name?

Seeking a name for an obvious un-named thing

Here's a simple conundrum: Given a file, say:


what are the names for the different parts of it?

The directory (part) is /foo/bar. The non-directory or file part is baz.exe. The extension is .exe.

So what is baz?

Frequently, I've written code that produces …

Can't find a variable that doesn't exist

Stoopid, stoopid me.

The symptoms

You have a Python script or package that is running perfectly. After you make some alterations, "all of a sudden" you can't run it because it can't find a variable or name:

AttributeError: 'module' object has no attribute 'foobar'

But if you search for that name, it doesn't …

The "verbose" Python

PYTHONPATH, PYTHONVERBOSE and other infuriations

The symptoms

“Suddenly” (as is traditional to commence these sort of stories), I started having problems with some of the Python installations on my local machine. As is the way of things, I had three distinct sources of Python:

  • The native / builtin Python that came with my Mac
  • The Python …

IPython notebooks in Pelican

Adding an obvious feature to Pelican.

IPython notebooks are an incredibly useful way to do reproducible research and illustrate chunks of code. Obviously you'd like to put some of them on the web. Surely there must be an easy way to include them in a Pelican-based website. So why hasn't anyone done it yet?

They have …

Logging PHP errors

Somehow, this feels like a personal failing.

The scenario

I try to avoid PHP as much as possible (just not my thing) but I'd installed a PHP-based webapp (REDCap) and there was something occuring that seemed like it should have generated an error. However I could not find any PHP errors logged anywhere on the machine. So …

Porting this site to Pelican

On making a simpler website in a slightly complicated way.


Way, way back, this site ran on various permutations of PHP-based frameworks. Getting sick of ugly URLs and having to vigilantly update the software so as to avoid hacks (and still periodically falling victim to various hacks), I went in search of a Python-based framework that I could customize …

Primitive R objects and S3

Objects are just lists and class is just an attribute.

Primordial R objects

The first thing to know is that nearly every object in R is really just a list with named elements. For example, the results returned from "summary" are returned in an object:

x <- summary(c(1:10))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    1 …

An introduction to objects in R

The various types of OOP in R, whatever that is.


Doing object-oriented programming (OOP) in R can be complicated. This is because when you look in R for the usual idioms and features of OOP:

  • some are missing
  • some are present but implemented in unusual albeit valid ways
  • some …

Reference classes

Also sometimes wittily referred to as S5 or R5.

A more recent development in R is Reference classes. These promise, at last, fully blown objects and classes like those of C++ and Java. Naturally, this innovation comes with some downsides:

  • Reference classes are still, as yet, lightly documented
  • They use their own form of syntax and idioms
  • The C …

Rdoc, the essentials

A short memory jogger.
= Top level title

== 2nd level / sub-title


A horizontal rule is at least 4 dashes


Text can be _italic_, *bold* or +monospaced typewriter+.

   Verbatim / literal text is indented. But look out
   if it's after a list, as they must be idented to a
   different extent


   * Indent your lists and use stars …

Unicode and HTML entities

In which we struggle with a cacophony of characters for the web.

Buried in the Python standard library, unicodedata contains most the information needed to interrogate and translate unicode characters. Unfortunately, it's underdocumented. More accurately, the docs are a terse list of what it does, but not why you might want to use it or how you use it. Unfortunately it's also …

A useful .irbrc file

Tweaking irb.

For better or worse, programming in Ruby means that you will be spending a lot of time in irb. You can customise its behaviour in your .irbrc file, usually found in your home directory. The tweaks in mine were gathered from various places across the 'net:

require "rubygems"
require "wirble …

Euler 035 - counting circular primes

How many circular primes (primes that when rotated are still primes) are there below one million?

See Problem 20.

Again, this rescues a lot of code from previous problems, but the solution is less than perfect. The Euler guidelines say that every problem should be computable within a minute, but despite much tweaking, this solution takes significantly longer.

For speed, primes should be generated only once …

Language Wars

"What language would you recommend to introduce programming to an audience of life science students at a bachelor level?"

(Originally published on BiocodersHub)

Following several lengthy and passionate discussions in different venues on what language to use for teaching bioinformatics, I've started cutting and pasting my reply. And here it is.

You'll get a lot of different opinions on this because:

  • It's a religious issue. That is, it comes …

Django form fields

A pointer on how to process them.

Django's form generation machinery for the most part is fine and straightforward, but on a recent project I ran into problems with a form that accepted files. The documentation is all there, but changes in how things work and an emphasis on forms for models make things harder to understand …

Gotcha: map keys in Groovy

Key equality is tricky, let's go shopping.

Maps are great quick-and-dirty data structures or caches in many programming languages. But there's a least one way that Groovy maps aren't so friendly. Observe:

groovysh> m1 = [1:1]
===> {1=1}
groovysh> m1[1]
===> 1
groovysh> m1[(int) 1]
===> 1
groovysh> m1[(long) 1]
===> null

So, map keys are using …

Hitchhikers guide to BioPython: SeqRecords

For the novice, more-than-raw sequences.

(Previously published on BiocodersHub.)

Previously I'd spoken about how Biopython represents sequence data with the Seq class. But there is also the SeqRecord class:

  • A Seq is just raw sequence data and information about what type of sequence it is.
  • A SeqRecord is a Seq and all the other information …

Illogical or

'Or' or something else.

In which we consider the difference between or and || in Ruby. The secret is ... it depends. On context.


>> foo = nil or "foo"
=> "foo"
>> bar = nil || "bar"
=> "bar"

That makes sense. But!:

def testor (&block)
   foo = block or "foo"
   bar = block || "bar"
   return foo, bar, block

>> testor()
=> [nil, "bar …

Ruby 1.9, keyword arguments, WTF

Amongst the many entries Ruby has in the "you have got to be kidding me" stakes, this is a doozy.

Due to Ruby's lack of explicit support for keyword arguments, it's traditional to use a quirk of its argument parsing that pushes named arguments into a hash:

def foo (a …


In a way, tuples are one of the most magical parts of Python. How does the interpreter distinguish between a method call, a bracketed portion of a mathematical expression and an actual tuple?:

calc_midpoint (1, None, [2, 3], False)
x = w + (x * y)
a = (b, c)

The really smart thing …


Some problems, some solutions.

If you need to draw a custom graph type in Ruby, there's a few problems:

  1. There's only a modest number of graphing libraries
  2. You don't have to go to far for your graph type to not be covered by these

Which leaves you few choices. You could manually draw it …

Can't use print

Who would of thunk it? print is a reserved word everywhere.

Perhaps this is buried in some specification, but it seems that you can't use print as the name of a method of a class. (It makes sense you can't override the global name, but in an object? Perhaps it's …

Getting the name of something

From the Department of the Blindingly Obvious.

Recently, I had the need to get an informative name from a range of Python objects for generating a useful error message. Where problems started was that these objects could include classes (new-style and old), functions, lambdas and built-in types. And here the logic started getting tricky.

  • To get the …

How to stop output and printing

The essential problem is how to not get output from a program.

Let me explain: Ete2, a Python module for representing phylogenies, has a number of dependencies (MySQLdb, Numpy, PyQt, etc.) that it doesn’t necessarily need and it can be installed without them. If you don’t use the associated functionality, you won’t need these dependencies. But, irritatingly, ete2 tries …

Applescript via Python

Controlling Mac applications with appscript

Over the years, Apple has fallen in and out of love with Applescript, its "official" scripting language for MacOS. True, Applescript isn't going to go away any day now and true, it is a very simple language and easy to use. But if you don't want to have to learn …

SQLAlchemy merge and relations

In which an oddity in SQLAlchemy is spotted, and it turns out to be a bug not a misunderstanding.


The merge function in SQLAlchemy lets an object seem to get pushed to the database, but actually stores and returns a copy of the object. This is handy for when …

Tips & tricks

Some brutal experience gained while learning to love Zope.


Ah, archetypes. How I adore thee. Other ways of constructing Plone content are as nothing before me. You are quick to develop, flexible in your approach, sensible in your defaults. But cryptic in your errors.

A lesson or two on breaking archetypes: First, not all fields accept all widgets …


Notes on installing and using.

Matplotlib is cool. Did I say it was cool? I meant very cool. While principally a plotting library, it can be used for image manipulation and drawing.


Which is where our first problem occurs (as of the 0.8.4 version of matplotlib and possibly earlier). Should you try …

Arx cheatsheet

Setting up Arx

arx my-id 'A Cryptic Moniker <>'
    Set up your identity.
arx my-editor vi
    Set your default editor for changing logs etc.
arx make-archive ~/Documents/Arx/Commits
    Set up an archive.
arx my-default-archive
    Make an archive your default.
arx my-revision-library …

Installing git on RHEL3

The hard way


I had to install git on a Redhat server, without the benefit of a package managers, local repository or anything like a sensible configuration. Because "security".

Install a recent version of lib curl

Why? because git calls cur_easy_strerror which was only introduced in libcurl 7.12.0. RHEL3 has …


Simple utilties for Plone product initilisation and installation.



This library was developed for personal use during the bad old days of Plone development around version 2.5. It almost certainly does not work with Plone 3+ and is kept here for historical interest.

This library originated from a desire to do away with the large amount …

Implicit typename error

"Iterator is implicitly a typename ... warning: implicit typename is deprecated"

While compiling some code with gcc (that had previously compiled and run without complaint under CodeWarrior), the following warning was reported:

SblNumerics.h:334: warning: 'typename
std::iterator_traits<_Iterator>::value_type' is implicitly a typename
SblNumerics.h:334: warning: implicit typename is deprecated, please see
the documentation for details

The offending …

Parsing dates in Python

One of those weird things that always slips my mind and always seems less obvious than it should be.

Parsing a date (object) from a string representation in Python seems oddly neglected or cumbersome as compared to the rest of the standard library that surrounds it. (Witness the number of …

Non-class type

"... request for member ... which is of non-class type"? What?

After a long hiatus, we return with a traditionally opaque C++ error message:

error: request for member 'close' in 'out', which is of non-class
type 'std::ofstream*'

which like all the best C++ error messages is long, detailed and completely unhelpful. It could be triggered by a number of things …

Doxygen cheatsheet

Doxygen is a free tool for documenting code. With a single command it can generate cross-referenced HTML documentation from any C++ or Java code. Furthermore, if the code is commented in a particualr styled, Doxygen can leverage that to enhance the documentation. The below is a selection of the most …

Creating attributes on classes

Programmatic addition of methods.

I've gotten used to manipulating members of Python objects with getattr, setattr etc. But a recent similar problem had me stumped. I had a class that I wanted to create a large number of similar behaving properties on (they would all manipulate an internal dictionary):

>>> f = Foo()
>>> f.contributor = 'xyz' …

How to get a multiply-defined symbol

"symbol X is multiply-defined"

There are many reasons why the "multiply-defined" error can occur at link-time, most of them annoying but mundane: failure to include compilation guards on header files, the compiler finding more than one version of a file (perhaps a backup?) on the file-search path, lack of the extern qualifier and so …

Looser throw specifier error in C++

Looser? Virtual? What?
looser throw specifier for 'virtual Error::~Error()'
... overriding 'virtual std::exception::~exception() throw ()'

While compiling some code with gcc (that had previously compiled and run without complaint under CodeWarrior), the following fatal error was reported:

Error.h:83: looser throw specifier for `virtual Error::~Error()'
/usr/include/gcc/darwin/3 …

Diagnostic printf

A better basic debugging tool.

The crude "lets drop a print statement in" approach keeps being useful even with advanced dynamic languages. Here's a (slightly) improved version for ruby that pretty prints any number of passed objects and (importantly) where the print call was made from:

def dbg (*args)
   print("dbg: #{caller()[0]}: ")
   args.each …


A new, and poorly explained, feature of Python classes.

Around version 2.2, Python rejigged its classes with some useful extensions. Unfortunately these enhancements have been explained so poorly that they appear in little published code.

One such enhancement is __slots__. An attribute of this name in a class restricts what attributes can be created in objects of that …