Parsing Dendroscope nodes

For when you have to do lots to a big tree.

Previously, I showed how Dendroscope files can be easily manipulated with brute-force regex, so you can right scripts to color a mass of nodes, rather than having to format them one-by-one in the GUI. However, more complex manipulations require more powerful approaches. Here is a more systematic way of parsing - and manipulating - Dendroscope nodes.

Each node occupies one line and looks like a named set of key-value pairs:

137: x=-6.9873 y=3.4021 lc=0 0 0 lb='A_ENG_303_2009'

There's at least a dozen possible keys, all cryptically named. You can work out what they do by editing a tree in the GUI and seeing what changes in the file. The good news is that the keys can appear in any order, and are consistently separated by spaces. The bad news is that values can be integers, floats, triplets of integers (for colors) or quoted strings. So with a little regex magic, we can make a class to parse these strings and stuff them in a hash-like structure:

# Store node information in a handy format
class DendroNodeInfo < Hash
        # Parses a line like "x=-6.9873343E-4 ... lc=0 0 0 lb='A_ENG_303_2009'"
        def self.from_field_str(fld_str)
                info = self.new()
                fld_str.scan(/(w+)=(d+ d+ d+|'[^']\*'|S+)/) { |m| info[m[0]] = m[1]
                        } return info end # Returns an appropriately formatted info string # def
                        to_field_str()
                        # order isn't critical but we do this for neatness
                        ordered_fields = %w[nh nw fg sh x y lx ly ll la lv lc lb] all_fields =
                        (keys - ordered_fields) + ordered_fields pairs =[] all_fields.each {
                        |k| if has_key?(k) pairs << [k, fetch(k)] end }
                pair_strs = pairs.each() { |p|
                        "#{p[0]}=#{p[1]}"
                }
                return pair_strs.join(' ')
        end
end

Users should feed the field string to the class method from_field_str:

field_str = "x=-6.9873 y=3.4021 lc=0 0 0 lb='A_ENG_303_2009'" fields = DendroNodeInfo.from_field_str(field_str)

which can then be manipulated:

fields["lc"] = "255 50 120"

and used to produce a new field string:

print "135: #{fields.to_field_str()}"

Note that the class copes with new or unrecognised fields. The order that they are output is set, just so the keys are arranged in a nice order.