Manual
TreeMaker
Interactive construction of taxonomies and species richness data
| Author: | Paul-Michael Agapow |
|---|---|
| Contact: | treemaker@agapow.net |
| Date: | 2008/8/4 |
| Web site: | http://www.agapow.net/software/treemaker |
Introduction
Biodiversity assessment demands objective measures, because ultimately conservation is an issue of economics, prioritizing the use of limited resources for preserving taxa. The most general framework for such metrics are those that assess evolutionary distinctiveness as judged by how much of a phylogeny is conserved. However, their applicability is limited by the still small proportion of taxa that have been reliably placed in a phylogeny. Given that this is unlikely to be corrected soon, alternatives are needed. Taxonomy can be used as a reasonable surrogate for phylogeny. Combining this with searches for combinations of local sites containing maximal diversity, the efficacy of any conservation schemes can be determined from a taxonomy of the organisms involved and the abundance data at potential preservation sites.
To this end, TreeMaker is software that allows the interactive building and editing of a taxonomy and its conversion into a phylogeny for the above calculations. It also allows the editing of site abundance and species richness data. This data may be imported from and exported to a variety of formats for interoperability with other programs. While it is mainly intended for use in conservation and biodiversity, it can be used as a simple tool for building phylogenies.
Technical description
TreeMaker can be downloaded from http://www.agapow.net/software/treemaker. Several associated programs (like MeSA and Conserve) can be found on the same site at http://www.agapow.net/software/. TreeMaker is available as a standalone program for MacOS (as a Universal Binary), Windows and Linux. Across platforms, it has only cosmetic not functional differences. Similarly, the datafiles TreeMaker produces may be used across platform. There are no special memory or library requirements.
The TreeMaker distribution includes:
The TreeMaker application
A set of example files including:
- example.tree, a data file (in raw tree format)
- example.trmk, a dat file (in TreeMaker format)
treemaker_manual, this manual
TreeMaker may be installed by simply copying it to an appropriate place on a local hardisk. To use the online help from within TreeMaker, the HTML manual file must be in the same directory as the application.
Typical use
To illustrate the use of TreeMaker, we'll follow the construction of a small taxonomy along with some abundance data. Minor details may differ depending on the version of TreeMaker used. First, we create a new TreeMaker document using New on the File menu. This presents a dialog that allows us to specify the initial number of taxonomic levels:

A new document is created with a taxonomy containing a single node, the root. We can now extend the taxonomy by selecting the root node and choosing New Daughter from the Tree-building menu:

We continue this for some time, adding nodes to the right places on the tree. Note how the list of the terminal taxa updates in the right-hand abundance pane as the taxonomy changes.
Of course, all the nodes have the default and cryptic names (indicated by being grayed out). We need to rename them to something meaningful. Select a node and choose Rename Node from the Tree-building menu:

Continuing on, we complete our taxonomy:

Now we want to add abundance data, how many times particular species of ants have been seen at particular sites. So first we add a site:

After adding another, we can directly edit the abundance data for each terminal taxa:

Now we can save the document for later use. Also we can export the data to another format for use in another program, using the Export option on the File menu. Parameters for how the data is exported can be found in the Settings option on the same menu:

The document

A TreeMaker document presents its data in two panes. On the left is the tree hierarchy. This presents a taxonomy in a semi-columnar format. Each column represents a taxonomic level, e.g. family, genera, species, and can be named as users desire. The number of levels can be defined at document creation or by later adding or deleting levels from the Tree-building menu. Note that level names are mainly cosmetic, and for help in laying out a taxonomy.
Below the level names is the taxonomy laid out as a staggered tree. Nodes in the same column are considered to be at the same taxonomic level. Any child (immediately descendant) nodes will be in the
On the right is the site data. This allows the association of terminal taxa in the taxonomy with abundance (or incidence) data at a series of discrete sites. Sites may be added or deleted from data sets. If the terminal taxa of the taxonomy are changed (i.e. a tip is deleted or gains a daughter), the rows of site data are updated automatically. Sites may be selected for menu operations by selecting the column header. Individual site data can be editted directly by clicking on them.
The menus
File
- New
- Create a new TreeMaker document. The user will first be asked for the number of initial taxonomic levels and given a chance to name them.
- Open
- Open a previously saved TreeMaker document.
- Save
- Save the current data as a TreeMaker document. If you wish to save in other formats (e.g. for using with other programs), use the Export To option.
- Save As
- Save the current data as a TreeMaker document with a new name.
- Export To
- Save the data in a foreign format suitable for use with other programs.
- Settings
- This produces a dialog that allows the setting of various options controlling the presentation and export of data.
Edit
This presents the usual options for editing text fields and boxes that present in TreeMaker documents and dialog boxes.
Tree-building
- Rename Node
- Change the name of the selected node. Note that all nodes, not just terminal ones, may be named. If nodes are not given a specific name, a default is supplied but displayed in grayed out text.
- Rename Level
- Change the name of the level. Again, if a level name is not given, a default will be generated and displayed in light grey.
- Add Level
- Add another level to the taxonomy
- Remove Level
- Delete the selected level from the taxonomy. For safety, a level cannot be removed unless it is empty. That is, level removal will not happy until the level contains no nodes.
- Shift Left / Delete
- Delete the selected node. If it has any children, make those children of their grandparent node. In effect this shifts a subtree to the left.
- Shift Right
- Insert a new node above the selected node, thus shifting a subtree to the right.
- New Daughter
- Add a child node to the current selected node. Note that this operation is not available for nodes in the terminal level of the taxonomy, a safety measure to stop the accidental addition of levels.
- Flatten to Star Phylogeny
- Transform the tree so that all terminal taxa are immediate children of the root node.
Abundance
- Rename Site
- If an abundance figure is selected for editing, change the name of the selected site.
- Add Site
- Add a new column to the site data pane, representing a new discrete site
- Delete Site
- Delete the selected site any associated abundance data.
Tree
This is a context dependent menu to ease navigation around large trees. If a node is selected in the tree pane, then this menu changes its name to that name and gives a number of options relating to operations upon this node. These are New daughter (which works as the option Tree-building as above) and Rename (which works as per Rename node). Attached below these are a series of cascading menus for the subtree headed by the selected node. This allows the same two operations on any of the descendent nodes.
Other menu choices
The exact position and style of these may differ based on platform and version.
- About TreeMaker
- Shows an information box with credits and the application version number.
- TreeMaker Help
- Opens the local copy of the help file (essentially this document) in a web browser. Note that for this to work, the help file must be in the same directory as the application.
- Go To Website
- Open a web browser pointing at the TreeMaker home page.
Settings & export
This leads to a dialog for several options that control how data is presented and exported, in particular how branch lengths are treated within the exported phylogenies. Note that these work on a per-document basis - changes in the settings for one document do not effect those in another. In combination, these options can create confusion, so a simple example taxonomy will be used to illustrate how trees are produced:

If translated directly into a phylogeny, in Newick format this tree would normally be represented as:
(((A)), ((B), (C, D)))
or, with the intermediate taxonomic node indicated:

The difficulty arises in the case of the tips A & B. These "singleton" nodes are the only child of their parent node. In theory, the Newick format permits such nodes. In practice, many programs do not, and expect any parent node to be at least bifurcating. This is prima facie reasonable: internal nodes in molecular phylogenies or cladograms are inferred by the presence of at least two child nodes. However, there are cases where such solitary nodes can arise. Our present case where taxonomies are literally translated into phylogenies is one. Some families may contain only one genus, some genera only one species. Another case is where phyletic transformation has lead to one species given rise to a distinct and different one. Finally, extinction may cull the children of a speciation event so that a parent species gives rise to a single child species.
The obvious way to handle such singletons is to collapse them up into their parents until a valid (and at least bifurcating) tree is formed. TreeMaker provides a number of options for this.
Taxa names as binomials
If this option is checked, in the respective case (display or export), the names of terminal taxa as x y, where y is the node name and x is the name of their parent node, i.e. the penultimate taxonomic group. For example:
((('one A')), (('two B'), ('three C', 'three D')))
Collapse singletons
If any tree nodes are singletons (as above) collapse them up into their parents in the exported tree. For example:
(A, (B, (C, D)))
Branchlengths: None
No distances are written to the output tree. For example:
(((A)), ((B), (C, D)))
Branchlengths: As is
In the exported trees, use any branchlengths that already in the tree when it was imported. Otherwise, set any branchlength as 1.0. For example, if the tree was imported from a file that marked C and D are joining their parents with branches of 0.5 in length:
(((A:1.0):1.0):1.0, ((B:1.0):1.0, (C:0.5, D:0.5):1.0):1.0)
Branchlengths: All inter-level distances equal
Every branch in the tree is set to the value give in the "Distance" field. So, if "Distance"" is set to 0.7:
(((A:0.7):0.7):0.7, ((B:0.7):0.7, (C:0.7, D:0.7):0.7):0.7)
If the "collapse singletons" option is set, the result will be:
(A:0.7, (B:0.7, (C:0.7, D:0.7):0.7):0.7)
This has no effect if "None" or "As is" is used.
Distance is total
If set, "Distance" is interpreted as the maximum distance (root to tip) in the tree, and the branch distances are set as the this divided by the maximum path length in nodes. So if "Distance" is set to 2.0:
(((A:0.67):0.67):0.67, ((B:0.67):0.67, (C:0.67, D:0.67):0.67):0.67)
That is, from the root to the furtherest tips (all of them) is 2.0. If the "collapse singletons" option is set, the result will be:
(A:0.67, (B:0.67, (C:0.67, D:0.67):0.67):0.67)
That is, from the root to the furtherest tips (C & D) is 2.0. Obviously this only applies with "All inter-level distances equal".
Singletons accumulate distances
If singletons are collapsed, they accquire the distance of the sum of branches that have been collapsed. That is, if M subtends N subtends O subtends P, with branches of 1.0 each, it collapses to M-P with a branch-length of 3. This applies to "As is", and "All inter-level distances equal" only when singletons are collapsed. So, if "Distance"" is set to 0.7 and "Collapse singletons" is on, we get:
(A:2.1, (B:1.4, (C:0.7, D:0.7):0.7):0.7)
Data format
TreeMaker uses a simple plain-text format based on a subset of the YAML specification [3]. This is done so that if necessary users can hack at the data files, converting datasets to and from TreeMaker format where other methods fall short. The format is briefly described below, but more can be learnt by studying TreeMaker output and YAML documentation. Minor details may differ depending on the version of TreeMaker used.
TreeMaker documents begin with #TREEMAKER on the first line by itself. Following this is an optional comment section that is ignored until the formal start of the YAML document. This is indicated by a four hyphen notation ---, again on a line by itself. The YAMl elements that follow are line-based, in which indentation is used to indicate nesting of sections and ownership. The main two types of document elements are key:value pairs, lists and key:value lists:
- key:value pair
- The key is an identifier associated with some following data, which can be a simple value, a list or entire indented section. The key name is followed by a colon :.
- list
- A list is a series or items, all indented to the same depth and prefixed by a hyphen -.
- inline list
- A list placed on a single line, with members separated by commas and flanked by square braces.
- inline key:value list
- A series of key:value pairs, separated by commas and flacked by braces, e.g. { a, b, c }.
The TreeMaker data is therefore a series of sections given as key:value pairs. These sections are "version", "levels", "sites", "tree" and "abundances".
The version section indicates the format version used by the file and should not be changed.
The levels section gives an inline list of the level names. If a level has not been named, a blank is given.
Similarly, sites is a list of the site names.
The tree section specifies a phylogeny as a list of nodes. Each node is identified with a unique id and give an inline key:value list with information about the node - what its name is and what the id of its parent node is. Note that the root has no entry for a parent.
Abundances is a list of the above tree node ids, each followed by an inline list of the abundances at each site. These abundances are integers.
The settings section provides the values for the document options that control presentation and output. I'd recommend that you don't mess with this. Actually this section can be deleted without harm.
Note that at the moment the order of these sections is immutable, and that the indentation depth is set to 3 spaces. This is not strictly in line with the YAML specification but is done for simplicity. The type of line-breaks (eolns) used in the file is unimportant, although by convention TreeMaker saves using Unix line-endings.
Q & A
- Can subtrees be cut from one part of the taxonomy and pasted onto another?
- No.
- Can version TreeMaker version x open files created by TreeMaker version y?
- Generally any version of TreeMaker can open a file created by any previous version. However, there is no guarantee that it will be able to open a file created by any future version. This is because the capabilities of TreeMaker have expanded through time and the data file format has changed to accomodate these.
- Is there an Undo function to reverse mistakes?
- No.
- Can there be?
- No.
- Are there any limits on the types of trees produced?
- TreeMaker has been used (and abused) to make trees with hundreds of nodes. However, attention may have to be paid to how data is exported, as some external programs only read a subset of legal tree formats. For example, some programs may expect only strictly bifurcating trees (e.g. ((A,B),(C,D))) while others do not tolerate singleton nodes (where a node has only a single descendant, e.g. ((A)). (See the section of data export above.) Others will have restrictions about how nodes are named.
Credits
If you use TreeMaker in any research resulting in a publication, please cite [1]. TreeMaker was written in RealBasic [4] and developed under MacOSX. It was produced with the help of Ross Crozier and Lisa Dunnett of James Cook University, Australia. Lisa has also produced a program called TreeMaker with roughly equivalent functionality. It can be found at http://homes.jcu.edu.au/~jc125033/Treemaker.htm.
References
| [1] | Ross H Crozier, Lisa J Dunnett, Paul-Michael Agapow (2005). Phylogenetic biodiversity assessment based on systematic nomenclature. Evolutionary Bioinformatics Online 1:11-36. |
| [2] | MeSA - Macroevolutionary Simulation & Analysis. Software. Website at <http://www.agapow.net/software/mesa>. |
| [3] | YAML - Yet Another Markup Language. http://www.yaml.org/ |
| [4] | RealBasic. http://www.realsoftware.com/ |
| [5] | Conserve. http://www.agapow.net/software/conserve |

