ClustToTree

If you prefer correct German to broken English: Es gibt eine deutsche Version der SeqSimPresenter-Information.
The Hypercard program "ClustalV" aligns sequences (nucleic acids or proteins) for maximal similarity and can calculate phylogenetic trees from the alignments. Unfortunately, the tree data (.nj files) are in a little used format. The Hypercard stack "ClustToTree" converts these files to the "New Hampshire Standard" format understood by several tree drawing programs. It also extracts the sequence names (which are missing in the .nj files) from the corresponding .aln alignment files and gives the option to edit the names, which have been truncated after 10 characters by ClustalV. ClustToTree does not draw the tree, I recommend Don Gilbert´s Hypercard program "TreeDrawDeck". In ClustToTree.complete, I have merged the tree drawing module from "TreeDrawDeck" with ClustToTree.

Hints for use

Produce an alignment in ClustalV using the default format (producing a .aln file), then continue with the tree calculation (.nj file). ClustToTree requests both files to be in the same folder, which they are if they are not moved. Choose button "ClustToTree" in ClustToTree, then specify the .aln file. The resulting list of sequence names can be altered almost ad libitum: only comma, semicolon, colon, brackets, square brackets, or space (use underscore instead, it will be converted to space) are not allowed in the names, because these structure the trees in the New Hampshire Standard format, and the name should not have more than 29 characters if "TreeDrawDeck" or "ClustToTree.complete" will be used for drawing the tree, because these programs ignore additional characters. Clicking "Continue" produces the New Hampshire tree description and displays it in a text field. The description must be selected and copied manually. After switching to "TreeDrawDeck", choose the tree type and paste the description in the text field. "Draw Tree" displays the tree picture which can be saved as a PICT file for further editing (in Canvas, MacDraw...).

Kai´s Personal Tip

Editing the names can be time consuming and boring, especially if trees of the same protein family are drawn repeatedly (e.g. to add newly described sequences). I collect the sequences of "my" protein superfamily in a (regularly updated) word file in the "Pearson (Fasta)" format (sequence name with ">" in front, sequence starting on next line). Names have the paragraph format "Header 2". This allows a fast overview and reordering of the sequences in the outline view after hiding the "body text" (= the sequences themselves which are formated as "Normal"). And with the command "Table of contents" a list of sequence names is produced, only the leading ">"s have to be erased (search and replace helps for long lists). The list can be copied (don't include the section break at the end) and pasted into the name list of ClustToTree. So name the sequences in the file as you want them labeled in the tree drawing (but don't use ",;:()[]" or space, and stay below 29 characters). The table of contents/name list can be produced on the fly and should not be saved with the sequence file. To calculate the tree data, save the file as text only (don't overwrite the original), the best location is the folder containing ClustalV. If the formating stuff sounds too complicated or if your word processor does not use paragraph formats, save the sequence names as a separate list and update them as necessary, but caution: this list must be kept in the identical order as the file of sequences, otherwise the branches of the tree get the wrong labels.

OK, I take it!


[To our home page][Back to the archive]

Last edited: January 23, 1996 by KaiFr