ClustToTree.complete

The program "ClustalV" aligns sequences (DNA or protein) for maximal similarity and can calculate phylogenetic trees from the alignment. The output data (.nj files) are in a rather unusual format. The Hypercard program ClustToTree.complete converts these data to the " New Hampshire Standard"-format. Additionally, it extracts the sequence names from the alignment files (.aln files) and allows to edit the names, which have been truncated to 10 characters by ClustalV. It then allows to draw rooted and unrooted trees from the data and to save them as PICT files for further modification. For more options for the graphic output, the converted tree data can also be copied and used in the "TreeDrawDeck" from Don Gilbert.

Hints for use

Produce an alignment in ClustalV using the default format (producing a .aln file), then continue with the tree calculation (.nj file). ClustToTree requests both files to be in the same folder, which they are if they are not moved. Either choose button "ClustToTree" in ClustToTree.complete, then specify the .aln file. The resulting list of sequence names can be altered almost ad libitum: only comma, semicolon, colon, brackets, square brackets or space (use underscore instead, it will be converted to space) are not allowed in the names, because these structure the trees in the New Hampshire Standard format, and the name should not have more than 29 characters if "TreeDrawDeck" or "ClustToTree.complete" will be used for drawing the tree, because these programs ignore additional characters; the name list can be saved for repeated use. Alternatively, you can switch to an already saved list of sequence names (arrow buttons) and tell the program to use that (button "Use Namelist"); in this case the program will want to know which .nj file to use. Choose rooted or unrooted tree, the tree will be displayed after a short calculation and can be saved to disk as a PICT file which can be edited further with Canvas, MacDraw... The New Hampshire tree description can be copied (use button "Copy Tree Data", and don't try to hide the tree before saving its data!) for use in other tree drawing programs ("TreeDrawDeck" has several more basic tree types and lots of parameters to play around with).

ClustalW notes

The new ClustalW uses an improved pairwise aligning algorithm (which however takes far longer) and a better algorithm for tree calculation, and its tree output is now in the New Hampshire Standard format (which you can copy and paste directly into "Tree Draw Deck"). ClustToTree is still helpful if you want to modify the sequence names (e.g. untruncate them), especially with its option to save and reuse such a name list. You have to explicitly request the output in the old ClustalV tree format (the new default format is the New Hampshire Standard, .ph file) before starting the tree calculation.

Kai´s Personal Tip

Editing the names can be time consuming and boring, especially if trees of the same protein family are drawn repeatedly (e.g. to add newly described sequences). I collect the sequences of "my" protein superfamily in a (regularly updated) word file in the "Pearson (Fasta)" format (sequence name with ">" in front, sequence starting on next line). Names have the paragraph format "Header 2". This allows a fast overview and reordering of the sequences in the outline view after hiding the "body text" (= the sequences themselves which are formated as "Normal"). And with the command "Table of contents" a list of sequence names is produced, only the leading ">"s have to be erased (search and replace helps for long lists). The list can be copied (don't include the section break at the end) and pasted into the name list of ClustToTree. So name the sequences in the file as you want them labeled in the tree drawing (but don't use ",;:()[]" or space (use underscore instead, it will be converted to space), and stay below 29 characters). The table of contents/name list can be produced on the fly and should not be saved with the sequence file. To calculate the tree data, save the file as text only (don't overwrite the original), the best location is the folder containing ClustalV. If the formating stuff sounds too complicated or if your word processor does not use paragraph formats, save the sequence names as a separate list and update them as necessary, but caution: this list must be kept in the identical order as the file of sequences, otherwise the branches of the tree get the wrong labels.

Gimme dat ding!


[To our home page][Back to the Archive]

Last edited: January 23, 1996 by KaiFr