Only the conserved modules are suited for the construction of a phylogenetic tree whose branch lengths reflect the evolutionary distances of the proteins. For proteins with two such modules, the better conserved copy (aminoproximal module of the Secretion/Neurotransmission family, carboxyproximal module of the Peroxisomal family) was used. In case of the CDC48p family, which is characterized by two well conserved modules, the aminoproximal copy was used. The sequences (in NBRF/PIR-format) were aligned for optimal similarity, and the tree calculated by Des Higgins' ClustalV (using a version which can cope with up to 100 sequences). The tree data were converted to the New Hampshire (nested parentheses) format with ClustToTree and the tree drawn with TreeDrawDeck from Don Gilbert. The new ClustToTree.complete allows conversion and drawing within one program (you still need ClustalV for alignment and tree calculation). All calculations were done on an Apple Macintosh
The module sequences (except those submitted as confidential, see my request for sequence submission), some additional modules not used for alignment, and modules of some further relatives of the superfamily are available as "Text only" and as a MS Word 5 file. The formated version allows an easy overview in Word's outliner mode and provides a fast method to overcome Clustal's limits for naming sequences (<11 characters).
For tree construction, save a copy as "Text only" and run an alignment and a tree calculation in ClustalV with the default settings. Have Word produce a Table of Contents from the formated version of the sequence file, using the Outline option. Convert the tree data with the Hypercard program ClustToTree, paste the Table of Contents in the text field showing the extracted names of the sequences after the first step of the conversion. Copy the resulting tree data, switch to TreeDrawDeck, choose a tree type and paste the data into the text field (in ClustToTree.complete, these steps are replaced by one mouseclick)*.
*With now far more than 100 known sequences, the AAA family has outgrown TreeDrawDeck. I now save the tree data as a text-only file (option in ClustToTree) and use TreeView (available for Mac and Windows) for tree drawing.
If you just want to check whether your sequence indeed belongs to the AAA family, and to which subfamily, align your sequence to a reduced set of sequences (available as text-only and MS Word file), resulting in faster alignment and a less cluttered tree. The set contains an AAA box of all AAA sequences from S. cerevisiae and from Methanococcus jannaschii, plus the odd A2126A from Mycobacterium leprae (representative of the ARC family), PHBI027 from Pyrococcus horikoshii (an archebacterial member of the meiosis family), and K04G2.3 from C. elegans and MTCY22G10.32c from Mycobacterium tuberculosis, both representing a subfamily of the "Homotypic Fusion" family. The set represents all known families and subfamilies and some sequences I consider as far relatives of the family, but others may think differently: AFG1, BCS1, YB36, and PIM1 from S. cerevisiae, and "S8" of M. jannaschii.
A full set of the complete AAA sequences (except those submitted as confidential, see my request for sequence submission) and distant relatives in the same format is available both as a MS Word 5 file and in "Text only" format.
The AAA-Tree is available in an object oriented (PICT) format to allow for high quality printouts and easy modification (colors, names of genes, additions...). Use it freely, but please mention who constructed it.
Back to the tree
Back to the list overview
Last edited: August 12, 1999 by KaiFr