Links to the phylogenetic trees and associated data I have shown on the AAA meetings in San Diego and in Kyoto
The phylogenetic tree of the AAA superfamily I have presented on the 3rd AAA meeting is available as a GIF file or as the original layered Canvas file. The tree has been constructed from almost 200 sequences. Name labels have been replaced by "letter number" (A1, A2,...A9, A0, B1...S9) to allow for a maximal resolution of the tree (the tree drawing program fits name labels and tree on one page - the shorter the names, the larger the tree). The tree still looks nicer without the 200 labels, so they can be hidden in the Canvas file (and have been omitted from the GIF file). Sequence names correlating to the shortcuts are available as a text file.
To construct the tree, I started with a text file of the AAA box sequences (in NBRF/PIR-format), had Clustal W align the sequences and calculate the tree data, and produced the drawing using TreeView. The picture was modified in Canvas by moving the name labels to another drawing layer (to allow hiding them), and by adding a layer containing the names of families and subfamilies. The tree drawing was colorized, but not altered otherwise.
To view just a representative sample of the AAA proteins, have a look at a tree showing the complete set of AAA members of C. elegans (representing metazoan eukaryotes), S. cerevisiae (representing unicellular eukaryotes), E. coli (representing eubakteria), and M. janaschii (representing archaea).
The trees I showed at the Kyoto meeting were derived from searches of the genome databases of C. elegans, D. melanogaster, H. sapiens, and A. thaliana with archaean AAA boxes (Methanococcus proteasomal cap subunit and CDC48 D1).
The sequences returned from the blast search were extracted with a computer macro and directly used for alignment and tree construction, combined with a representative set of AAA boxes (all yeast sequences, a eubacterium, an archaeon). In a second round, duplicated sequences and far relatives were omitted and the tree recalculated:
The color of the branches indicates the AAA (sub)family, the sequence names are colored according to the host organism: red for animals, green for the plant, brown for eu- and black for arche-bacteria, and blue for yeast (Do you still remember the reason?).
The statistics of the search results (numbers of the AAA members in each of these species) and the Web addresses of the databases used are listed on a separate page.
- C.e.: There was no difference from the search result from 1999, indicating either that the sequences have been perfect from the beginning, or have not been redone since. I only updated some sequence names.
H.s.: I added two human AAA genes, BAB14017.1 and BAB14482.1 from an EMBL database search, not (yet) represented in the "complete" genome
. The tree shows both the "classical" name and the label from the human genome project for each human sequence. The blast search produced several pairs of sequences which were either identical, or one sequence was included in another one (sometimes with a sequence difference in the first one or two residues, probably sequencing errors). In these cases, the shorter one is always named XP_..., and the complete one NP_... I assume that the XPs are more provisorical annotations. With one exception (XP_004304.1), all XP´s had an NP equivalent, so I omitted them.
A.t.: I have updated the Arabidopsis sequence names of the Zn-protease subfamily after the meeting, using the new unified nomenclature which will be published by Zach Adam et al. in April, 2001 in Plant Physiology. Subfamiliy members containing a Zn-binding consensus (HExxH) are now named FtsH1 to FtsH9.
I have set those sequences which lack the consensus in brackets. These sequences still contain a well conserved AAA box, so they could act as chaperones, or form mixed complexes with proteolytically active subunits, similar to the beta-subunits of the proteasomal core in eukaryotes.
Three sequences (K19M22.7 = BAB09632.1, F22G5.10 = AAF79577.1, and MFH8.11 = BAB08420.1) have not yet been included in the new nomenclature, though they contain the Zn-binding motiv. I assume they will soon be renamed.
The tree drawing showing all currently known AAA proteins and the corresponding data (sequences, alignment, bootstrapping data) can be accesssed from a separate page. The tree is published as a poster in the Journal of Cell Science in the section Science at a Glance.
Last edited: February 22, 2002 by KaiFr
Dr. Kai-Uwe Fr?hlich