Complete set of data of the AAA family tree poster published in the Journal of Cell Science

Summary

The AAA (for ATPases Associated with various cellular Activities; suggested by Kunau, W.H. et al. 1993, Biochimie 75, 209-224) protein superfamily is characterized by a highly conserved module of approximately 230 amino acid residues (AAA box) including an ATP binding consensus, present in one or two copies in the AAA proteins. AAA proteins are found in all organisms (Archaea, Eubacteria, Eukaryota: Protista, Fungi, Plants, Animals) and are essential for, e.g., cell cycle functions, vesicular transport, mitochondrial functions, peroxisome assembly, and essential functions of proteolysis.

This unrooted phylogenetic tree of the AAA superfamily is derived from an alignment of 345 AAA boxes from 316 AAA proteins in 92 species. For proteins with a duplicate AAA box, the more conserved box was used for tree construction (amino proximal box of the secretion/neurotransmission proteins, carboxy proximal box of the peroxisomal proteins). As most of the members of the CDC48/homotypic fusion family contain two well conserved AAA boxes, both copies were included in the tree for these proteins. There are a few exceptions for which only one box was used (carboxy proximal AAA box of Ce K04G2.3, Myt MTCY22G10.32c, amino proximal AAA box of, e.g., Hs BAB14017.1 and Sc YTA7).
I have omitted some sequences which have been classified as AAA proteins by others, e.g. AFG1 and BCS1 from S. cerevisiae, or PRS2 from M. jannaschii. As is the case with many classifications concerning nature, there is no clear-cut border between AAA proteins and non-members. All AAA sequences used in this tree form a distinct cluster, and have clearly shorter branches than any of the farther relatives.

The protein sequences of the respective AAA boxes were aligned for optimal similarity, and the tree data were calculated by Clustal W (Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. Nucl. Acids Res. 22, 4673-4680) with default settings (gaps not excluded, Blosum matrix). The tree was drawn with TreeView (Page, R.D.M. 1996. Comput. Appl. Biosci. 12, 357-358). During further editing with Canvas (DENEBA Software), some crossing branches were untangled, but the branch lengths or branching points were left unchanged#. The reliability of the tree structure was verified by bootstrapping (1000 trials) with Clustal W. Internal branches which occur in >80% of the bootstrap samples have been marked by filled circles. While most of the proposed families and subfamilies (Pex1/Pex6, proteasome subunits) are confirmed by the bootstrapping, the boundaries of the CDC48/homotypic fusion family appear less well defined for both of its AAA boxes.

The molecular cartoons illustrate the overall structure of some AAA proteins: homohexameric rings for the proteins with a duplication of the AAA box (as the structure of the Pex proteins is still uncertain, a monomer is shown as an alternative structure) and the ARC family, a ring of six different AAA proteins forming the base of the 19S lid on both sides of the proteasome core, and the YTA10/YTA12 heterohexamer and YME1 homohexamer in the inner mitochondrial membrane with their different orientation due to the loss of a transmembrane domain in YME1.

Data

Sequences of the AAA boxes used for the alignment (NBRF/PIR-format)
Numbered AAA box sequences *
Numbered names corresponding to Numbered AAA box sequences *
Alignment produced with Clustal W
Tree file in New Hampshire (nested parentheses) format extracted with ClustToTree from the ".nj" tree file of Clustal W.
Original tree drawing by TreeView (PICT format)
Extract of the bootstrapping results (1000 Cycles, Random number generator seed = 932): Branching order and branch lengths, the last number in each line is the score of confirming bootstrap cycles.

* To allow the tree branches maximal space in the primary drawing (reducing branch and label convolution), I shortened the branch labels by replacing them with numbers. In the final drawing, I manually inserted the names.
# "For practical reasons, the Pex/C-terminal Cdc48 family cluster is derived from a separate calculation. For bootstrapping, the complete dataset has been used."
Actually, I had constructed the tree without the C-terminal AAA boxes of the Cdc48 relatives (sequences numbered 401-441), and have been just too lazy to rearrange all the labels and branches when I added them (which otherwise is an essential addition to represent the evolution of the domain). So I recalculated the tree with all sequences (resulting in this tree drawing), but grafted just its Pex/C-terminal Cdc48 family branch to the old tree (replacing the original Pex branch).

References and Links


[To my home page]

I am glad about corrections, additions (new sequences, new functions), suggestions, questions, requests, and praise.

Kai-Uwe Fr?hlich
Last edited: February 22, 2002