Recurring local sequence motifs in proteins

TitleRecurring local sequence motifs in proteins
Publication TypeJournal Article
Year of Publication1995
AuthorsHan, K. F., & Baker D.
JournalJournal of molecular biology
Date Published1995 Aug 4
KeywordsCluster Analysis, Helix-Loop-Helix Motifs, Primary Publication, Protein Conformation, Proteins, Sequence Alignment, Sequence Homology, Amino Acid, Statistics as Topic

We describe a completely automated approach to identifying local sequence motifs that transcend protein family boundaries. Cluster analysis is used to identify recurring patterns of variation at single positions and in short segments of contiguous positions in multiple sequence alignments for a non-redundant set of protein families. Parallel experiments on simulated data sets constructed with the overall residue frequencies of proteins but not the inter-residue correlations show that naturally occurring protein sequences are significantly more clustered than the corresponding random sequences for window lengths ranging from one to 13 contiguous positions. The patterns of variation at single positions are not in general surprising: chemically similar amino acids tend to be grouped together. More interesting patterns emerge as the window length increases. The patterns of variation for longer window lengths are in part recognizable patterns of hydrophobic and hydrophilic residues, and in part less obvious combinations. A particularly interesting class of patterns features highly conserved glycine residues. The patterns provide a means to abstract the information contained in multiple sequence alignments and may be useful for comparison of distantly related sequences or sequence families and for protein structure prediction.

Alternate JournalJ. Mol. Biol.
han95A.pdf297.22 KB