Identifying which residues in a protein are important to determining its specificity (for example, why the LacI repressor protein is regulated by lactose, whereas its homolog FruR is regulated by fructose) is an important problem in biology. If we could uncover from sequence alone what residues determine specificity, we could: 1) understand the effect or importance of mutations, 2) build predictions for ligands, or 3) understand the complexity of evolutionary divergence. In this work, we utilize previously developed ideas in the sequence alignment field, which is that a position that determines specificity is likely to be one that is conserved within one specificity group, but conserved to something else in other specificity groups. In this software implementation, we make two important changes:
We relax the requirement that a position must determine specificity in all groups. With this relaxation we greatly increase the ability to identify positions of specificity, where degeneracy (non-conservation) in some groups can be explained by the relaxation of the use of that residue in a particular family.
We use ensemble alignments to build statistical distributions of SDPs. As the number of sequences in an alignment increases, the quality of the alignment decreases. Therefore, we fix any one alignment to a smaller number of sequences and resample from thousands of sequences to improve the estimation of SDPs.
A foundation of our work is the ability to have proteome information at our fingertips. This includes the current knowledge of tyrosine phosphorylation, quantitative measurements measured on those sites, and related protein annotations. In enabling this research for our own lab, we also construct tools that can be used by the broader research community, with a focus on extendibility and reproducibility.
A major piece of ongoing work in the lab is to develop methods that will allow us to identify what phosphotyrosines will be recognized by a binding domain. Specifically, we hope to push this area of research into arenas that allow us to predict the relative competition between domains for phosphotyrosine sequences and phosphotyrosine sequences for domains. This information will enable us to begin to predict the consequence of context differences between cells in response to the same extracellular cue. We will feel we have succeeded when these predictions can be used to explain complex network phenomena.
A major barrier to the study of protein phosphorylation is the ability to create phosphorylated proteins for in vitro study. The Naegle lab has been developing a cheap and fast method for producing phosphorylated proteins that capitalizes on observations made of enzymatic specificity.