MCAM is the ensemble clustering method for unsupervised learning published here. Clustering is a method that seeks to group multidimensional data such that similar objects are in a group and in different group than dissimilar objects. However, since this definition changes in vector space if you alter the 1) distance metric used, 2) the transformation used, 3) the algorithm used, or 4) the number of clusters sought, then all solutions are hypotheses about the structure of the underlying data and there are a myriad of possible hypotheses for the same data. Therefore, in ensemble clustering, we take into account many (thousands sometimes) possible solutions. This code was written for ensemble clustering in Matlab.
A foundation of our work is the ability to have proteome information at our fingertips. This includes the current knowledge of tyrosine phosphorylation, quantitative measurements measured on those sites, and related protein annotations. In enabling this research for our own lab, we also construct tools that can be used by the broader research community, with a focus on extendibility and reproducibility.
Kristen Naegle developed ensemble approaches to clustering of biological data in her Ph.D. work that demonstrated that one can infer function of tyrosine phosphorylation from quantitative measurements of the dynamic changes of network phosphorylation in cells in response to growth factor stimulation. During her post-doctoral work, Dr. Naegle went on to show that robustness in clustering was predictive of protein interactions and inferred novel interactions in the epidermal growth factor receptor network.
A major piece of ongoing work in the lab is to develop methods that will allow us to identify what phosphotyrosines will be recognized by a binding domain. Specifically, we hope to push this area of research into arenas that allow us to predict the relative competition between domains for phosphotyrosine sequences and phosphotyrosine sequences for domains. This information will enable us to begin to predict the consequence of context differences between cells in response to the same extracellular cue. We will feel we have succeeded when these predictions can be used to explain complex network phenomena.
A major barrier to the study of protein phosphorylation is the ability to create phosphorylated proteins for in vitro study. The Naegle lab has been developing a cheap and fast method for producing phosphorylated proteins that capitalizes on observations made of enzymatic specificity.