Inferring biological insight from high-dimensional data

Clustering and the unique approaches that ensemble approaches afford.


Clustering is a type of unsupervised learning approach to identify underlying structure in multidimensional data, based on exploring the data alone (i.e. without labels as occurs in supervised learning).  Ultimately, the structure uncovered in a solution is a hypothetical relationship of the data that may indicate meaning. This structure is different when the relationships of the data are perturbed, such as by transforming the data, or when alternate criteria are considered in the process (like using a different measure of distance between points or algorithms).  Therefore, one could cluster many times, making such perturbations, to explore the space of solutions.

Kristen Naegle developed ensemble approaches to clustering of biological data in her Ph.D. work that demonstrated that one can infer function of tyrosine phosphorylation from quantitative measurements of the dynamic changes of network phosphorylation in cells in response to growth factor stimulation.  During her post-doctoral work, Dr. Naegle went on to show that robustness in clustering was predictive of protein interactions and inferred novel interactions in the epidermal growth factor receptor network.

The Naegle lab has gone on to utilize these frameworks in collaborations with Valeria Cavalli, Linda Pike, and Paul Huang to explore a variety of biological problems from axonal degeneration to DDR2 signaling.

Additionally, team member Roman Sloutsky and Kristen Naegle posed frameworks for how to incorporate the noise that is inherent in biological data during the clustering process in order to understand how the relationships identified in clustering are altered when noise is considered. Team members Tom Ronan and Kristen Naegle wrote a review article on clustering and the unique approaches that ensemble approaches afford. This review was one of the most highly accessed articles of Science journals and made the home page.

Inferring biological insight from high-dimensional data

Science Home Page banner the week our review article was published (June 16, 2016).

Our Research Areas

  • Databases and resources for proteome-level PTM information

    A foundation of our work is the ability to have proteome information at our fingertips. This includes the current knowledge of tyrosine phosphorylation, quantitative measurements measured on those sites, and related protein annotations.  In enabling this research for our own lab, we also construct tools that can be used by the broader research community, with a focus on extendibility and reproducibility.

    More
  • Inferring biological insight from high-dimensional data

    Kristen Naegle developed ensemble approaches to clustering of biological data in her Ph.D. work that demonstrated that one can infer function of tyrosine phosphorylation from quantitative measurements of the dynamic changes of network phosphorylation in cells in response to growth factor stimulation.  During her post-doctoral work, Dr. Naegle went on to show that robustness in clustering was predictive of protein interactions and inferred novel interactions in the epidermal growth factor receptor network.

    More
  • SH2 domain binding

    A major piece of ongoing work in the lab is to develop methods that will allow us to identify what phosphotyrosines will be recognized by a binding domain. Specifically, we hope to push this area of research into arenas that allow us to predict the relative competition between domains for phosphotyrosine sequences and phosphotyrosine sequences for domains. This information will enable us to begin to predict the consequence of context differences between cells in response to the same extracellular cue. We will feel we have succeeded when these predictions can be used to explain complex network phenomena.

    More
  • Engineering enzymatic interactions

    A major barrier to the study of protein phosphorylation is the ability to create phosphorylated proteins for in vitro study. The Naegle lab has been developing a cheap and fast method for producing phosphorylated proteins that capitalizes on observations made of enzymatic specificity.

    More