Naegle Lab Open Source Software


Ensemble clustering refers to the process of perturbing the data, the relationships between the data, or the metrics by which relationships are judged and clustering many times under these perturbations (please see our review for a fuller explanation). The lab has had great success in developing and applying ensemble clustering to biological data to gain insight in the physical systems. Examples of successful biological understanding that is gained from such approaches include:

Identifying transient, phosphotyrosine-directed interactions in the EGFR network by robustly clustering tyrosine phosphorylation dynamic data with our collaborator Forest White (paper)

Identifying context-specific interaction differences in the EGFR/HER2 networks (paper).

Separating the role of HDAC in driving gene expression changes, versus axon-end acetylation, from gene expression data with our collaborator Valeria Cavalli (paper)

Uncovering novel roles of proteins in the DDR2 (Discoidin Domain Receptor) from phosphoproteomic data with our collaborator Paul Huang (paper)

Separating the responses of the EGF receptor system to ligands and doses from luciferase complementation assay with our collaborator Linda Pike (paper)

The OpenEnsembles project is our open source Python project to implement ensemble clustering for a greater accessibility to the approach. This is an ongoing developmental project and incorporates our work initially developed in Matlab (MCAM). The project and code are here.


The OpenEnsembles project is an open source Python project to implement ensemble clustering for a greater accessibility to the approach (GitHub).

Our Research Areas

  • Databases and resources for proteome-level PTM information

    A foundation of our work is the ability to have proteome information at our fingertips. This includes the current knowledge of tyrosine phosphorylation, quantitative measurements measured on those sites, and related protein annotations.  In enabling this research for our own lab, we also construct tools that can be used by the broader research community, with a focus on extendibility and reproducibility.

  • Inferring biological insight from high-dimensional data

    Kristen Naegle developed ensemble approaches to clustering of biological data in her Ph.D. work that demonstrated that one can infer function of tyrosine phosphorylation from quantitative measurements of the dynamic changes of network phosphorylation in cells in response to growth factor stimulation.  During her post-doctoral work, Dr. Naegle went on to show that robustness in clustering was predictive of protein interactions and inferred novel interactions in the epidermal growth factor receptor network.

  • SH2 domain binding

    A major piece of ongoing work in the lab is to develop methods that will allow us to identify what phosphotyrosines will be recognized by a binding domain. Specifically, we hope to push this area of research into arenas that allow us to predict the relative competition between domains for phosphotyrosine sequences and phosphotyrosine sequences for domains. This information will enable us to begin to predict the consequence of context differences between cells in response to the same extracellular cue. We will feel we have succeeded when these predictions can be used to explain complex network phenomena.

  • Engineering enzymatic interactions

    A major barrier to the study of protein phosphorylation is the ability to create phosphorylated proteins for in vitro study. The Naegle lab has been developing a cheap and fast method for producing phosphorylated proteins that capitalizes on observations made of enzymatic specificity.