Aidong Zhang adds machine learning expertise to biomedical data sciences at UVA

One measure of the success of UVA’s campaign to take its place as a national leader in the data sciences is its ability to attract highly regarded researchers to join the faculty. If Aidong Zhang’s decision to join UVA Engineering is any indication, the University is well on its way.

Zhang has been singled out for some of the most prestigious honors in her field. She is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and a Fellow of the Association for Computing Machinery (ACM) for her contributions to bioinformatics, data mining and multimedia data indexing. Furthermore, she comes to Grounds from a three-year stint as a program director for the National Science Foundation’s Division of Information and Intelligent Systems. NSF program directorships are traditionally reserved for senior scholars chosen for their breadth of experience and vision for the future.

When she visited Charlottesville, Zhang met with Philip Bourne, director of the Data Science Institute and a professor of biomedical engineering, as well as faculty from both the computer science and biomedical engineering departments. “I was impressed by the support at the highest levels for the data sciences, the opportunities for interdisciplinary collaboration and, in particular, by the strong interest in applying machine learning to a variety of biomedical applications, which is my specialty,” Zhang said. “I thought this would be a great environment for me.”

As part of the University’s cluster hiring program, which seeks faculty members who work in a particular interdisciplinary field with the potential for broad social impact, Zhang joined UVA Engineering as a William Wulf Faculty Fellow and Professor of Computer Science with joint appointments in Biomedical Engineering and the Data Science Institute. The extent of this appointment reflects the range of her interests.

"Biomedical researchers face a classic ‘Big P Small N’ problem. The data they collect tends to be highly multidimensional—the parameters they track might include genomic data, for instance, as well as clinical data—but the number of samples they study is typically small, covering less than 1000 patients. "

Aidong Zhang, PhD, William Wulf Faculty Fellow

Adaptive Machine Learning

For instance, one of the underlying principles of deep learning is that a substantial number of samples are required to draw valid conclusions. The more parameters, the more samples are needed.

“Biomedical researchers face a classic ‘Big P Small N’ problem,” Zhang said. “The data they collect tends to be highly multidimensional—the parameters they track might include genomic data, for instance, as well as clinical data—but the number of samples they study is typically small, covering less than 1,000 patients.” Zhang is designing adaptive machine learning to address this challenge.

Automated Indexing

Zhang is also applying her expertise in machine learning to make it easier for researchers to pull relevant information from the millions of publications and papers in the National Library of Medicine, which is maintained by the National Institutes of Health. Currently, these publications are indexed manually, a laborious process. Zhang and her students are participating in the ongoing BioASQ competition to develop ways to automate the indexing of these documents using standard medical subject headings. Last year, Zhang’s program outperformed all competitors for accuracy. “Many universities from around the world are participating in this competition,” she says. “We are very excited by our results.”

Personalized Medicine

In another instance of her far-ranging research, Zhang is working to overcome a critical barrier to the introduction of personalized medicine. In the future, physicians might offer personalized care to an individual based on the outcomes derived from applying machine learning to a cohort of similar patients. Doing this accurately requires algorithms capable of learning how to ensure that the base cohort is always similar in relevant ways to the patient, a problem that is compounded as a patient ages or his or her disease progresses.

“Our work will have a direct impact on how to measure similarity and define the precise group, at any moment, from which to build a personalized model,” Zhang said.

Did you know?

The department is home to two nationally funded student training programs in systems biology and biomedical data science.

NIH Training Grant in Biomedical Data Science, Jason Papin (PI)

A pre-doctoral training program focused on teaching scientists to work at the interface of computer science, statistics, big data and biomedicine.

NSF REU in Multiscale Systems Bioengineering, Timothy Allen (PI)

Each summer, UVA trains undergraduates from a variety of STEM backgrounds in the skills, confidence, and mentorship necessary for successful careers in the exciting field of systems bioengineering.