Aidong Zhang adds machine learning expertise to biomedical data sciences at UVA
charliefeigenoff@gmail.comOne measure of the success of UVA’s campaign to take its place as a national leader in the data sciences is its ability to attract highly regarded researchers to join the faculty. If Aidong Zhang’s decision to join UVA Engineering is any indication, the University is well on its way.
Zhang has been singled out for some of the most prestigious honors in her field. She is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and a Fellow of the Association for Computing Machinery (ACM) for her contributions to bioinformatics, data mining and multimedia data indexing. Furthermore, she comes to Grounds from a three-year stint as a program director for the National Science Foundation’s Division of Information and Intelligent Systems. NSF program directorships are traditionally reserved for senior scholars chosen for their breadth of experience and vision for the future.
When she visited Charlottesville, Zhang met with Philip Bourne, director of the Data Science Institute and a professor of biomedical engineering, as well as faculty from both the computer science and biomedical engineering departments. “I was impressed by the support at the highest levels for the data sciences, the opportunities for interdisciplinary collaboration and, in particular, by the strong interest in applying machine learning to a variety of biomedical applications, which is my specialty,” Zhang said. “I thought this would be a great environment for me.”
As part of the University’s cluster hiring program, which seeks faculty members who work in a particular interdisciplinary field with the potential for broad social impact, Zhang joined UVA Engineering as a William Wulf Faculty Fellow and Professor of Computer Science with joint appointments in Biomedical Engineering and the Data Science Institute. The extent of this appointment reflects the range of her interests.
![]()
"Biomedical researchers face a classic ‘Big P Small N’ problem. The data they collect tends to be highly multidimensional—the parameters they track might include genomic data, for instance, as well as clinical data—but the number of samples they study is typically small, covering less than 1000 patients. "
Aidong Zhang, PhD, William Wulf Faculty Fellow
Adaptive Machine Learning
For instance, one of the underlying principles of deep learning is that a substantial number of samples are required to draw valid conclusions. The more parameters, the more samples are needed.
“Biomedical researchers face a classic ‘Big P Small N’ problem,” Zhang said. “The data they collect tends to be highly multidimensional—the parameters they track might include genomic data, for instance, as well as clinical data—but the number of samples they study is typically small, covering less than 1,000 patients.” Zhang is designing adaptive machine learning to address this challenge.
Automated Indexing
Zhang is also applying her expertise in machine learning to make it easier for researchers to pull relevant information from the millions of publications and papers in the National Library of Medicine, which is maintained by the National Institutes of Health. Currently, these publications are indexed manually, a laborious process. Zhang and her students are participating in the ongoing BioASQ competition to develop ways to automate the indexing of these documents using standard medical subject headings. Last year, Zhang’s program outperformed all competitors for accuracy. “Many universities from around the world are participating in this competition,” she says. “We are very excited by our results.”
Personalized Medicine
In another instance of her far-ranging research, Zhang is working to overcome a critical barrier to the introduction of personalized medicine. In the future, physicians might offer personalized care to an individual based on the outcomes derived from applying machine learning to a cohort of similar patients. Doing this accurately requires algorithms capable of learning how to ensure that the base cohort is always similar in relevant ways to the patient, a problem that is compounded as a patient ages or his or her disease progresses.
“Our work will have a direct impact on how to measure similarity and define the precise group, at any moment, from which to build a personalized model,” Zhang said.