All Invited
Committee Members:
- Tariq Iqbal (chair) - ESE
- Don Brown (advisor) - ESE/SDS
- Afsaneh Doryab - ESE
- Gary K. Owens - UVA School of Medicine
Zoom Meeting Information: Please email ese-programs@virginia.edu for the zoom information.
Title: Unsupervised Domain Adaptation and Contrastive Learning for insufficiently labeled data
Abstract:
Recently, modern deep learning-based approaches have become popular over traditional methods in many real-world applications. However, the success of these approaches relies on (1) access to the massive amount of labeled data for training and (2) independent and identically distributed (i.i.d) assumption of training and test datasets. In many applications, the target tasks are diverse, and collecting high-quality labeled data for them is expensive. It would be more challenging when we are dealing with the semantic segmentation task in which we assign a unique label to every single pixel in the image. Alternatively, most of the available datasets in practice are labeled partially or only a small amount of them are labeled. The main goal of this research is to take the advances in deep learning, transfer learning, and domain adaptation techniques to develop robust deep models for situations where the target dataset is labeled insufficiently.
For preliminary research, I developed the Label-efficient Contrastive learning-based (LECL) model to detect and classify various types of nuclei in 3D immunofluorescent images using weak annotations (i.e., point annotations). Developing and training weakly-supervised learning models for 3D images is a challenging task because these images contain multiple channels (z-axis) for nuclei and different markers separately, which makes training using point annotations difficult. Previous methods use Maximum Intensity Projection (MIP) to convert immunofluorescent images with multiple slices to 2D images, which can cause signals from different z-stacks to falsely appear associated with each other. To overcome this we devised a novel approach called Extended Maximum Intensity Projection (EMIP) that addresses issues using MIP. Furthermore, we perform a Supervised Contrastive (SupCon) learning approach for weakly supervised settings more conservatively only over those areas where their labels are reliable. Experiments indicate the effectiveness and efficacy of the proposed framework on cardiovascular datasets. For further minimization of the labeling cost in 3D immunofluorescent images, I plan to develop an efficient domain adaptation technique. We assume each cell type as an individual domain and plan to develop a novel domain adaptation technique to improve the segmentation/classification performance of the trained model on one specific marker for other markers with different distributions without requiring any additional labeling work.
The secondary aim of this research is to focus on self-training Domain Adaptation (DA) techniques to improve the generalization ability of the deep models on the unlabeled or scarce-labeled target tasks by training the model on both label-scarce target and label-rich source data. There are some challenges with these techniques that will be addressed in this study. (1) Because the target data is unlabeled, these techniques utilize the pseudo labels for training the model on the target data. However, these pseudo-labels may be noisy and unreliable, and utilizing them may lead to error accumulation. (2) Domain adaptation techniques often are based on this strong assumption that the annotation of the source domain is complete and accurate. However, this assumption is violated in most practical tasks, because not only is full-annotation expensive and time-consuming, but also it is sometimes impossible. For example, noisy labels are inevitable in medical data analysis because of the subjectivity of domain experts, incomplete discriminative information, etc. Our goal is to develop a generalized DA model that addresses these challenges.