Title: Real-Time Dynamic Physical Phenomenon Adapted Machine Learning Pipeline for Smart Acoustics
One vision for modern cyber-physical systems in smart environments is an underlying acoustic processing infrastructure providing various services to the residents of a smart space. Years of research on this domain have resulted in commercial in-situ and smartphone based systems which can do spoken conversation, take spoken commands, recognize speaker, and can detect agitated behavior, physiological sounds, disease sounds and body specific sounds. Based on the context, some of these systems are subject to some dynamic physical phenomena that may affect the performance of the underlying machine learning algorithms, addressing which has mostly remained an open problem. The overarching goal of this research is to utilize real-time sensing of such dynamic physical phenomena and adapt the underlying machine learning pipeline for better recognition of certain acoustic events.
To demonstrate our hypothesis, we have developed an acoustic event detection platform called LocoVocal, and three other solutions called mLung, DeepLung, and SocialSense as part of developing this thesis. LocoVocal is a novel in-situ human acoustic activity detection platform which localizes human subjects inside a room in real-time and chooses an appropriate location adapted acoustic model from a set of pre-computed models, and passes it to the application running on top of LocoVocal for improved performance. LocoVocal has been evaluated with three different applications: speech recognition, speaker identification, and distant emotion recognition to demonstrate its versatility. Two other novel solutions, mLung and DeepLung, have been developed in this thesis which act as a longitudinal monitoring service in the smartphone for pulmonary patients and caregivers. mLung, and its deep learning counterpart DeepLung, are privacy-preserving mobile-cloud hybrid services for lung anomalous sound (like cough, wheeze) detection using classifiers in the cloud. Both systems filters speech in-phone for patient privacy, sense respiratory cycles in real-time using the inertial sensors as the phone is held by the patient’s chest, and windows the audio dynamically based on the respiratory cycle for the in-cloud classifiers. Finally, we present SocialSense, which acts as a social interaction monitoring application in the smartphone detecting when a phone user is speaking, with who the user is speaking by discovering neighboring phones (assuming every phone is mapped to a unique user), and the mood of the user. By periodic Bluetooth neighbor sensing and collaboration among phones, SocialSense knows who is joining or leaving a social interaction, and maintains a non redundant classification model by keeping training instances only from people who are present in the social interaction.
Committee: Jack Davidson (Chair); Jack Stankovic (Advisor); Gabriel Robins; Brad Campbell; Laura Barnes (Minor Representative)