Human Vocal Event Detection for Realistic Health-care Applications
Supported by rapid innovations in machine learning, signal processing, and internet of things technologies, the concept of passive sensing is redesigning almost every aspect of our lives. Innovating novel, low cost and noninvasive sensing techniques to model/identify human events (i.e., emotions, mental disorder, etc.) has become one of the core research interests. Advancement in passive sensing has made the development and operation of complex human health monitoring systems technically feasible. Automated and passive human event sensing can improve assessment and treatment of mental disorders, monitoring and care of patients suffering from agitation, dementia or stroke rehabilitation, extensively reduce the work-load of caregivers, and provide more timely and accurate responses to crisis. Sound is ubiquitous in the expression of human events and its surrounding environment. According to multiple studies, sound as a modality conveys bio markers of our mental and behavioral states or events. The major scopes of research on human audio event detection are: detection of speech emotion, assessment of mental disorders, behavioral and ambient human event detection. Despite the rapid growth of interest in audio sensing for health applications in recent years, yet, accuracy of detection or modeling human verbal events is far from desirable to have any practical implication. This is due to some open challenges, such as, distortion of acoustic features with variation of speaker to microphone distances, unavailability of strongly labeled audio data, expression of verbal events through consolidation of prosody and context of speech, ambiguity in lexical speech content, limitation of available training data, etc. In this dissertation, I will present my recent and ongoing research to demonstrate that development and application of novel and adaptive feature engineering approaches, such as, adaptive feature selection, synthetic data generation, and effective feature representation generation, can address the open challenges of human vocal event detection in the scope of health monitoring. With this goal in mind, we have built four automated vocal event detection frameworks that addresses the open challenges in the four major scopes of interest. Finally, I will discuss the limitations of the presented solutions and lay out my future plans for future improvements.
- John A. Stankovic (Advisor)
- Hongning Wang (Chair)
- Yanjun Qi
- Yuan Tian
- John Lach (Minor Representative)