Acoustic Pipeline for Speech-Based Emotion Detection
Many algorithms on speech-based emotion detection that leverage machine learning are published. Since they are only trained and tested on datasets that consist of audio clips in which the speaker emulates emotion such as anger, happiness, neutrality, and sadness. Despite the high accuracy that the algorithms have achieved, they are not suitable for real-life deployment for two reasons. First, the datasets are often times collected in laboratory environments where noises are minimum and the microphone is placed very close to the speaker, which is not representative of real-life environments in which background noises are present and people are not expected to be adjacent to the acoustic sensor(s) all the time. Second, each audio clip is uttered by an actor, and labeled with the emotion that the actor attempts to simulate. However, research shows that there is a significant discrepancy between the set of acoustic features indicative of emotions in acted speech and the set of acoustic features indicative of emotions in spontaneous speech. As a result, algorithms trained on acted speech may not achieve the same excellent performance when deployed in real-life environments to detect emotions in people’s speech.
This thesis explores different approaches to address the problem that high-performing machine learning classifiers on speech-based emotion recognition may not be fit for use in real-life deployment, and proposes an acoustical pipeline consisting of classifiers for emotion detection and speaker identification respectively. The pipeline is intended to be part of a smart healthcare system to monitor the users’ emotions.
Chair – Alfred Weaver (CS)
Advisor – John Stankovic (CS)
John Lach (ECE)