Abstract: Culture refers to the cumulative knowledge, beliefs, values and concepts that are accepted by a group of people. Such information are shared and inherited from the previous generations in order for one to be blended and accepted in a society. Different cultural groups communicate differently that is distinct and unique making homogeneous interpretation of underlying emotional contents are more accurate. However, universality of cultural-influenced speech can be observed when cross cultural speeches are being interacted from different cultural groups to one another especially with the advancement of communication technology. In this study, two different cultural-influenced speech datasets representing American (NTU-American) and European (Netherland EmoSpeech) are employed to investigate their similarity and dissimilarity in term of heterogeneous listeners perception on the underlying emotional contents. The Mel Frequency Cepstral Coefficient (MFCC) feature extraction method and Multi Layer Perceptron (MLP) classifier are coupled to determine four different emotions, namely; anger, happiness, sadness and neutral acting as emotionless state. From the experimental result, it is noted that the proposed approach yielded accuracy performance of two times better than chance guessing. Moreover, the Netherland EmoSpeech dataset managed to obtain comparative accuracy with the established NTU-American dataset demonstrating that the data is satisfactory for speech emotion recognition purposes.
Norhaslinda Kamaruddin, Abdul Wahab, Muhammad Jaliluddin Mazlan and Norul Ayny Norzilan, 2016. Universality and Diversity of Cultural-Influenced Speech Emotion Recognition System. The Social Sciences, 11: 3828-3832.