Facial expression recognition boosted by soft label with a diverse ensemble
PATTERN RECOGNITION LETTERS
Authors: Gan, Yanling; Chen, Jingying; Xu, Luhui
Facial expression recognition (FER) has recently attracted increasing attention with its growing applications in human-computer interaction and other fields. But a well-performing convolutional neural network (CNN) model learned using hard label/single-emotion label supervision may not obtain optimal performance in real-life applications because captured facial images usually exhibit expression as a mixture of multiple emotions instead of a single emotion. To address this problem, this paper presents a novel FER framework using a CNN and soft label that associates multiple emotions with each expression. In this framework, the soft label is obtained using a proposed constructor, which mainly involves two steps: (1) training a CNN model on a training set using hard label supervision; (2) fusing the latent label probability distribution predicted by the trained model to obtain soft labels. To improve the generalization performance of the ensemble classifier, we propose a novel label-level perturbation strategy to train multiple base classifiers with diversity. Experiments have been carried out on 3 publicly available databases: FER-2013, SFEW and RAF. The results indicate that our method achieves competitive or even better performance (FER-2013: 73.73%, SFEW: 55.73%, RAF: 86.31%) compared to state-of-the-art methods. (C) 2019 Published by Elsevier B.V.
Multiple Attention Network for Facial Expression Recognition
Authors: Gan, Yanling; Chen, Jingying; Yang, Zongkai; Xu, Luhui
One key challenge in facial expression recognition (FER) is the extraction of discriminative features from critical facial regions. Because of their promising ability to learn discriminative features, visual attention mechanisms are increasingly used to address pattern recognition problems. This paper presents a novel multiple attention network that simulates humans & x2019; coarse-to-fine visual attention to improve expression recognition performance. In the proposed network, a region-aware sub-net (RASnet) learns binary masks for locating expression-related critical regions with coarse-to-fine granularity levels and an expression recognition sub-net (ERSnet) with a multiple attention (MA) block learns comprehensive discriminative features. Embedded in the convolutional layers, the MA block fuses diversified attention using the learned masks from the RASnet. The MA block contains a hybrid attention branch with a series of sub-branches, where each sub-branch provides region-specific attention. To explore the complementary benefits of diversified attention, the MA block also has a weight learning branch that adaptively learns the contributions of the different critical regions. Experiments have been carried out on two publicly available databases, RAF and CK & x002B;, and the reported accuracies are 85.69 & x0025; and 96.28 & x0025;, respectively. The results indicate that our method achieves competitive or better performance than state-of-the-art methods.