Overview
The speech enhancement (SE) task aims to let researchers from various fields take part in the challenge. The instructions given in the Instruction section are designed with the following two things in consideration. Firstly, applications of reverberant speech enhancement are diverse, ranging from hearing aids to automatic speech recognition. Secondly, a universally accepted set of objective quality measures has not been fully established for evaluating reverberant speech enhancement algorithms. Therefore, we have decided to perform both objective and subjective evaluation and use several different objective measures. This means that it is not intended to determine a champion of the SE task. Rather the goal is to reveal relative merits and demerits of different approaches and also to elucidate the characteristics of each objective quality measure, which will be hopefully facilitating future research and development of reverberant speech enhancement algorithms.
The details of the objective measures used and the subjective evaluation procedure
that will be taken are described in the following.
Please refer to
the Instruction section for details about the challenge regulation.
Objective evaluation
The objective measures are split into mandatory ones and optional ones. The mandatory objective measures are as follows.
- Cepstrum distance: The cepstrum distance (CD) is based on the discrepancy between target and reference signals. For each test utterance, the corresponding clean signal is used as the reference. Thus, the CD is used only for SimData. The CD scores are calculated as per [1].
- Log likelihood ratio: The log likelihood ratio (LLR) is based on the discrepancy between target and reference signals. For each test utterance, the corresponding clean signal is used as the reference. Thus, the LLR is used only for SimData. The LLR scores are calculated as per [1].
- Frequency-weighted segmental SNR: The frequency-weighted segmental SNR (FWSegSNR) is based on the discrepancy between target and reference signals. For each test utterance, the corresponding clean signal is used as the reference. Thus, the FWSegSNR is used only for SimData. The FWSegSNR scores are calculated as per [1].
- Speech-to-reverberation modulation energy ratio: The speech-to-reverberation modulation energy ratio (SRMR) can be calculated only from target signals. Thus, the SRNR scores are used for both SimData and RealData. The SRMR scores are calculated as per [2].
- Computational cost: The average computational cost incurred for processing the test utterances is expected to be submitted along with the other quality measures. E.g., complexity, real-time-factor, latency, description of machine specification
In addition, we recommend submitting the results obtained with the following optional measures.
- Word error rate: The participants in the SE task are strongly encouraged to take part in the ASR task by using their enhancement algorithms as an front-end to the ASR baseline system provided by the challenge. Since the word error rate (WER) can be calculated only from target signals, the WER scores are used for both SimData and RealData.
- PESQ: The enhanced speech signals can also be evaluated in terms of PESQ (Percepsutal Evaluation of Speech Quality) and the PESQ scores can be submitted along with the numbers of the above described measures. Since PESQ requires reference signals, this can be used only for SimData.
Note: In general, it is required that you or your institution have a proper PESQ license when you publish research results obtained using the PESQ Software and this also applied to this Challenge. If you are planning to submit the PESQ scores (this is not mandatory) to the Speech Enhancement Task, please make sure that you or your institution have the license.
[1] Hu and Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE T-ASLP, 16(1), 229-238, 2008
[2] Falk, et al., "A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech," IEEE T-ASLP, 18(7), 1766-1774, 2010
Subjective evaluation
As part of this challenge a web-based subjective evaluation will be conducted. Both the challenge participants as well as other people are invited to take part in the subjective evaluation. A MUSRHA test will be conducted focusing on perceptual attributes such as perceived distance and overall speech quality. As a reference, the clean signal will be used from the SimData and the close talking recording will be used from the RealData. The listener is asked to listen to the examples using headphones (the type of headphones needs to be specified). Details will be announced when the final evaluation test set is released.
Speech enhancement evaluation tool
The SE evaluation tool is available at the Download section.