Overview
You can use all approaches you want for achieving the best recognition results as long as you respect the instructions. That means that you can create your own ASR system, which may be significantly different from the baseline system, e.g., it may use different features, different acoustic models, different training criteria, different decoding strategies, advanced multi-channel front-end processing, and so on.
Please refer to
the Instruction section for details about the challenge regulation.
Evaluation metric
As evaluation metric, the word error rate (WER) is used. We encourage you to provide the results for your system so that it is possible to assess the contributions of each system component to the overall recognition accuracy. Here are some examples:
- If you use front-end processing to enhance the speech signal or the speech features, we encourage you to test the enhanced speech signals/speech features with the provided baseline recognition system.
- If your system consists of several blocks, we encourage you to also provide the results for subsystems where some of the blocks are not active. Thus, it is possible to evaluate the contributions of the individual blocks to the overall recognition rate.
- If you use additional training data, we encourage you to also provide the results of the system that is based on the provided training data.
- If you achieve interesting results by using information that should not be exploited, we encourage you to also provide these results, clearly stating which additional information has been exploited. Note however that these results will not be considered official. For example, if you achieve very good results by adapting the acoustic models to the considered rooms using the speaker identities, you can provide these results. But you should clearly indicate that you used the speaker identities.
ASR evaluation tool (Baseline recognition system)
ASR evaluation tool and baseline recognition system are available at the Download section. To have a common basis for evaluating different approaches in terms of ASR recognition rates, we provide a baseline speech recognition system, which is based on the hidden Markov model tool kit (HTK). This system also allows participants who want to concentrate on front-end processing to evaluate their algorithms in terms of word error rates. The baseline system uses mel-frequency cepstral coefficients (MFCCs) including Delta and Delta-Delta coefficients as features. As acoustic models, it employs tied-state HMMs with 10 Gaussian components per state trained according to the maximum-likelihood criterion.
Please let the organizers know any mistakes and questions.