Authors: George Saon,Hong-Kwang J. Kuo,Steven Rennie,Michael Picheny
ArXiv: 1505.05899
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1505.05899v1
We describe the latest improvements to the IBM English conversational
telephone speech recognition system. Some of the techniques that were found
beneficial are: maxout networks with annealed dropout rates; networks with a
very large number of outputs trained on 2000 hours of data; joint modeling of
partially unfolded recurrent neural networks and convolutional nets by
combining the bottleneck and output layers and retraining the resulting model;
and lastly, sophisticated language model rescoring with exponential and neural
network LMs. These techniques result in an 8.0% word error rate on the
Switchboard part of the Hub5-2000 evaluation test set which is 23% relative
better than our previous best published result.