This lecture is a review of what is known about modeling human speech recognition (HSR). A model is proposed, and data are tested against the model.
There seem to be a large number of theories, or points of view, on how human speech recognition functions, yet few of these theories are comprehensive. What is needed is a set of models that are supported by experimental observation, that characterize how human speech recognition really works. Finally there is the practical problem of building a machine recognizer. One way to do this is to build a machine recognizer based on the reversed engineering of human recognition. This has not been the traditional approach to automatic speech recognition (ASR).
What is needed is some insight into why this large difference between human performance and present day machine performance exists. Author Jont Allen addresses this and other questions.