Error-Analysis Phonology
Abstract:
Error-recognition in phonology has not been done to a large extend possibly due to the general difficulties recognizing speech. There has been only one large-scale project on this topic.
Only very few advanced systems have been developed to recognize pronunciation errors with methods of NLP.
In the
ISLE-project a system was developed, which is able to recognize mispronounced words and provide precise feedback about the error (some publications such as project-reports on the
homepage).
Modern commercial systems nowadays also include speech recognition modules.
However these systems do not analyse the input as such and can therefore only report about a percentage of correctness.
Usually no hints about the type of error can be given, since the method solely relys on some form of pattern matching (comparing the frequency spectrum or the like).
As a trivial example the program TriplePlay Plus which included a speech recognition module, presented a "ping" and "boing" as feedback when the pronunciation was close or could not be recognized.
Other simple program show some sort of graph or frequency curve depicting the differences between the stored data and the learner recording. Note that the interpretation of the display is left to the learner.
The recognizer in the ISLE-prototype is a state-of-the-art HMM-based speech recognizer which is tuned towards the recognition of only a restricted set of words, but these are recognized with a high accuracy even if language learners pronounce the words.
In a first task the system tries to localize the error in the input.
This is done in order to improve the error recognition task by sorting the input into correct and incorrect regions.
The scores calculated depend on three type of measurements: 1. the acoustic likelihood of the path, 2. the output probility of the most likely state in the model set and 3. the acoustic likelihood of a background model.
The actual diagnosis follows in a second step.
Some rules (letter-phone and phone-phone) are applied to the orthographic and the phonemic form of possible correct forms in order to determine the recognized form.
A general description is as follows:
-
articulatory difficulties producing particular sound of the target language (/th/ in English)
-
receptive difficulties, because of which learners are unable to perceive and therefore to reliably produce the distinction between two sounds.
-
orthographic carry-over from the mother tongue.
-
orthographic difficulties of English.
Finally the system also includes a word stress detection mechanism.
This allows the system to present feedback also in case the "simple" case of wrongly positioned stress was detected.
This description shows that on the one hand a very good speech recognition tool is necessary and on the other hand a rule-based mechanism is used in order to match the recognized string with the possible input.