Example System I: ALLP

Abstract:
Athena Language Learning Project
In this large-scale project started in 1983 an intelligent language learning program was developed. Three main new technologies were introduced via this program into the classroom. 1. Natural Language Processing, 2. Speech Processing and 3. Interactive Video. The information in this section is mainly taken from Murray, Janet H. in: Holland, V.M. ; Kaplan, J.D. et al. (Ed.), 1995 and Felshin, Sue in: Holland, V.M. ; Kaplan, J.D. et al. (Ed.), 1995.
Three major exercise types were implemented in the ALLP-framework. The first one was a conversation simulation with a poltergeist, which could be directed to clean up or mess up a room according to the learner's input. This was called LINGO. In this microworld relatively complex dialogs were made possible through the use of a complete dialog-simulator, described below. The poltergeist was not only able to make suggestions of what to do next, but also had a concept of things being in focus. Therefore e.g. the last mentioned item could be referred to with a pronoun. Also the character of the poltergeist could be changed with the use of language either being of help to the student or hindering the actions of the learner.
The second exercise was called No recuerdo. This consisted of an interactive video story about the adventures of an amnesiac Columbian scientist. Here the learner played the role of a journalist who had the task to "uncover the truth": the learner was asked to interview various actors in the plot. The responses from the characters were either recorded and played from the video disk or they were created by some NLP-module. For both scenarios - the poltergeist and the scientist story - the system has not been tested with language learners because of hardware limitations, even though demo-versions were produced.
The last main type was an Intelligent Workbook, which offered a series of interactive grammar exercises. A situation was presented and some English paraphrases were given for the learner to produce a correct Spanish utterance, which effectively meant a translation task. Here also some hardware limitations did not allow the system to be implemented fully. But according to the publications "real" learners were able to use this part of the program.

The NLP-System

Figure 1: Design of the NLP-System in the ALLP
In the above figure taken from Murray, Janet H. in: Holland, V.M. ; Kaplan, J.D. et al. (Ed.), 1995 one can clearly identify all the major parts necessary for a complete dialog system. Square boxes mark code whereas rounded boxes signify data. The shadowed boxes show the language dependence of some modules.
The system was developed right from the beginnig as a language independent system with clearly defined modules for different languages. Tests were done with Spanish, French, German, English, and Russian. Therefore the system relied heavily on a type of interlingua, which was used in the discourse module. Errors in learner's input were detected in two ways. Agreement errors were identified by feature relaxation whereas word ordering errors were recognized via additional rules in the grammar. The system was also able to identify very uncommon morphological or syntactic forms and to apply penalties to these even though they were actually correct. More common interpretations of the input were then chosen even though this might mean to hypothesize an error.
The system contained a complete morphology generation subsystem, which was able to generate most surface word-forms from underlying stems and affixes. The syntax was modelled after the Government-and-Binding theory (Chomsky, Noam, 1981) using concepts such as S-, D-structure, and CF-structures. However the parser could only generate one analysis-tree at a time and was not being able to perform movements. In order to retrieve semantic interpretations of input the lexicon and the grammar made heavy use of thematic roles and the so called case frames. The following example shows the frame for the English verb "tell".
Example 1: allplexikon
(:voice active
(:thematic-role agent
:syntax((:case subject :type dp :required-p t)))
(:thematic-role theme
:syntax((:case empty :type vmax :spec (or indicative infinitive))):required-p t)))
(:thematic-role destination
:synatx((:case indirect-object :type dp :required-p t))))
In this lexical entry one can identify some more information about the structure of the complements of tell (type vmax), which must be either indicative or infinitive.
The parser used in the system was a LALR-parser, which essentially followed the Tomita-algorithm (Tomita, M., 1986). The stack could be accessed and through this flexibility was achieved. Some semantic processing was done during "syntatic" parsing in order to eliminate unlikely paths as early as possible. The parser was written in Common Lisp and C.
The system included different grammars for recognition and generation. Since the recognition grammars were also partly responsible for error recogniton, the size of that grammar was about 10 time bigger than the generation grammar, according to Felshin.
The only references to the ALLP-project on the WWW I could find, are LINGO and No Recuerdo (last update 1997!).