Error-Analysis Syntax

Abstract:
As mentioned in the introduction error recognition in morpho-syntax has been the area with the most research. However even though large-scale grammars are available morpho-syntax is still a difficult field.
Two general strategies for morhosyntactic error handling can be distinguished. Firstly there are so called robust parsing methods, which try to continue parsing past a position, which cannot be handled by the grammar without considering the type and exact location of the error. The main purpose is to achieve a result for as much of the input as possible, which usually means to yield the largest possible chunks. A similar approach is used for analyzing spoken language, where additionally the so called recognizer may detect erroneous structures such as interruption or repetition of a phrase, corrections etc. Secondly sensitive strategies are being developed, which specifically try to locate and analyze errors in the input. With the help of some type of correction method the parsing process will continue across the error position and yield a complete description of the input usually including the position and the type of error. In a system which aims at determining the grammaticality according to a given grammar and at providing as much feedback about an error as possible only the second type of parsing method can be adopted.
Again, two strategies can be followed for identifying errors: Either the algorithm is changed to allow for the recognition even though the grammar does not cover the input, or the grammar is extended with so called mal-rules, which allow the generation of a description. The first concept can be refered to as "anticipation less" whereas the second one is called "anticipation-based". The following discusses a few aspects of these two approaches to error recognition. An example from Schwind, Camilla in: Appelo, L. ; de Jong, F. (Ed.), 1994 can demonstrate one major disadvantage of the second approach. The mal-rule in the phrase structure grammar is specifically designed to describe an error a French native speaker would make when learning German, namely to position the adjective after the noun in a noun phrase (le maillot jaune vs. das gelbe Trikot). However speakers with other mother tongues may produce other errors not covered by this approach. New rules would have to be added to the grammar for many other cases. A similar case is presented in Schneider, David ; McCoy, Kathleen F., 1998, where a grammar rule is designed to specifically recognize a number mismatch between the determiner and a noun in a noun phrase. Referring to agreement values in PS-rules increases the problems even more, because covering a substantial grammar fragment would cause the number of rules to explode. This in turn leads to enormous efficiency problems.
However two advantages have to be noted with regard to the anticipation-based approach. One is, that the most efficient parsing algorithms can be chosen, since usually the grammars are not changed in their form but only extended. A second advantage is the possibility to be able to distinguish between on the one hand ungrammatical input and on the other hand unparsable input, i.e. input, which is not covered by the grammar. In most cases this means that the feedback to the learner can be stated with more confidence about the location and the type of the error.
The anticipation-free approach transfers the load of recognizing and handling errors in the input into the parsing mechanism. An example of this is the approach presented in Menzel, Wolfgang, 1992, where a "model-based" error-diagnosis is characterized by its complexity. For example in an agreement situation every feature of every lexical item is checked by an individual function in order to allow for a precise localization of an error. As another example for the load moved into the parsing mechanism the approach taken by Mellish, Chris S., 1989 needs two parses in order to identify linearization errors with the help of a simple phrase structure grammar. One important advantage is the recognition of certain errors "anywhere" if they can be identified at a single position. As an example an agreement error between the subject and a verb should be recognized not only if the subject is in the standard position but also if for some reason the subject is displaced, e.g. by topicalization. A second advantage is the chance for independent development of a grammar and a lexicon. They can be engineered so as to generate descriptions only for correct sentences of a language. This also allows the integration of "foreign" data, e.g. a large lexical database, which may have been developed in other contexts.
In order to decrease the processing load, the search space for finding a solution may be minimized in anticipation-free concepts. The evaluation in Lee, Kong Joo ; Kweon, Cheol Jung et al., 1995 can be taken as an example for this problem (for a certain type of robust parsing): With a phrase structure grammar with only 192 rules, the parsing algorithm generates 12.000 items in the chart with heuristics turned on and even 25.000 items without. On the one hand almost any sentence will be analyzed but on the other hand the efficiency is very low. In order to counter this general problem Mellish introduces a number of heuristics, which refer to a variety of possible configurations of chart items. One of the main aims of Kato, Tsuneaki, 1994 with his modification of the algorithm is therefore to decrease the number of different heuristics and nevertheless improve the efficiency. A different approach is chosen in Schröder, Ingo ; Menzel, Wolfgang et al., 2000 and Fouvry, Frederik, 2003. In these two cases constraints are weighted, which on the one hand allows robust parsing and on the other hand nevertheless allows determining the solution with the "smallest" error measure. Additionally in Schröder, Ingo ; Menzel, Wolfgang et al., 2000 constraints can be marked with a weight 0, which effectively makes them so called hard constraints. Solutions with this kind of constraint clash will then not be considered for the further analysis. However there is no evaluation with respect to a possible feedback to the learner about the error in these two approaches.