Analogy versus rules in Dutch past tenses Harald Baayen baayen@mpi.nl This paper addresses the role of analogy in the formation of past tense forms. Bybee and Slobin (1982) called attention to the existence of graded attractor sets among the irregular past-tense forms. Connectionist, subsymbolic work has since then been remarkably succesful in modeling the production of past-tense forms, both regular and irregular, without explicit rules. The explanatory potential of symbolic analogy-based models (Timbl, see, e.g., Daelemans et al., 1995; AML, Skousen, 1993), however, have not been applied in depth to this phenomenon. Eddington (2000) presents an analysis of English past tense formation in which AML is used to predict past-tense classes. A problem that arises here is that the analyst has to define, a priori, a rigid set of outcome classes. For the broader range of outcome classes considered by Allbright and Hayes (2002), Eddington's approach no longer works. To accomodate their experimental findings with English nonwords, Allbright and Hayes argue that in English some 160 probabilistic regular subrules are operative. Such a proliferation of probabilistic rules, unfortunately, is rather implausible from a processing point of view. For regular Dutch past tense formation, analogical modeling of the allomorphy of the regular past tense suffix (-te/-de) has been reasonably successful. The allomorph -te is selected for stems ending in an undelyingly voiceless obstruent, and -de elsewhere. Speakers of Dutch have clear intuitions about which allomorph is correct for nonwords (which have no undelying specification for voice for their final obstruent). Analogical models approximate the experimental data on the basis of phonological similarity gangs (Ernestus and Baayen, 2001; Baayen, 2002). Interestingly, for the prediction of the past tense form of new words, three predictions have to be made simultaneously, namely, whether the stem should undergo vowel alternation (without suffixation), whether its final obstruent (if present) should be voiced or voiceless, and whether the suffix should be -te or -de. The approach taken by Eddington would, in the case of Dutch, lead to an unattractive proliferation of outcome classes. To avoid having to a-priori determine the outcome classes, I have developed a modification of the Timbl and AML approaches, one in which all features of the output form are predicted in parallel. A simulation using leave-one-out cross-validation applied to 159 irregular monomorphemic verbs in a database of in all 915 monomorphemic verbs predicted (as maximum likelihood choice) the correct past-tense form for 47% of the 159 irregular verbs. It predicted the expected regular form for 34% of these verbs, and it predicted some other form in 20% of the cases. Among the latter are past tense forms for which the voicing of the final obstruent was guessed incorrectly ("laadde" instead of "laatte" as a regularization of "liet"), as well as incorrect but understandable irregularizations ("schacht" as the past tense of "schenken", compare "denken"/"dacht"). These results show that symbolic analogical parallel pattern completion can be succesful without a-priori classification of outcome sets. The present approach combines the flexibility typical of artificial neural networks with immediate insight (unavailable in neural net approaches) into the lexical similarity structure leading to the choices of the model, while obviating the need to postulate probabilistic subrules for regular forms as in Allbright and Hayes (2002). References A. Allbright and B. Hayes (2002), Rules and analogy in English past tenses: a computational/experimental study. Manuscript. R. H. Baayen, Probabilistic approaches to morphology. To appear in: Bod, R. and Hay, J. and Jannedy, S. (Eds.), Probability theory in linguistics, The MIT Press, Cambridge (Mass.), 2002. Bybee, J. L. and Slobin, D. I., (1982), Rules and schemas in the development and use of the English past tense, Language 58, 265--289. Daelemans, W., Berck, P. and Gillis, S. (1995), Linguistics as data mining: Dutch diminutives, CLIN V, Papers from the 5th CLIN meeting, 59-72. Eddington, D. (2000), Analogy and the dual-route model of morphology. Lingua 110, 281-298. Ernestus, M. and Baayen, R. H. (2001), Choosing between the Dutch past-tense suffixes -te and -de, In: Van der Wouden, T. and De Hoop, H. (Eds.), Linguistics in the Netherlands, Benjamins, Amsterdam, 81--93. Plunkett, K. and Juola, P. (1999), A connectionist model of {E}nglish past tense and plural morphology, Cognitive Science, 23, 463-490. Skousen, R. (1993), Analogy and structure, Kluwer, Dordrecht.