Context Effects on Verb Frame Bias

Frank Keller
Division of Informatics, University of Edinburgh, UK

Christoph Scheepers, Stephanie Becker and Christine Foeldesi
Computational Linguistics, Saarland University, Germany

Roland & Jurafsky (1998) compared verb frame frequencies generated by production experiments and corpus studies and found significant differences. Moreover, verb frame frequencies varied significantly across different types of corpora. They attributed these effects to (a) discourse type (isolated sentences, continuous text, dialogue), and (b) semantic influences of the discourse context in which a verb occurs.

Thus, discourse context is predicted to have an influence on frame frequencies, not only in corpus studies, but also in controlled production experiments (which so far have only been conducted using isolated sentences). We tested this predictions for German, which exhibits a verb frame ambiguity closely related to the widely studied NP/S ambiguity in English. Certain verbs can take either an accusative NP complement or an infinitival VP complement. An example is
untersagen `disallow' in (1):

(1) a. Peter untersagte das Vorhaben sofort. Peter disallowed the plan immediately

b. Peter untersagte das Vorhaben durchzufuehren. Peter disallowed the plan realize-INF

In a pretest, we established out of context frame biases for 98 verbs that exhibit the NP/VP ambiguity. Subjects had to generate a sentence for each verb in an questionnaire-based free production experiment. The responses were annotated as NP frame, VP frame, S frame, and other. A total of 24 verbs were attested in both the NP frame and the VP frame. Twelve of these were classified as NP-biased and another twelve as VP-biased.

The main experiment investigated the influence of discourse context on frame frequency. All instances of the 24 verbs were extracted from the Frankfurter Rundschau corpus (40 million words of newspaper text). For each verb, 100 instances were randomly sampled and manually annotated for verb frame. All NP biased verbs had to be discarded as they were rare or unattested in the other frame. Also three VP biased verbs were discarded for the same reason.

Experimental materials were constructed for the remaining nine verbs by randomly selecting four NP instances and four VP instances from the corpus sample. For each instance, the sentence the verb occurred in, plus one to three preceding sentences, were retained as contexts. All text following the verb was removed. The resulting set of 72 stimuli was presented to two groups of subjects (combined N = 24) in a completion experiment administered over the internet, generating the following completion frequencies:

NP completion VP completion S completion Other Total
NP context 237 (56%) 175 (41%) 15 (3%) 5 (1%) 432 (50%)
VP context 155 (36%) 237 (55%) 38 (9%) 2 (0%) 432 (50%)
Total 392 (45%) 412 (48%) 53 (6%) 7 (1%) 864 (100%)

Hierarchical log-linear models including Context (NP, VP), Completion (NP, VP, S), and either Subject (N = 24) or Item (N = 36) showed a significant interaction between Context and Completion (LRCS1 = 32.08, df = 2, p < .001; LRCS2 = 30.42, df = 2, p < .001). The frequency of NP completions in the NP context was significantly higher than in the VP context. Conversely, the frequency of VP completions in the NP context was significantly lower than in the VP context.

The results confirm that the discourse context of a verb has an influence on its frame frequency. This holds not only in corpus studies (as shown by Roland & Jurafsky 1998), but also in a controlled production experiment. Our results are novel also in that they do not use manually constructed examples, but perform true random sampling of items from a large corpus.

References

Roland, D., & Jurafsky, D. (1998). How verb subcategorization frequencies are affected by corpus choice. In Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, (pp. 1122-1128), Montreal.