Language Acquisition Based on Associative Mapping of Grammatical Structure to Visual Scene Structure

Peter Ford Dominey
CNRS Institut des Sciences Cognitives

In the developmental trajectory of a human infant between 0-24 months of age, language acquisition and perceptual scene analysis are addressed in a quite robust and effective manner. At 6-9 months of age, infants can analyse complex visual scenes to identify causal events and their agents and goals (Leslie & Keeble 1987, Woodward 1998). Likewise, by 14 months of age, these infants have begun to construct the language-to-scene mapping capability that allows language and visual scene analysis to intersect in a common internal "conceptual scene" representation (Hirsh-Pasek & Golinkoff
1996). It appears evident that there exists a highly productive and synergistic interaction between the processes of language acquisition, and visual scene analysis acquisition, that allows the infant to develop these two capabilities in a rapid and robust manner. Based on these observations, we have developed a language learning system that performs
perceptual analysis of visual scenes, and constructs the mapping between natural language narration of scenes, and the internal representation of the analyzed scene. The system is based on the principles that (a) Sentences are self-describing based on functional morphology and word order, and (b) The grammatical structure of sentences corresponds to the
thematic structure of events in the visual scene. To the extent that these principles are true, it should be possible to construct the mapping between grammatical structure in sentences, and thematic structure in visual scenes (Dominey 2001).

In the current research, visual scenes consist of actions corresponding to "touch", "push", "take" and "give", effected with colored toy blocks. The objects are manipulated by the experimenter who at the same time narrates the ongoing events. Video image analysis yields a time-ordered list of the physical contacts between objects, and associated parameters from which a higher level "event(agent, object, recipient)" representation is constructed. Perceptual event representations and the corresponding narratives are provided as input to an associative structure mapping algorithm that learns the mapping between words and their referents in the scene, and between grammatical structures and their event-level interpretations in the scene. During post-learning sentence interpretation, the appropriate mapping of grammatical structure to scene structure is retrieved based on grammatical markers inherent to the sentence. The system learns a rich subset of English that includes complex hierarchical grammatical structure, and demonstrates that perceptual event structure contributes a rich and important source of structure involved in the acquisition of language.

References

Dominey PF (2001) Conceptual Grounding in Simulation Studies of Language Acquisition, (In press) Evolution of Communication

Hirsh-Pasek, K;, Golinkoff R.M (1996) "The origins of Grammar", MIT Press

Leslie AM, Keeble S (1987) Do six-month-olds percieve causality ? Cognition 25, 265-288.

Woodward AL (1998) Infants selectively encode the goal object of an actor's reach. Cognition 69 1-34.