Language
Acquisition Based on Associative Mapping of Grammatical Structure to
Visual Scene Structure
Peter Ford Dominey
CNRS Institut des Sciences Cognitives
In the developmental trajectory of a human infant between 0-24 months
of age, language acquisition and perceptual scene analysis are addressed
in a quite robust and effective manner. At 6-9 months of age, infants
can analyse complex visual scenes to identify causal events and their
agents and goals (Leslie & Keeble 1987, Woodward 1998). Likewise,
by 14 months of age, these infants have begun to construct the language-to-scene
mapping capability that allows language and visual scene analysis to
intersect in a common internal "conceptual scene" representation
(Hirsh-Pasek & Golinkoff
1996). It appears evident that there exists a highly productive and
synergistic interaction between the processes of language acquisition,
and visual scene analysis acquisition, that allows the infant to develop
these two capabilities in a rapid and robust manner. Based on these
observations, we have developed a language learning system that performs
perceptual analysis of visual scenes, and constructs the mapping between
natural language narration of scenes, and the internal representation
of the analyzed scene. The system is based on the principles that (a)
Sentences are self-describing based on functional morphology and word
order, and (b) The grammatical structure of sentences corresponds to
the
thematic structure of events in the visual scene. To the extent that
these principles are true, it should be possible to construct the mapping
between grammatical structure in sentences, and thematic structure in
visual scenes (Dominey 2001).
In the current research, visual scenes consist of actions corresponding
to "touch", "push", "take" and "give",
effected with colored toy blocks. The objects are manipulated by the
experimenter who at the same time narrates the ongoing events. Video
image analysis yields a time-ordered list of the physical contacts between
objects, and associated parameters from which a higher level "event(agent,
object, recipient)" representation is constructed. Perceptual event
representations and the corresponding narratives are provided as input
to an associative structure mapping algorithm that learns the mapping
between words and their referents in the scene, and between grammatical
structures and their event-level interpretations in the scene. During
post-learning sentence interpretation, the appropriate mapping of grammatical
structure to scene structure is retrieved based on grammatical markers
inherent to the sentence. The system learns a rich subset of English
that includes complex hierarchical grammatical structure, and demonstrates
that perceptual event structure contributes a rich and important source
of structure involved in the acquisition of language.
References
Dominey PF (2001) Conceptual Grounding in Simulation Studies of Language
Acquisition, (In press) Evolution of Communication
Hirsh-Pasek, K;, Golinkoff R.M (1996) "The origins of Grammar",
MIT Press
Leslie AM, Keeble S (1987) Do six-month-olds percieve causality ? Cognition
25, 265-288.
Woodward AL (1998) Infants selectively encode the goal object of an
actor's reach. Cognition 69 1-34.