Telling the future: How speakers time word preparation and articulation

Zenzi M. Griffin
Georgia Institute of Technology

Speakers must coordinate the processing of ideas, words, and movements over time, but they have a great deal of flexibility in how. They can start utterances with minimal preparation of their words or after preparing and buffering all of their words in phonological or motor codes [1]. What information can speakers use to control the timing of speech and
preparation? Two experiments demonstrate that speakers can use a correlate of word length to estimate the amount of time they have available for word preparation during speech and minimize the amount of preparation that precedes speech. They thereby minimize word buffering while maintaining fluency.

In Experiment 1, 20 speakers were asked to name 32 object pairs without pausing between names (e.g., "scarf pipe"). Articulating "scarf" takes less than 600 ms, but preparing "pipe" can take ~900 ms [2]. Saying "scarf" as soon as it was ready would leave a speaker with ~300 ms of silence before "pipe" was ready to follow. "Scarf" must be buffered. When
monosyllabic and multisyllabic object names like "scarf" and "skeleton" are matched on other dimensions, they take the same amount of time to prepare in mixed length lists [3,4]. "Scarf pipe" and "skeleton pipe" should take the same amount of time to prepare. Speech onset latencies should only differ if speakers consider the length of the first name in timing speech. They did. Speakers began saying "skeleton" earlier than "scarf." Speakers gazed at the long- and short-named objects equally before speaking, but gazed at second objects more before saying short names. In contrast, with long first words, they gazed at second objects more during speech.

Adding words that require little preparation should provide more time to prepare second names while speaking. In Experiment 2, when speakers said "next to" between names, latencies for long and short names were equal. Speech began significantly earlier than when no words intervened. When nothing intervened, these speakers replicated Experiment 1's results. Similar timing occurs in speakers' gazes while describing scenes [5] and in sequences of arm movements [6]. These results suggest that people are sensitive to the amount of time it takes to prepare and perform an action. When speakers choose to, they can use this information to minimize advance preparation and buffering of words while speaking fluently. This has many implications for language production theories.

References

[1] Wheeldon, L., & Lahiri, A. (1997). Prosodic units in speech production. Journal of Memory and Language, 37, 356-381.

[2] Snodgrass, J. G., & Yuditsky, T. (1996). Naming times for the Snodgrass and Vanderwart pictures. Behavior Research Methods, Instruments, & Computers, 28, 516-536.

[3] Bachoud-Levi, A.-C., Dupoux, E., Cohen, L., & Mehler, J. (1998). Where is the length effect? A cross-linguistic study of speech production. Journal of Memory and Language, 39, 331-346.

[4] Meyer, A. S., Roelofs, A., & Levelt, W. J. M. (in press). Word length effects in object naming: The role of a response criterion. Journal of Memory and Language.

[5] Griffin, Z. M., & Bock, K. (2000). What the eyes say about speaking. Psychological Science, 11, 274-279.

[6] Ketelaars, M. A. C., Garry, M. I., & Frank, I. M. (1997). On-line programming of simple movement sequences. Human Movement Science, 16, 461-483.