Disfluency in Speech and Gestures: Windows into Metacognitive Processes

Yilmaz, Beglim, Kaya, Emel Nur, Karakay, Sultan, Furman, Reyhan orcid iconORCID: 0000-0001-6034-3820, Göksun, Tilbe and Eskenazi, Terry (2024) Disfluency in Speech and Gestures: Windows into Metacognitive Processes. In: 10th International Symposium on Brain and Cognitive Science, 1 June 2024, Middle East Technical University (METU), Ankara, Türkiye.

Full text not available from this repository.

Official URL: https://isbcs2024-ii.metu.edu.tr/


Speech disfluency refers to errors, pauses, or repetitions in speech production (Maclay & Osgood, 1959). Earlier studies suggest that speech disfluencies signal one’s certainty such that those with lower confidence in their answers produce more disfluencies (Smith & Clark, 1993; Swerts & Krahmer, 2005). Language is multimodal, involving cues, including hand gestures (Bortfeld et al., 2001; Fröhlich et al., 2019). Co-speech gestures are shown to precede lexical affiliates (Ferré, 2010; Seyfeddinipur, 2006; TerBekke et al., 2024) and increase performance monitoring (Çapan et al., 2023), suggesting a metacognitive involvement. However, up to date no study investigated the relationship between speech disfluencies, hand gestures and metacognitive processes. Furthermore, speech disfluencies and co-speech gestures change as a function of listener visibility: People produce more gestures (Alibali et al., 2001; Cohen & Harrison, 1973; Kraus et al., 1995) and fewer disfluencies (Alibali et al., 2001; Kasl & Mahl, 1965; Oviatt, 1995; Rimé, 1982) when they can see the listener. Here, we ask whether (1) disfluencies and gestures act as metacognitive cues in speech and (2) they serve varying functions in different communicational settings. Fifty participants (32 female, Mage = 21.16, SD = 1.46) responded to trivia questions either with a visible or a nonvisible listener, and they audibly elaborated on their answers, during which we measured the frequency and the type of disfluencies and co-speech gestures. Then, participants rated their confidence in their answers (i.e., metacognitive judgment) on a 4-point Likert scale. We predict confidence ratings will change as a function of speech disfluency and co-speech gestures produced by the participants. We also expect the rate of disfluencies and gestures to change depending on the conversational setting. To test these hypotheses, we will analyze the data using linear mixed effect models to account for the variability resulting from different subjects answering questions with varying difficulties. Preliminary analyses showed that among 40 questions, participants answered 16.9 questions correctly on average (SD = 4.01, Range = 10-27), and the mean confidence ratings were 2.08 (SD = 0.39, Range = 1.25-2.85). Further coding for the rate of speech disfluencies and hand gestures is in progress. Our findings will contribute to understanding the multimodal nature of language and the role of metacognition in speech and gesture production.

Repository Staff Only: item control page