Haidee Joy Paterson1, Ben Lang2,3, Zainab Hermes4, Samantha Wray3,5, Osama Abdullah1, Alec Marantz3, and Hadi Zaatiti6
1Core Technology Platform, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates, 2University of California San Diego, San Diego, CA, United States, 3New York University, New York, NY, United States, 4The University of Chicago, Chicago, IL, United States, 5Dartmouth College, Hanover, NH, United States, 6New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
Synopsis
Motivation: This study seeks to overcome the challenges associated with characterizing vocal tract articulation during speech.
Goal(s): The primary objective of this study is to outline the technical configuration required to optimize real-time MRI with high temporal resolution while synchronizing it with audio recordings.
Approach: We employed a single slice FLASH sequence, an MRI-compatible optical microphone, and a signal generator. This setup enables precise synchronization of MRI image acquisition with audio recordings during speech production.
Results: Our findings showcase the practicality of our setup in studying Arabic speech articulation, both in letter pronunciation and the articulation of whole words, encompassing various dialects.
Impact: This research highlights the technical intricacies involved in integrating real-time MRI of the vocal tract with synchronised speech production, introducing an innovative application previously unexplored in linguistic research to address challenging linguistic problems.
Background
The study of the vocal tract during speech presents unique challenges to fully characterize vocal tract articulation and the production of phonetic elements, such as pharyngeal sounds [1], [2]. One primary challenge is achieving a high temporal frame rate to capture the rapid and intricate movements within the vocal tract during speech. Real-time MRI holds tremendous promise in this regard, allowing us to visualize the dynamic processes within the vocal tract. However, the dynamic nature of speech production and the need to precisely synchronize MRI scanner acquisition with voice recordings introduce technical intricacies. Accurate synchronization ensures that dynamic MRI images align precisely with corresponding voice recordings, facilitating the correlation of speech sounds with vocal tract movements. This work aims to describe the technical configuration to optimize high framerate real-time MRI using commercially available sequences with audio synchronization, utilizing available equipment in a physics lab, such as a signal generator.Materials and Methods
All studies were conducted using a 3T Siemens Prisma scanner running X-numaris X31 software, equipped with a 64-channel head coil. Optimization of real-time MRI acquisition involved comparing the true-FISP sequence with traditional FLASH sequences in terms of image quality, artifacts, and temporal speed. Our results (Figure 1) supported the use of the FLASH sequence in a single midline sagittal image with 10 frames per second (fps) for 10 seconds with the following parameters: a repetition time (TR) of 104.6ms, echo time (TE) of 1.33ms, flip angle 10 degrees, slice thickness of 10mm, spatial resolution of 0.9 x 0.9 x 10mm, FOV of 230mm, an acceleration factor (GRAPPA) of 3, smoothing filter turned on, and with interpolation. Audio recordings within the MRI scanner were achieved using an Optoacoustic’s optical microphone, as described in Figure 2. To synchronize MRI image acquisition with speech production, we used an Agilent’s wave generator to trigger both the gradient echo sequence (with the external trigger option selected on the MRI console) and the optical microphone, positioned approximately 1-2 cm away from the subject's mouth. A custom Python code was used to temporary align the onset of the first MRI image with the onset of speech production, saving a video file of the MRI in sync with the spoken audio. A MATLAB toolkit [3] was then used to automatically detect the contours of the vocal tract for each MRI frame.Results and Conclusion
In Figure 3, we present an example of synchronized audio and MRI acquisitions, showing a subject uttering three different Arabic syllables. The first row displays the audio recording, the second row shows the time-frequency analysis (i.e., spectrogram), and examples of two MRI timepoints during syllable pronunciation. This capability enables linguists to identify key tongue positions in various tasks. Figure 4 depicts two pairs of similar Arabic letters (ta and tta, and ka and kaf). Note the differences in tongue position and shape, made discernible by the synchronization of real-time MRI and audio recording. Figure 5 shows that speakers of different Arabic dialects pronouncing identical words (/ʕiʒʒa/ and /ħiʒʒa/ ) implement similar overall constriction in the pharynx and larynx for /ħ/ and /ʕ/ regardless of origin dialect. Importantly, constriction for the pharyngeal consonant does not appear to be isolated to tongue root. Retraction–constriction is also present in the larynx utterance of whole words in various Arabic dialect. In summary, this study outlines the technical intricacies involved in integrating real-time MRI of the vocal tract with synchronized speech production, introducing an innovative application previously unexplored in linguistic research. We showcase the practicality of our synchronized realtime MRI and audio recording, addressing otherwise challenging linguistic problems.Acknowledgements
The experiments described herein were conducted using the facilities of the NYUAD Brain Imaging Core Technology Platform.References
[1] A. TOUTIOS and S. NARAYANAN, “Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research,” Physiol. Behav., vol. 176, no. 3, pp. 139–148, 2017.
[2] A. Niebergall et al., “Real-time MRI of speaking at a resolution of 33 ms: Undersampled radial FLASH with nonlinear inverse reconstruction,” Magn. Reson. Med., vol. 69, no. 2, pp. 477–485, 2013.
[3] M. Belyk, C. Carignan, and C. McGettigan, “An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images,” Behav. Res. Methods, no. 0123456789, 2023.