0940

Intermittently Tagged Real-Time MRI Reveals Internal Tongue Motion during Speech Production

Weiyi Chen¹, Dani Byrd², Shrikanth Narayanan¹, and Krishna S Nayak¹

¹Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States, ²Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA, United States

Synopsis

The tongue is arguably the most important articulator enabling human speech production. Current tagged MRI methods for studying internal tongue motion evolved from CINE cardiac techniques that rely on multiple repetitions with perfect synchronization. However, speech production, unlike cardiac motion, possesses great token, type and individual variability due to its voluntary, highly context-sensitive and information-encoding nature. In this work, we demonstrate tagged RT-MRI of speech production, without requiring any repetitions or synchronization for data re-binning. We demonstrate capture of several important tongue deformation patterns and their relative timing.

Introduction

The tongue is a complex biomechanical system comprised of numerous intrinsic and extrinsic muscles [1], forming remarkably complex shapes during speech. Tagged CINE-MRI has been employed to analyze the motion of the internal tongue [2]–[4]. Current tagged MRI methods rely on repetition with perfect synchronization and have enabled nuanced analysis of cardiac motion [5], [6] during sinus rhythm (highly repeatable and synchronized to ECG). Speech production differs from cardiac motion in important ways; notably, there is substantial token and type variability due to its voluntary, highly context-sensitive, and information-encoding nature.

In this work, we demonstrate a tagging method during RT-MRI for speech production without needing synchronization or repetition. We show that the proposed method can capture several unique motion patterns and their relative timing through measuring internal tongue deformation.

Method

Sequence: We utilize 1-3-3-1 SPAtial Modulation of Magnetization (SPAMM) tagging and rapid spiral GRE acquisition [7]. Figure 1 illustrates the pulse sequence with precise timing, as implemented within a real-time imaging platform (HeartVista, Inc., Los Altos, CA, USA). Tagging is applied as a brief interruption to a continuous real-time spiral acquisition. Tagging can be initiated manually by the operator, cued to the speech stimulus, or be automatically applied with a fixed frequency. We used a standard 2D 1-3-3-1 SPAMM sequence with 1cm spacing in both in-plane directions within 5.66msec. The imaging parameters were: FOV 20cm, slice thickness 7mm, readout duration 2.49msec, TE/TR 0.71/5.58msec, 13-interleaves bit reversed view-ordering. Tag persistence in tongue muscle depends on longitudinal relaxation (T1) of the tongue muscle and imaging flip angles [8]. Tag persistence was simulated and experimentally measured.

Reconstruction: Gridding reconstruction with view-sharing was performed on-the-fly during data acquisition. Sliding window of 5TRs resulted in 36frames/sec temporal resolution. The approximate end-to-end reconstruction latency was about 30msec. This setup enables the operator to observe the tagging lines’ deformation in real-time to monitor the subject completion of the designed articulation task, and if the timing of triggering conformed to design.

Speech Experiments: We scanned 2 volunteers (27/M and 27/F), both native American English speakers, on a Signa Excite HD 1.5T scanner with a custom eight-channel upper-airway coil [9]. American English diphthongs /aɪ/, /ɔɪ/ and /aʊ/ were studied because they involve substantial movement of tongue when gliding from initial to final vowel positions, and the duration of these movements (~180ms to 300ms [10]) can be thoroughly covered in the current imaging window. Images were qualitatively evaluated by visual assessment.

Results

Parameter selection: Figure 2 shows a trade-off between CNR-based tag persistence and image SNR when choosing optimal excitation flip angle. Dashed lines in Figure 2(a) indicates CNR optimal flip angle that delivers the longest threshold time. The Ernst angle for imaging tongue is 6.2° as showed in Figure 2(b). Figure 3 contains in-vivo tag persistence measurements in human tongue. The measured signal conformed to the simulation for all imaging flip angles. The CNR by FA = 3° and 5° reached the threshold level for more than 650 ms, with the latter having 35% higher image SNR. Imaging using a very small flip angle was sensitive to B1 inhomogeneity, as the signal dropped dramatically when unintentionally decreasing flip angle. As an overall result of the above considerations, we used flip angle of 5° with an imaging window of 650-800ms and ending CNR of 5-6.

Visualization of tongue deformation: Figure 4 reveals internal tongue movement during three American English diphthong articulation examples. Deformation patterns were observed in the images. For example, we associate tongue tip curving/stretching by the bended grid lines (green). Shear is identified by square grids deforming into parallelograms (cyan). Compression is recognized through deformation into bi-concave rectangles (tongue body magenta, tongue root yellow). These deformations occurred on the course of the diphthong articulation. Figure 5 shows a representative animated GIF of the diphthong articulation.

Discussion

The proposed RT-MRI tagging method can substantially simplify the data acquisition and exclude errors from the re-binning process. This new tool can provide insight into investigating tongue function in vivo such as speech production in amyotrophic lateral sclerosis (ALS) patients [11] and in post-glossectomy patients compared to controls [12].

Conclusion

We demonstrate the feasibility of real-time tagged MRI of speech production to reveal the internal deformations of the tongue without requiring token repetitions. This approach is able to capture motion patterns, such as shear and compression, and their relative timing, as demonstrated using case examples of American English diphthong vowels.

Acknowledgements

This work was supported by NIH Grant R01DC007124 and NSF Grant 1514544. We thank Eric Peterson, William Overall and Juan Santos at HeartVista, Inc. for supporting on RTHawk Research system. We acknowledge the support and collaboration of the Speech Production and Articulation kNowledge (SPAN) group at the University of Southern California, Los Angeles, CA, USA.

References

W. M. Kier and K. K. Smith, “Tongues, tentacles and trunks: the biomechanics of movement in muscular‐hydrostats,” Zool. J. Linn. Soc., vol. 83, no. 4, pp. 307–324, Apr. 1985.
V. Parthasarathy, J. L. Prince, M. Stone, E. Z. Murano, and M. NessAiver, “Measuring tongue motion from tagged cine-MRI using harmonic phase (HARP) processing,” J. Acoust. Soc. Am., vol. 121, no. 1, pp. 491–504, Jan. 2007.
J. Woo, M. Stone, Y. Suo, E. Z. Murano, and J. L. Prince, “Tissue-point motion tracking in the tongue from cine MRI and tagged MRI.,” J. Speech. Lang. Hear. Res., vol. 57, no. 2, pp. S626-36, 2014.
M. Stone et al., “Modeling the motion of the internal tongue from tagged cine-MRI images,” J. Acoust. Soc. Am., vol. 109, no. 6, pp. 2974–2982, Jun. 2001.
M. L. Shehata, S. Cheng, N. F. Osman, D. a Bluemke, and J. a C. Lima, “Myocardial tissue tagging with cardiovascular magnetic resonance.,” J. Cardiovasc. Magn. Reson., vol. 11, no. 1, p. 55, 2009.
E.-S. H. Ibrahim, “Myocardial tagging by cardiovascular magnetic resonance: evolution of techniques--pulse sequences, analysis algorithms, and applications.,” J. Cardiovasc. Magn. Reson., vol. 13, no. 1, p. 36, Jul. 2011.
L. Axel and L. Dougherty, “Heart wall motion: improved method of spatial modulation of magnetization for MR imaging.,” Radiology, vol. 172, no. 2, pp. 349–350, Aug. 1989.
S. E. Fischer, G. C. McKinnon, S. E. Maier, and P. Boesiger, “Improved myocardial tagging contrast,” Magn. Reson. Med., vol. 30, no. 2, pp. 191–200, 1993.
S. G. Lingala, Y. Zhu, Y.-C. Kim, A. Toutios, S. Narayanan, and K. S. Nayak, “A fast and flexible MRI system for the study of dynamic vocal tract shaping.,” Magn. Reson. Med., vol. 77, no. 1, pp. 112–125, Jan. 2017.
S. Lee, A. Potamianos, and S. Narayanan, “Developmental acoustic study of American English diphthongs,” J. Acoust. Soc. Am., 2014.
E. Lee et al., “Magnetic resonance imaging based anatomical assessment of tongue impairment due to amyotrophic lateral sclerosis: A preliminary study,” J. Acoust. Soc. Am., vol. 143, no. 4, pp. EL248-EL254, Apr. 2018.
M. Stone, J. Woo, J. Zhuo, H. Chen, and J. L. Prince, “Patterns of variance in /s/ during normal and glossectomy speech,” Comput. Methods Biomech. Biomed. Eng. Imaging Vis., 2014.

Figures

Figure 1. Speech RT-MRI with Intermittent Tagging. (a) Overall acquisition timing. Continuous imaging is performed using interleaved spiral GRE imaging (c, blue block) with view-sharing reconstruction. 13-interleaves were utilized to fully sample k-space at each time frame using a bit-reversed interleaf order. Tag placement is performed using two 1-3-3-1 SPAMM pulses along x and y (b, yellow block). Note the second composite SPAMM pulse is shifted with a 90° relative phase and is with slightly larger crusher to avoid stimulated echo.

Figure 2. Simulation of tag persistence and steady-state signal as a function of imaging flip angle. Top: Threshold time is defined as the time span between the tag being placed and the tag CNR falling below the threshold value (shown for CNR cutoffs of 4, 5, 6, and 7). The dashed line marks the flip angles that will deliver the longest threshold time for each CNR threshold. Performance suffers quickly if the flip angle is too low, but less so if the flip angle is too high. Bottom: Steady state signal for the imaging TR=5.58ms and tongue T1=850ms at 1.5T. The Ernst angle is 6.2°.

Figure 3. Tag persistence in human tongue at 1.5T. Left: simulation (line) and measurement (symbol) of the tag line signal for the first 1.2s after the tag module was applied. Right: contrast decay after tag module being applied. Tongue T1=850ms was measured using an inversion recovery fast spin echo (IR-FSE) sequence with multiple inversion times. The signal and contrast were normalized by the standard deviation of noise, measured by a separate scan with RF excitation turned off.

Figure 4. For the first time, tagged RT-MRI reveals internal tongue motion during American English diphthong articulation. The (a) American English Vowel Charts illustrate tongue position observed in the corresponding (b) representative frames. Arrows with different colors indicate the start of various motion patterns: (left to right) tongue tip deform (green), shear (cyan), tongue body compress (magenta), and tongue root compress (yellow). Note that the relative timing among different motion patterns was also showed in (b): for example, deformation in tongue tip (green) occurred at the earliest, followed by shear (cyan); compression in tongue root (yellow) generally happened at the last.

Figure 5 (Animated GIF). Tagged RT-MRI during American English diphthong /aɪ/, /ɔɪ/ and /aʊ/. /aɪ/ and /aʊ/ start with similar low and retracted tongue postures (note the pharyngeal narrowing); /aɪ/ and /ɔɪ/ end with similar postures of the tongue bunched up high in the palatal vault; and the starting posture of /ɔɪ/ is similar to the ending posture of /aʊ/ with the tongue high and retracted toward the velum (soft palate). Deformations are showed in detail in Figure 4.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

0940