5146

Real-time MRI of the larynx: detecting phonation contrasts
Sarah E Johnson1, Marissa Barlaz1, Shuju Shi1, Ryan K Shosted1, and Brad P Sutton2

1Linguistics, University of Illinois at Urbana-Champaign, Urbana, IL, United States, 2Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States

Synopsis

The present study assesses the ability of rt-MRI to detect subtle laryngeal configuration changes during varying phonation contrasts. One subject lay supine within a 3T Siemens Trio scanner while producing a variety of phonation types including breathy, modal, and creaky voice. An analysis of axial and coronal slices of the larynx detected predictable changes at the ventricular folds, vocal folds and arytenoid cartilages. We conclude that rt-MRI of the larynx may have further application in the study of phonation in both research and clinical settings as a non-invasive measure of laryngeal function.

Introduction

Advances in rt-MRI allow researchers to image the posterior vocal tract, including the structures of the pharynx, epilarynx, and larynx. This method has advantages for speech and clinical research, including its non-invasive nature and the ability to image multiple regions of the vocal tract simultaneously. In the past, radial fast low-angle shot (FLASH) MRI has been used to obtain time-varying coronal images of the larynx. Using this method, researchers have reliably detected glottal adduction during swallowing1 and voiced consonants.2 However, it is not clear whether more subtle laryngeal configurations, such as phonation contrasts, can also be detected using this method. Accordingly, the present study will explore whether rt-MR images can be used to differentiate glottal closure, breathy, creaky, and modal phonation.

Methods

The participant (a trained phonetician) lay supine within a 3T Siemens Trio scanner while producing a variety of test utterances containing phonation type contrasts, as well as complete glottal closure (glottal stop). rt-MR images were obtained using the partial separability model,3,4 yielding approximately 25 frames per second when acquiring 4 slices composed of 128 x 128 vowels at 2.2 mm x 2.2 mm x 8.0 mm (through-plane depth). Two 5-minute scans were collected. In the first scan, 4 axial slices transected the laryngeal region, including both the arytenoid cartilages and the vocal folds. In the second scan, 4 coronal slices transected the neck from anterior at the thyroid notch to posterior behind the cricoid cartilage. We analyzed three slices: an axial slice at the arytenoid cartilages, an axial slice at the glottis, and a coronal slice at the center of the anterior-posterior axis through the vocal folds (Figure 1).

Results

Visual inspection of the images revealed greater constriction at the vocal and ventricular folds and lower larynx during creaky phonation and glottal stop (Figure 2). Principal component analyses of pixel intensity2 corroborate these qualitative observations in both the coronal and axial orientations. A pixel intensity analysis of the entire laryngeal vestibule across phonation conditions revealed differences in the adduction of tissues over time (Figure 3). For this measure, brighter pixel intensity is a result of more matter (soft tissue) within a given region of interest. Results indicate the presence of more soft tissue within the glottal and ventricular regions, most likely due to medial adduction of the soft tissue surrounding the laryngeal lumen. When quantified, the greatest degree of adduction was observed in glottal stop, followed by creaky, modal, and breathy phonation in that order. A control set of images of quiet breathing showed the lowest pixel intensity, indicative of an wider laryngeal lumen.

Discussion

We interpret these results as increased laryngeal and ventricular approximation and lowered larynx during the production of glottal stop and creaky voice. These articulations are traditionally associated with a raised larynx.6 One explanation may be that the participant is a native Mandarin speaker. The low Mandarin tone is often produced with creaky phonation and sometimes a lowered larynx.7 The speaker may have extended the creaky phonation task to coincide with a low tone and hence a lowered larynx. The relatively nuanced articulatory contrast between breathy and modal phonation, which is traditionally characterized in part by relative abduction of the glottis, was automatically captured in our analysis using PCA of pixel intensity. These results demonstrate that current rt-MRI methods and technology can be used to extract features of fine articulatory distinctions between phonation type and degree of vocal and ventricular fold approximation.

Conclusion

We find that rt-MRI can be used to accurately distinguish laryngeal configurations associated with differing phonation types and degrees of glottal closure. We also find that rt-MRI can be used to reliably detect differences in laryngeal height associated with phonation contrasts. We conclude that rt-MRI of the larynx can have further application in the study of phonation in both research and clinical settings. Our methods of data collection, reconstruction, and automatic machine learning of rt-MR image features may be extended to provide a non-invasive measure of laryngeal function in clinical settings for patients with dysphonia and other voice or resonance disorders.

Acknowledgements

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE - 1144245.

References

1. Zhang S, Olthoff A, Frahm J. Real-time magnetic resonance imaging of normal swallowing. J Magn Reson Imaging. 2012 Jun;35(6):1372-9.

2. Niebergall A, Zhang S, Kunay E, Keydana G, Job M, Uecker M, Frahm, J. (2013). Real-time mri of speaking at a resolution of 33 ms: Undersampled radial flash with nonlinear inverse reconstruction. Magn Reson Med. 2013 Feb;69(2):477-85.

3. Fu M, Zhao B, Carignan C, Shosted RK, Perry JL, Kuehn DP, Liang ZP, Sutton BP. High-resolution dynamic speech imaging with joint low-rank and sparsity constraints. Magn Reson Med. 2015 May;73(5):1820-32.

4. Liang Z-P. (2007). Spatiotemporal imaging with partially separable functions. 4th IEEE International Symposium on Biomedical Imaging: from nano to macro; 2007; Arlington, Virginia. New York: Curran Associates, Inc; 988–91 p.

5. Carignan C, Shosted RK, Fu M, Liang Z-P, Sutton, BP. A real-time MRI investigation of the role of lingual and pharyngeal articulation in the production of the nasal vowel system of French. J Phonetics. 2015;50:34-51.

6. Hardcastle WJ, Beck JM. A figure of speech : a Festschrift for John Laver. Mahwah, New Jersey: Erlbaum; 2005. Esling JH, Harris JG. States of the glottis: an articulatory phonetic model based on laryngoscopic observations; 347-83.

7. Moisik S, Lin H, Esling JH. A study of laryngeal gestures in Mandarin citation tones using simultaneous laryngoscopy and laryngeal ultrasound (SLLUS). J International Phonetic Association. 2014;44:21–58.

8. Otsu N. A Threshold Selection Method from Gray-Level Histograms. (1979). IEEE Transactions on Systems, Man, and Cybernetics. 1979;9(1):62-6.

Figures

Scan regions (from left), axial arytenoids, axial vocal folds, coronal medial vocal folds.

Pixel intensity during glottal stop (left) and breathy phonation (right). The images were post-processed with Matlab 2015b (im2bw), with thresholding determined using the Otsu8 method to minimize interclass variance among white and black pixels.

SSANOVA of time-varying pixel intensity of glottal stop, breathy, creaky, modal phonations, and silent breathing. A higher pixel intensity value indicates more matter (soft tissue) within a frame. Normalized time is measured from the beginning to the end of each repetition at 25 fps.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)
5146