One-Second 3-D-Imaging of the Vocal Tract to Measure Dynamic Articulator Modifications

Michael Burdumy^1,2, Matthias Echternach², Jan Gerrit Korvink³, Bernhard Richter², Jürgen Hennig¹, and Maxim Zaitsev¹

¹Medical Physics, University Medical Center Freiburg, Freiburg, Germany, ²Institute of Musicians' Medicine, University Medical Center Freiburg, Freiburg, Germany, ³Institute of Microstructure Technology, Karlsruhe Institute of Technology, Karlsruhe, Germany

Synopsis

To accelerate dynamic 3-D imaging of the vocal tract during articulation, a stack-of-stars sequence with golden angle rotation and iterative reconstruction was implemented. Phase correction, peripheral under-sampling, temporal and spatial regularization were applied to reach an acquisition time of 1.3 seconds. The vocal tract modifications of one subject could be successfully analyzed at discrete time steps during phonation of a long note.

Purpose

This work aims to improve the temporal resolution of three-dimensional vocal tract (VT) imaging with sufficient spatial resolution to analyze morphometric changes during speech or singing. The method enables dynamic speech or singing tasks in the MR system where the recorded 3-D images can be analyzed via distance measurements, volumetric acoustic simulations or rapid-prototyping print-outs of segmentations.

Methods

A radial stack-of-stars sequence with golden-angle rotation was modified such that dynamic changes in the VT could be depicted at high spatial and temporal resolutions. A previously reported¹ radio-frequency-spoiled radial GRE sequence was extended to 3-D imaging, by adding a phase gradient and phase spoilers in the sagittal direction. Sequence parameters: RF-spoiled Stack-of-Stars: TR=2.9ms, TE=1.4ms, FA=6°, FOV 200x200x62mm³, Pixel Resolution 1.56x1.56x1.3mm³, BW 1524Hz/pixel.

The inner loop of the sequence cycles through all phase encodes at different projection angles. Hence, a complete coverage of all partitions is reached in short time. In the outer loop, each projection is rotated with respect to the previous one of the same partition. Furthermore, the peripheral parts of k-space in phase-direction are sampled less frequently, with a linear decrease of the number of projections starting from the k-space center towards both peripheries (Figure 1). The measurement was split into temporally separated parts, such that 21 projections were measured in the center partition and 5 projections in the outer-most partition, leading to a total of 456 projections and a temporal resolution of 1.322s. Coil sensitivities were calculated from the data itself for each time frame, as described previously¹.

With the measured signal $$$S$$$, the original image $$$x$$$ can be calculated by minimizing a functional of the form $$f(x)=||Ax-S||_2+\lambda_1 TV_{temporal}(x)+\lambda_2 TV_{spatial}(x),$$ where $$$A$$$ is the forward encoding operator that includes coil sensitivities, Fourier encoding and projection onto a grid. The pixel-wise total variation operator $$$TV_{temporal}$$$ enforces sparsity in time², while $$$TV_{spatial}$$$ enforces sparsity in the spatial domain. The minimization problem was solved with the method of non-linear conjugated gradients in MATLAB (R2014b). $$$A$$$ was implemented using the gpuNUFFT³ that is based on the Image Reconstruction Toolbox⁴.

As an example of practical application, results of a 28 year-old female are shown. Data were acquired in the supine position in a 3T Prisma (Siemens, Erlangen, Germany) with the manufacturer’s 64-channel head/neck coil. The untrained singer was required to hold the note C5 for as long as possible and explicitly instructed to sing past her comfortable resting expiratory level. In the reconstructed images, the larynx height was measured from a mid-sagittal slice, as described in Figure 2 and¹. The air-filled VT was manually segmented for all time points using ITK SNAP⁵ and the binary volume segmentations were added to each other, so as to find regions of morphometric changes.

Results

The subject was able to hold the note for 20s, while the sequence ran for 25s. The correct pitch could be confirmed by a trained musician with the help of the scanner’s microphone system. Image reconstruction of this data-set was performed in about four hours on a single CPU and GPU. The contrast and artifact levels were sufficient to identify the landmarks for the larynx height. This parameter was approximately constant for about 12s, but then showed a decrease towards the end of phonation (Figure 3). A closing of the mouth, raising of the tongue and opening of the uvula could be seen at the end of the recording, when the subject started normal breathing. Regarding the segmentations (Figure 4), the addition of the binary models of all time steps showed changes in the region of the lips, frontal part of the tongue and in all three dimensions of the larynx region (Figure 5).

Discussion

It has been suspected that untrained singers modify the configuration of the VT when air runs out at the end of a long note, because of a lack of sub-glottic pressure. This study could confirm such modifications in a singing subject in all three dimensions, especially of the tongue and the larynx. Both the presented images and previous studies confirm that spatial resolutions higher than 2mm are required to identify the small structures of the VT⁶. However, previous studies were limited to static or repetitive tasks, due to longer acquisition times.

Although the presented method enables high under-sampling factors, the singing or speech tasks and the regularization parameters must be chosen carefully, else fast modifications are suppressed.

Conclusion

With an under-sampling factor of 13 compared to full Cartesian sampling, the acquisition of one volume per second opens up new possibilities to research vocal tract acoustics and articulator modfications during dynamic tasks.

Acknowledgements

This work was supported by DFG grants ZA422/3-3 and RI1050/4-3.

References

¹Burdumy, M; Traser, L; Richter, B; Echternach, M; Korvink, JG; Hennig, J; Zaitsev, M. “Acceleration of MRI of the Vocal Tract Provides Additional Insight into Articulator Modifications.” Journal of Magnetic Resonance Imaging, 2015, doi:10.1002/jmri.24857.

²Feng, L; Grimm, R; Block, KT; Chandarana, H; Kim, S; Xu, J; Axel, L; Sodickson, DK; Otazo, R. “Golden-Angle Radial Sparse Parallel MRI: Combination of Compressed Sensing, Parallel Imaging, and Golden-Angle Radial Sampling for Fast and Flexible Dynamic Volumetric MRI: iGRASP: Iterative Golden-Angle RAdial Sparse Parallel MRI.” Magnetic Resonance in Medicine, 2013, doi:10.1002/mrm.24980.

³Knoll, F.; Schwarzl, A,; Diwoky, C.; Sodickson DK. “gpuNUFFT - An Open-Source GPU Library for 3D Gridding with Direct Matlab Interface. Proc ISMRM p4297, 2014.

⁴Fessler, J.A.; Sutton,B.P. “Nonuniform Fast Fourier Transforms Using Min-Max Interpolation.” IEEE Transactions on Signal Processing 51, no. 2, 2003: 560–74, doi:10.1109/TSP.2002.807005.

⁵ Yushkevich, PA; Piven, J; Hazlett, HC; Smith, RG; Ho, S; Gee, JC; Gerig,G. ”User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability“ Neuroimage, 2006, 31(3):1116-28.

⁶ Scott, AD; Wylezinska, M; Birch, MJ; Miquel, ME. “Speech MRI: Morphology and Function.” Physica Medica 30, no. 6: 604–18, 2014, doi:10.1016/j.ejmp.2014.05.001.

Figures

Figure 1: K-space coordinates in sagittal direction for one acquisition window and 456 projections. Sampling is most dense at the mid-time-point and under-sampled closer to the beginning and end.

Figure 2: Mid-Sagittal slice of the vocal tract. The crosses depict the landmarks that are needed to calculate parameter larynx height (from left to right): Anterior commissure of the larynx, anterior tubercle of the atlas, highest frontal point of the sixth cervical vertebra.

Figure 3: Display of calculated parameter larynx height over time. Note that y-axis is reversed, as a rising of the larynx effects a decrease of the larynx height parameter. The red box marks the rising of the larynx towards the end of phonation.

Figure 4: Surface model of a manually segmented vocal tract cavity (green). The corresponding (interpolated) anatomical data is displayed by one exemplary sagittal slice (gray).

Figure 5: The gray anatomical images depict three orthogonal slices of the vocal tract at the beginning of phonation. Locations of the image slices are marked by the dotted white lines. Colored regions show location of segmented vocal tract of all time steps. Low numbers mean that this region was rarely occupied, large numbers mark more frequent occupations. Constant occupation is not displayed.

Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)

1327