1083

Spatiotemporal atlas driven reconstruction of dynamic speech imaging

Riwei Jin¹, Fangxu Xing², Imani Gilbert³, Jamie Perry³, Jonghye Woo², Ryan Shosted¹, Zhi-Pei Liang¹, and Brad Sutton¹
¹University of Illinois Urbana-Champaign, Champaign, IL, United States, ²Massachusetts General Hospital/Harvard Medical School, Boston, MA, United States, ³East Carolina University, Greenville, NC, United States

Synopsis

Keywords: Image Reconstruction, Image Reconstruction, Atlas, speech imaging

Motivation: Individuals across a population typically exhibit similar articulatory movements when performing speech tasks with specific speech samples. From an imaging experiment, we are interested in representing how an individual’s speech behavior is different from the ‘standard’ motion, which assists the preoperative planning of velopharyngeal surgery.

Goal(s): We expected to visualize velopharyngeal variations between individual subjects and the average population.

Approach: We have integrated an atlas into a low-rank residual reconstruction framework to capture the distinctive motion variations unique to each subject.

Results: We demonstrated the ability of the method to visualize velopharyngeal variations as well as enhancing the quality of the reconstruction process.

Impact: By applying a spatio-temporal atlas-driven reconstruction method, we were able to visualize and analysis velopharyngeal variations between individuals and the average population which will specifically benefit the surgical planning of individual cleft palate patients.

Introduction

Dynamic MRI is developing to be an essential tool for speech imaging due to its high spatiotemporal resolution and soft-tissue contrast. It has been demonstrated that fast dynamic speech MR imaging techniques could deliver linguistic and clinical insights into disordered speech in cleft palate patients and others ^1,2. However, the development of imaging methods that offers both high-quality spatiotemporal reconstruction and accurate analysis of articulatory dynamics has progressed at a slow pace. To address this, we need an approach that has the potential to simultaneously deliver high-quality reconstructions and precise analysis of articulatory dynamics while focusing linguistics and speech-pathologists on the components of speech that differ in the current patient from the ‘average’. We proposed a novel low-rank residual reconstruction method, based on: recent advances with the Partial-Separability model-based reconstruction method ^3-5, construction of a high-quality spatiotemporal atlas from a group of subjects ^6,7, and methods for accurate temporal alignment between subjects and atlas through recorded audio waveforms ^8,9. This enables us to visualize velopharyngeal variations between individual subjects and the average population, which will specifically benefit the surgical planning of individual cleft palate patients.

Methods

The spatiotemporal atlas was created using an established method ⁷ from the initial reconstructions conducted through a PS low-rank model-based method ^3-5. To initialize the residual reconstruction, a spatiotemporal alignment process needs to be accomplished to acquire the atlas average information in the subject space. We used the recorded audio waveforms to perform a 1D diffeomorphic demons registration to the reference waveform for temporal alignment ⁸. We utilized the symmetric image normalization (SyN) method to spatially register the temporally aligned atlas to the subject space. This step can be accomplished using routines available in the advanced normalization tools (ANTs) open-source software library ¹⁰.
We assume the subject-specific spatiotemporal atlas to be given by $$$P(\mathbf{r}, \mathbf{t})$$$ and the subject image series to be $$$I(\mathbf{r}, \mathbf{t})$$$ where $$$\mathbf{r}=\{\mathbf{r}_n\}_{n=1}^{N}$$$, $$$\mathbf{t}=\{\mathbf{t}_m\}_{m=1}^{M}$$$. A sparse residual component $$$\mathbf{R}$$$ can be represented by the subtraction between the original image series and spatiotemporal atlas: $$$\mathbf{R}=\mathbf{I}-\mathbf{P}$$$. Here we can simulate the residual raw data by applying the system operator to $$$\mathbf{R}$$$: $$$\mathbf{d}_\mathbf{R}=\mathbf{\Omega}\odot\mathbf{F}\mathbf{S}\mathbf{R}=\mathbf{d}-\mathbf{\Omega}\odot\mathbf{F}\mathbf{S}\mathbf{P}$$$, where $$$\mathbf{F}$$$ is the DFT matrix, $$$\mathbf{S}$$$ is the coil-sensitivity weighting matrix, $$$\mathbf{\Omega}$$$ is the (k, t)-space sampling matrix, $$$\mathbf{d}$$$ is the raw data of the image, $$$\odot$$$ indicates a Hadamard product. Based on the PS-model theory ³, $$$\mathbf{R}$$$ can be represented by multiplying the residual spatial and temporal subspace $$$\mathbf{U}_{\mathbf{R}}\in\mathbb{C}^{N\times\textit{L}}, \mathbf{V}_{\mathbf{R}}^\mathrm{H}\in\mathbb{C}^{L\times\textit{M}}$$$ with the approximated rank $$$L$$$. The temporal subspace is predetermined by removing atlas-based temporal dynamics from a navigator acquisition, leaving the residual navigator data. The spatial subspace can be determined through a least-square estimation with a total-variation regularization: $$\hat{\mathbf{U}_\mathbf{R}}=\arg\min_{\mathbf{U}_\mathbf{R}\in\mathbb{C}^{N\times\textit{L}}}[\|\mathbf{d}_\mathbf{R}-\mathbf{\Omega}\odot\mathbf{F}\mathbf{S}\mathbf{U}_{\mathbf{R}}\mathbf{V}_{\mathbf{R}}^\mathrm{H}\|_{2}^{2}+\beta\mathrm{TV}(\mathbf{U}_{\mathbf{R}})]$$ We used an ADMM-based algorithm to solve this problem ¹¹. The final reconstructed image can be represented as: $$$\hat{\mathbf{I}}=\hat{\mathbf{U}_\mathbf{R}}\mathbf{V}_{\mathbf{R}}^\mathrm{H}+\mathbf{P}$$$

Results

We created 5 average spatiotemporal atlases of five different speech samples from a group of 20 child participants between 5 to 8 years of age. These subjects were gathered from three different locations: Champaign, Illinois; Greenville, North Carolina; and Boston, Massachusetts. We used a 10-slices protocol based on ⁴ with a 2$$$\times$$$2$$$\times$$$6 mm spatial resolution, 40 FPS temporal resolution and a total of 12.5 minutes scan separated to 5 short scans with 5 different samples. Figure 1 shows the mid-sagittal slice of 1 timeframe from one subject. Figure 2 shows both the mid-sagittal slice and the temporal profile of the sample ‘Buy baby a bib’ for two repetitions. The two repetitions are temporally labeled by s1 and s2. Figure 1(a)&2(a) show the average atlas at the subject space, Figure 1(b)&2(b) show the PS model-based reconstruction of the subject, Figure 1(c)&2(c) show the residual component image, Figure 1(d)&2(d) show the atlas-driven reconstruction image. Red arrows in Figure 1(c) point out the differences between the subject and the atlas which were highlighted through the residual component. Figure 1(d) shows an SNR improvement of the proposed method compared to Figure 1(b). Red arrows in Figure 2(b) and Figure 2(c) indicate the occurrence of a non-closure event. These two figures indicate the two main benefits of atlas-based image reconstruction: 1) reflecting spatiotemporal variations between the individual and the average population, and, 2) the potential for improving the overall reconstructed image quality.

Discussion & Conclusion

Further improvements are applicable based on a better temporal alignment or a more robust spatial registration method. By integrating the average spatiotemporal atlas, we were able to focus our image reconstruction and analysis on the variations of individual subjects through a PS-model based residual reconstruction method.

Acknowledgements

Research reported in this publication was supported by the National Institute of Dental and Craniofacial Research of the National Institutes of Health under award number R01DE027989. This work was conducted in part at the Biomedical Imaging Center of the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign (UIUC-BI-BIC).

References

1. Lingala SG, Sutton BP, Miquel ME, Nayak KS. Recommendations for real-time speech MRI. J Magn Reson Imaging. 2016; 43: 28-44.

2. Perry JL, Kuehn DP, Sutton BP, Fang X. Velopharyngeal structural and functional assessment of speech in young children using dynamic magnetic resonance imaging. Cleft Palate Craniofac J. 2017; 54: 408-422.

3. Liang Z-P. Spatiotemporal imaging with partially separable functions. In Proceedings of IEEE International Symposium on Biomedical Imaging, Washington D.C., USA, 2007. pp. 988–991.

4. Jin, R, Shosted, RK, Xing, F, et al. Enhancing linguistic research through 2-mm isotropic 3D dynamic speech MRI optimized by sparse temporal sampling and low-rank reconstruction. Magn Reson Med. 2023; 89: 652-664. doi:10.1002/mrm.29486

5. Jin, R, Li, Y, Shosted, RK, et al. Optimization of 3D dynamic speech MRI: Poisson-disc undersampling and locally higher-rank reconstruction through partial separability model with regional optimized temporal basis. Magn Reson Med. 2023; 1-14. doi: 10.1002/mrm.29812

6. Woo J, Lee J, Murano EZ, Xing F, Al-Talib M, Stone M, et al. A high-resolution atlas and statistical model of the vocal tract from structural MRI. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization. 2015;3(1):47-60.

7. Woo J, Xing F, Lee J, Stone M, Prince JL, editors. Construction of an unbiased spatio-temporal atlas of the tongue during speech. Information Processing in Medical Imaging: 24th International Conference, IPMI 2015, Sabhal Mor Ostaig, Isle of Skye, UK, June 28-July 3, 2015, Proceedings 24; 2015: Springer.

8. Xing F, Jin R, Gilbert IR, Perry JL, Sutton BP, Liu X, et al. 4D magnetic resonance imaging atlas construction using temporally aligned audio waveforms in speech. The Journal of the Acoustical Society of America. 2021;150(5):3500-8.

9. Xing F, Jin R, Gilbert I, El Fakhri G, Perry J, Sutton B, et al., editors. Quantifying velopharyngeal motion variation in speech sound production using an audio-informed dynamic MRI atlas. Medical Imaging 2023: Image Processing; 2023: SPIE.

10. Avants BB, Epstein CL, Grossman M, Gee JC. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis. 2008;12(1):26-41.

11. Ramani S, Fessler JA. A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstruction. IEEE Trans Med Imaging. 2012;31(3):677-88.

Figures

Figure 1. Mid-Sagittal slice of the same time frame from one subject for (a) Average spatiotemporal atlas in subject space, (b) Subject’s image through PS-reconstruction, (c) Subject’s Residual image, (d) Subject’s image through Atlas-driven residual reconstruction. Red arrows indicate the differences between Subject and the average atlas which were reflected in (c).

Figure 2. Mid sagittal slices and temporal strip plot of two ‘Buy baby a bib’ samples of the yellow dot lines for (a) Average spatiotemporal atlas in subject space, (b) Subject’s image through PS-reconstruction, (c) Subject’s Residual image, (d) Subject’s image through Atlas-driven residual reconstruction. The two samples are temporally labeled by s1 and s2. Red arrows indicate the occurring of non-closure event.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

1083

DOI: https://doi.org/10.58530/2024/1083