Riwei Jin1, Fangxu Xing2, Imani Gilbert3, Jamie Perry3, Jonghye Woo2, Ryan Shosted1, Zhi-Pei Liang1, and Brad Sutton1
1University of Illinois Urbana-Champaign, Champaign, IL, United States, 2Massachusetts General Hospital/Harvard Medical School, Boston, MA, United States, 3East Carolina University, Greenville, NC, United States
Synopsis
Keywords: Image Reconstruction, Image Reconstruction, Atlas, speech imaging
Motivation: Individuals across a population typically exhibit similar articulatory movements when performing speech tasks with specific speech samples. From an imaging experiment, we are interested in representing how an individual’s speech behavior is different from the ‘standard’ motion, which assists the preoperative planning of velopharyngeal surgery.
Goal(s): We expected to visualize velopharyngeal variations between individual subjects and the average population.
Approach: We have integrated an atlas into a low-rank residual reconstruction framework to capture the distinctive motion variations unique to each subject.
Results: We demonstrated the ability of the method to visualize velopharyngeal variations as well as enhancing the quality of the reconstruction process.
Impact: By applying a spatio-temporal atlas-driven reconstruction method, we were able to visualize and analysis velopharyngeal variations between individuals and the average population which will specifically benefit the surgical planning of
individual cleft palate patients.
Introduction
Dynamic MRI is developing to be an essential tool for speech imaging due
to its high spatiotemporal resolution and soft-tissue contrast. It has been
demonstrated that fast dynamic speech MR imaging techniques could deliver
linguistic and clinical insights into disordered speech in cleft palate
patients and others 1,2. However, the development of imaging methods that offers both high-quality
spatiotemporal reconstruction and accurate analysis of articulatory dynamics
has progressed at a slow pace. To address this, we need an approach that has the
potential to simultaneously deliver high-quality reconstructions and precise
analysis of articulatory dynamics while focusing linguistics and speech-pathologists on the components of speech that differ in the current patient
from the ‘average’. We proposed a novel low-rank residual reconstruction method, based on: recent advances with the
Partial-Separability model-based reconstruction method 3-5, construction
of a high-quality spatiotemporal atlas from a group of subjects 6,7, and
methods for accurate temporal alignment between subjects and atlas through
recorded audio waveforms 8,9. This enables us to visualize velopharyngeal
variations between individual subjects and the average population, which will
specifically benefit the surgical planning of individual cleft palate patients.Methods
The spatiotemporal atlas was created using an established method 7 from the initial reconstructions conducted through a PS low-rank model-based
method 3-5. To initialize the residual reconstruction, a spatiotemporal
alignment process needs to be accomplished to acquire the atlas average
information in the subject space. We used the recorded audio waveforms to
perform a 1D diffeomorphic demons registration to the reference waveform
for temporal alignment 8. We utilized the symmetric image normalization (SyN)
method to spatially register the temporally aligned atlas to the subject space.
This step can be accomplished using routines available in the advanced
normalization tools (ANTs) open-source software library 10.
We assume the subject-specific spatiotemporal
atlas to be given by $$$P(\mathbf{r}, \mathbf{t})$$$ and the subject image series to be $$$I(\mathbf{r}, \mathbf{t})$$$ where $$$\mathbf{r}=\{\mathbf{r}_n\}_{n=1}^{N}$$$, $$$\mathbf{t}=\{\mathbf{t}_m\}_{m=1}^{M}$$$. A sparse residual component $$$\mathbf{R}$$$ can be represented by the subtraction between
the original image series and spatiotemporal atlas: $$$\mathbf{R}=\mathbf{I}-\mathbf{P}$$$. Here we can simulate
the residual raw data by applying the system operator to $$$\mathbf{R}$$$: $$$\mathbf{d}_\mathbf{R}=\mathbf{\Omega}\odot\mathbf{F}\mathbf{S}\mathbf{R}=\mathbf{d}-\mathbf{\Omega}\odot\mathbf{F}\mathbf{S}\mathbf{P}$$$, where $$$\mathbf{F}$$$ is the DFT matrix, $$$\mathbf{S}$$$ is the coil-sensitivity weighting matrix, $$$\mathbf{\Omega}$$$ is the (k, t)-space sampling matrix, $$$\mathbf{d}$$$ is the raw data of the image, $$$\odot$$$ indicates a Hadamard product. Based on the PS-model
theory 3, $$$\mathbf{R}$$$ can be represented by multiplying the residual spatial and temporal subspace $$$\mathbf{U}_{\mathbf{R}}\in\mathbb{C}^{N\times\textit{L}}, \mathbf{V}_{\mathbf{R}}^\mathrm{H}\in\mathbb{C}^{L\times\textit{M}}$$$ with the approximated rank $$$L$$$. The temporal subspace is predetermined by removing
atlas-based temporal dynamics from a navigator acquisition, leaving the
residual navigator data. The spatial subspace can be determined through a least-square
estimation with a total-variation regularization: $$\hat{\mathbf{U}_\mathbf{R}}=\arg\min_{\mathbf{U}_\mathbf{R}\in\mathbb{C}^{N\times\textit{L}}}[\|\mathbf{d}_\mathbf{R}-\mathbf{\Omega}\odot\mathbf{F}\mathbf{S}\mathbf{U}_{\mathbf{R}}\mathbf{V}_{\mathbf{R}}^\mathrm{H}\|_{2}^{2}+\beta\mathrm{TV}(\mathbf{U}_{\mathbf{R}})]$$ We
used an ADMM-based algorithm to
solve this problem 11. The final reconstructed image can be represented as: $$$\hat{\mathbf{I}}=\hat{\mathbf{U}_\mathbf{R}}\mathbf{V}_{\mathbf{R}}^\mathrm{H}+\mathbf{P}$$$Results
We
created 5 average spatiotemporal atlases of five different speech samples
from a group of 20 child participants between 5 to 8 years of age. These
subjects were gathered from three different locations: Champaign, Illinois;
Greenville, North Carolina; and Boston, Massachusetts. We used a 10-slices protocol based on 4 with a 2$$$\times$$$2$$$\times$$$6 mm spatial
resolution, 40 FPS temporal resolution and a total of 12.5 minutes scan
separated to 5 short scans with 5 different samples.
Figure 1 shows the mid-sagittal slice of 1 timeframe from one
subject. Figure 2 shows both the mid-sagittal slice and the temporal profile of the sample ‘Buy baby a bib’ for two repetitions. The two repetitions are temporally labeled by s1 and s2. Figure 1(a)&2(a) show the average atlas at the subject space, Figure 1(b)&2(b) show the PS model-based reconstruction of the subject, Figure 1(c)&2(c) show the
residual component image, Figure 1(d)&2(d) show the atlas-driven reconstruction
image. Red arrows in Figure 1(c) point out the differences between the subject and the atlas
which were highlighted through the residual component. Figure
1(d) shows an SNR improvement of the proposed method compared to Figure 1(b). Red
arrows in Figure 2(b) and Figure 2(c) indicate the occurrence of a non-closure
event. These two figures indicate the two main benefits of atlas-based image
reconstruction: 1) reflecting spatiotemporal variations between the individual
and the average population, and, 2) the potential for improving the overall
reconstructed image quality.Discussion & Conclusion
Further improvements are applicable based on a better temporal alignment or a more robust spatial registration method. By
integrating the average spatiotemporal atlas, we were able to focus our image reconstruction
and analysis on the variations of individual subjects through a PS-model based
residual reconstruction method. Acknowledgements
Research reported in this publication was supported by the
National Institute of Dental and Craniofacial Research of the National
Institutes of Health under award number R01DE027989. This work was
conducted in part at the Biomedical Imaging Center of the Beckman Institute for
Advanced Science and Technology at the University of Illinois at
Urbana-Champaign (UIUC-BI-BIC).References
1. Lingala SG, Sutton BP, Miquel ME, Nayak KS. Recommendations
for real-time speech MRI. J Magn Reson Imaging. 2016; 43: 28-44.
2. Perry JL, Kuehn DP, Sutton BP, Fang X. Velopharyngeal
structural and functional assessment of speech in young children using dynamic
magnetic resonance imaging. Cleft Palate Craniofac J. 2017; 54: 408-422.
3. Liang Z-P. Spatiotemporal
imaging with partially separable functions. In Proceedings of IEEE
International Symposium on Biomedical Imaging, Washington D.C., USA, 2007. pp.
988–991.
4. Jin, R, Shosted, RK, Xing, F, et al. Enhancing linguistic research through 2-mm isotropic 3D dynamic
speech MRI optimized by sparse temporal sampling and low-rank reconstruction. Magn Reson Med. 2023; 89:
652-664. doi:10.1002/mrm.29486
5. Jin, R, Li, Y, Shosted, RK, et al. Optimization of 3D dynamic speech MRI: Poisson-disc undersampling
and locally higher-rank reconstruction through partial separability model with
regional optimized temporal basis. Magn Reson Med. 2023; 1-14. doi: 10.1002/mrm.29812
6. Woo J, Lee J, Murano EZ, Xing F,
Al-Talib M, Stone M, et al. A high-resolution atlas
and statistical model of the vocal tract from structural MRI. Computer Methods
in Biomechanics and Biomedical Engineering: Imaging & Visualization.
2015;3(1):47-60.
7. Woo J, Xing F, Lee J, Stone M, Prince JL, editors.
Construction of an unbiased spatio-temporal atlas of the tongue during speech.
Information Processing in Medical Imaging: 24th International Conference, IPMI
2015, Sabhal Mor Ostaig, Isle of Skye, UK, June 28-July 3, 2015, Proceedings
24; 2015: Springer.
8. Xing F, Jin R, Gilbert IR, Perry JL, Sutton BP, Liu X, et al.
4D magnetic resonance imaging atlas construction using temporally aligned audio
waveforms in speech. The Journal of the Acoustical Society of America.
2021;150(5):3500-8.
9. Xing F, Jin R, Gilbert I, El Fakhri G, Perry J, Sutton B, et
al., editors. Quantifying velopharyngeal motion variation in speech sound
production using an audio-informed dynamic MRI atlas. Medical Imaging 2023:
Image Processing; 2023: SPIE.
10. Avants BB, Epstein CL, Grossman M, Gee JC. Symmetric
diffeomorphic image registration with cross-correlation: evaluating automated
labeling of elderly and neurodegenerative brain. Medical image analysis.
2008;12(1):26-41.
11. Ramani
S, Fessler JA. A splitting-based iterative algorithm for accelerated
statistical X-ray CT reconstruction. IEEE Trans Med Imaging. 2012;31(3):677-88.