Pierre-André Vuissoz^{1,2}, Benjamin Elie^{3}, Freddy Odille^{2,4}, and Yves Laprie^{3}

MRI becomes an important tool in the study of speech, in particular in the understanding of articulatory gestures. Distributed Compressed Sensing and Projection Onto Convex Sets are used to reconstruct dynamic sequences of vocal tract images at 33 frames per second with a spatial resolution enabling the extraction of vocal tract contours. 15 seconds long, spoiled gradient echo sequence acquisitions with pseudo random Cartesian sampling were recorded while subjects were repeating sentences. 76 sentences were recorded, representing the majority of the French phonemes. High frame rate dynamic vocal tract MRI will enable the study of coarticulation in French.

**METHODS:**

MRI experiments were performed on a 3T Signa HDxt MR system (GE Healthcare, Milwaukee, WI). Dynamic vocal tract MRI data were obtained form 3 healthy volunteers with written informed consent and approval of local ethics committee. The data were collected with a 16 channel neurovascular coil array. The protocol consisted in a mid-sagittal vocal tract slice acquired with a custom modified Spoiled Fast Gradient Echo (FSPGR, TR 3.02ms , TE 1.004ms, partial Fourier 120 $$$k_x$$$ samples, line BW 125 kHz, flip angle 30°, matrix 192x192, 512 temporal frames). For each temporal frame the sequence modification consisted in acquiring only a randomized subset of lines per frame ($$$n_{lpf}=10$$$) among the phase lines ($$$n_y=192$$$) of the k-space, resulting in an acquisition time of $$$3.02ms\times n_{lpf}$$$ per frame and a total acquisition time of $$$512\times 3.02\times n_{lpf}=15.5s$$$. During the acquisition protocol all subjects had to pronounce a set of 76 French sentences chosen to span the largest coarticulation context. Each sentences was pronounced several times during the 15.5s acquisition and the voice was recorded.

A pseudo-random Cartesian sampling scheme is chosen were central k-space lines are privileged. With $$$n_{lpf}=10$$$, the number of fully sampled centre lines is set $$$n_{cl}=5$$$. For each temporal frame the $$$n_{cl}$$$ central lines are fully sampled and the remaining lines are randomly chosen following the probability function:

$$p\left(k_y,t\right)=\left|\frac{1}{\left(1-\left(k_y-\frac{n_y}{2}\right)\right)^{r(t)}}\right|$$

This distribution leads to a variable sampling density that decreases away from the centre k-space lines as displayed in Figure 1.

Since the acquisition uses partial Fourier in $$$k_x$$$ direction a fist iterative homodyne k-space filling for each coil using POCS (Projection Onto Convex Sets) (5) is used to recover the full k-space lines. To build the low resolution images used as phase correction prior in the POCS algorithm the temporal mean value of the acquired kt-space for the whole acquisition is used.

As presented in more details in (6), the temporal Fourier of the image intensity is chosen as a sparse-transform to apply the compressed sensing. Using parallel acquisition on multichannel coils, compressed sensing can exploit the strong correlation of the different channels by introducing a joint sparsity constraint by simultaneously minimizing the $$$l_1\rm{-norm}$$$ of the sparse representation of the signal in each coil and also the number of non-zero coefficients location for all coils. Therefore a Distributed Compressed Sensing (DCS) (7) with these assumption is performed.

By using the complete k-space lines form the POCS iterative reconstruction as an observation matrix $$$\bf{B}$$$, the DCS reconstruction is performed solving the equation:

$$\bf{P}\it{=}\underset{{\bf\hat{P}}}{\operatorname{argmin}}\|{\bf{F}_{\it{t}}}{\bf{\hat{P}}}\|_{1,2}\qquad{s.t.}\qquad\|{\bf{\Phi}}{\bf{F}_{\it{sp}}}{\bf\hat{P}}-\bf{B}\|_{\it2,2}\leq\epsilon$$

Where $$$\bf{P}$$$ is the matrix whose columns contain the images vectors of the $$$l$$$ colis. $$$\bf{F}_{\it{t}}$$$ is the time Fourier and $$$\bf{F}_{\it{sp}}$$$ is the spatial Fourier operators. $$$\bf{\Phi}$$$ is the incoherent acquitition matrix issue from the pseudo-random Cartesian sampling. The $$$l_{1,2}\rm{-norm}$$$ is defined as the $$$l_1\rm{-norm}$$$ of the $$$l_2\rm{-norm}$$$ of each row of $$$\bf{P}$$$. To solve this equation SPGL1 solver (8,9) was used.

Finally coils combination is performed for each frame to produce the final vocal tract movie.

**RESULTS:**

**DISCUSSION:**

**CONCLUSION:**

1. Scott AD, Wylezinska M, Birch MJ, Miquel ME. Speech MRI: morphology and function. Phys. Medica PM Int. J. Devoted Appl. Phys. Med. Biol. Off. J. Ital. Assoc. Biomed. Phys. AIFB 2014;30:604–618. doi: 10.1016/j.ejmp.2014.05.001.

2. Fu M, Zhao B, Carignan C, Shosted RK, Perry JL, Kuehn DP, Liang Z-P, Sutton BP. High-resolution dynamic speech imaging with joint low-rank and sparsity constraints. Magn. Reson. Med. 2015;73:1820–1832. doi: 10.1002/mrm.25302.

3. Lingala SG, Zhu Y, Kim Y-C, Toutios A, Narayanan S, Nayak KS. A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn. Reson. Med. 2016. doi: 10.1002/mrm.26090.

4. Donoho DL. Compressed sensing. IEEE Trans. Inf. Theory 2006;52:1289–1306. doi: 10.1109/TIT.2006.871582.

5. Youla D. Generalized Image Restoration by the Method of Alternating Orthogonal Projections. IEEE Trans. Circuits Syst. 1978;25:694–702. doi: 10.1109/TCS.1978.1084541.

6. Elie B, Laprie Y, Vuissoz P-A, Odille F. High spatiotemporal cineMRI films using compressed sensing for acquiring articulatory data. In: EUSIPCO2016. Budapest, Hungary; 2016.

7. Liang D, Liu B, Wang J, Ying L. Accelerating SENSE using compressed sensing. Magn. Reson. Med. 2009;62:1574–1584. doi: 10.1002/mrm.22161.

8. van den Berg E, Friedlander M. SPGL1: A solver for large-scale sparse reconstruction http://www.cs.ubc.ca/~mpf/spgl1/index.html.

9. van den Berg E, Friedlander M. Probing the Pareto Frontier for Basis Pursuit Solutions. SIAM J. Sci. Comput. 2008;31:890–912. doi: 10.1137/080714488.

Figure 1: For one acquisition, Pseudo-random Cartesian
sampling scheme, (a) the phase lines of the first 10 frames, (b) all the
acquired phase lines (c) k-space raw data of one coil for the 17^{th}
frame, (d) homodyne k-space filling
using 10 iteration of POCS for the same frame.

Figure 2: For one subject pronouncing
“Le filou et la fripouille manipulent de l’acrylique antirides dilué sous le
tipi” more than twice, Time motion display of a 15.5s reconstructed movie, (a) vertical
section through lips, (b) through tongue tip, (c) backward motion of the tongue,
(d) velum motion, (e) corresponding audio recording.

Figure 3: For the same subject four different positions
of the articulators.