Temporal point spread function interpretation of low rank, dictionary learning models in dynamic MRI

Sajan Goud Lingala¹, Sampada Bhave², Yinghua Zhu¹, Krishna Nayak¹, and Mathews Jacob²

¹Electrical Engineering, University of Southern California, Los Angeles, CA, United States, ²Electrical and Computer Engineering, University of Iowa, Iowa city, IA, United States

Synopsis

A number of dynamic MRI applications have seen the adaptation of data-driven models for efficient de-noising and reconstruction from under-sampled data. In this work, we develop a novel temporal point spread function interpretation of two data-driven models: low rank, and dictionary-learning. Through this interpretation, we show (a) the low rank model to perform spatially invariant non-local view-sharing, and (b) the dictionary-learning model to perform spatially varying non-local view-sharing. Both the models can be viewed as efficient data-driven retrospective binning techniques. We provide demonstrations using the application of de-noising real-time MRI speech data.

PURPOSE

A number of dynamic MRI applications have seen the adaption of data-driven models for efficient de-noising, and reconstruction from under-sampled data [1-6]. Data-driven models have shown to be advantageous over models that are based on pre-determined transforms. In this work, we develop a novel temporal point spread function interpretation of the low rank [1-5], and dictionary-learning models [6] that are used for de-noising dynamic MRI. Through this interpretation, we show (a) the low rank model to perform spatially invariant non-local view-sharing, and (b) the dictionary-learning model to perform spatially varying non-local view-sharing. Both the models can be viewed as efficient data-driven retrospective binning techniques.

THEORY

Dynamic signal model: The dynamic pixel time profile $\gamma(\mathbf x,t)$ is modeled as a linear combination of temporal basis functions $v_{i}(t)$ derived from the data [1-2]: $\mathbf \Gamma_{p\times m} = \mathbf V_{p \times r} \mathbf U_{r \times m};$ where $\mathbf \Gamma$ is the Casorati matrix representation of the dynamic data, and $\mathbf U$ is the spatial weight matrix; $p, r,m$ are respectively the number of time frames, number of basis functions, and number of pixels in a time frame. The low rank model assumes linear combination with a few number of orthogonal bases (i.e, $r<p$ ), while dictionary learning assumes a sparse linear combination of a large number of bases that are not necessary orthogonal (i.e $r>p$ ; $\|u_i\|_{0}<=k; k<r$ ), where $k$ is the sparsity level.

Denoising using the low-rank model: Denoising with the low rank model can be formulated as $\min_{\mathbf U}\|\mathbf V\mathbf U-\mathbf \Gamma_{n}\|_{F}^{2};$ where $\mathbf \Gamma_{n}$ is the noisy dynamic data. $\mathbf V$ is estimated from the data itself via SVD decomposition of rank $r$ approximation of $\Gamma_n$ ; $(r<p)$ . The denoised solution is given as: $\hat{\mathbf \Gamma}=\underbrace{\mathbf V(\mathbf V^{T}\mathbf V)^{-1}\mathbf V^{T}}_{\mathbf Q_{p\times p}}\mathbf \Gamma_n;$ The above suggests that every spatial frame in $\hat{\mathbf \Gamma}$ is a weighted linear combination of spatial frames from $\mathbf \Gamma_n$ . The weights are determined by columns (or rows) of the symmetric $\mathbf Q$ matrix. We term the columns of this matrix as temporal point spread functions (TPSF) as it characterizes averaging across time.

Denoising using the dictionary-learning model: The problem is formulated as joint estimation of $\mathbf U$ and $\mathbf V$ : $\min_{\mathbf U, \mathbf V}\|\mathbf V \mathbf U-\mathbf \Gamma_n\|_{F}^{2}; \mbox{such that}, \|u_{i}\|_{0}<=k; \|v_i\|_{2}^{2}<=1.$ The above can be solved by dictionary learning algorithms such as k-SVD [7], with the resulting solution: $\hat{\mathbf \Gamma}=\underbrace{\mathbf V_{red}(\mathbf V_{red}^{T}\mathbf V_{red})^{-1}\mathbf V_{red}^{T}}_{\mathbf Q_{p\times p}}\mathbf \Gamma_n;$ where the rows of the matrix $\mathbf V_{red}$ are the temporal basis functions that are active at a specified pixel. Note, $\mathbf V_{red}$ is the reduced subset from the dictionary $\mathbf V$ and will vary for different spatial pixels. This implies the TPSF is spatially varying.

METHODS

Dynamic data of the upper-airway in the mid-sagittal plane was acquired on a GE 3T scanner with the head coil, using a fast gradient echo sequence (Flip angle: 15⁰, TR= 3.28 ms; time-resolution: 420ms). The subject produced repeated utterance of the sound: "za-na-za-na". The head coil has low sensitivity in the dynamic articulatory regions of interest. The low rank, and dictionary-learning algorithms are applied to improve the signal to noise in these regions. Rank, and sparsity in the respective algorithms were chosen such that

$\| \hat{\mathbf \Gamma} - \mathbf \Gamma_n \|_{F}^{2}$ lied within the noise level.

RESULTS

Figure 1 shows representative TPSFs during low rank denoising. Two frames are picked which correspond to different motion states: velum touching the pharyngeal wall, and velum in resting position. The peaks of the TPSF correspond to the frames that have the most weight in the linear combination, and correspond to instances of similar motion states, which suggests implicit non-local view-sharing. Figure 2 show representative spatially varying TPSFs during dictionary learning de-noising. The spatial locations are chosen at three different air-tissue interfaces, where the articulators have different rates of movements. Note how the characteristics of the TPSFs are correlated with underlying denoised dynamic pixel time profiles, suggesting spatial adaption of non-local view-sharing. Figure 3 finally shows the denoising results using the two algorithms.

DISCUSSION

We provide a novel interpretation of low rank and dictionary learning algorithms as retrospective data-rebinning techniques. The low rank model can be interpreted as a spatial invariant non-local view-sharing method. The dictionary learning model can be interpreted as a spatially variant non-local view-sharing method. The temporal point spread functions can be used as a means to characterize blurring with these methods. Our analysis suggests the data-driven models to be very efficient in DMRI applications with quasi-periodicity such as dynamic lung imaging, free breathing cardiac imaging, and repeated speech utterances.

Acknowledgements

No acknowledgement found.

References

[1] A. S. Gupta and Z. Liang, “Dynamic imaging by temporal modeling with principal component analysis,” 2001, p. 10.

[2] Z.-P. Liang, “Spatiotemporal imaging with partially separable functions,” in Noninvasive Functional Source Imaging of the Brain and Heart and the International Conference on Functional Biomedical Imaging, 2007. NFSI- ICFBI 2007. Joint Meeting of the 6th International Symposium on. IEEE, 2007, pp. 181–182.

[3] H. Jung, K. Sung, K. S. Nayak, E. Y. Kim, and J. C. Ye, “k-t focuss: A general compressed sensing framework for high resolution dynamic mri,” Magnetic Resonance in Medicine, vol. 61, no. 1, pp. 103–116, 2009.

[4] H. Pedersen, S. Kozerke, S. Ringgaard, K. Nehrke, and W. Y. Kim, “k-t pca: Temporally constrained k-t blast reconstruction using principal com- ponent analysis,” Magnetic resonance in medicine, vol. 62, no. 3, pp. 706– 716, 2009.

[5] S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic mri exploiting sparsity and low-rank structure: kt slr,” Medical Imaging, IEEE Transactions on, vol. 30, no. 5, pp. 1042–1054, 2011.

[6] S. G. Lingala and M. Jacob, “Blind compressive sensing dynamic mri,” Medical Imaging, IEEE Transactions on, vol. 32, no. 6, pp. 1132–1145, 2013.

[7] M. Aharon, et al, "k-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation" IEEE Trans. Sign Processing; 54 (11): 4311-4322, 2006.

Figures

Low rank denoising as non-local view-sharing. The TPSFs are the entries of the

$\mathbf Q$ matrix. Different frames have different TPSFs. Note how the peaks of the TPSFs correspond to similar motion state frames implying implicit non-local view-sharing.

Dictionary learning denoising as spatially varying non-local view-sharing. The TPSFs are the entries of the

$\mathbf Q_{red}$ matrix, and are spatially varying. Note how the peaks of the TPSFs correspond to the peaks of the underlying dynamic pixel time profiles, implying implicit spatial varying non-local view sharing.

Denoising results using data-driven algorithms: The non-local time averaging in the data-driven models enables robust denoising while preserving temporal fidelity. In this example, dictionary learning denoising has subtle gains in performance over low rank denoising, which is attributed to spatial variance of TPSFs.

Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)

4233