3960

Exploring RF pulse design with deep reinforcement learning

Xiaodong Ma¹, Kamil Uğurbil¹, and Xiaoping Wu¹
¹Center for Magnetic Resonance Research, Radiology, Medical School, University of Minnesota, Minneapolis, MN, United States

Synopsis

In this study, we expand the application of a deep reinforcement learning (DRL) pulse design framework to designing four basic types of RF pulses and more complicated multi-band RF pulses. Our results showed that the DRL framework can be used to effectively design all types of RF pulses, improving slice profiles with reduced ripple levels in comparison to the conventional SLR algorithm.

Purpose

Recent studies have demonstrated the value of deep learning in RF pulse design^1-4. Particularly, deep reinforcement learning (DRL)⁵ was shown able to predict an RF pulse with specified design criteria³. Here, we expand its application to five pulse design scenarios: small-tip-angle excitation, 90-degree excitation, refocusing, inversion and multi-band 90-degree excitation.

Methods

Our DRL pulse design framework entailed an artificial agent learning from interactions with its environment (Fig. 1). The agent was a deep neural network and the environment involved a Bloch simulator.
The training in each episode (i.e. iteration) was as follows: 1) the agent made observation by taking the target slice profile as input and took action by predicting RF pulse; 2) the predicted RF pulse was fed (along with a constant gradient) to the Bloch simulator to produce the corresponding slice profile; 3) the produced slice profile was used to evaluate a reward with which to update the neural network. The neural network was trained by minimizing the loss (instead of maximizing the reward) using Adam algorithm⁶. The loss was calculated by
$$loss=\lambda_{1}\cdot{MSE_{-}M_{xy}}+\left(1-\lambda_{1}\right)\cdot{MSE_{-}M_{z}}+\lambda_{2}\cdot{R_{-}M_{xy}}+\lambda_{3}\cdot{R_{-}M_{z}}$$
where MSE_M_xy and MSE_M_z are mean squared error (MSE) for transverse and longitudinal components, respectively, used to quantify the deviation of the produced slice profile from the target; R_M_xy and R_M_z are the corresponding ripple levels, defined as sum of the min-max magnitude difference in passband and stopband; λ₁, λ₂ and λ₃ are three hyperparameters tunable to adjust the weightings of individual terms.
For ease of training, the target slice profile was relaxed to a trapezoid shape, with the transition band calculated by the SLR algorithm for 0.01 passband and stopband ripple levels⁷.
For each design scenario, an independent neural network was trained using a custom two-stage optimization. λ₁, λ₂ and λ₃ in the loss definition were adapted (Fig. 2) from stage to stage such that the first stage was devised to reduce both MSE and ripples while the second stage to purely minimize MSE. Further, λ₁ was set to a value close to 1 for excitation pulse designs (to consider mostly M_xy) and a value close to 0 for refocusing and inversion pulse designs (to consider mostly M_z).
Although the neural networks in all design scenarios shared the same architecture of 6 fully-connected layers (including 4 hidden layers each with ReLU activation and 256 neurons), they differed in the output layer. The output layer was designed to output the entire RF waveform for inversion pulse design, whereas it was devised to predict only the first half of the RF waveform (with the second half mirroring the first half) for all the other design scenarios requiring a linear-phase slice profile.
For the small-tip-angle scenario, the single neural network was trained by randomly sampling a pool of target slice profiles prescribed for flip angles from 5 to 80 degrees (in steps of 5 degree) to promote its generalizability. In all scenarios, real-valued RF pulses were predicted, the slice thickness was set to 3 mm, and a total of 400 episodes carried out (300 for stage 1 and 100 for stage 2).
Our DRL framework was implemented using the Flux package in Julia⁸ and all networks trained on a Linux workstation using CPU without parallel computation. For comparison, pulses were also designed with the SLR algorithm⁷.

Results

Our DRL method substantially reduced the slice profile MSE for all pulse design scenarios (Fig. 3) when comparing the resultant slice profile against the target trapezoidal slice profile, and MSE was decreased the most for multiband excitation pulse design (by as high as 86%). When comparing the resultant slice profile against the ideal rectangular slice profile, our DRL method outperformed SLR for flip angles up to 90 degrees, with MSE decreased most for small tip angle excitation (by 15%), though slice profile was moderately degraded (by <10%) for refocusing, and slightly degraded (<1%) for inversion.
The improvement in slice profile for small tip angle pulse design was further confirmed by inspecting the resultant slice profiles of representative flip angles (Fig. 4). For a wide range of flip angles, our DRL method gave rise to a slice profile closer to both target and ideal slice profiles, with more flattened passband observed for relatively small tip angles and with less stopband ripples for relatively large tip angles.
Likewise, our DRL method improved slice profiles for single band and multiband excitation pulses with noticeable suppression of stopband ripples, while leading to visually comparable slice profiles for refocusing and inversion (Fig. 5).
The predicted RF pulses, however, are not as smooth as those from SLR and have increased peak B1.

Discussion and Conclusion

We have demonstrated a deep reinforcement learning framework to design various RF pulses. Our results showed that our DRL framework can effectively predict excitation, refocusing, inversion and multiband excitation pulses for a given target slice profile, with improved performance in ripple suppression as compared to the conventional SLR pulse design algorithm.
Part of our future work is to investigate how best to reduce peak B1 and to examine the utility of other neural network architectures (e.g., RNN⁹ and ResNet¹⁰).

Acknowledgements

This work was supported by NIH grants U01 EB025144, and P41 EB015894.

References

1. inding MS, Skyum B, Sangill R, et al. Ultrafast (milliseconds), multidimensional RF pulse design with deep learning. Magn Reson Med 2019;82:586-599

2. Shin D, Ji S, Lee D, et al. Deep Reinforcement Learning Designed Shinnar-Le Roux RF Pulse Using Root-Flipping: DeepRFSLR. IEEE Trans Med Imaging 2020;39:4391-4400

3. Shin D, Lee J. DeepRF: Designing an RF pulse using a self-learning machine. Proc Intl Soc Mag Reson Med 2020:0611

4. Zhang Y, Jiang K, Jiang W, et al. Multi-task convolutional neural network-based design of radio frequency pulse and the accompanying gradients for magnetic resonance imaging. NMR Biomed 2020:e4443

5. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. 2020

6. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 2014

7. Pauly J, Le Roux P, Nishimura D, et al. Parameter relations for the Shinnar-Le Roux selective excitation pulse design algorithm [NMR imaging]. IEEE Trans Med Imaging 1991;10:53-65

8. Innes M. Flux: Elegant machine learning with Julia. Journal of Open Source Software 2018;3:602

9. David ER, James LM. Learning Internal Representations by Error Propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations: MIT Press; 1987:318-362

10. He KM, Zhang XY, Ren SQ, et al. Deep Residual Learning for Image Recognition. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr) 2016:770-778

Figures

Fig.1. The deep reinforcement learning (DRL) pulse design framework. In a forward pass, the neural network takes the input target slice profile and outputs the predicted RF pulse. The Bloch simulator embedded in the system gives the produced slice profile, which in turn is used (along with the target slice profile) to evaluate the loss. In a backward pass, the gradient of the loss with respect to the weights of the neural network are calculated using backpropagation to update the neural network. The procedures described above are iterated during the training.

Fig.2. Summary of pulse parameters and training details. For each of the five pulse design scenarios, a custom two-stage optimization strategy was employed to train the neural network, in which the three weighting factors (λ₁, λ₂ and λ₃) were changed in value from stage 1 to stage 2 to define a different loss function for each stage (see text for more details).

Fig.3. Quantitative comparison of RF pulses designed with conventional SLR vs our DRL method in terms of slice profile MSE. Note that for all pulse design scenarios, our DRL method led to a substantially reduced MSE when comparing the resultant slice profile against the target trapezoidal slice profile and that when comparing the resultant slice profile against the ideal rectangular slice profile, our DRL method outperformed SLR for flip angles up to 90 degrees.

Fig.4. Comparison of the SLR algorithm and our DRL method when used for small tip-angle excitation pulse design. For DRL, a single neural network trained using a range of flip angles (5° to 80° with a 5° step) was utilized to predict RF pulses of different flip angles including three flip angles seen during the training (A-C), and an unseen flip angle (D) for testing the generalizability of the neural network. Note that despite larger fluctuations observed in the predicted RF pulse, our DRL method led to better slice profiles, with smaller stopband ripples especially for larger flip angles.

Fig.5. Comparison of the SLR algorithm vs our DRL method when used to design (A) 90◦ excitation, (B) refocusing, (C) inversion and (D) 90◦ excitation multiband (MB=2, gap=50 mm) pulses. For each scenario a separate neural network was trained. Note that despite larger fluctuation observed in the predicted RF pulse, our DRL method led to visually comparable slice profiles for refocusing and inversion, while improving slice profiles for single band and multiband excitation pulses with noticeable suppression of stopband ripples.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

3960