In deep reinforcement learning (DRL), software agents based on deep neural networks are used to explore environments in order to maximise a reward (e.g. score in a video game). Here, DRL was used to control a virtual MRI scanner and actively interpret acquired data. An environment was constructed in which correctly determining the shape of a phantom was rewarded with a high score, and penalised by increasing acquisition time. Following training, the algorithm had learnt to acquire sparse images, assigning TE, TR and flip angles that enabled it to act as an edge detector and deduce shape with 99.8% accuracy.
Deep reinforcement learning (DRL) is a subset of machine learning, in which algorithms actively interact with a given environment, aiming to maximise a notional reward or score. This contrasts with more widely used supervised learning approaches, in which algorithms are trained to map input data (such as clinical images) to human expert interpretations (e.g. diagnoses or the location of tumours). DRL was recently brought to prominence by DeepMind (now owned by Google), whose algorithms learnt to play Atari computer games to expert levels, using only pixel data as input1 and to beat world-class experts at the board game Go.2
Here, we report the results of a proof-of-concept study to investigate the ability of deep reinforcement learning to perform real-time, adaptive control of a virtual MRI scanner. The problem was cast in the form of a game, in which an MRI scanner was simulated with the Bloch equations. The objective of the game was to learn to acquire MRI data in a manner that enable the shape of a phantom to be determined (either circle or square), as quickly as possible, guided by partially reconstructed magnitude images.
Game environment: A simple, single-slice gradient echo sequence was simulated using the Bloch equations, coded in Python 3.5. In each round of the game, a virtual, square or circular phantom was generated at a random position in a 32×32 2D matrix. Phantoms were assigned T1/T2 = 1300/20 ms. Background regions outside the phantom were assigned T1/T2 = 3000/2000 ms. Equilibirum magnetisation was set to unity and aligned with B0. Gaussian noise was added to the simulated data with a standard deviation of 0.05.
At each step in the game, values for the TR, TE, flip angle and phase-encoding gradient magnitude could be assigned, followed by a readout gradient, during which the virtual signal was sampled, used to fill the appropriate line in k-space, and the image reconstructed using a 2D Fourier transform. Also at each step, the shape of the phantom could be guessed, or left as ‘unknown’. A positive score was granted for a correct shape guess (G+, inversely weighted by total time taken), zero score for no guess (G0), and a large negative score for an incorrect guess (G-).
Control with DRL: The deep deterministic policy gradient (DDPG) algorithm with actor-critic architecture 3 was implemented in Keras (Python 3.5), with magnitude pixel data input to the network (see Figure 1). The actor network consisted of 6 2D convolutional layers (widths 32, 32, 64,64,128,128) and 3 dense layers, with 20% dropout and ReLU outputs. The output layer contained 5 nodes corresponding to phase encoding step (PE), TE, TR, flip angle and shape guess (G), which were scaled to the range PE=(0,31), TE=(5,20)ms, TR=(25,2000)ms, flip=(1,90)°. Training was undertaken on 6×106 games, using an NVidia TitanX GPU. The resulting network was evaluated for 10,000 games.
1. Mnih et al. Nature, 2014;518
2. Silver et al Nature, 2015;529
3. Lillicrap et al arXiv:1509.02971