3632

Clinical diagnosis based on speech mechanics with dynamic (real time) MRI, functional MRI and acoustics of speech (spectrogram)
Sunita Gudwani1, S. Senthil Kumaran2, and Madhuri Behari3

1Department Otorhinolaryngology & Head-Neck Surgery, AIIMS, New Delhi India, New Delhi, India, 2AIIMS, New Delhi, New Delhi, India, 3Neurology, Fortis Vasant Kunj, New Delhi, DE, India

Synopsis

In evidence based clinical practice, mapping speech production problems is a challenge, further with cognitive impairment in degenerative disease makes the examination and assessments difficult. Visualization of the flexible vocal tract (dynamic MRI), cognitive planning (BOLD activation) and automated acoustic analysis (spectrogram) of speech production in motor degenerative diseases (like Parkinson’s Disease) might reframe the diagnostic evaluations. Empirically dynamic MRI is important technique for articulatory movements when adjunct with cognitive planning (fMRI) and spectrogram analysis may team as sensitive measures in clinical diagnosis. Thus this pilot study was planned to observe the significance.

Purpose

Vocal tract is similar to an inverted L-shaped tube extending from vocal folds to lips with extension to nasal cavity and due to wide range of movements and flexibility of articulators involved (particularly tongue), it is difficult to clinically examine and measure. These skilled motor movements are programmed and controlled by cerebral cortex and basal ganglia which deteriorates gradually in degenerative diseases of central nervous system like Parkinson’s Disease (PD). Clinical diagnosis if supplemented with MRI (spatio-temporal articulatory dynamics), fMRI (BOLD) and automated acoustic feature analysis with support vector machines (SVM) may be promising instrument

Introduction

Speech production is a complex mechanism involving articulatory movements, with considerable individual and inter-speaker variability. In progressive degeneration these non-coordinated agonist-antagonist muscle groups of oral, nasal and respiratory motor programs deteriorate leading to perceptual features of spectrogram. Knowledge of applied bioacoustics, biosensing and computational modeling would optimize clinical diagnosis1-7

Methods

Study compared natural rate syllable production (velar and alveolar) in normal healthy control (n=6) and Parkinson’s Disease (n=6) subjects (age 45-70 years) with acoustic features (spectrogram), real-time dynamic MRI and fMRI. Assessments consisted UPDRS-III, H&Y, Frenchay’s Dysarthria Scale; acoustic analysis of audio-recorded speech and fMRI, dynamic MRI. Speech stimulus involved Hindi syllable production (unaspirated, voiceless and voiced) that were presented as visual stimulus with SuperLab4.2, Cedrus Inc. USA and MR compatible binocular goggles and speech output was recorded by MR compatible microphone (NordicNeuroLab, Norway). MR images were acquired at 1.5T MR scanner (Avanto, Siemens, Germany) using 12-channel head & neck where parameters of dynamic MRI were 2-D single slice using ‘trufi’ sequence with slice thickness flip angle 77°; TR 1.90 ms; TE 1 ms.; echo-span 2.7 ms; bandwidth 1395 Hz/ pixel; phase 40 freq/ s.; slice thickness 10 mm (10mm X 241mm X 360mm); temporal resolution 5 frames per second or 1 frame in 200 ms, acquisition time of 80s. Single-shot echoplanar imaging (EPI) sequence was used for observing cortical BOLD (blood oxygen level dependent) activity, parameters used were TR/TE 4000/44, flip angle 90, slices: 31(slice thickness 5mm with no slice gap), EPI factor 128. To overlay T1-weighted 3D images were acquired with magnetically prepared rapid gradient-echo (MPRAGE) sequence. Processing of dynamic MRI was done with DICOM works and ImageJ softwares. Analysis of BOLD images involved preprocessing and post processing using SPM12 software where second level analysis was done as two sample (uncorrected p<0.001, threshold 5). After noise reduction (Wavepad) spectral acoustic processing was done in wavesurfer, and praat of recorded verbal output (during MRI & outside scanner). Important variables observed were formants F1, F2, F2 transition, voice-onset time, jitter, shimmer, pitch variations, loudness, harmonic distortion, etc. The variables defined in praat were used as features extraction, linear regression and classification in support vector machine (SVM) for computing and automated diagnosis of the two groups (healthy controls and PD)

Results and Discussion

Dynamic MRI show affricate production in PD was posterorized, incomplete imprecise velar constriction (PC³1mm) and lesser tongue body grooving with hypernasality (VA³1mm in PD). Pharyngeal, laryngeal shaping revealed vocal-cord vibration. Acoustic analysis show more intensity or frication (Table1). Rate of speech production calculated was 4-5 sps i.e. equal in PD and healthy controls similar to spectrogram analysis. Comparing BOLD activation of cortical areas it was observed in supplementary motor area (SMA), bilateral inferior frontal gyrus (IFG) and sensory motor areas indicate a greater effort to initiate speech in PD. The classification of PD based on acoustic analysis by SVM had significance of 6.8. Corroborating the information from all the three aspects dynamic MRI, fMRI and automated computation of acoustic parameters adjunct the clinical examination. It can be attributed from this pilot study that dynamic MRI, BOLD activation and automated computation of acoustic feature increased the efficacy and precision of clinical diagnosis

Conclusion

In speech neuroscience diagnosis augmented with dynamic MRI, fMRI and automated acoustic analysis would be promising clinical future

Acknowledgements

No acknowledgement found.

References

1. Bresch E, Kim YC, Nayak K, Byrd D, Narayanan S. Seeing speech: capturing vocal tract shaping using real-time magnetic resonance imaging. IEEE Signal Processing Magazine 25(3):123-132. doi: 10.1109/MSP.2008.918034

2. Kim YC, Proctor MI, Narayanan SS, Nayak KS. Improved imaging of lingual articulation using real-time multislice MRI. J Magn Reson Imaging, 2012 35(4):943-8. doi: 10.1002/jmri.23510.

3. Strohl KP, Butler JP, Malhotra A. Mechanical properties of the upper airway. Compr Physiol. 2012, 2(3):1853-72. doi: 10.1002/cphy.c110053.

4. Toutios A, Narayanan SS. Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research. APSIPA Trans Signal Inf Process. 2016, 5. doi:10.1017/ATSIP.2016.5.

5. Ventura SM, Freitas DR, Tavares JM. Toward dynamic magnetic resonance imaging of the vocal tract during speech production. J Voice, 2011 25(4):511-8. doi:10.1016/j.jvoice.2010.01.014.

6. Vorperian HK, Kurtzweil SL, Fourakis M, Kent RD, Tillman KK, Austin D. Effect of body position on vocal tract acoustics: Acoustic pharyngometry and vowel formants. J Acoust Soc Am. 2015, 138(2):833-45. doi: 10.1121/1.4926563.

7. Vos RR, Murphy DT, Howard DM, Daffern H. Determining the relevant criteria for three-dimensional vocal tract characterization. J Voice, 2017. pii: S0892-1997(16)30212-0. doi: 10.1016/j.jvoice.2017.04.001. [Epub ahead of print]

Figures

Vocal tract images, Acoustic analysis and BOLD images during speech production [A: representative Mid-sagittal images to calculate the parameters; B: Spectrogam of syllables ‘ka’ and ‘cha’ in same subject; C: BOLD activation images during speech production overlaid on MPRAGE images (uncorrected p< 0.001, cluster threshold 5)]

Comparing parameters in healthy control and Parkinson’s Disease (PD) subjects [TTCL -Tongue tip contact location (mm); Distance of groove from lip aperture (DGLL) (mm); grooving (distance of groove base from tongue-tip); precision of contact is the distance between tongue-tip/ dorsum from palate (PC); velic aperture (VA for nasality)]

Comparing healthy control and Parkinson’s Disease (PD) subjects [Vocal tract images 1: normal healthy control and 2 PD subject. Images represented as (A): baseline (neutral position) and CV production (B) velar stop ‘ka’ contact; (C): ‘ka’ release; (D) alveolar affricate ‘dza’ contact, (E) ‘dza’ release and (F) spectrogram of syllable ‘cha’ in same subject]

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)
3632