Steven Sourbron1, Joao Almeida e Sousa1, Alexander Daniel2, Charlotte Buchanan2, Ebony Gunwhy1, Eve Lennie1, Kevin Teh1, Steve Shillitoe1, David Morris3, Andrew Priest4, David Thomas5, and Susan Francis2
1University of Sheffield, Sheffield, United Kingdom, 2University of Nottingham, Nottingham, United Kingdom, 3University of Edinburgh, Edinburgh, Scotland, 4University of Cambridge, Cambridge, United Kingdom, 5University College London, London, United Kingdom
Synopsis
Keywords: Software Tools, Data Processing
DICOM is the universally recognized
standard for medical imaging, but reading and writing from DICOM databases
remains a challenging task for most data scientists. The
dbdicom package was
developed to provide an intuitive programming interface for reading and writing
data from entire DICOM databases, and replaces confusing DICOM-native concepts
by language and notations that will be more familiar to data scientists working
in python. The package is available under an open license but is in the early
stages of development and currently rolled out in 3 multi-vendor, multi-center
studies.
INTRODUCTION
DICOM is the universally recognized standard for medical imaging, but reading and writing DICOM data remains a challenging task for most data scientists. A typical image processing pipeline might use the excellent python package pydicom for extracting image arrays and any required header information from DICOM data, but will then write out the results in more manageable format such as NifTI. In the process the majority of header information will have to be discarded, which forms a major barrier to evaluation or deployment of these processing methods in a real-world context.
dbdicom aims to provide an intuitive programming interface for reading and writing data from entire DICOM databases, and replaces confusing DICOM-native concepts by language and notations that will be more familiar to data scientists. dbdicom wraps around and extends pydicom, which is limited to reading and writing individual files and requires a deeper understanding of DICOM to ensure compliance with the standard.METHODS
dbdicom was developed by the UK Renal Imaging Network (UKRIN) for the use case of the AFiRM study [1], a multi-centre, multi-vendor clinical trial collecting longitudinal, multiparametric and quantitative MRI studies in 500 patients with kidney disease. AFiRM uses dbdicom to read DICOM data stored on a dedicated XNAT platform, and save results for regions-of-interest and calculated parameters in DICOM to be uploaded back onto XNAT.
dbdicom was developed in an agile manner through integration in healthy volunteer and clinical pilot studies and continues to be shaped as the pipelines are deployed into production. Initial validation was performed by converting calculated DICOM images into NIfTI format and comparing the results against those derived from NIfTI data via the open-source package UKAT (UKRIN Kidney Analysis Toolbox) [2].RESULTS
The dbdicom source code is freely available under an open Apache 2.0 license [3]. The package can also be installed directly via the python package index [4] through the command `pip install dbdicom`, after which it can be imported in any python script via `import dbdicom`. The README file lists the most important usage currently available, while more structured documentation is under development. Currently, `dbdicom` unit tests cover 71% of the code.
dbdicom has powerful functionality for reading and browsing DICOM databases. A folder `MyData` containing any number of DICOM files can be opened via `dcm = dbdicom.database(MyData)`, and subsequently printed using `dcm.print()`. A list of patients, studies, series or images in the folder can be retrieved via commands such as `dcm.series()`, and the search can be narrowed down to images with any given header value. For instance, `dcm.studies(ReferringPhysician = ’Dr No’, StudyDate = ’20190101’)` will only return studies performed on January 1 of 2019 on patients referred by Dr No.
Beyond reading, dbdicom also has functionality for modifying DICOM databases in a way that is similar to handling files in folders. For instance, `localizer.copy_to(follow_up)` will copy the DICOM series `localizer` to a DICOM study called `follow_up`. DICOM objects can be moved in a similar way by calling `move_to()`, copied by calling `copy()`, exported by `export_as_dicom(another_database)`, or imported by `import_dicom(files)`. dbdicom will ensure that all required DICOM header fields are set correctly. Export of entire patients, studies or series to other formats is similar, e.g. `export_to_nifti(path)`, `export_to_csv(path)’ or `export_to_png(path)`. New (empty) DICOM objects can be created in a similar intuitive manner; for instance, to create a new series called `synthetic` in a study called `follow_up`, call `follow_up.new_series(SeriesDescription=`synthetic`)`.
Reading and writing DICOM attributes works as in pydicom, but dbdicom can also perform these operations on entire series, studies, patients, or even the complete database. To get a list of all patient names in a database, simply call `database.PatientName`. Equally, to list all series in a study call `study.SeriesDescription`. Any valid DICOM keyword can be retrieved in this way. The values can be set in the same way, for instance to anonymise the patient name of all patients in a study, call `study.PatientName = ‘Anonymous’`. An import feature is that all changes made to the database are temporary and reversible, until the user either saves changes or restores the last saved state.
dbdicom also has dedicated functionality for reading and writing DICOM series. For instance `series.array([‘SliceLocation’,‘FlipAngle’])` will return a 5-dimensional array sorted by slice location, flip angle and image number. For writing, `series.set_array(arr)` will write the numpy array `arr` into an existing DICOM series. There also exists a numpy-like interface to create DICOM series from scratch, for instance `dbdicom.zeros((10, 128, 128), dtype=’mri’)` will create a fully DICOM compliant series with blank images, saved in the DICOM format `MRImage`.CONCLUSION
dbdicom is a available as an open-source, but is work in progress and has not yet been applied widely. Beyond
the AFiRM study, dbdicom is currently rolled out in other studies including
multi-vendor studies iBEAt [5] and TRISTAN [6]. Images of any DICOM type can be read – including multiframe – but a major current limitation is that only images
of type MRImage are written. However, the code structure is designed in a modular
manner to allow extension to other classes in the future, and the functionality
will grow alongside the needs of the projects where it is applied. Acknowledgements
dbdicom development is funded by the Medical Research Council, project reference MR/R02264X/1 - UK Renal Imaging Network (UKRIN): Enabling clinical translation of functional MRI for kidney disease.References
[1] https://www.uhdb.nhs.uk/afirm-study/
[2] https://github.com/UKRIN-MAPS/ukat
[3] https://github.com/QIB-Sheffield/dbdicom
[4] https://pypi.org/project/dbdicom
[5] https://bmcnephrol.biomedcentral.com/articles/10.1186/s12882-020-01901-x
[6] https://www.imi-tristan.eu