We have created a database for the storage of DICOM data using MongoDB. Data is stored in chunks using gridfs and accessible via queries on any field of the DICOM header, as well as other pre-defined metadata fields. When data is imported into the database, a custom Python plugin extracts metadata from the DICOM header for each file, which is used to populate the database. In addition to the UID values stored in the DICOM header, an MD5 checksum is computed for each file and referenced in the database to ensure that data is not duplicated. The metadata is tracked using a local instance of MongoDB Charts.
Data is replicated across two NVMe servers connected with 100 Gbps ethernet for fast replication. The two servers are configured as a MongoDB replica set with an arbiter running on a third server that is responsible for promoting either of the data-bearing nodes to be the primary access point based on usage and availability. The database is accessible via a custom PyTorch connector that connects the database to the 100 Gbps network for access by a HPC cluster for deep learning applications and via a web interface on the local network used to pull or upload data.
The database is being utilized for the storage of SPECT Medical Imaging data. The database improves our ability to catalog collaboration data, identify data that is relevant for deep learning studies, and to perform data mining on the existing dataset.