DICOM file understanding Updated 2020¶
Note: This document heavily references the DICOM Standard maintained and published by NEMA.
OnkoDICOM in its version 1.0 state is a DICOM file viewer. It gives medical practitioners the ability to view medical images, and identify dose rates, dose plans, and dose structures for any given patient. Previously identified Regions of Interest (ROIs) can also be viewed through the program. OnkoDICOM can also produce DVH, Radiomics and Clinical Data CSV files for clinical research or treatment purposes.
Glossary of terms¶
DICOM - Digital Imaging and Communications in Medicine ( DICOM ) format for medical images
X-Ray - a photographic or digital image of the internal composition of something, especially a part of the body, produced by X-rays being passed through it and being absorbed to different degrees by different materials.
Ultrasound - Medical ultrasound is a diagnostic imaging technique, or therapeutic application of ultrasound. It is used to create an image of internal body structures such as tendons, muscles, joints, blood vessels, and internal organs
Computed Tomography - computed tomography scan is a medical imaging procedure that uses computer-processed combinations of many X-ray measurements taken from different angles to produce cross-sectional images of specific areas of a scanned object, allowing the user to see inside the object without cutting
Magnetic Resonance Imaging - Magnetic resonance imaging is a medical imaging technique used in radiology to form pictures of the anatomy and the physiological processes of the body. MRI scanners use strong magnetic fields, magnetic field gradients, and radio waves to generate images of the organs in the body
Positron Emission Tomography - Positron emission tomography is an imaging technique that uses radioactive substances to visualize and measure metabolic processes in the body
Service Object Pair - A class defines a certain functionality consisting of a combination of a Service and an Object, which make up the Pair – e.g. CT Image Storage is an example of an SOP Class
Information Object Definition - an object-oriented abstract data model used to specify information about Real-World Objects. An IOD provides communicating Application Entities with a common view of the information to be exchanged.
Medicolegal - something that involves both medical and legal aspects, mainly: Medical jurisprudence, a branch of medicine.
Anonymisation - a type of information sanitization whose intent is privacy protection. It is the process of either encrypting or removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous
DICOM defined¶
A DICOM file is an image saved in the Digital Imaging and Communications in Medicine ( DICOM ) format. It contains an image from a medical scan, such as an X-ray (XR), Ultrasound (US), Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET). DICOM files may also include identification data for patients so that the image is linked to a specific individual.
It is a standard for handling , storing , printing , and transmitting information in medical imaging and is typically saved in .DCM or .DCM30 file format. This file format provides a means to encapsulate in a file the Data Set representing a SOP (Service Object Pair) Instance* related to a DICOM IOD (Information Object Definition).
In addition to the DICOM format, the radiologist routinely encounters images of several file formats such as JPEG, TIFF, GIF, and PNG. Each format has its own unique advantages and disadvantages, which must be taken into consideration when images are archived, used in teaching files, or submitted for publication. The disadvantages of these file types include file size being quite large, and special software being required for viewing on personal computers.
DICOM file structure¶
A DICOM file consists of a header and image data sets packed into a single file. The information within the header is organized as a constant and standardized series of tags. By extracting data from these tags one can access important information regarding the patient demographics, study parameters, etc. In the interest of patient confidentiality, all information that can be used to identify the patient should be removed before DICOM images are transmitted over a network for educational or other purposes.
The header data information is encoded within the DICOM file so that it cannot be accidentally separated from the image data. If the header is separated from the image data, the computer will not know which imaging study has been done or to whom it belongs and it will not be able to display the image correctly , leading to a potential medicolegal situation.
The information within the header is organized as a constant and standardized series of tags. These tags are organized into groups of data elements . For example, the group “0010” contains patient information and is 92 bits in length. It contains the patient's name in the tag “0010–0010,” the patient's identification number in the tag “0010–0020,” birth date in the tag “0010–0030,” and so on.
Figure 1 depicts is an example of DICOM tags extracted from an image:
Figure 1
Whereas the structure of a can be visualised in figure 2
Figure 2
Removing Patient Information from DICOM file¶
For security reasons, we want to remove confidential information before sending a DICOM file over the web.
The common tags that indicate the patient identity include the patient's name, age, sex, birth date, hospital identity number, ethnic group, occupation, referring physician, institution name, study date, and DICOM Unique Identifiers (UIDs).
A simple and easy method of ensuring this is by converting and exporting the DICOM file into other image formats such as JPEG or TIFF . The header information is lost and patient identity cannot be obtained from the resultant image.
Another method is " anonymisation ", whereby all patient information is removed from the DICOM header . This can be achieved by software tools. Specifically, all tags contained in groups “0008” (study information) and “0010” (patient information) of the DICOM header should be removed and replaced during anonymisation .
Modality (0008, 0060) attribute contains what the actual type of dicom it is
It will say "RTDOSE", "RTSTRUCT", "CT", etc. as a plain string.
DICOM Attributes¶
A DICOM attribute or data element is composed of:
A tag - A DICOM file consists of a sequence of Data Elements (the Data Set) which describe instances of real world information. Data Elements are uniquely identified by a tag consisting of two parts: the Group Number and the Element Number. Although similar or related Data Elements often have the same Group Number; a Data Group does not convey any semantic meaning. Some Data Elements may occur more than once in a DICOM's Data Set. A tag is represented by two 16-bit unsigned integers representing the Group Number followed by Element Number, for example the Data Element Modality is represented by the tag (0008,0060).
A Value Representation (VR) - Each Data Element has a Value Representation (VR) which describes the data type and format of the Data Element's values. A VR determines the length of the Data Element's value and which characters are permitted in the value. VRs are encoded with two uppercase letters from the DICOM default character set (i.e. A - Z). A list of all VRs and their specifications can be found in the DICOM Standard PS3.5 Section 6.2f1.
A Transfer Syntax - Every DICOM file has a Transfer Syntax which communicates how the subsequent data is encoded. The default Transfer Syntax provided by the Standard is DICOM Implicit VR Little Endian Transfer Syntax.
The first attribute of this Transfer Syntax is Implicit VR, which means that in each Data Element outside the File Meta Information header (more on this later), it is not necessary to declare the VR as every tag already has an implicit VR defined in the Standard. In an Explicit Transfer Syntax, each Data Element's VR would have to be declared within each Data Element.
The second attribute of the Transfer Syntax is Little Endian. Endianness describes the order in which bytes are interpreted. DICOM Standard PS3.5 Section 7.32 defines Little Endian Byte Ordering as follows:
- In a binary number consisting of multiple bytes (e.g., a 32-bit unsigned integer value, the Group Number, the Element Number, etc.), the least significant byte shall be encoded first; with the remaining bytes encoded in increasing order of significance.
- In a character string consisting of multiple 8-bit single byte codes, the characters will be encoded in the order of occurrence in the string (left to right).
Using a Data Element Tag for Modality as an example, we can see an example of how Little Endian Byte Ordering works. Consider the bytes represented in hexadecimal:
08 00 60 00
Let's say we already know that these four bytes represent the tag. We also know (as was established above in 1.1) that the Data Element's tag consists of two 16-bit unsigned integers. We also know from our knowledge of mathematics that a pair of hexadecimal numbers represent a byte (8 bits). From these points we can deduce that the above represents 4 bytes, which means there are two 16-bit numbers present: 08 00 and 60 00. Little Endian Byte Ordering dictates that these bytes were encoded with the least significant byte first. Essentially this means we are reading each of these bytes right to left. This results in the bytes being interpreted as 0x0008 and 0x0060. This quite clearly corresponds to the Group Number and Element Number (0008, 0060), which represents the DICOM's Modality.
Big Endian is also defined in the DICOM Standard, however the only Big Endian Transfer Syntax defined has been retired by the Standard.
A Value length and Value field - With the information above we can begin to look at the raw binary data of an encoded DICOM Data Set. A Data Element can be easily encoded and decoded once the rules of how they are structured are understood. Let's take a look at this example:
08 00 60 00 08 00 00 00 52 54 53 54 52 55 43 54
First there is the tag (which we have seen an example of in 1.3) which consists of 2 bytes representing the Group Number followed by 2 bytes representing the Element Number. This decodes to (0008, 0060).
Decoding the next set of bytes depends on the Transfer Syntax established for this particular DICOM File. In this instance we are using the default, DICOM Implicit VR Little Endian Transfer Syntax. This means that the next 4 bytes are dedicated to declaring the Value Length (32-bit unsigned integer) which is a number stating how long the Data Element's Value is in bytes. The example above decodes to 8 bytes.
In the case that we were using an Explicit VR Transfer Syntax, the next 2 bytes would be for declaring the VR, and the 2 bytes after that would represent Value Length (as such, this only allows for the length to be a 16-bit unsigned integer).
Note that a Value Length must be an even number.
Following the 8 bytes representing VR and/or Value Length is the Data Element's value itself, called the Value Field. The values in these bytes is determined by the Value Length and the Data Element's VR. Note that as Value Length must always be an even number, so must the Value Field have an even amount of bytes. In the case of falling short of an even number, the Value Field with be padded at the end with a null value 0x00, or in the case of some VRs which require 0x20 which is the ASCII 'Space' character. The ASCII characters in these 8 bytes decode as RTSTRUCT.
To break the example down more concisely:
First 2 bytes: Group Tag (0x0008)
Next 2 bytes: Element Tag (0x0060)
Next 2 bytes: Value Length (0x00000008)
Next 8 bytes: Value Field (RTSTRUCT)
And in plain language:
Tag: (0008, 0060), Modality
Length: 8 bytes
Value: RTSTRUCT
The basic attribute structure is shown in the following figure:
Figure 3
To see all tags - https://www.dicomlibrary.com/dicom/dicom-tags/
To see all Attribute names, Value Representations and Tags use this link:
https://northstar-www.dartmouth.edu/doc/idl/html_6.2/DICOM_Attributes.html
To see all the Value Representations explained: https://northstar-www.dartmouth.edu/doc/idl/html_6.2/Value_Representations.html#wp1023393
DICOM Data Format Layer¶
The DICOM Data Format Layer includes the following elements of specification:
- DICOM Media Storage SOP Classes and associated IODs;
- The DICOM File Format ;
- The Secure DICOM File Format ;
- The DICOM Media Storage Directory SOP Class;
- DICOM Media Storage Application Profiles ;
- DICOM Security Profiles for Media Storage.
RT DOSE defined¶
The RT Dose Module3 is used to convey 2D or 3D radiation dose data generated from treatment planning systems or similar devices.
RT PLAN defined¶
RT PLAN is to address the requirements for transfer of treatment plans generated by manual entry, a virtual simulation system, or a treatment planning system before or during a course of treatment.
RT STRUCT defined¶
The “radiotherapy structure set” (RTSTRUCT) object of the DICOM standard is used for the transfer of patient structures and related data, between the devices found within and outside the radiotherapy department. It contains mainly the information for regions of interest (ROIs) and points of interest (e.g., dose reference points).
DICOM Security Considerations¶
The DICOM File Format has a potential security vulnerability when the 128-byte File Preamble contains malicious executable content. Such malicious executable content may also refer to other malicious content in the file hidden within Data Elements of the File Meta Information or the Data Set.
- Sanitize the preamble , such as by:
- * Verifying that the preamble is:
- * * all zeroes , or
- * * begins with a valid magic number for recognized dual format content (e.g., TIFF or BigTIFF), or
- * * contains other known safe content .
- * Clearing the preamble regardless of its content (may not work without)
- * Testing explicitly for * executable preamble contents* .
- Test explicitly for executable content anywhere within the DICOM File
- Validate that the DICOM values, structures and content comply with the standard encoding rules and the IOD of the specified SOP Class, including Private Data Elements.
- Validate that the contents are of the * appropriate SOP Classes* .
- Validate that DICOM File Format files created for HTTP requests and responses do not contain such malicious content .
DICOM Image Display¶
To promote identical grayscale image display on different monitors and consistent hard-copy images from various printers, the DICOM committee developed a lookup table to display digitally assigned pixel values. To use the DICOM grayscale standard display function (GSDF) , images must be viewed (or printed) on devices that have this lookup curve or on devices that have been calibrated to the GSDF curve.
Value Multiplicity¶
In addition to a Value Representation, each attribute also has a Value Multiplicity to indicate the number of data elements contained in the attribute. For character string value representations, if more than one data element is being encoded, the successive data elements are separated by the backslash character “\”.
References:
[1] - http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_6.2.html
[2] - http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_7.3.html
[3] - http://dicom.nema.org/medical/dicom/2016e/output/chtml/part03/sect_C.8.8.3.html
Sources:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3354356/
http://dicom.nema.org/medical/dicom/current/output/chtml/part10/chapter_7.html
https://www.dicomstandard.org/concepts/
Already existing Dicom Viewers:
dicomlibrary.com/meddream
https://beta.jackimaging.com/demo
https://www.radiantviewer.com/
https://viewmyscans.com/
GIMP
https://www.microdicom.com/
------------- Information from 2019 -------------¶
The DICOM file structure¶
AE title (Apply Entity Title)
DIMSE (DICOM message Service Element): Patient and Network
ER (Entity Relationship)
IOD (Information Object Descriptor): Two main modules (patient and study) and other modules (series, equipment, and images).
SOP (Service Object Pair): It is defined by the union of an Information Object Definition (IOD) and a DICOM Service Elements (DIMSE). It contains the rules and semantics which may restrict the use of the services in the DIMSE Service Group or the Attributes of the IOD.
UID (Unique ID): Unique identifier.
VR (Value Representation): It describes the data type and format of the attribute value.
SCU / SCP (Service Class User/ Service Class Provider): A Service Class Specification defines a group of one or more SOP Classes related to a specific function that is to be accomplishedby communicating Application Entities.
Dicompyler Structures :¶
GTVp (gross tumour volume- Primary)
CTVp (Clinical target volume)
ITV (internal target volume)
From https://www.dicomlibrary.com/ and http://dicom.nema.org/medical/dicom/current/output/pdf/part03.pdf.
Updated by Peter Qian about 4 years ago · 1 revisions