Project

General

Profile

DICOM file understanding » History » Version 1

Peter Qian, 01/04/2021 09:23 AM

1 1 Peter Qian
h1. DICOM file understanding *Updated 2020*
2
 
3
_Note: This document heavily references the DICOM Standard maintained and published by NEMA._
4
5
_OnkoDICOM in its version 1.0 state is a DICOM file viewer. It gives medical practitioners the ability to view medical images, and identify dose rates, dose plans, and dose structures for any given patient. Previously identified Regions of Interest (ROIs) can also be viewed through the program. OnkoDICOM can also produce DVH, Radiomics and Clinical Data CSV files for clinical research or treatment purposes._
6
7
8
h2. +*Glossary of terms*+ 
9
10
+DICOM+  - Digital Imaging and Communications in Medicine ( *DICOM* ) format for medical images
11
12
+X-Ray+ - a photographic or digital image of the internal composition of something, especially a part of the body, produced by X-rays being passed through it and being absorbed to different degrees by different materials.
13
14
+Ultrasound+ - Medical ultrasound is a diagnostic imaging technique, or therapeutic application of ultrasound. It is used to create an image of internal body structures such as tendons, muscles, joints, blood vessels, and internal organs
15
16
+Computed Tomography+ - computed tomography scan is a medical imaging procedure that uses computer-processed combinations of many X-ray measurements taken from different angles to produce cross-sectional images of specific areas of a scanned object, allowing the user to see inside the object without cutting
17
18
+Magnetic Resonance Imaging+ - Magnetic resonance imaging is a medical imaging technique used in radiology to form pictures of the anatomy and the physiological processes of the body. MRI scanners use strong magnetic fields, magnetic field gradients, and radio waves to generate images of the organs in the body
19
20
+Positron Emission Tomography+ - Positron emission tomography is an imaging technique that uses radioactive substances to visualize and measure metabolic processes in the body
21
22
+Service Object Pair+ - A class defines a certain functionality consisting of a combination of a Service and an Object, which make up the Pair – e.g. CT Image Storage is an example of an SOP Class
23
24
+Information Object Definition+ - an object-oriented abstract data model used to specify information about Real-World Objects. An IOD provides communicating Application Entities with a common view of the information to be exchanged.
25
26
+Medicolegal+ - something that involves both medical and legal aspects, mainly: Medical jurisprudence, a branch of medicine.
27
28
+Anonymisation+ - a type of information sanitization whose intent is privacy protection. It is the process of either encrypting or removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous
29
30
31
32
h2. *+DICOM defined+*
33
34
A *DICOM*  file is an image saved in the Digital Imaging and Communications in Medicine ( *DICOM* ) format. It contains an image from a medical scan, such as an X-ray (XR), Ultrasound (US), Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET). *DICOM files* may also include identification data for patients so that the image is linked to a specific individual.
35
36
It is a standard for *handling* , *storing* , *printing* , and *transmitting information* in medical imaging and is typically saved in *.DCM* or *.DCM30(DICOM 3.0)* file format. This file format provides a means to encapsulate in a file the Data Set representing a SOP (Service Object Pair) Instance* related to a DICOM IOD (Information Object Definition). 
37
38
In addition to the DICOM format, the radiologist routinely encounters images of several file formats such as JPEG, TIFF, GIF, and PNG. Each format has its own unique advantages and disadvantages, which must be taken into consideration when images are archived, used in teaching files, or submitted for publication. The disadvantages of these file types include file size being quite large, and special software being required for viewing on personal computers.
39
40
41
h2. +*DICOM file structure*+
42
43
A DICOM file consists of a *header*  and *image data sets* packed into a single file. The information within the header is organized as a constant and standardized series of tags. By extracting data from these tags one can access important information regarding the patient demographics, study parameters, etc. In the interest of patient confidentiality, all information that can be used to identify the patient should be removed before DICOM images are transmitted over a network for educational or other purposes.
44
45
46
The *header data information is encoded within the DICOM file* so that it *cannot*  be accidentally *separated*  from the image data. If the header is *separated*  from the image data, the computer will not know which imaging study has been done or to whom it belongs and it will *not be able to display the image correctly* , leading to a potential *medicolegal situation*. 
47
48
The information within the *header*  is organized as a *constant*  and *standardized series of tags*. These tags are organized into groups of *data elements* . For example, the group “0010” contains patient information and is 92 bits in length. It contains the patient's name in the tag “0010–0010,” the patient's identification number in the tag “0010–0020,” birth date in the tag “0010–0030,” and so on. 
49
50
51
Figure 1 depicts is an example of DICOM tags extracted from an image:
52
53
!fig1.jpg! 
54
55
Figure 1
56
57
58
59
Whereas the structure of a can be visualised in figure 2
60
61
!fig2.jpg!
62
63
Figure 2
64
65
66
67
h2. +*Removing Patient Information from DICOM file*+
68
69
For security reasons, we want to remove confidential information before sending a DICOM file over the web.
70
71
The common tags that indicate the patient identity include the patient's name, age, sex, birth date, hospital identity number, ethnic group, occupation, referring physician, institution name, study date, and DICOM Unique Identifiers (UIDs). 
72
73
A simple and easy method of ensuring this is by *converting*  and *exporting*  the *DICOM*  file into *other image formats* such as *JPEG*  or *TIFF* . The *header information is lost* and patient identity cannot be obtained from the resultant image. 
74
75
Another method is " *anonymisation* ", whereby all patient information is removed from the *DICOM header* . This can be achieved by software tools. Specifically, all tags contained in groups *“0008” (study information)* and *“0010” (patient information)* of the DICOM header should be *removed*  and *replaced*  during *anonymisation* .
76
77
Modality (0008, 0060) attribute contains what the actual type of dicom it is
78
It will say "RTDOSE", "RTSTRUCT", "CT", etc. as a plain string.
79
80
81
82
h2. +*DICOM Attributes*+
83
84
A DICOM *attribute*  or *data*  element is *composed*  of:
85
86
A *tag*  -  A DICOM file consists of a sequence of Data Elements (the Data Set) which describe instances of real world information. Data Elements are uniquely identified by a tag consisting of two parts: the Group Number and the Element Number. Although similar or related Data Elements often have the same Group Number; a Data Group does not convey any semantic meaning. Some Data Elements may occur more than once in a DICOM's Data Set. A tag is represented by two 16-bit unsigned integers representing the Group Number followed by Element Number, for example the Data Element Modality is represented by the tag (0008,0060).
87
88
A *Value Representation (VR)*  - Each Data Element has a Value Representation (VR) which describes the data type and format of the Data Element's values. A VR determines the length of the Data Element's value and which characters are permitted in the value. VRs are encoded with two uppercase letters from the DICOM default character set (i.e. A - Z). A list of all VRs and their specifications can be found in the DICOM Standard PS3.5 Section 6.2f[1].
89
90
A *Transfer Syntax* - Every DICOM file has a Transfer Syntax which communicates how the subsequent data is encoded. The default Transfer Syntax provided by the Standard is DICOM Implicit VR Little Endian Transfer Syntax.
91
The first attribute of this Transfer Syntax is Implicit VR, which means that in each Data Element outside the File Meta Information header (more on this later), it is not necessary to declare the VR as every tag already has an implicit VR defined in the Standard. In an Explicit Transfer Syntax, each Data Element's VR would have to be declared within each Data Element.
92
93
The second attribute of the Transfer Syntax is Little Endian. Endianness describes the order in which bytes are interpreted. DICOM Standard PS3.5 Section 7.3[2] defines Little Endian Byte Ordering as follows:
94
95
* In a binary number consisting of multiple bytes (e.g., a 32-bit unsigned integer value, the Group Number, the Element Number, etc.), the least significant byte shall be encoded first; with the remaining bytes encoded in increasing order of significance.
96
* In a character string consisting of multiple 8-bit single byte codes, the characters will be encoded in the order of occurrence in the string (left to right).
97
98
Using a Data Element Tag for Modality as an example, we can see an example of how Little Endian Byte Ordering works. Consider the bytes represented in hexadecimal:
99
08 00 60 00
100
101
Let's say we already know that these four bytes represent the tag. We also know (as was established above in 1.1) that the Data Element's tag consists of two 16-bit unsigned integers. We also know from our knowledge of mathematics that a pair of hexadecimal numbers represent a byte (8 bits). From these points we can deduce that the above represents 4 bytes, which means there are two 16-bit numbers present: 08 00 and 60 00. Little Endian Byte Ordering dictates that these bytes were encoded with the least significant byte first. Essentially this means we are reading each of these bytes right to left. This results in the bytes being interpreted as 0x0008 and 0x0060. This quite clearly corresponds to the Group Number and Element Number (0008, 0060), which represents the DICOM's Modality.
102
103
Big Endian is also defined in the DICOM Standard, however the only Big Endian Transfer Syntax defined has been retired by the Standard.
104
105
A *Value length*  and *Value field*  - With the information above we can begin to look at the raw binary data of an encoded DICOM Data Set. A Data Element can be easily encoded and decoded once the rules of how they are structured are understood. Let's take a look at this example:
106
08 00 60 00 08 00 00 00 52 54 53 54 52 55 43 54
107
108
First there is the tag (which we have seen an example of in 1.3) which consists of 2 bytes representing the Group Number followed by 2 bytes representing the Element Number. This decodes to (0008, 0060).
109
110
Decoding the next set of bytes depends on the Transfer Syntax established for this particular DICOM File. In this instance we are using the default, DICOM Implicit VR Little Endian Transfer Syntax. This means that the next 4 bytes are dedicated to declaring the Value Length (32-bit unsigned integer) which is a number stating how long the Data Element's Value is in bytes. The example above decodes to 8 bytes.
111
112
In the case that we were using an Explicit VR Transfer Syntax, the next 2 bytes would be for declaring the VR, and the 2 bytes after that would represent Value Length (as such, this only allows for the length to be a 16-bit unsigned integer).
113
114
_Note that a Value Length must be an even number._
115
116
Following the 8 bytes representing VR and/or Value Length is the Data Element's value itself, called the Value Field. The values in these bytes is determined by the Value Length and the Data Element's VR. Note that as Value Length must always be an even number, so must the Value Field have an even amount of bytes. In the case of falling short of an even number, the Value Field with be padded at the end with a null value 0x00, or in the case of some VRs which require 0x20 which is the ASCII 'Space' character. The ASCII characters in these 8 bytes decode as RTSTRUCT.
117
118
To break the example down more concisely:
119
First 2 bytes: Group Tag (0x0008)
120
Next 2 bytes: Element Tag (0x0060)
121
Next 2 bytes: Value Length (0x00000008)
122
Next 8 bytes: Value Field (RTSTRUCT)
123
And in plain language:
124
Tag: (0008, 0060), Modality
125
Length: 8 bytes
126
Value: RTSTRUCT
127
128
The basic attribute structure is shown in the following figure:
129
130
!fig3.jpg!
131
132
Figure 3
133
134
135
136
137
138
139
140
To see all tags - https://www.dicomlibrary.com/dicom/dicom-tags/
141
142
143
To see all Attribute names, Value Representations and Tags use this link:
144
https://northstar-www.dartmouth.edu/doc/idl/html_6.2/DICOM_Attributes.html 
145
146
147
To see all the *Value Representations* explained: https://northstar-www.dartmouth.edu/doc/idl/html_6.2/Value_Representations.html#wp1023393
148
149
150
h2. +*DICOM Data Format Layer*+
151
152
The DICOM Data Format Layer includes the following elements of specification:
153
154
* *DICOM Media Storage SOP Classes* and associated IODs;
155
* The *DICOM File Format* ;
156
* The *Secure DICOM File Format* ;
157
* The *DICOM Media Storage Directory* SOP Class;
158
* DICOM Media Storage *Application Profiles* ;
159
* DICOM *Security Profiles* for Media Storage.
160
161
h2. 
162
+*RT DOSE defined*+
163
164
The RT Dose Module[3] is used to convey 2D or 3D radiation dose data generated from treatment planning systems or similar devices. 
165
166
167
h2. +*RT PLAN defined*+
168
169
RT PLAN is to address the requirements for transfer of treatment plans generated by manual entry, a virtual simulation system, or a treatment planning system before or during a course of treatment.
170
171
172
h2. +*RT STRUCT defined*+
173
174
The “radiotherapy structure set” (RTSTRUCT) object of the DICOM standard is used for the transfer of patient structures and related data, between the devices found within and outside the radiotherapy department. It contains mainly the information for regions of interest (ROIs) and points of interest (e.g., dose reference points). 
175
176
177
h2. +*DICOM Security Considerations*+
178
179
The DICOM File Format has a potential security vulnerability when the 128-byte File Preamble contains malicious executable content. Such malicious executable content may also refer to other malicious content in the file hidden within Data Elements of the File Meta Information or the Data Set.
180
181
* *Sanitize*  the *preamble* , such as by:
182
* * Verifying that the preamble is:
183
* * * *all zeroes* , or
184
* * * *begins*  with a *valid magic number* for recognized dual format content (e.g., TIFF or BigTIFF), or
185
* * * contains other known *safe content* .
186
* * *Clearing*  the *preamble*  regardless of its content (may not work without)
187
* * Testing explicitly for * executable preamble contents* .
188
* *Test explicitly* for *executable content anywhere* within the *DICOM*  File
189
* *Validate*  that the DICOM *values, structures and content* comply with the *standard encoding rules* and the IOD of the specified SOP Class, including Private Data Elements.
190
* *Validate*  that the contents are of the * appropriate SOP Classes* .
191
* *Validate*  that DICOM File Format files created for *HTTP requests* and responses do *not contain such malicious content* .
192
193
194
h2. +*DICOM Image Display*+
195
196
To promote identical grayscale image display on different monitors and consistent hard-copy images from various printers, the DICOM committee developed a lookup table to display digitally assigned pixel values. To use the *DICOM grayscale standard display function (GSDF)* , images must be viewed (or printed) on devices that have this lookup curve or on devices that have been calibrated to the GSDF curve.
197
198
199
h2. +*Value Multiplicity*+
200
201
In addition to a Value Representation, each attribute also has a Value Multiplicity to indicate the number of data elements contained in the attribute. For character string value representations, if more than one data element is being encoded, the successive data elements are separated by the backslash character “\”.
202
203
204
*References:*
205
[1] - http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_6.2.html
206
[2] - http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_7.3.html
207
[3] - http://dicom.nema.org/medical/dicom/2016e/output/chtml/part03/sect_C.8.8.3.html
208
209
*Sources:* 
210
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3354356/
211
http://dicom.nema.org/medical/dicom/current/output/chtml/part10/chapter_7.html
212
https://www.dicomstandard.org/concepts/
213
214
*
215
Already existing Dicom Viewers:*
216
dicomlibrary.com/meddream 
217
https://beta.jackimaging.com/demo
218
https://www.radiantviewer.com/
219
https://viewmyscans.com/
220
GIMP 
221
https://www.microdicom.com/
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
h1. ------------- *Information from 2019* -------------
261
262
h2. The DICOM file structure
263
264
*AE title* (Apply Entity Title)
265
266
*DIMSE* (DICOM message Service Element): Patient and Network
267
268
*ER* (Entity Relationship)
269
270
*IOD* (Information Object Descriptor): Two main modules (patient and study) and other modules (series, equipment, and images).
271
!iod.jpg!
272
273
*SOP* (Service Object Pair): It is defined by the union of an Information Object Definition (IOD) and a DICOM Service Elements (DIMSE). It contains the rules and semantics which may restrict the use of the services in the DIMSE Service Group or the Attributes of the IOD.
274
275
*UID* (Unique ID): Unique identifier.
276
277
*VR* (Value Representation): It describes the data type and format of the attribute value. 
278
279
*SCU / SCP* (Service Class User/ Service Class Provider): A Service Class Specification defines a group of one or more SOP Classes related to a specific function that is to be accomplished​by communicating Application Entities.
280
281
h2. Dicompyler Structures :
282
283
!Volumes.jpg!
284
285
*GTVp* (gross tumour volume- Primary)
286
*CTVp* (Clinical target volume)
287
*ITV* (internal target volume)
288
289
290
291
From https://www.dicomlibrary.com/ and http://dicom.nema.org/medical/dicom/current/output/pdf/part03.pdf.