Convolutional Neural Networks for Medical Imaging
Deep learning is a fast evolving field that has a lot of implications in medical imaging.
Currently medical images are interpreted by radiologists. But this interpretation gets very subjective. After years of looking at ultrasound images, my co-workers and I still get into arguments about whether we are actually seeing a tumor in a scan. Radiologists often have to look through large volumes of these images that can cause fatigue and lead to mistakes. So there is a need for automation.
Machine learning algorithms such as support vector machines are often used to detect and classify tumors. But they are often limited by the assumptions we make when we define features. This results in reduced sensitivity. However, deep learning could be an ideal solution because these algorithms are able to learn features from raw image data.
One major challenge in applying deep learning models such as a computational neural network (CNN) is the vast amount of computational resources it entails. It would be nice to have large hard-drive space, a faster processor and a fancy NVIDIA GPU. I have none of that. I did have the foresight to upgrade my laptop to 16 GB RAM some time during grad school. But it is entirely possible to train a CNN to do a simple classification task in a reasonable amount of time. For more complicated tasks, you can easily use a pre-trained models trained on ImageNet database etc.
In this post, I attempted to build a CNN model, train it and have it solve a classification task like detecting lung nodules. I used the data from the Lung Image Database Consortium and Infectious Disease Research Institute (LIDC/IDRI) data base. As the database was larger (124 GB) than the space available on my laptop, I ended up using reformatted version available for LUNA16. This dataset consisted of 888 CT scans with annotations describing coordinates and ground truth labels. First step was to create a image database for training.
Creating an image database
The images were formatted as .mhd and .raw files. The header data was contained in .mhd files and multidimensional image data was stored in .raw files. I used SimpleITK library to read the .mhd files. Each CT scan consisted of n 512 x 512 axial scans. There are about 200 images in each CT scan.
There were a total of 551065 annotations. Of all the annotations provided, 1351 were labeled as nodules, rest were labeled negative. So the class imbalance is huge. The easy way to deal with it is to under sample the majority class and augment the minority class through rotating images.
We could potentially train the CNN on all the pixels, but that would increase the computational cost and training time. So instead I just decided to crop the images around the coordinates provided in the annotations. The annotations were provided in Cartesian coordinates. So they had to be converted to voxel coordinates. Also the image intensity was defined in Hounsfield scale. So it had to be rescaled for image processing purposes.
The script below would generate 50 x 50 grayscale images for training, testing and validating a CNN.
While the script above under-sampled the negative class such that every 1 in 6 images had a nodule. The data set is still vastly imbalanced for training. I decided to augment my training set by rotating images. The script below does just that.
So for an original image, my script would create these two images:
|original image||90 degree rotation||180 degree rotation|
Augmentation resulted in a 80-20 class distribution, which was not entirely ideal. But I also did not want to augment the minority class too much because it might result in a minority class with little variation.
Building a CNN model
Now we are ready to build a CNN model. After dabbling a bit with tensorflow, I decided it was way too much work for something incredibly simple. I decided to use tflearn. Tflearn is a high-level API wrapper around tensorflow. It made coding a lot more palatable. The approach I used was similar to this. I used 3 convolutional layers in my architecture. You will find the schematic of my CNN architecture below.
My CNN model is defined in a class as shown in the script below.
Training the model
Because the data required to train a CNN is large, it is often desirable to train the model in batches. Loading all the training data into memory is not always possible because of hardware limitations. I had a total of 6878 images in my training set and I was working out of a 2012 Macbook Pro. So I decided to load all the images into a hdf5 dataset using h5py library. You can find the script I used to do that here.
Once I had the training data in a hdf5 dataset, I trained the model using this script.
The training took a couple of hours on my laptop. Like any engineer, I wanted to see what goes on under the hood. Here is a visual of the weights in the first convolutional layer:
As expected these images of 5x5 convolutional filters won’t tell you much. But the feature maps produced after convolutions might.
So if I pass through this image
through the first convolutional layer containing 32 5 x 5 filters, it generates a set of feature maps that looks like this:
While it is still difficult to see what these filters are doing, it is apparent that some seem to pick up edges, some prioritize solid shapes and some seem to isolate the nodule from other structures present in the lungs. The max pooling layer following the first layer down-sampled the feature maps by 2. So when the down-sampled feature map is passed into the second convolutional layer of 64 5x5 filters, you will find the resulting set of feature maps below:
The features generated in the second layer seem to be doing a better job of isolating nodules.
The feature maps generated by the third convolutional layer containing 64 3x3 filters:
The feature maps generated by the last convolutional layer seems to isolate the nodules entirely.
How accurate are my predictions?
I tested my CNN model on 1623 images. Here is the confusion matrix.
I had an validation accuracy of 93 %. My model has a precision of 89.3 % and recall of 71.2 %. The model has a specificity of 98.2 %. So what do these numbers mean?
Accuracy alone won’t tell you much. For relaying a diagnostic result, it is important to know with what certainty a positive result is actually positive.
Precision or positive predictive value tells us
I looked deeper into the sort of predictions: False Negative Predictions: False Positive Predictions: True Negative Predictions: True Positive Predictions:
Not enough labeled data
One challenge in implementing these algorithms is the scarcity of labeled medical image data. While this is a limitation for all applications of deep learning, it is more so for medical image data because of patient confidentiality concerns.