Image Dataset Augmentation – Part One

Grump the tortoise, the subject of our image testing

Introduction and Transformations

Imaging in deep learning for computer vision and classification problems have become a compelling way to impress friends and family, influence strangers and demonstrate the power and adaptability of artificial neural networks and convolutional networks. It can also assist researchers and systems in sorting, segmenting and identifying / classifying data, moving
mundane tasks into the realm of automation and saving graduate students and TAs countless late nights sorting, indexing, checking and eventually spilling coffee all over the entirety of a project.

Image Classification relies on a sample set of images that have been ‘labeled’ with a desired output pictures of cats labeled with ‘cat’, and dogs with ‘dog’. The neural network that has been trained to classify these will learn features that images of cats possess, and will learn other features that identify dog images as dogs. But this requires a large quantity of labeled data; and can have complications what if all cat images have white backgrounds? the network may not even learn to identify features of a cat, but instead will learn that if the image has a white background, it’s a cat.

Using image transformations and some know-how, we can augment an existing dataset, expanding the effective size of the training data pool, and hopefully pushing the network to learn more meaningful features.

Let’s begin by looking at the basics of an imaging package, scikit-image

In [2]:
import numpy as np #numpy represents complex dimensional data as arrays

import matplotlib.pyplot as plt #some standard python plotting

import skimage #our imaging library
In [3]:
grump ='turtle.jpg')

using scikit-image’s io module and imread function, we read in an image as a NumPy array of n-dimensions, filled with numbers that represent the pixel locations and color values.

In [5]:

print('our image, as a numpy array, is shaped', grump.shape)

plt.figure(figsize = (10, 8))
<class 'imageio.core.util.Array'>
our image, as a numpy array, is shaped (667, 1000, 3)

So Scikit-Image’s io.imread function reads in pixel values in the first two dimensions grump.shape[0], grump.shape[1], and RGB color channels in the third dimension, grump.shape[2].

So to represent the image we have a set of three values for each pixel ‘location’

This means our image will have 667 values pixels – this first dimension is the number of rows, which might be counter-intuitive for you it is for me. The number of rows corresponds to the Y axis of the image.

The second dimension will be the X, and the final dimension, grump.shape[2], contains values for the color channels, Red Green and Blue, or RGB. Scikit-Image has multiple color spaces, including RGBA, HSV, XYZ, CIE, and grayscale, and supports conversion between them easily.

So let’s just look at some of those specific pixel values

In [17]:
print(grump[50, 50, :]) #look at all RGB values for a specific array index, or one pixel's RGB value.
print("these three numbers in brackets represent one pixel's values for a specific location in the image array.")

'''and let's index into a specific section of the image array'''
print("let's look at the first 5 pixels in the x and y axis")
plt.imshow(grump[:5, :5, :])
print(grump[:5, :5, :])
[81 76 72]
these three numbers in brackets represent one pixel's values for a specific location in the image array.
let's look at the first 5 pixels in the x and y axis
[[[72 64 62]
  [75 67 65]
  [76 68 65]
  [79 71 68]
  [82 73 68]]

 [[68 59 60]
  [71 62 63]
  [69 60 61]
  [72 64 62]
  [72 62 61]]

 [[60 55 52]
  [61 56 53]
  [64 59 56]
  [63 55 53]
  [64 56 54]]

 [[54 51 46]
  [53 50 45]
  [54 51 46]
  [55 50 46]
  [56 51 47]]

 [[46 45 41]
  [51 47 46]
  [49 45 42]
  [48 44 41]
  [48 44 41]]]

Let’s take a closer look at our shell-buddy

In [10]:
plt.title("Grumpy's X and Y")
plt.xlabel('x : image[1]')
plt.ylabel('y : image[0]')

#And now let's take a look at the individual channels of the image.
plt.figure(figsize = (20, 15))
plt.imshow(grump[:, :, 0], cmap = 'Reds')
plt.title('First Color Channel')
plt.imshow(grump[:, :, 1], cmap = 'Greens')
plt.title('Second Color Channel')
plt.imshow(grump[:, :, 2], cmap = 'Blues')
plt.title('Third Color Channel')

Ok, well, maybe the color channels would be better represented with a different image. But I do love this tortoise-friend. Let’s use a much more illustrative, albeit inferior, non-testudinidae-friendly image.

In [14]:
test_img ='trial.jpg')

plt.figure(figsize = (18, 8))
plt.title("Testing Image")
plt.imshow(test_img[:, :, 0] - (test_img[:, :, 1] + test_img[:, :, 2]), cmap = 'Reds')
plt.title("Red Channel")
plt.imshow(test_img[:, :, 1] - (test_img[:, :, 0] + test_img[:, :, 2]), cmap = 'Greens')
plt.title("Green Channel")
plt.imshow(test_img[:, :, 2] - (test_img[:, :, 0] + test_img[:, :, 1]), cmap = 'Blues')
plt.title("Blue Channel")

Ok, another point worth delving into – why did I have to subtract some channels from the others to get something to display in the above example? Well, in this image we’re working in RGB – colors are represented by a set of integers or float values. Combining the three RGB channels – a red value of 0 to 255, Green value of 0 to 255, and Blue value of 0 to 255. A white pixel would be represented in the three channels by [255, 255, 255], and black would be [0, 0, 0]

So when we go to plot, say, the red channel test_img[:, :, 0], if we have a fully or very close to fully red object as we do in this test image, it would be represented by a single pixel value of 255, and a fully white pixel would be represented, in the red channel, by 255. So our red values would appear identical to our white pixels! Something to keep in mind if you hit some weird display bumps in the road while dealing with color images!

Back to Turtle, errr, Tortoise… Power!

Color is great and all, very nice, but for many purposes, we can get by with grayscale values – we’re eliminating a huge amount of the array’s data by converting – instead of holding 3 color values, it will only hold a single value per pixel.

Another aspect to note – before we get away from color channels – these don’t have to hold different colors. You could put different image processing techniques or different images entirely into different channels. Anything you can turn into an array, you can assign to a channel, since in NumPy it’s just another dimension.

Matplotlib’s Pyplot uses value maps called colormaps to display an image or data in different ways – these are worth examining when you’re doing any data or image representation. Default pyplotting of a black and white image will give you some crazy high-contrast colors, which can really be nice, but be sure to use the cmap argument if you want a more tame or interpretable representation.

In [20]:
grump_bw = skimage.color.rgb2gray(grump)
grump_bw = skimage.util.img_as_float32(grump_bw)

plt.figure(figsize = (20, 8))
plt.xlabel('Original RGB')
plt.xlabel('Grayscale with default colormap')
plt.imshow(grump_bw, cmap = 'gray')
plt.xlabel('Grayscale in 32-bit')

So why convert into float 32? weren’t we working in 64? or int values for the org image? Yes. Yes we were. Some image operations and processes will require specific array value types, and some will output different types, so knowing what you’re working in is important.

Use numpy’s dtype property!

In [8]:
print('grump.dtype = ', grump.dtype)
print('grump_bw.dtype = ', grump_bw.dtype)
print('grump_bw_scaled.dtype = ', grump_bw.dtype)
grump.dtype =  uint8
grump_bw.dtype =  float32
grump_bw_scaled.dtype =  float32

Congratulations, we have some of the basics of the basics covered, we can move on to the cool stuff that Scikit-Image is capable of! Let’s expose that tortoise! No… no, that came out wrong. We’re going to maximize exposure… no… no, again. Scikit-Image’s exposure module contains a bunch of useful image processes that will really get us into fighting shape.


In [22]:
rescale = skimage.exposure.rescale_intensity(grump_bw)
#stretches the values in the image, setting the highest values to maximum allowed by datatype, and lowest to minimum

eqhist = skimage.exposure.equalize_hist(grump_bw)
#histogram equalization will generally 'squash' intensity values to bump up the contrast over the whole image

adapthist = skimage.exposure.equalize_adapthist(grump_bw, clip_limit = .5)
#adaptive version of the eqhist - this will use the same 'squash and bump', but across sections of the image

sigmoid = skimage.exposure.adjust_sigmoid(grump_bw)
#sigmoidal contrast adjustment to rescale intensity values

log = skimage.exposure.adjust_log(grump_bw)
#Logarithmic contrast adjustment to rescale intensity values

plt.figure(figsize = (20, 8))
plt.imshow(grump_bw, cmap = 'gray')
plt.title('OG Grump in 32-bit glory')

plt.imshow(eqhist, cmap = 'gray')
plt.title('Equalize Histogram')

plt.imshow(adapthist, cmap = 'gray')
plt.title('Equalize Adaptive Histogram')

plt.imshow(rescale, cmap = 'gray')
plt.title('Rescale Intensity')

plt.imshow(log, cmap = 'gray')
plt.title('Logarithmic Gamma Adjustment')

plt.imshow(sigmoid, cmap = 'gray')
plt.title('Sigmoid Gamma Adjustment')

Each of these processes has a place in different applications, but for scientific purposes, rescaling intensities, histogram equalizing and the adaptive version of histogram equalizing tend to have a leg up in a lot of applications, including many Machine Learning processes, by maximizing / minimizing contrast across sections of the image, we have a much more obvious representation of potential features. I mean, just look at Grump’s beautiful, shimmering shell detail in the adaptive histogram equalization – top right above.

Next up we will go over image transformations, matrices and enhancements that will be useful when we begin to augment our dataset!