Debugging Image Classification model by clustering misclassifications

--

I was working on training an image classification model as a technical exercise. Then I wanted to remove “Noisy“ pic samples from my training data. “Noisy” like blurry for instance or any kind of issue I can forgive a misclassification on. Then I can judge classification performance with more objectivity. I followed the tutorial here [1] for image clustering where a generic image classifier pre-trained TensorFlow model VGG16 was used. The tutorial was about image clustering, not noise removal, so I had to add noisy data samples myself. My hope was that the system could produce a cluster of the noisy data added and it did!

This is the code I used to add blur copied from here [2]

def blur_image (image_name):
# Importing Image class from PIL module
import PIL
from PIL import Image

# Opens a image in RGB mode
im = Image.open(r"{}".format(image_name))

# Blurring the image
im1 = im.filter(PIL.ImageFilter.BoxBlur(4))

im1.save ("blur_{}.png".format(image_name))
sample of blur image created and named blur_0168.png
Original Image named 0168.png

Only a small sample of image files was used in addition to their blurred version. I selected a number of clusters to be 10 in the Kmeans. The system did cluster the blurred images almost all in 1 or 2 clusters (0 and 5, see appendix 3) separate from the good (non-noisy) image samples.

Cluster Generated of Blurred Images

Finally , thanks for reading and you can find the full implementation code in google colab in my GitHub repo here [3].

References

[1] https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34

[2] https://www.geeksforgeeks.org/python-pillow-blur-an-image/

[3] https://github.com/Eezzeldin/ImageClustering.git

Appendices

Appendix 1: Summary of the approach used for clustering

The tutorial clustered flower images based on the Feature Map produced by the model.predict function. The Feature Map was extracted by removing the last prediction layer from the model output. Then the Feature Map array got Dimentionality Reduction by scklearn PCA and then clustered with scklearn Kmeans.

Appendix 2: Image files samples used

[‘0156.png’, ‘0166.png’, ‘0168.png’, ‘0169.png’, ‘0170.png’, ‘0167.png’, ‘0165.png’, ‘0164.png’, ‘0155.png’, ‘0160.png’, ‘0163.png’, ‘0162.png’, ‘0157.png’, ‘0161.png’, ‘0158.png’, ‘0159.png’, ‘blur.png’, ‘blur_0157.png.png’, ‘blur_0156.png.png’, ‘blur_0166.png.png’, ‘blur_0168.png.png’, ‘blur_0169.png.png’, ‘blur_0170.png.png’, ‘blur_0167.png.png’, ‘blur_0165.png.png’, ‘blur_0164.png.png’, ‘blur_0155.png.png’, ‘blur_0160.png.png’, ‘blur_0163.png.png’, ‘blur_0162.png.png’, ‘blur_0161.png.png’, ‘blur_0158.png.png’, ‘blur_0159.png.png’]

Appendix 3: Clusters Generated

Cluster Groups (0–10)

{6: [‘0156.png’], 4: [‘0166.png’], 1: [‘0168.png’, ‘0169.png’, ‘0155.png’, ‘0162.png’], 8: [‘0170.png’], 2: [‘0167.png’, ‘0164.png’, ‘0160.png’, ‘0161.png’], 7: [‘0165.png’], 3: [‘0163.png’, ‘0157.png’, ‘0159.png’], 9: [‘0158.png’],

5: [‘blur.png’, ‘blur_0168.png.png’, ‘blur_0169.png.png’, ‘blur_0155.png.png’, ‘blur_0162.png.png’],

0: [‘blur_0157.png.png’, ‘blur_0156.png.png’, ‘blur_0166.png.png’, ‘blur_0170.png.png’, ‘blur_0167.png.png’, ‘blur_0165.png.png’, ‘blur_0164.png.png’, ‘blur_0160.png.png’, ‘blur_0163.png.png’, ‘blur_0161.png.png’, ‘blur_0158.png.png’, ‘blur_0159.png.png’]}

--

--

Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup
Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup

Written by Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup

5 years Data Scientist and a MSc from George Mason University in Data Analytics. I enjoy experimenting with Data Science tools. emad.ezzeldin4@gmail.com

No responses yet