Debugging Image Classification model by clustering misclassifications

Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup

3 min readFeb 6, 2023

I was working on training an image classification model as a technical exercise. Then I wanted to remove “Noisy“ pic samples from my training data. “Noisy” like blurry for instance or any kind of issue I can forgive a misclassification on. Then I can judge classification performance with more objectivity. I followed the tutorial here [1] for image clustering where a generic image classifier pre-trained TensorFlow model VGG16 was used. The tutorial was about image clustering, not noise removal, so I had to add noisy data samples myself. My hope was that the system could produce a cluster of the noisy data added and it did!

This is the code I used to add blur copied from here [2]

def blur_image (image_name):
  # Importing Image class from PIL module
  import PIL
  from PIL import Image

  # Opens a image in RGB mode
  im = Image.open(r"{}".format(image_name))

  # Blurring the image
  im1 = im.filter(PIL.ImageFilter.BoxBlur(4))

  im1.save ("blur_{}.png".format(image_name))

sample of blur image created and named blur_0168.png

Only a small sample of image files was used in addition to their blurred version. I selected a number of clusters to be 10 in the Kmeans. The system did cluster the blurred images almost all in 1 or 2 clusters (0 and 5, see appendix 3) separate from the good (non-noisy) image samples.

Finally , thanks for reading and you can find the full implementation code in google colab in my GitHub repo here [3].

References

[1] https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34

[2] https://www.geeksforgeeks.org/python-pillow-blur-an-image/

[3] https://github.com/Eezzeldin/ImageClustering.git

Appendices

Appendix 1: Summary of the approach used for clustering

The tutorial clustered flower images based on the Feature Map produced by the model.predict function. The Feature Map was extracted by removing the last prediction layer from the model output. Then the Feature Map array got Dimentionality Reduction by scklearn PCA and then clustered with scklearn Kmeans.

Appendix 2: Image files samples used

[‘0156.png’, ‘0166.png’, ‘0168.png’, ‘0169.png’, ‘0170.png’, ‘0167.png’, ‘0165.png’, ‘0164.png’, ‘0155.png’, ‘0160.png’, ‘0163.png’, ‘0162.png’, ‘0157.png’, ‘0161.png’, ‘0158.png’, ‘0159.png’, ‘blur.png’, ‘blur_0157.png.png’, ‘blur_0156.png.png’, ‘blur_0166.png.png’, ‘blur_0168.png.png’, ‘blur_0169.png.png’, ‘blur_0170.png.png’, ‘blur_0167.png.png’, ‘blur_0165.png.png’, ‘blur_0164.png.png’, ‘blur_0155.png.png’, ‘blur_0160.png.png’, ‘blur_0163.png.png’, ‘blur_0162.png.png’, ‘blur_0161.png.png’, ‘blur_0158.png.png’, ‘blur_0159.png.png’]

Appendix 3: Clusters Generated

Cluster Groups (0–10)

{6: [‘0156.png’], 4: [‘0166.png’], 1: [‘0168.png’, ‘0169.png’, ‘0155.png’, ‘0162.png’], 8: [‘0170.png’], 2: [‘0167.png’, ‘0164.png’, ‘0160.png’, ‘0161.png’], 7: [‘0165.png’], 3: [‘0163.png’, ‘0157.png’, ‘0159.png’], 9: [‘0158.png’],

5: [‘blur.png’, ‘blur_0168.png.png’, ‘blur_0169.png.png’, ‘blur_0155.png.png’, ‘blur_0162.png.png’],

0: [‘blur_0157.png.png’, ‘blur_0156.png.png’, ‘blur_0166.png.png’, ‘blur_0170.png.png’, ‘blur_0167.png.png’, ‘blur_0165.png.png’, ‘blur_0164.png.png’, ‘blur_0160.png.png’, ‘blur_0163.png.png’, ‘blur_0161.png.png’, ‘blur_0158.png.png’, ‘blur_0159.png.png’]}

Debugging Image Classification model by clustering misclassifications

References

Appendices

Written by Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup

No responses yet