Debugging Image Classification model by clustering misclassifications
I was working on training an image classification model as a technical exercise. Then I wanted to remove “Noisy“ pic samples from my training data. “Noisy” like blurry for instance or any kind of issue I can forgive a misclassification on. Then I can judge classification performance with more objectivity. I followed the tutorial here [1] for image clustering where a generic image classifier pre-trained TensorFlow model VGG16 was used. The tutorial was about image clustering, not noise removal, so I had to add noisy data samples myself. My hope was that the system could produce a cluster of the noisy data added and it did!
This is the code I used to add blur copied from here [2]
def blur_image (image_name):
# Importing Image class from PIL module
import PIL
from PIL import Image
# Opens a image in RGB mode
im = Image.open(r"{}".format(image_name))
# Blurring the image
im1 = im.filter(PIL.ImageFilter.BoxBlur(4))
im1.save ("blur_{}.png".format(image_name))
Only a small sample of image files was used in addition to their blurred version. I selected a number of clusters to be 10 in the Kmeans. The system did cluster the blurred images almost all in 1 or 2 clusters (0 and 5, see appendix 3) separate from the good (non-noisy) image samples.
Finally , thanks for reading and you can find the full implementation code in google colab in my GitHub repo here [3].
References
[1] https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34
[2] https://www.geeksforgeeks.org/python-pillow-blur-an-image/
[3] https://github.com/Eezzeldin/ImageClustering.git
Appendices
Appendix 1: Summary of the approach used for clustering
The tutorial clustered flower images based on the Feature Map produced by the model.predict function. The Feature Map was extracted by removing the last prediction layer from the model output. Then the Feature Map array got Dimentionality Reduction by scklearn PCA and then clustered with scklearn Kmeans.
Appendix 2: Image files samples used
[‘0156.png’, ‘0166.png’, ‘0168.png’, ‘0169.png’, ‘0170.png’, ‘0167.png’, ‘0165.png’, ‘0164.png’, ‘0155.png’, ‘0160.png’, ‘0163.png’, ‘0162.png’, ‘0157.png’, ‘0161.png’, ‘0158.png’, ‘0159.png’, ‘blur.png’, ‘blur_0157.png.png’, ‘blur_0156.png.png’, ‘blur_0166.png.png’, ‘blur_0168.png.png’, ‘blur_0169.png.png’, ‘blur_0170.png.png’, ‘blur_0167.png.png’, ‘blur_0165.png.png’, ‘blur_0164.png.png’, ‘blur_0155.png.png’, ‘blur_0160.png.png’, ‘blur_0163.png.png’, ‘blur_0162.png.png’, ‘blur_0161.png.png’, ‘blur_0158.png.png’, ‘blur_0159.png.png’]
Appendix 3: Clusters Generated
Cluster Groups (0–10)
{6: [‘0156.png’], 4: [‘0166.png’], 1: [‘0168.png’, ‘0169.png’, ‘0155.png’, ‘0162.png’], 8: [‘0170.png’], 2: [‘0167.png’, ‘0164.png’, ‘0160.png’, ‘0161.png’], 7: [‘0165.png’], 3: [‘0163.png’, ‘0157.png’, ‘0159.png’], 9: [‘0158.png’],
5: [‘blur.png’, ‘blur_0168.png.png’, ‘blur_0169.png.png’, ‘blur_0155.png.png’, ‘blur_0162.png.png’],
0: [‘blur_0157.png.png’, ‘blur_0156.png.png’, ‘blur_0166.png.png’, ‘blur_0170.png.png’, ‘blur_0167.png.png’, ‘blur_0165.png.png’, ‘blur_0164.png.png’, ‘blur_0160.png.png’, ‘blur_0163.png.png’, ‘blur_0161.png.png’, ‘blur_0158.png.png’, ‘blur_0159.png.png’]}