Debugging Computer Vision Image Classification: Why is your model failing in production ?!

Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup

4 min readFeb 24, 2023

Computer vision models working in training, validation, and test set yet failing in production. This article discusses some common patterns where a model would fail in production.

Error Pattern 1: Variant Classifiers (looking in the wrong place)

Ph.D. candidate Mr.Saur in [1] best explained this error pattern, where a model predicts spurious features like the environment instead of the object of interest. For instance, let’s look at the example below for pre-trained model VGG16 predicts on images and LIME explains predictions. The images are those of a cougar and some flying insects. While VGG16 was able to predict the cougar accurately, it failed on the flying insect. It can be seen in the heat map images, that VGG16 focused on the invariant features well, which is the body of the animal. While it failed to focus on the body of the insect and instead its heat map is scattered all over the image. VGG16 is predicted based on spurious features like the object of interest background and environment. Therefore, for the flying insect species, VGG16 is an Invariant Classifier.

Error Pattern 2: Noisy Observations in Productions (hiding the lead)

The images below I extracted myself from iNaturalist dataset using an Image Extractor that I built [2]. These images all have are considered noisy because the main feature patterns are not obvious. There are main patterns of noise identified by [3] under data challenges as illumination, motion blur, occlusion, a small region of interest, and perspective.

Error Pattern 3: pareidolia (seeing something that isn’t there!)

VGG16 predicted that flying insect as a harvestman spyder.

Insect image path in iNaturalist : 2021_valid/02409_Animalia_Arthropoda_Insecta_Odonata_Corduliidae_Cordulia_shurtleffii/a468a7a7–1e95–4c02-a867–97c4593a8cf7.jpg

VGG16 Top 5 Predictions [(‘n01770081’, ‘harvestman’, 0.06967397),(‘n01537544’, ‘indigo_bunting’, 0.06922152),(‘n01773157’, ‘black_and_gold_garden_spider’, 0.05304367),(‘n01773797’, ‘garden_spider’, 0.046315435),(‘n02231487’, ‘walking_stick’, 0.04343839)]

It can be seen that VGG16 started seeing spyder patterns that are not there, which is almost close to the human cognitive phenomena of pareidolia. Actually, pareidolia is known to occur in computer vision models and that is covered in Wikipedia [4].

Error Pattern 4: Wrong In-variant features (anomalous images!)

Unlike pareidolia, an object of interest might have real patterns from two or more different animals/objects. A classifier who is trained on cats and dogs would predict this to be a cat. Unless it is also trained on a third class of an “unknown” category.

Error Pattern 5: Very small differences

In [5] you can read about the differences between an African and Water buffalo. The model predicted one to be the other but the differences are actually very small though!

iNautralist path : 2021_valid/04639_Animalia_Chordata_Mammalia_Artiodactyla_Bovidae_Syncerus_caffer/d80a238c-b0d0–4acc-bae0–363ca1e15611.jpg

Top5 Predictions : [(‘n02408429’, ‘water_buffalo’, 0.42585105), (‘n02397096’, ‘warthog’, 0.14604077), (‘n02396427’, ‘wild_boar’, 0.109195), (‘n02412080’, ‘ram’, 0.056828406), (‘n02403003’, ‘ox’, 0.038692415)]

Error Pattern 6: Overfitting (looking only at one area!)

The model is almost 100% positive about the breed of this dog and it is right, but looking at the superpixels, it seems overfitted to the dog’s nose!

[(‘n02102318’, ‘cocker_spaniel’, 0.99758947), (‘n02102480’, ‘Sussex_spaniel’, 0.0022108615), (‘n02101556’, ‘clumber’, 0.00015731655), (‘n02100877’, ‘Irish_setter’, 1.9174751e-05), (‘n02099601’, ‘golden_retriever’, 9.502533e-06)]

Error pattern 7: UnderFitting (not looking at all the important areas! )

Conclusion

Variant Classifiers, Noisy Observations, pareidolia, and Wrong In-variant features I believe are the main patterns of error that can fail a model in production despite working in train/valid/test sets.