How to build your own OOD and Domain Adaptation test set?

Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup

7 min readMar 13, 2023

There is no OOD (Out-Of-Distribution) benchmark dataset for Wildlife datasets like iNauralist, iWildCam, and their alikes. Thus image classifiers could take shortcuts by learning environments instead of animals [1]. CIFAR 10 Vs. CIFAR 100 are benchmark datasets for OOD performance evaluation. iNaturalist re-design analogous to CIFAR10Vs.100 is presented in this article. All data and code are made available on github[12].

1- Data

2- Experiment Challenges

2.1- Domain Challenges

2.2- Semantic Shifts (Bad Annotations)

2.3- Out-Of-Distribution (OOD)

3- Data Pre-Processing

3.1- Accounting for Bad Annotations

3.2- Accounting for Domain Challenges

3.3- Accounting for OOD Challenges

4- Image Classification Experiment Setup

4.1- Train Setup

4.1.1- Target Variables

4.1.2- Class 1 Observations

4.1.3- Class 2 Obsservations

4.1.4- Evaluation Metric

4.2- OOD Only Test Set

4.3- Domain Adaptation Test Set

4.4- OOD and Domain Adaptation Test Set

5- Conclusion

5.1- Data and Target Class

5.2- Clustering and Domain Discovery

5.3- OOD

5.4 OOD and Domain Adaptation

1- Data

iNaturalist_validation_2021 was downloaded via pyTorch [2] and files scanned across it. Only Animal Species images were required for this experiment but the code traverses through all images anyways. However, only “Animalia” jpj images were stored. It was all about fifty something thousand animal pictures.

image_files =[]
img_path = "/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/"

import os

def traverseDir(folderPath):
 
    for subFolderRoot, foldersWithinSubFolder, files in os.walk(folderPath, topdown=False):
 
        for fileName in files:
            print(os.path.join(subFolderRoot, fileName))
            if "Animalia" in os.path.join(subFolderRoot, fileName):
              image_files.append (os.path.join(subFolderRoot, fileName))
 
        for folderNameWithinSubFolder in foldersWithinSubFolder:
            print(os.path.join(subFolderRoot, folderNameWithinSubFolder))

 
traverseDir(img_path)
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/00602_Animalia_Arthropoda_Insecta_Hemiptera_Miridae_Poecilocapsus_lineatus
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/00971_Animalia_Arthropoda_Insecta_Lepidoptera_Erebidae_Apantesis_phalerata
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/08550_Plantae_Tracheophyta_Magnoliopsida_Lamiales_Oleaceae_Fraxinus_pennsylvanica
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/08226_Plantae_Tracheophyta_Magnoliopsida_Garryales_Garryaceae_Garrya_elliptica
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/05177_Animalia_Cnidaria_Anthozoa_Actiniaria_Actiniidae_Anthopleura_sola
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/05751_Plantae_Bryophyta_Bryopsida_Hypnales_Lembophyllaceae_Isothecium_stoloniferum
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/05616_Fungi_Basidiomycota_Agaricomycetes_Boletales_Suillaceae_Suillus_granulatus
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/00648_Animalia_Arthropoda_Insecta_Hemiptera_Reduviidae_Apiomerus_spissipes
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/09206_Plantae_Tracheophyta_Magnoliopsida_Ranunculales_Ranunculaceae_Anemone_cylindrica
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/03126_Animalia_Chordata_Aves_Accipitriformes_Accipitridae_Buteo_buteo
.
.
.
/gdrive/My Drive/Colab Notebooks/ComputerVision/ImageClustering/animals/2021_valid/09243_Plantae_Tracheophyta_Magnoliopsida_Ranunculales_Ranunculaceae_Delphinium_nuttallianum

animals = [[f.strip(f.split('/')[-1]),f.split('/')[-1]] for f in  image_files if ".jpg" in f]
animals_df = pd.DataFrame (animals)
animals_df ["file"] = animals_df [0] + "/" + animals_df [1]
animals_df = animals_df.drop ([0,1],axis = 1)
animals_df ["species"] = animals_df ["files"].apply (lambda x : x.split("/") [-3].split ("_") [2])
animals_df.groupby("species").agg ("count")

Only Chordata was used in this experiment for OOD performance evaluation.

In addition, clustered version of Chordata is used in this experiment setup for Domain Adaptation performance evaluation which can be downloaded from my personal kaggle profile here [8] it has the name of clustered_iNaturalist_Valid2021.csv.

2- Experiment Challenges

2.1- Domain Challenges

Aves are the only flying species in Chordata and thus the model can develop spurious correlation between the species and the sky.

Sky is a normal Environment for Aves to be seen at!

Aves species could also exist in a Green Domain!

2.2 —Semantic Shifts (bad annotation)

In some images the species have been annotated from very far distance which could affect annotation accuracy! That would result potentially in Semantic Shifts issues [3].

Aves sub-species have been classified from very far distances!

2.3- Out-Of-Distribution (OOD)

Aves species has a dominant-class in observation samples which is “Passeriformes”. A model trained only on “Passeriformes” must be also able to classify all the other species as “Aves” as well. In that sense other species are considered to be Negaive-Examples to “Passeriformes”.

3- Data Pre-Processing

3.1 — Accounting for Potentially Bad Annotations

The methodology for extracting Noisy observations is well explained in previous post [4]. About 500 images were identified and removed from the main observations set.

3.2 — Accounting for Domain Challenges

Ten domains/environments/cluster were specified for Aven using clustering methodology specified here [5] and [6]. Each cluster was constrained to a minimum sample size of 2000 images using k-means-constrained [7]. Ten clusters were selected for computational efficiency purposes.

Amphibia and Actinopterygii have their very unique domains, and thus vulnerable for spurious correlations. There are common domains though between Aves and Mammalia (2 and 7). That accounts for enough samples for training with good computational efficiency. Leaving 8,9,6,4,3 and 1 all as possible validation and test sets.

3.3— Accounting for OOD Challenges

From previous PreProcessing step , only data from cluster 2 and 7 were selected. Then from OOD challenges section above , Passeriformes was identified as the dominant class in Aves. Thus , Passeriformes alone will be used for training while other species will be like Negative Examples in Test and Validation sets.

4- Image Classification Experiment Setup

The experiment is designed for a Binary Image Classifier :

4.1 — Train Setup

4.1.1 — Target Variables; 1 : Aves 0 : Not Aves (other species)

4.1.2 — Class 1 Observations :

Cluster 7 :304 Passeriformes Aves
Cluster 2 : 915 Passeriformes Aves
Total : 1219 Aves

4.1.3 — Class 2 Observations :

Custer 2 : 526 Observations
175 Amphibia , Anura
163 Mammaalia Carnivora and 188 Mammaalia Rodentia
Cluster 7 : 870 Observations
357 Mammalia , Artiodactyla
240 Mammalia , Carnivora
273 Mammalia , Rodentia
Total : 1396 Not Aves Observations

4.1.4 — evaluation metric : F1 Score

4.2 — OOD Only Test Data

The OOD Test data should be of thee same training environment (cluster 2 and 7) but from different species (Not Passeriformes).

Cluster 2 :

Aves,Charadriiformes,147

Aves,Pelecaniformes,101

Total : 248 OOD Aves Test Observations

If the number of test samples is too little, more can be borrowed from environments closest to Clusters 2 and 7.

4.3 —Domain Adaptation Only Test

Domain Adaptation Test set is very sheer as there are Clusters 8,9,6,4,3 which Passeriformes observations in these clusters are all suitable Domain Adaptation Test Set.

c1 = data ["cluster"] != 2 
c2 = data ["cluster"] != 7 
c3 = data ["species"] == "Aves"
c4 = data ["species1"] == "Passeriformes"
c  = c1 & c2 & c3 & c4 
data_da = data.copy() [c]  # da : for Domain Adaptation
data_da [["file" , "cluster" , "species","species1"]].groupby (["cluster","species","species1"]).agg ("count")    [data_da [["file" , "cluster" , "species","species1"]].groupby (["cluster","species","species1"]).agg ("count") ['file'] >100]

4.4— Domain Adaptation and OOD Test

c1 = data ["cluster"] != 2 
c2 = data ["cluster"] != 7 
c3 = data ["species"] == "Aves"
c4 = data ["species1"] != "Passeriformes" #*********
c  = c1 & c2 & c3 & c4 
data_da = data.copy() [c]  # da : for Domain Adaptation

data_da [["file" , "cluster" , "species","species1"]].groupby (["cluster","species","species1"]).agg ("count")    [data_da [["file" , "cluster" , "species","species1"]].groupby (["cluster","species","species1"]).agg ("count") ['file'] >100]

5- Conclusion

5.1 — Data and Target Class

There are some known benchmark animal datasets for Domain Adaptation performance evaluation like VlCS and Terra Incognita used in Facebook research [9]. While there are also known benchmark datasets for OOD performance evaluation like CIFAR 10 Vs. CIFAR 100 [10] used in google’s 2022 publishing [3]. There are no benchmark datasets for OOD performance evaluation or OOD and Domain Adaptation together specifically for animal data. In pursuit of creating one, INaturalist_2021_Validation data was downloaded , Chordata species [11] extracted. “Aves” was identified as the species with highest observations coun and thus selected as the target species in this work.

5.2 — Domain Discovery

Clustering on Chordata occurred according to methodologies discussed in [4] and [5]. Ten Clusters with minimum sizes of 2000 observations per cluster was input to constrained-k-means algorithm [7]. Cluster 2 and 7 were selected for training because they have the highest balance of samples between “Aves” and other-species. Other Clusters were left for Domain Adaptation performance evaluation. The clustered data was made available for download here [8] and later on , a notebook will be made on kaggle with the clustering code. Clustering quality had been validated by visually comparing the top (closest to cluster mean) 20 images of each cluster.

5.3 — OOD

Passeriformes was found to be the highest observation sub-species in the Aves species. Thus Passeriformes was selected as the main class in the train set while other sub-species in Cluster 2 and 7 for OOD performance evaluation.

5.4- OOD and Domain Adaptation

Other Sub-species than Passeriformes were selected from all the other clusters than 2 and 7 for both OOD and Domain Adaptation performance evaluation simaltinousley.

[1] https://autonomousvision.github.io/cgn/

[2] https://pytorch.org/vision/stable/generated/torchvision.datasets.INaturalist.html#torchvision.datasets.INaturalist

[3] https://ai.googleblog.com/2022/07/towards-reliability-in-deep-learning.html

[4] https://medium.com/@emad-ezzeldin4/debugging-computer-vision-image-classification-removing-noisy-images-2fb7c518930b

[5]https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34

[6]https://medium.com/@emad-ezzeldin4/discovering-different-environments-in-animal-camera-traps-f157df07f9c8

[7] https://github.com/joshlk/k-means-constrained

[9] https://arxiv.org/pdf/2007.01434.pdf

[10]https://complexity.cecs.ucf.edu/cifar-10-cifar-100/

[11] https://en.wikipedia.org/wiki/Chordate

[12]https://github.com/Eezzeldin/iWildOOD