Simplify passing PIL images #54

johnbradley · 2024-10-16T12:51:25Z

Currently to create predictions for multiple PIL images instead image paths users must call three functions.
Something like the following:

img_features = classifier.create_image_features(images)
for probs in classifier.create_probabilities(img_features, classifier.txt_features):
    for pred in classifier.format_species_probs(None, probs, k=5):
        print(pred['species'], pred['common_name'], pred['score'])

The text was updated successfully, but these errors were encountered:

johnbradley · 2024-10-16T12:53:41Z

We already have a central location that loads the image path.

https://github.com/Imageomics/pybioclip/blob/44818f0a9aaf71cb97a6ac4545c85fb8e2afd2f1/src/bioclip/predict.py#L186C9-L188

I think the original intension was to have this function to support str or PIL. Outside of this we would just need to update type hints.

quitmeyer · 2024-10-16T17:20:46Z

So, to throw another wrench into your system, not only do i need to specify rank, i am also using a CustomLabelsClassifier which unfortunately gives the error

AttributeError: 'CustomLabelsClassifier' object has no attribute 'format_grouped_probs'

here you can see the main code im trying to do. Basically I have Yolo detections for multiple insects on a flat plane in a high resolution image. I have cropped and rotated sub images of just a single insect at a time that I want to feed pybioclip. Before I could do it with the @bucket-of-bugs projects, but we had to save all the cropped images to disk first and lose their location on the larger original image. I want to feed pybioclip these PIL images directly


        cv_image_cropped = warp_rotation(cv_image,coordinates)

        pil_image = Image.fromarray(cv_image)

        crop_path=get_path_from_img_temp(pil_image)
        
        # create a PIL image array
        pil_image_ar=[pil_image]
        img_features = classifier.create_image_features(pil_image_ar)
        idx = 0
        for probs in classifier.create_probabilities(img_features, classifier.txt_features):
            name = f"image{idx}"
            for pred in classifier.format_grouped_probs(name, probs, rank=Rank.ORDER, min_prob=1e-9, k=5):
                print(pred['file_name'], pred['genus'], pred['score'])
            idx += 1

johnbradley · 2024-10-16T17:46:40Z

@quitmeyer CustomLabelsClassifier doesn't work with Rank. With CustomLabelsClassifier you can supply labels that aren't even part of taxonomy. What are some of the labels you are using with CustomLabelsClassifier in this scenario?

quitmeyer · 2024-10-16T17:49:16Z

@johnbradley here's our old example from beetlepalooza
https://github.com/Digital-Naturalism-Laboratories/bucket-o-bugs
So we load up a species list from GBIF so that our detections are narrowed to a specific country (panama) and type of critter (class INSECTA)

quitmeyer · 2024-10-16T17:58:07Z

I have also been trying to trick bioclip by making a temporary file, but when pybioclip tries to open it i get a "Permission Denied" error (even when running as admin)

with tempfile.NamedTemporaryFile(suffix='.jpg', dir=temp_dir) as temp:
            pil_image.save(temp, format='JPEG')
            #temp.write(output.read())
            temp.seek(0)
            thepath = Path(temp.name)

            print(thepath)
            
            crop_path=thepath
            
            print(crop_path)
            results = classifier.predict(r""+str(crop_path))
            print(results)```

EDIT: i got it to work by having to set delete=False
```        with tempfile.NamedTemporaryFile(suffix='.jpg', dir=temp_dir, delete=False) as temp:

but then im back at my original problem of just saving a ton of files to disk that i later have to clean up manually

johnbradley · 2024-10-16T18:18:30Z

@quitmeyer In that code the classifier isn't making use of rank(that I can tell). Perhaps you manually grouped the items.

It sounds like you have custom labels that are species, but you want to group them by order(or other taxon rank).

We have a new feature called binning that might be what you are looking for. See these two doc links:
https://github.com/Imageomics/pybioclip?tab=readme-ov-file#predict-from-a-list-of-classes-with-binning
https://github.com/Imageomics/pybioclip?tab=readme-ov-file#predict-from-a-binning-csv

The feature hasn't been released yet so you may need to install like so:

pip uninstall pybioclip
pip install git+https://github.com/Imageomics/pybioclip

The above documentation shows how to use binning with image paths.
Here is some example code that uses a PIL:

from bioclip import CustomLabelsBinningClassifier
import PIL.Image

# create a PIL image array
img = PIL.Image.open("Ursus-arctos.jpeg").convert("RGB")
images = [img]

# replace the dictionary with your own mapping
# from species -> order
classifier = CustomLabelsBinningClassifier(cls_to_bin={
  'dog': 'small',
  'fish': 'small',
  'bear': 'big',
})

img_features = classifier.create_image_features(images)
for probs in classifier.create_probabilities(img_features, classifier.txt_features):
    result = classifier.group_probs("image", probs, k=3)
    print(result)

johnbradley · 2024-10-16T18:24:08Z

Just to provide all the options, if you can use CustomLabelsClassifier with a PIL image like so:

from bioclip import CustomLabelsClassifier
import PIL.Image

# create a PIL image array
img = PIL.Image.open("Ursus-arctos.jpeg").convert("RGB")
images = [img]

classifier = CustomLabelsClassifier(["duck","fish","bear","dog","wolf"])
img_features = classifier.create_image_features(images)
for probs in classifier.create_probabilities(img_features, classifier.txt_features):
    topk = probs.topk(k=5)
    for i, prob in zip(topk.indices, topk.values):
        print(classifier.classes[i], prob.item())

egrace479 · 2024-10-16T19:23:05Z

@johnbradley, @quitmeyer, for bucket-of-bugs we did just use CustomLabelsClassifier with the list of relevant orders and then appended circle and hole to the list before passing it to pybioclip.

Fixes #54

quitmeyer · 2024-10-16T19:41:49Z

create a PIL image array

img = PIL.Image.open("Ursus-arctos.jpeg").convert("RGB")
images = [img]

classifier = CustomLabelsClassifier(["duck","fish","bear","dog","wolf"])
img_features = classifier.create_image_features(images)
for probs in classifier.create_probabilities(img_features, classifier.txt_features):
topk = probs.topk(k=5)
for i, prob in zip(topk.indices, topk.values):
print(classifier.classes[i], prob.item())

Hey! Awesome! This is exactly what I was looking for! Got it going!

Now there's just a weird problem where it REALLY REALLY thinks a big clear image of a butterfly is an earwig? (even though the bioclip online demo nails it as lepidopter) but if that keeps being a prob ill post a different discussion :)

Being able to use a PIL image like this is exactly what I was looking for!

egrace479 · 2024-10-16T20:12:23Z

Hey! Awesome! This is exactly what I was looking for! Got it going!

Now there's just a weird problem where it REALLY REALLY thinks a big clear image of a butterfly is an earwig? (even though the bioclip online demo nails it as lepidopter) but if that keeps being a prob ill post a different discussion :)
...
Being able to use a PIL image like this is exactly what I was looking for!

There can be differences in performance based on image processing. I saw this with another project where the interpolation method was different between the demo and local running: one opened images with tf.keras.utils.load_img default at the desired size, the other resized with PIL and it changed the predictions.

quitmeyer · 2024-10-16T20:33:14Z

There can be differences in performance based on image processing. I saw this with another project where the interpolation method was different between the demo and local running: one opened images with tf.keras.utils.load_img default at the desired size, the other resized with PIL and it changed the predictions.

Interesting @egrace479 , it seems to only mess up really bad on this butterfly which is also about 10-20x larger than most of the other images (small moths and beetles). Do you think there's a chance having too big of an image screws things up? Is there a way to specify the interpolation method?

quitmeyer · 2024-10-16T21:01:26Z

Do you think there's a chance having too big of an image screws things up? Is there a way to specify the interpolation method?

BTW i just ran a quick experiment running the same image at full, 1/4 and 1/8 resolution and they ended up with the same results. so that might not be a prob

johnbradley · 2024-10-16T21:10:25Z

The code scales images to 224x224 as part of an image preprocessing step when using the default model:

pybioclip/src/bioclip/predict.py

Lines 155 to 164 in d8035ef

    
           preprocess_img = transforms.Compose( 
        
               [ 
        
                   transforms.ToTensor(), 
        
                   transforms.Resize((224, 224), antialias=True), 
        
                   transforms.Normalize( 
        
                       mean=(0.48145466, 0.4578275, 0.40821073), 
        
                       std=(0.26862954, 0.26130258, 0.27577711), 
        
                   ), 
        
               ] 
        
           )

To match what is done in the Open-Ended part of the bioclip demo you need to use the TreeOfLifeClassifier. To match the Zero-Shot option in the bioclip demo you should use the CustomLabelsClassifier.

Are you passing the order as the label to CustomLabelsClassifier? ie. CustomLabelsClassifier(["Lepidoptera","Demaptera","Thysanoptera"]).

quitmeyer · 2024-10-16T22:27:12Z

Are you passing the order as the label to CustomLabelsClassifier? ie. >CustomLabelsClassifier(["Lepidoptera","Demaptera","Thysanoptera"]).

Here's what we get up to (condensed the code a bit):

from bioclip import CustomLabelsClassifier, Rank

def get_bioclip_prediction_PILimg(img, classifier):
  # create a PIL image array
  images = [img]
  winner= ""
  winnerprob= ""

  img_features = classifier.create_image_features(images)
  for probs in classifier.create_probabilities(img_features, classifier.txt_features):
      topk = probs.topk(k=5)
      for i, prob in zip(topk.indices, topk.values):
          if(i==0):
            winner=classifier.classes[i]
            winnerprob=prob.item()
          print(classifier.classes[i], prob.item())

  # Print the winner
  print(f"  This is the winner: {winner} with a score of {winnerprob}")
  return winner, prob

def load_taxon_keys(taxa_path, taxa_cols, taxon_rank = "order", flag_holes = True):
  '''
  Loads taxon keys from a tab-delimited CSV file into a list.

  Args:
    taxa_path: String. Path to the taxa CSV file.
    taxa_cols: List of strings. Taxonomic columns in taxa CSV to load (default: ["kingdom", "phylum", "class", "order", "family", "genus", "species"]).
    taxon_rank: String. Taxonomic rank to which to classify images (must be present as column in the taxa csv at file_path). Default: "order".
    flag_holes: Boolean. Whether to flag holes and smudges (adds "hole" and "circle" to taxon_keys). Default: True.

  Returns:
    taxon_keys: List. A list of taxon keys to feed to the CustomClassifier for bioCLIP classification.
  '''
  df = pl.read_csv(taxa_path, low_memory = False).select(taxa_cols).filter(pl.col(taxon_rank).is_not_null())
  taxon_keys = pl.Series(df.select(pl.col(taxon_rank)).unique()).to_list()
  
  if flag_holes:
    taxon_keys.append("circle")
    taxon_keys.append("hole")
  
  return taxon_keys


  #load up the Pybioclip stuff
  taxon_keys_list = load_taxon_keys(taxa_path = taxa_path, taxa_cols = taxa_cols, taxon_rank = taxon_rank.lower(), flag_holes = flag_holes)
  print(f"We are predicting from the following {len(taxon_keys_list)} taxon keys: {taxon_keys_list}")

  print("Loading CustomLabelsClassifier...")
  classifier = CustomLabelsClassifier(taxon_keys_list, device = device)
 # Next process each pair and generate temporary files for the ROI of each detection in each image
  # Iterate through image-JSON pairs
  for image_path, json_path in matching_pairs_img_json_detections:
    # Load JSON file and extract rotated rectangle coordinates for each detection
    
    #coordinates_of_detections_list = get_rotated_rect_coordinates(json_path)
    coordinates_of_detections_list = get_rotated_rect_raw_coordinates(json_path)
    print(len(coordinates_of_detections_list)," detections in "+json_path)
    if coordinates_of_detections_list:
      for coordinates in coordinates_of_detections_list:
        print(coordinates)
        image = Image.open(image_path)
        cv_image = np.array(image)
        cv_image = cv_image[:, :, ::-1]  # Reverse the channels (BGR to RGB)

        cv_image_cropped = warp_rotation(cv_image,coordinates)

        #pil_image = Image.fromarray(cv_image_cropped)
        pil_image = Image.fromarray(cv_image_cropped[:, :, ::-1] ) #numpy images are inverted

        pred, conf = get_bioclip_prediction_PILimg(pil_image,classifier)

johnbradley · 2024-10-17T12:40:04Z

@quitmeyer Based on your code I think you are creating a CustomLabelsClassifier like this:

classifier = CustomLabelsClassifier(
['Lepidoptera', 'Embioptera', 'Strepsiptera', 'Odonata', 'Plecoptera', 'Zygentoma', 'Cnemidolestodea', 'Orthoptera', 'Megaloptera', 'Trichoptera', 'Mantodea', 'Ephemeroptera', 'Hymenoptera', 'Siphonaptera', 'Hemiptera', 'Dermaptera', 'Thysanoptera', 'Diptera', 'Psocodea', 'Mecoptera', 'Neuroptera', 'Blattodea', 'Zoraptera', 'Phasmida', 'Coleoptera', 'hole', 'circle'], device = device)

My understanding is bioclip was trained on species. I think you will get better results if you pass the list species to CustomLabelsClassifier. Then take the results and sum up the scores based on the taxa level you are targeting (order in this case). This is how the TreeOfLifeClassifier handles ranks other than Species.

quitmeyer · 2024-10-17T12:59:21Z

I think you will get better results if you pass the list species to CustomLabelsClassifier. Then take the results and sum up the scores based on the taxa level you are targeting (order in this case).

ahhh ok, i think we might have mistakenly thought that's how it was functioning if we chose orders other than taxa

johnbradley · 2024-10-17T13:03:28Z

If you would like to use the TreeOfLifeClassifier but filter to the target classes and add "hole" and "smudge" I could work up some code to do that.

johnbradley · 2024-10-17T13:06:23Z

Otherwise the new binning feature is probably your best but, but generating 13K text embeddings is going to take quite some time. We have an issue with some rough notes on saving embeddings: #17

quitmeyer · 2024-10-17T13:15:03Z

If you would like to use the TreeOfLifeClassifier but filter to the target classes and add "hole" and "smudge" I could work up some code to do that.

The main key for us is to be able to filter to specific regions and taxa only (in our case Panama and Insects) so that even if it gets some things wrong it doesn't give something totally impossible (like a polar bear in panama) and that we have some reproducible way of setting up our script for other regions (like for collaborators in the US or Peru), by doing something like feeding it a different GBIF download.

Here's the latest script ive been hacking on for a better picture
https://github.com/Digital-Naturalism-Laboratories/Mothbox/blob/main/AI/Mothbot_ID.py

You can see i got it successfully working doing IDs on ROIs thanks to you!

quitmeyer · 2024-10-17T13:31:28Z

, but generating 13K text embeddings is going to take quite some time. We have an issue with some rough notes on saving embeddings: #17

Ohhh and yeah if i can save embeddings that would be amazing, because that's the thing that kinda takes the longest for us right now i think!

quitmeyer · 2024-10-17T13:37:14Z

If you would like to use the TreeOfLifeClassifier but filter to the target classes and add "hole" and "smudge" I could work up some code to do that.

that would be incredible!

johnbradley · 2024-10-17T17:35:35Z

@quitmeyer I pushed a notebook to a branch that shows how to filter the TOL classifier: https://github.com/Imageomics/pybioclip/blob/54-pil/FilterTOLExample.ipynb
It uses polars to read the CSV to match your example code. Right now the notebook filters based on order values in the CSV. Because of this it may find species outside of your list. If you change taxon_rank to species will filter for embeddings for species names that match between TOL and your CSV.

quitmeyer · 2024-10-17T17:47:08Z

Thanks so much! I'll check it out and let you know how it goes!

Fixes #54

quitmeyer · 2024-10-25T22:12:09Z

SO cool we can use PIL images now!

johnbradley added a commit that referenced this issue Oct 16, 2024

Allow using PIL with predict functions

bad2987

Fixes #54

johnbradley added a commit that referenced this issue Oct 22, 2024

Allow using PIL with predict functions

b00c829

Fixes #54

This was referenced Oct 22, 2024

Allow using PIL with predict functions #55

Merged

Allow subsetting TreeOfLifeClassifier #56

Closed

johnbradley closed this as completed in 8520cd5 Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify passing PIL images #54

Simplify passing PIL images #54

johnbradley commented Oct 16, 2024

johnbradley commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

johnbradley commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

quitmeyer commented Oct 16, 2024 •

edited

Loading

johnbradley commented Oct 16, 2024

johnbradley commented Oct 16, 2024

egrace479 commented Oct 16, 2024

quitmeyer commented Oct 16, 2024 •

edited

Loading

create a PIL image array

egrace479 commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

johnbradley commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

johnbradley commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

johnbradley commented Oct 17, 2024 •

edited

Loading

johnbradley commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

johnbradley commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

quitmeyer commented Oct 25, 2024

Simplify passing PIL images #54

Simplify passing PIL images #54

Comments

johnbradley commented Oct 16, 2024

johnbradley commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

johnbradley commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

quitmeyer commented Oct 16, 2024 • edited Loading

johnbradley commented Oct 16, 2024

johnbradley commented Oct 16, 2024

egrace479 commented Oct 16, 2024

quitmeyer commented Oct 16, 2024 • edited Loading

create a PIL image array

egrace479 commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

johnbradley commented Oct 16, 2024

quitmeyer commented Oct 16, 2024

johnbradley commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

johnbradley commented Oct 17, 2024 • edited Loading

johnbradley commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

johnbradley commented Oct 17, 2024

quitmeyer commented Oct 17, 2024

quitmeyer commented Oct 25, 2024

quitmeyer commented Oct 16, 2024 •

edited

Loading

quitmeyer commented Oct 16, 2024 •

edited

Loading

johnbradley commented Oct 17, 2024 •

edited

Loading