Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify passing PIL images #54

Closed
johnbradley opened this issue Oct 16, 2024 · 24 comments
Closed

Simplify passing PIL images #54

johnbradley opened this issue Oct 16, 2024 · 24 comments

Comments

@johnbradley
Copy link
Collaborator

Currently to create predictions for multiple PIL images instead image paths users must call three functions.
Something like the following:

img_features = classifier.create_image_features(images)
for probs in classifier.create_probabilities(img_features, classifier.txt_features):
    for pred in classifier.format_species_probs(None, probs, k=5):
        print(pred['species'], pred['common_name'], pred['score'])
@johnbradley
Copy link
Collaborator Author

We already have a central location that loads the image path.

https://github.com/Imageomics/pybioclip/blob/44818f0a9aaf71cb97a6ac4545c85fb8e2afd2f1/src/bioclip/predict.py#L186C9-L188

I think the original intension was to have this function to support str or PIL. Outside of this we would just need to update type hints.

@quitmeyer
Copy link

So, to throw another wrench into your system, not only do i need to specify rank, i am also using a CustomLabelsClassifier which unfortunately gives the error

AttributeError: 'CustomLabelsClassifier' object has no attribute 'format_grouped_probs'

here you can see the main code im trying to do. Basically I have Yolo detections for multiple insects on a flat plane in a high resolution image. I have cropped and rotated sub images of just a single insect at a time that I want to feed pybioclip. Before I could do it with the @bucket-of-bugs projects, but we had to save all the cropped images to disk first and lose their location on the larger original image. I want to feed pybioclip these PIL images directly


        cv_image_cropped = warp_rotation(cv_image,coordinates)

        pil_image = Image.fromarray(cv_image)

        crop_path=get_path_from_img_temp(pil_image)
        
        # create a PIL image array
        pil_image_ar=[pil_image]
        img_features = classifier.create_image_features(pil_image_ar)
        idx = 0
        for probs in classifier.create_probabilities(img_features, classifier.txt_features):
            name = f"image{idx}"
            for pred in classifier.format_grouped_probs(name, probs, rank=Rank.ORDER, min_prob=1e-9, k=5):
                print(pred['file_name'], pred['genus'], pred['score'])
            idx += 1

@johnbradley
Copy link
Collaborator Author

@quitmeyer CustomLabelsClassifier doesn't work with Rank. With CustomLabelsClassifier you can supply labels that aren't even part of taxonomy. What are some of the labels you are using with CustomLabelsClassifier in this scenario?

@quitmeyer
Copy link

@johnbradley here's our old example from beetlepalooza
https://github.com/Digital-Naturalism-Laboratories/bucket-o-bugs
So we load up a species list from GBIF so that our detections are narrowed to a specific country (panama) and type of critter (class INSECTA)

@quitmeyer
Copy link

quitmeyer commented Oct 16, 2024

I have also been trying to trick bioclip by making a temporary file, but when pybioclip tries to open it i get a "Permission Denied" error (even when running as admin)

with tempfile.NamedTemporaryFile(suffix='.jpg', dir=temp_dir) as temp:
            pil_image.save(temp, format='JPEG')
            #temp.write(output.read())
            temp.seek(0)
            thepath = Path(temp.name)

            print(thepath)
            
            crop_path=thepath
            
            print(crop_path)
            results = classifier.predict(r""+str(crop_path))
            print(results)```

EDIT: i got it to work by having to set delete=False
```        with tempfile.NamedTemporaryFile(suffix='.jpg', dir=temp_dir, delete=False) as temp:

but then im back at my original problem of just saving a ton of files to disk that i later have to clean up manually

@johnbradley
Copy link
Collaborator Author

@quitmeyer In that code the classifier isn't making use of rank(that I can tell). Perhaps you manually grouped the items.

It sounds like you have custom labels that are species, but you want to group them by order(or other taxon rank).

We have a new feature called binning that might be what you are looking for. See these two doc links:
https://github.com/Imageomics/pybioclip?tab=readme-ov-file#predict-from-a-list-of-classes-with-binning
https://github.com/Imageomics/pybioclip?tab=readme-ov-file#predict-from-a-binning-csv

The feature hasn't been released yet so you may need to install like so:

pip uninstall pybioclip
pip install git+https://github.com/Imageomics/pybioclip

The above documentation shows how to use binning with image paths.
Here is some example code that uses a PIL:

from bioclip import CustomLabelsBinningClassifier
import PIL.Image

# create a PIL image array
img = PIL.Image.open("Ursus-arctos.jpeg").convert("RGB")
images = [img]

# replace the dictionary with your own mapping
# from species -> order
classifier = CustomLabelsBinningClassifier(cls_to_bin={
  'dog': 'small',
  'fish': 'small',
  'bear': 'big',
})

img_features = classifier.create_image_features(images)
for probs in classifier.create_probabilities(img_features, classifier.txt_features):
    result = classifier.group_probs("image", probs, k=3)
    print(result)

@johnbradley
Copy link
Collaborator Author

Just to provide all the options, if you can use CustomLabelsClassifier with a PIL image like so:

from bioclip import CustomLabelsClassifier
import PIL.Image

# create a PIL image array
img = PIL.Image.open("Ursus-arctos.jpeg").convert("RGB")
images = [img]

classifier = CustomLabelsClassifier(["duck","fish","bear","dog","wolf"])
img_features = classifier.create_image_features(images)
for probs in classifier.create_probabilities(img_features, classifier.txt_features):
    topk = probs.topk(k=5)
    for i, prob in zip(topk.indices, topk.values):
        print(classifier.classes[i], prob.item())

@egrace479
Copy link
Member

@johnbradley, @quitmeyer, for bucket-of-bugs we did just use CustomLabelsClassifier with the list of relevant orders and then appended circle and hole to the list before passing it to pybioclip.

johnbradley added a commit that referenced this issue Oct 16, 2024
@quitmeyer
Copy link

quitmeyer commented Oct 16, 2024

create a PIL image array

img = PIL.Image.open("Ursus-arctos.jpeg").convert("RGB")
images = [img]

classifier = CustomLabelsClassifier(["duck","fish","bear","dog","wolf"])
img_features = classifier.create_image_features(images)
for probs in classifier.create_probabilities(img_features, classifier.txt_features):
topk = probs.topk(k=5)
for i, prob in zip(topk.indices, topk.values):
print(classifier.classes[i], prob.item())

Hey! Awesome! This is exactly what I was looking for! Got it going!

Now there's just a weird problem where it REALLY REALLY thinks a big clear image of a butterfly is an earwig? (even though the bioclip online demo nails it as lepidopter) but if that keeps being a prob ill post a different discussion :)
image

Being able to use a PIL image like this is exactly what I was looking for!

@egrace479
Copy link
Member

Hey! Awesome! This is exactly what I was looking for! Got it going!

Now there's just a weird problem where it REALLY REALLY thinks a big clear image of a butterfly is an earwig? (even though the bioclip online demo nails it as lepidopter) but if that keeps being a prob ill post a different discussion :)
...
Being able to use a PIL image like this is exactly what I was looking for!

There can be differences in performance based on image processing. I saw this with another project where the interpolation method was different between the demo and local running: one opened images with tf.keras.utils.load_img default at the desired size, the other resized with PIL and it changed the predictions.

@quitmeyer
Copy link

There can be differences in performance based on image processing. I saw this with another project where the interpolation method was different between the demo and local running: one opened images with tf.keras.utils.load_img default at the desired size, the other resized with PIL and it changed the predictions.

Interesting @egrace479 , it seems to only mess up really bad on this butterfly which is also about 10-20x larger than most of the other images (small moths and beetles). Do you think there's a chance having too big of an image screws things up? Is there a way to specify the interpolation method?

@quitmeyer
Copy link

Do you think there's a chance having too big of an image screws things up? Is there a way to specify the interpolation method?

BTW i just ran a quick experiment running the same image at full, 1/4 and 1/8 resolution and they ended up with the same results. so that might not be a prob

@johnbradley
Copy link
Collaborator Author

The code scales images to 224x224 as part of an image preprocessing step when using the default model:

preprocess_img = transforms.Compose(
[
transforms.ToTensor(),
transforms.Resize((224, 224), antialias=True),
transforms.Normalize(
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711),
),
]
)


To match what is done in the Open-Ended part of the bioclip demo you need to use the TreeOfLifeClassifier. To match the Zero-Shot option in the bioclip demo you should use the CustomLabelsClassifier.


Are you passing the order as the label to CustomLabelsClassifier? ie. CustomLabelsClassifier(["Lepidoptera","Demaptera","Thysanoptera"]).

@quitmeyer
Copy link

Are you passing the order as the label to CustomLabelsClassifier? ie. >CustomLabelsClassifier(["Lepidoptera","Demaptera","Thysanoptera"]).

Here's what we get up to (condensed the code a bit):

from bioclip import CustomLabelsClassifier, Rank
def get_bioclip_prediction_PILimg(img, classifier):
  # create a PIL image array
  images = [img]
  winner= ""
  winnerprob= ""

  img_features = classifier.create_image_features(images)
  for probs in classifier.create_probabilities(img_features, classifier.txt_features):
      topk = probs.topk(k=5)
      for i, prob in zip(topk.indices, topk.values):
          if(i==0):
            winner=classifier.classes[i]
            winnerprob=prob.item()
          print(classifier.classes[i], prob.item())

  # Print the winner
  print(f"  This is the winner: {winner} with a score of {winnerprob}")
  return winner, prob
def load_taxon_keys(taxa_path, taxa_cols, taxon_rank = "order", flag_holes = True):
  '''
  Loads taxon keys from a tab-delimited CSV file into a list.

  Args:
    taxa_path: String. Path to the taxa CSV file.
    taxa_cols: List of strings. Taxonomic columns in taxa CSV to load (default: ["kingdom", "phylum", "class", "order", "family", "genus", "species"]).
    taxon_rank: String. Taxonomic rank to which to classify images (must be present as column in the taxa csv at file_path). Default: "order".
    flag_holes: Boolean. Whether to flag holes and smudges (adds "hole" and "circle" to taxon_keys). Default: True.

  Returns:
    taxon_keys: List. A list of taxon keys to feed to the CustomClassifier for bioCLIP classification.
  '''
  df = pl.read_csv(taxa_path, low_memory = False).select(taxa_cols).filter(pl.col(taxon_rank).is_not_null())
  taxon_keys = pl.Series(df.select(pl.col(taxon_rank)).unique()).to_list()
  
  if flag_holes:
    taxon_keys.append("circle")
    taxon_keys.append("hole")
  
  return taxon_keys

  #load up the Pybioclip stuff
  taxon_keys_list = load_taxon_keys(taxa_path = taxa_path, taxa_cols = taxa_cols, taxon_rank = taxon_rank.lower(), flag_holes = flag_holes)
  print(f"We are predicting from the following {len(taxon_keys_list)} taxon keys: {taxon_keys_list}")

  print("Loading CustomLabelsClassifier...")
  classifier = CustomLabelsClassifier(taxon_keys_list, device = device)
 # Next process each pair and generate temporary files for the ROI of each detection in each image
  # Iterate through image-JSON pairs
  for image_path, json_path in matching_pairs_img_json_detections:
    # Load JSON file and extract rotated rectangle coordinates for each detection
    
    #coordinates_of_detections_list = get_rotated_rect_coordinates(json_path)
    coordinates_of_detections_list = get_rotated_rect_raw_coordinates(json_path)
    print(len(coordinates_of_detections_list)," detections in "+json_path)
    if coordinates_of_detections_list:
      for coordinates in coordinates_of_detections_list:
        print(coordinates)
        image = Image.open(image_path)
        cv_image = np.array(image)
        cv_image = cv_image[:, :, ::-1]  # Reverse the channels (BGR to RGB)

        cv_image_cropped = warp_rotation(cv_image,coordinates)

        #pil_image = Image.fromarray(cv_image_cropped)
        pil_image = Image.fromarray(cv_image_cropped[:, :, ::-1] ) #numpy images are inverted

        pred, conf = get_bioclip_prediction_PILimg(pil_image,classifier)

@johnbradley
Copy link
Collaborator Author

@quitmeyer Based on your code I think you are creating a CustomLabelsClassifier like this:

classifier = CustomLabelsClassifier(
['Lepidoptera', 'Embioptera', 'Strepsiptera', 'Odonata', 'Plecoptera', 'Zygentoma', 'Cnemidolestodea', 'Orthoptera', 'Megaloptera', 'Trichoptera', 'Mantodea', 'Ephemeroptera', 'Hymenoptera', 'Siphonaptera', 'Hemiptera', 'Dermaptera', 'Thysanoptera', 'Diptera', 'Psocodea', 'Mecoptera', 'Neuroptera', 'Blattodea', 'Zoraptera', 'Phasmida', 'Coleoptera', 'hole', 'circle'], device = device)

My understanding is bioclip was trained on species. I think you will get better results if you pass the list species to CustomLabelsClassifier. Then take the results and sum up the scores based on the taxa level you are targeting (order in this case). This is how the TreeOfLifeClassifier handles ranks other than Species.

@quitmeyer
Copy link

I think you will get better results if you pass the list species to CustomLabelsClassifier. Then take the results and sum up the scores based on the taxa level you are targeting (order in this case).

ahhh ok, i think we might have mistakenly thought that's how it was functioning if we chose orders other than taxa

@johnbradley
Copy link
Collaborator Author

johnbradley commented Oct 17, 2024

If you would like to use the TreeOfLifeClassifier but filter to the target classes and add "hole" and "smudge" I could work up some code to do that.

@johnbradley
Copy link
Collaborator Author

Otherwise the new binning feature is probably your best but, but generating 13K text embeddings is going to take quite some time. We have an issue with some rough notes on saving embeddings: #17

@quitmeyer
Copy link

If you would like to use the TreeOfLifeClassifier but filter to the target classes and add "hole" and "smudge" I could work up some code to do that.

The main key for us is to be able to filter to specific regions and taxa only (in our case Panama and Insects) so that even if it gets some things wrong it doesn't give something totally impossible (like a polar bear in panama) and that we have some reproducible way of setting up our script for other regions (like for collaborators in the US or Peru), by doing something like feeding it a different GBIF download.

Here's the latest script ive been hacking on for a better picture
https://github.com/Digital-Naturalism-Laboratories/Mothbox/blob/main/AI/Mothbot_ID.py

You can see i got it successfully working doing IDs on ROIs thanks to you!
image

@quitmeyer
Copy link

, but generating 13K text embeddings is going to take quite some time. We have an issue with some rough notes on saving embeddings: #17

Ohhh and yeah if i can save embeddings that would be amazing, because that's the thing that kinda takes the longest for us right now i think!

@quitmeyer
Copy link

If you would like to use the TreeOfLifeClassifier but filter to the target classes and add "hole" and "smudge" I could work up some code to do that.

that would be incredible!

@johnbradley
Copy link
Collaborator Author

@quitmeyer I pushed a notebook to a branch that shows how to filter the TOL classifier: https://github.com/Imageomics/pybioclip/blob/54-pil/FilterTOLExample.ipynb
It uses polars to read the CSV to match your example code. Right now the notebook filters based on order values in the CSV. Because of this it may find species outside of your list. If you change taxon_rank to species will filter for embeddings for species names that match between TOL and your CSV.

@quitmeyer
Copy link

Thanks so much! I'll check it out and let you know how it goes!

@quitmeyer
Copy link

SO cool we can use PIL images now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants