Fix Owlv2 Performance #1065

balthazur · 2025-03-07T11:24:38Z

Description

Remove compilation in background as it failed (Fix for Inference Internal)
Improve SerializeOwlV2 class to keep reference of base class to enable (fix for slow serialization/training)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

locally and on staging

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

Docs updated? What were the changes:

isaacrob-roboflow

would love to better understand why we have both an owlv2 singleton and a reference dict, as it seems like those are intended to serve similar functions and the fact that both are necessary seems like I'm either missing something or there's a hidden bug

inference/models/owlv2/owlv2.py

isaacrob-roboflow · 2025-03-10T17:19:29Z

inference/models/owlv2/owlv2.py

+    # Cache of OwlV2 instances to avoid creating new ones for each serialize_training_data call
+    # This improves performance by reusing model instances across serialization operations
+    _base_owlv2_instances = {}
+


it seems like this dict serves a redundant purpose with the singleton .. might be simpler to maintain this long term if we either take out the singleton or fix the singleton such that this dict doesn't have to exist? or am I misunderstanding what's happening here

This dict is only for class SerializedOwlV2, which is another wrapper around the class OwlV2, and has the classmethod serialize_training_data. This function creates an OWLv2 Instance like owlv2 = OwlV2(model_id=roboflow_id) every time the function is called (everytime a training job comes in), which makes it create a new Instance every time without using the Singleton.

I'll try to think of a better way, but thought it would be fine because its a wrapper class for the serialization progress.

def save_embeddings(training_data, savedirprefix, previous_embeddings_file=None): from inference.models.owlv2.owlv2 import SerializedOwlV2 from inference.core.env import OWLV2_VERSION_ID from adapters import logging_adapter save_dir=f"/tmp/{savedirprefix}/embeddings" os.makedirs(save_dir, exist_ok=True) total_images = len(training_data) logging_adapter.info(f"Starting embedding generation for {total_images} images") start_time = time.time() embeddings_pt = SerializedOwlV2.serialize_training_data( training_data=training_data, hf_id=f"google/{OWLV2_VERSION_ID}", save_dir=save_dir, previous_embeddings_file=previous_embeddings_file )

This function above calls the serialize_training_data every time, and was creating a new OWLv2 Instance with different context and callstack everytime.

@isaacrob-roboflow Any suggestions how to unify the Singleton with the dict in Inference?

balthazur · 2025-03-11T13:27:43Z

@isaacrob-roboflow I investigated a bit more and tried a few different options. I created a short summary using ChatGPT to bring my messy notes in order:

The Problem

The issue is that the Owlv2Singleton class uses a weakref.WeakValueDictionary() to store model instances. This means that when there's no strong reference to the singleton object, it can be garbage collected.

When serialize_training_data() is called multiple times (for different training jobs), each call creates an OWLv2 instance like:

owlv2 = OwlV2(model_id=roboflow_id)

After the method completes, this instance is no longer referenced and gets garbage collected, along with its caches and potentially the singleton model if nothing else holds a reference to it.

Why Our Solution Works

The class dictionary _base_owlv2_instances solves this by maintaining strong references to the OWLv2 instances:

if roboflow_id in cls._base_owlv2_instances:
    owlv2 = cls._base_owlv2_instances[roboflow_id]
else:
    owlv2 = OwlV2(model_id=roboflow_id)
    cls._base_owlv2_instances[roboflow_id] = owlv2

This ensures that between calls to serialize_training_data(), we reuse the same OWLv2 instance with its caches intact, improving performance significantly.

Why Not Just Fix the Singleton?

We could change the singleton to use a regular dictionary instead of a weak dictionary, and that would help maintain the model. However, there's still an issue:

The singleton only holds the heavy model (from Hugging Face).
The important caches (image embeddings, etc.) are stored on the OWLv2 instance itself, not in the singleton.
We need to maintain a reference to the complete OWLv2 instance to keep those caches.

Current Implementation

The current implementation is straightforward and effective:

SerializedOwlV2._base_owlv2_instances maintains OWLv2 instances by model ID.
This is specifically used for the serialization service, not affecting the base OWLv2 class's general inference usage.
It maintains a persistent OWLv2 instance with all its caches across multiple calls.
While the dictionary structure is simple (typically just containing one entry like {'owlv2/owlv2-large-patch14-ensemble': <OwlV2 object>}), using a dictionary gives us flexibility to handle different OWLv2 versions if needed in the future.

For now, this is a reasonable solution that solves the issue without requiring major refactoring. The dictionary is simple but effective at maintaining state between calls.

If you'd like to refactor this later to unify the caching approach, that would be great, but the current implementation works well for our immediate needs. Also, if you have ideas for a better fix please feel free! Happy to apply them over mine.

…-perf

balthazur added 15 commits March 7, 2025 12:23

test with many logs

eac961f

handle embeddings case

f3ff7ff

Merge branch 'main' into owlv2-perf

7fedaf7

compile flag

c7b6061

fix

3c044d4

class method

3a092ba

remove unused vars

a571166

deletes

94d04a6

cleanup

3023bcd

cleanup

01ae70f

cleanup

816880d

cleanup

a5712ff

cleanup

e1abb80

format

7a0fd6d

cleanup

6c14795

balthazur changed the title ~~Draft: Investigage Owlv2 performance~~ Fix Owlv2 Performance Mar 10, 2025

balthazur marked this pull request as ready for review March 10, 2025 13:01

balthazur requested review from PawelPeczek-Roboflow, grzegorz-roboflow, yeldarby, probicheaux and hansent as code owners March 10, 2025 13:01

balthazur self-assigned this Mar 10, 2025

isaacrob-roboflow reviewed Mar 10, 2025

View reviewed changes

move into if

ae07c3b

balthazur and others added 4 commits March 11, 2025 20:23

class method and use for inference as well

c1ffee9

Merge branch 'main' into owlv2-perf

ec2f1cf

black

b5f9ec2

Merge branch 'owlv2-perf' of github.com:roboflow/inference into owlv2…

900ccac

…-perf

probicheaux approved these changes Mar 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Owlv2 Performance #1065

Fix Owlv2 Performance #1065

balthazur commented Mar 7, 2025 •

edited

Loading

isaacrob-roboflow left a comment

isaacrob-roboflow Mar 10, 2025

balthazur Mar 10, 2025

balthazur Mar 10, 2025

balthazur Mar 10, 2025

balthazur commented Mar 11, 2025

Fix Owlv2 Performance #1065

Are you sure you want to change the base?

Fix Owlv2 Performance #1065

Conversation

balthazur commented Mar 7, 2025 • edited Loading

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

isaacrob-roboflow left a comment

Choose a reason for hiding this comment

isaacrob-roboflow Mar 10, 2025

Choose a reason for hiding this comment

balthazur Mar 10, 2025

Choose a reason for hiding this comment

balthazur Mar 10, 2025

Choose a reason for hiding this comment

balthazur Mar 10, 2025

Choose a reason for hiding this comment

balthazur commented Mar 11, 2025

The Problem

Why Our Solution Works

Why Not Just Fix the Singleton?

Current Implementation

balthazur commented Mar 7, 2025 •

edited

Loading