support of model preloading for multiple inferences

I've been working with your ChartVLM model and noticed that the current implementation loads the model before each inference call. This appears to be inefficient for my use case where I need to perform multiple inferences.

Looking at the code in `ChartVLM.py`, each time `infer_ChartVLM()` is called, it seems to reload the adapter and decoders. I'm wondering if you have an alternative implementation to load the model components once and reuse them for multiple inferences?

This would significantly improve performance for batch processing scenarios. 

Thank you for your time and for sharing this excellent work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support of model preloading for multiple inferences #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

support of model preloading for multiple inferences #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions