High token loop hallucination rate, high no response rate

Problems:
Model often hallucinates with token loop sections. 
Model often returns empty responses.

Reproduction:
Russian/Kazakh language scanned texts.

Possible solutions:
1. Provide more detailed guidance on how to use the model with settings, preprocessing steps.
2. Fine-tune the model on synthetic dataset, as I am not aware of existence of any annotated Kazakh/Russian datasets for OCR readily available.

I've tried to integrate chandra repo code into my pipeline with exact vllm settings, request settings, prompts and preprocessing, but still face with high error and hallucination rate that makes this model barely usable in production in my language set. In my test set I have ~60 documents and 5-6 of them on average always have token loop hallucination. 1 or 2 documents consistently have missing pages.

Scans I use are of decent quality with clearly recognizable text, but the model still fails at relatively high rate.

I also tried sending pdf images directly without intermediate conversion steps, hoping that lossless pipeline will help the situation, but no success.

Can't share the exact documents bc they are private.
Any help could be appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High token loop hallucination rate, high no response rate #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

High token loop hallucination rate, high no response rate #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions