Skip to content

High token loop hallucination rate, high no response rate #51

@iDynbek

Description

@iDynbek

Problems:
Model often hallucinates with token loop sections.
Model often returns empty responses.

Reproduction:
Russian/Kazakh language scanned texts.

Possible solutions:

  1. Provide more detailed guidance on how to use the model with settings, preprocessing steps.
  2. Fine-tune the model on synthetic dataset, as I am not aware of existence of any annotated Kazakh/Russian datasets for OCR readily available.

I've tried to integrate chandra repo code into my pipeline with exact vllm settings, request settings, prompts and preprocessing, but still face with high error and hallucination rate that makes this model barely usable in production in my language set. In my test set I have ~60 documents and 5-6 of them on average always have token loop hallucination. 1 or 2 documents consistently have missing pages.

Scans I use are of decent quality with clearly recognizable text, but the model still fails at relatively high rate.

I also tried sending pdf images directly without intermediate conversion steps, hoping that lossless pipeline will help the situation, but no success.

Can't share the exact documents bc they are private.
Any help could be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions