Skip to content

Long ExampleData causes extract to hang on _fuzzy_align_extraction #277

@hcdonker-code

Description

@hcdonker-code

Hi,

First of all, thanks for open sourcing this nice package. :-)

I am trying to use langextract to post-process interviews by tagging (sometimes long) quotes in a text.
The problem that I run into, is that the text extraction takes exceedingly long to complete.
For example, it takes 17 minutes (~1000 seconds) to analyze a text of 355 characters (59 words) using an examples list that contains a single ExampleData with:

  • A text of length: ~ 2,100 words (~ 13,000 characters) with 10 Extractions with extraction_text size of:
    • 6 characters
    • 7 characters
    • 7 characters
    • 115 characters
    • 496 characters
    • 207 characters
    • 84 characters
    • 139 characters
    • 36 characters
    • 334 characters

(Unfortunately, I can not share the actual contents of the text for privacy reasons.)
Most of the time is spend before generating any output. When I terminate the program, it's always stuck at _fuzzy_align_extraction. After it starts spitting out output like this:

WARNING:absl:Prompt alignment: non-exact match:

the program quickly finishes. This is some corresponding output that is generated after the long silent period:

LangExtract: model=gemini-2.5-flash, current=358 chars, processed=358 chars: [00:09]
✓ Extraction processing complete
INFO:absl:Finalizing annotation for document ID .
INFO:absl:Document annotation completed.
✓ Extracted 3 entities (1 unique types)
• Time: 9.76s
• Speed: 37 chars/sec
• Chunks: 1

Any suggestions how to speed the extract function up?

Thanks in advance,

Hylke

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions