-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Hi,
First of all, thanks for open sourcing this nice package. :-)
I am trying to use langextract to post-process interviews by tagging (sometimes long) quotes in a text.
The problem that I run into, is that the text extraction takes exceedingly long to complete.
For example, it takes 17 minutes (~1000 seconds) to analyze a text of 355 characters (59 words) using an examples list that contains a single ExampleData with:
- A
textof length: ~ 2,100 words (~ 13,000 characters) with 10Extractions withextraction_textsize of:- 6 characters
- 7 characters
- 7 characters
- 115 characters
- 496 characters
- 207 characters
- 84 characters
- 139 characters
- 36 characters
- 334 characters
(Unfortunately, I can not share the actual contents of the text for privacy reasons.)
Most of the time is spend before generating any output. When I terminate the program, it's always stuck at _fuzzy_align_extraction. After it starts spitting out output like this:
WARNING:absl:Prompt alignment: non-exact match:
the program quickly finishes. This is some corresponding output that is generated after the long silent period:
LangExtract: model=gemini-2.5-flash, current=358 chars, processed=358 chars: [00:09]
✓ Extraction processing complete
INFO:absl:Finalizing annotation for document ID .
INFO:absl:Document annotation completed.
✓ Extracted 3 entities (1 unique types)
• Time: 9.76s
• Speed: 37 chars/sec
• Chunks: 1
Any suggestions how to speed the extract function up?
Thanks in advance,
Hylke