-
Notifications
You must be signed in to change notification settings - Fork 34
Llm extraction #450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llm extraction #450
Conversation
dc3bc9f to
03c99ee
Compare
Docs preview URL |
4fb01b3 to
48bd61e
Compare
Coverage Report
Files without new missing coverage
281 files skipped due to complete coverage. Coverage success: total of 97.94% is above 97.90% 🎉 |
fd5056b to
dd197bc
Compare
dd197bc to
ea21a0e
Compare
1783c8e to
fcd534a
Compare
|
Really nice work here! A few thoughts and questions:
Some broader remarks (that might be better addressed in a future PR):
|
|
@marconaguib thank you for your review ! All good points, I'll fix that :)
Are you referring to finetuning (where it would make sense to have a trainable component, although our lib might not be as optimized as other work out there like unsloth), or prompt search (more akin to tuning) ? For the latter, yes it would be great ! I wonder if there is some way we could offer a unified interface using
That would be great ! |
|
Yes, I was thinking prompt search indeed, and the |
100fb19 to
aab40a5
Compare
|



Description
DocToMarkupConverterto convert documents to markdown and improvedMarkupToDocConverterto allow overlapping markup annotations (e.g.,This is a <a>text <b>with</a> overlapping</b> tags).edsnlp.utils.fuzzy_alignment.alignto map the entities of an annotated document to another document with similar but not identical text (e.g., after some text normalization or minor edits).span_getter="sents"to apply various pipes on sentences instead of entities or spans.eds.llm_markup_extractor, that can be used to extract entities using a large language model served through an OpenAPI-style API.Checklist