-
Notifications
You must be signed in to change notification settings - Fork 112
Relation extraction llama abstraction #528
Relation extraction llama abstraction #528
Conversation
* CU-8698ek477: Add TODO to MetaCAT ML utils regarding AdamW import * CU-8698ek477: Fix AdamW import (trf->torch)
…ut a vector (#524) * CU-8698f8fgc: Add new test to check that the negative sampling indices do not include non-vectored indices * CU-8698f8fgc: Add fix for negative sampling including indices for words without a vector * CU-8698f8fgc: Update tests to make sure index frequencies are respected * CU-8698f8fgc: Add 3.9-friendly counter totalling method
* CU-8698gqumv: Add tests for Vocab upon regression testing * CU-8698gqumv: Fix regression time vocab data
* CU-86983ruw9: Fix train-test splitter leaving train set empty for smaller datasets * CU-86983ruw9: Add additional optional arguments to test-train splitting for minimum concept count and maximum test fraction * CU-86983ruw9: Add a few tests for test-train splitting
* CU-8698hfkch: Add eval method to deid model * CU-8698hfkch: lint checks --------- Co-authored-by: Tom Searle <[email protected]>
Ran it through GHA workflow on my fork as well, just in case: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible to merge the Bert and ModernBERT model implementations?
from medcat.utils.relation_extraction.models import Base_RelationExtraction | ||
|
||
|
||
class LlamaModel_RelationExtraction(Base_RelationExtraction): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the underscore in the name, just BaseRelationExtraction
and LlamaRelationExtraction
look fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just moved the classes around - didn't change the names. That's what the convention has been since #173
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modernBERT should be pretty much a drop in replacement for BERT? if using AutoTokenizer
and AutoModel
, AutoConfig
, to pull the relevant pretrained_model_name_or_path = "answerdotai/ModernBERT-base"
There's some differences in positional embeddings representations etc. but shouldn't need the 2 modules and 200 or so extra lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's quite a bit of stuff that's really similar, but some things that aren't exactly the same.
But the we discussed this with Vlad the other week was that I'd make some abstraction changes and he'd try and address some of the code duplication.
Just an FYI as well, I should have probably just updated this PR, but the next one is probably more relevant: |
Improve abstraction for RelCAT tokenizer and models: