Relation extraction llama abstraction #528

mart-r · 2025-04-02T11:21:10Z

Improve abstraction for RelCAT tokenizer and models:

Rely only on abstractions within rel_cat
Move implementations to new packages and modules for each type
Import implementations dynamically to avoid when necessary

* CU-8698ek477: Add TODO to MetaCAT ML utils regarding AdamW import * CU-8698ek477: Fix AdamW import (trf->torch)

…ut a vector (#524) * CU-8698f8fgc: Add new test to check that the negative sampling indices do not include non-vectored indices * CU-8698f8fgc: Add fix for negative sampling including indices for words without a vector * CU-8698f8fgc: Update tests to make sure index frequencies are respected * CU-8698f8fgc: Add 3.9-friendly counter totalling method

… a model save (#525)

* CU-8698gqumv: Add tests for Vocab upon regression testing * CU-8698gqumv: Fix regression time vocab data

* CU-86983ruw9: Fix train-test splitter leaving train set empty for smaller datasets * CU-86983ruw9: Add additional optional arguments to test-train splitting for minimum concept count and maximum test fraction * CU-86983ruw9: Add a few tests for test-train splitting

* CU-8698hfkch: Add eval method to deid model * CU-8698hfkch: lint checks --------- Co-authored-by: Tom Searle <[email protected]>

mart-r · 2025-04-02T12:18:25Z

Ran it through GHA workflow on my fork as well, just in case:
mart-r#52

tomolopolis

Possible to merge the Bert and ModernBERT model implementations?

tomolopolis · 2025-04-09T16:18:33Z

medcat/utils/relation_extraction/llama/model.py

+from medcat.utils.relation_extraction.models import Base_RelationExtraction
+
+
+class LlamaModel_RelationExtraction(Base_RelationExtraction):


why the underscore in the name, just BaseRelationExtraction and LlamaRelationExtraction look fine?

I just moved the classes around - didn't change the names. That's what the convention has been since #173

tomolopolis · 2025-04-09T16:38:57Z

medcat/utils/relation_extraction/modernbert/model.py

modernBERT should be pretty much a drop in replacement for BERT? if using AutoTokenizer and AutoModel, AutoConfig, to pull the relevant pretrained_model_name_or_path = "answerdotai/ModernBERT-base"

There's some differences in positional embeddings representations etc. but shouldn't need the 2 modules and 200 or so extra lines.

There's quite a bit of stuff that's really similar, but some things that aren't exactly the same.
But the we discussed this with Vlad the other week was that I'd make some abstraction changes and he'd try and address some of the code duplication.

mart-r · 2025-04-10T09:20:59Z

Just an FYI as well, I should have probably just updated this PR, but the next one is probably more relevant:
#530

mart-r and others added 15 commits March 24, 2025 11:36

CU-8698ek477: Fix AdamW import from tranformers to torch (#523)

c3d4254

* CU-8698ek477: Add TODO to MetaCAT ML utils regarding AdamW import * CU-8698ek477: Fix AdamW import (trf->torch)

CU-8698gkrqa: Add argument to allow specifying the changes warrenting…

291ac3a

… a model save (#525)

CU-8698gqumv: Fix regression test vocab vector sizes (#526)

47e37ea

* CU-8698gqumv: Add tests for Vocab upon regression testing * CU-8698gqumv: Fix regression time vocab data

CU-8698hfkch: Add eval method to deid model (#527)

602b4e7

* CU-8698hfkch: Add eval method to deid model * CU-8698hfkch: lint checks --------- Co-authored-by: Tom Searle <[email protected]>

Merge branch 'master' into relation_extraction_llama-MR-abstraction

a491e14

Update RelCAT stuff for improved abstraction

08975c0

Move separate model implementations to separate packages

2b0cfb5

Some minor abstraction changes

a399cd2

Remove accidentally copied abstract method decorator

afb4f74

Fix import in test

8540a3e

Fix RelCAT impport in pipe tests

4eed873

Merge branch 'master' into relation_extraction_llama-MR-abstraction

6db55a8

Update base relcat model implementation to include config

255334d

tomolopolis suggested changes Apr 9, 2025

View reviewed changes

vladd-bit merged commit 255334d into relation_extraction_llama Apr 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relation extraction llama abstraction #528

Relation extraction llama abstraction #528

Uh oh!

mart-r commented Apr 2, 2025

Uh oh!

mart-r commented Apr 2, 2025

Uh oh!

tomolopolis left a comment

Uh oh!

tomolopolis Apr 9, 2025

Uh oh!

mart-r Apr 10, 2025

Uh oh!

tomolopolis Apr 9, 2025

Uh oh!

mart-r Apr 10, 2025

Uh oh!

mart-r commented Apr 10, 2025

Uh oh!

Uh oh!

		from medcat.utils.relation_extraction.models import Base_RelationExtraction


		class LlamaModel_RelationExtraction(Base_RelationExtraction):

Relation extraction llama abstraction #528

Relation extraction llama abstraction #528

Uh oh!

Conversation

mart-r commented Apr 2, 2025

Uh oh!

mart-r commented Apr 2, 2025

Uh oh!

tomolopolis left a comment

Choose a reason for hiding this comment

Uh oh!

tomolopolis Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

mart-r Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

tomolopolis Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

mart-r Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

mart-r commented Apr 10, 2025

Uh oh!

Uh oh!