RAM-efficient Retrieval #23

yasamanparhizkar · 2024-10-05T01:50:25Z

PR Type

Feature

Short Description

Reduces RAM usage by the retrieval pipeline by:

Creating large matrices in mmlearn/modules/metrics/retrieval_recall.py in batches instead of all at once.
Using a single MetricCollection in mmlearn/tasks/zero_shot_retrieval.py for all modality-pairs to avoid duplicate tensors creations.
Added a progress bar during the calculation of retrieval recall@k. The progress bar is appreciated because this calculation takes a considerable amount of time.

TODOs:

This RAM efficiently might come at the cost of longer runtimes; more investigation is needed to ensure whether that's really the case.
This code directly changes functions in retrieval_recall.py and zero_shot_retrieval.py without providing an option to use the previous implementation. This option needs to be added.

Tests Added

You can run retrieval by:

mmlearn_run 'hydra.searchpath=[pkg://projects.med_benchmarking.configs]' +experiment=baseline experiment_name=test_eval job_type=eval [email protected]=ROCO datasets.test.split=test +datasets/[email protected]_fn.batch_processors.text=HFCLIPTokenizer +datasets/[email protected]=med_clip_vision_transform datasets.test.transform.job_type=eval dataloader.test.batch_size=32 dataloader.test.num_workers=4

However, W&B does not log RAM usage during recall@k calculation. You need to manually add logging lines and compare RAM usage of this implementation vs. the previous one.

…llection for retrieval

- Added concurrent processing for batch computations using ThreadPoolExecutor. documentation. - Indexed `positive_pairs` directly instead of creating a one-hot tensor.

…ency

created large matrices in recall in batches, used a single metrics co…

23ccf45

…llection for retrieval

yasamanparhizkar marked this pull request as draft October 5, 2024 01:50

yasamanparhizkar requested a review from fcogidi October 5, 2024 01:50

yasamanparhizkar self-assigned this Oct 5, 2024

yasamanparhizkar and others added 3 commits October 4, 2024 21:54

resolved pre-commit issue

2f5680a

Merge branch 'main' into lowmem_retrieval

72dc3ef

Refactor RetrievalRecallAtK metric for improved performance

71d5663

- Added concurrent processing for batch computations using ThreadPoolExecutor. documentation. - Indexed `positive_pairs` directly instead of creating a one-hot tensor.

fcogidi marked this pull request as ready for review February 13, 2025 21:43

fcogidi added 3 commits February 13, 2025 16:45

revert to main

f0b0658

revert docstrings to main

ea7325a

Refactor type hints from List and Tuple to list and tuple for consist…

401ed85

…ency

fcogidi self-assigned this Feb 13, 2025

fcogidi approved these changes Feb 19, 2025

View reviewed changes

fcogidi merged commit 96c0b8a into main Feb 19, 2025
6 checks passed

fcogidi deleted the lowmem_retrieval branch February 19, 2025 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RAM-efficient Retrieval #23

RAM-efficient Retrieval #23

Uh oh!

yasamanparhizkar commented Oct 5, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RAM-efficient Retrieval #23

RAM-efficient Retrieval #23

Uh oh!

Conversation

yasamanparhizkar commented Oct 5, 2024

PR Type

Short Description

Tests Added

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants