Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MedHELM V1 #3403

Merged
merged 219 commits into from
Mar 19, 2025
Merged

MedHELM V1 #3403

merged 219 commits into from
Mar 19, 2025

Conversation

MiguelAFH
Copy link
Collaborator

On this PR, we add the 31 scenarios part of the first release of MedHELM and the model deployments used to run all benchmarks. Changes checklist:

  • 31 scenarios under src/helm/benchmark/scenarios
  • 11 model deployments added
  • 31 run specs added to the new file medhelm_run_specs.py

MiguelAFH and others added 30 commits November 23, 2024 03:23
…ine, change bertscore backbone model to fit on 40GB GPU
Comment on lines +33 to +34
GITHUB_DIR_URL = "https://github.com/raulista1997/benchmarkdata/tree/main/mtsamples_processed"
RAW_BASE_URL = "https://raw.githubusercontent.com/raulista1997/benchmarkdata/refs/heads/main/mtsamples_processed/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin githash.

Comment on lines +49 to +56
soup = BeautifulSoup(response.text, "html.parser")
file_links = [
link.text
for link in soup.find_all(
"a", {"href": re.compile(r"/raulista1997/benchmarkdata/blob/main/mtsamples_processed/.*\.txt$")}
)
]
return file_links
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the GitHub API and pin githash (same comments as for mtsamples_procedures_scenario)

@@ -1958,6 +1976,15 @@ models:
num_parameters: 14000000000
release_date: 2024-05-21
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]

- name: microsoft/phi-3.5-mini-instruct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

microsoft/phi-3.5-mini-instruct already exists; remove.

release_date: 2024-09-25
tags: [TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]

- name: meta/llama-3.1-8b-instruct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta/llama-3.1-8b-instruct already exists; delete

@@ -1530,6 +1530,24 @@ models:
release_date: 2022-12-22
tags: [] # TODO: add tags

- name: meta/llama-3.2-1b-instruct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to right before the entry for meta/llama-3.2-3b-instruct-turbo.

Comment on lines 13 to 30
get_instructions,
extract_patient_id_from_fname,
get_ehrs,
get_tokenizer,
tag_rgx_expression,
fetch_nodes_with_tag,
cast_dtype,
check_condition,
check_all_conditions,
remove_node,
query_xml_str,
filter_events,
retrieve_most_relevant_visits,
get_prompt_template,
pack_and_trim_prompts,
preprocess_prompts,
add_reference_responses,
return_dataset_dataframe,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you only need to import return_dataset_dataframe.

# get the patient EHR selected for this instruction
pt_id: Union[str, int] = instruction_dict["patient_id"]
relevant_ehr = ehrs[pt_id] # type: ignore
prompt = PassageQuestionInput(passage="", question=question)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping.

@staticmethod
def get_date_of_note(patient: Dict[str, Any], note_idx: int) -> str:
"""Get date of note for patient"""
if not isinstance(note_idx, int):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval() is insecure code. Please either use int() instead, or remove this if block.

Comment on lines +46 to +53
cursor.execute(ground_truth_sql)
fetched_result = cursor.fetchone()
if fetched_result:
# Convert extra_values to match SQLite's expected types
converted_values = [
type(fetched_result[i])(extra_values[i]) for i in range(len(extra_values))
]
ground_truth_result = converted_values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work? If we're in this block, then it means that cursor.fetchall() returned a false-y value or that the query failed, so re-running the query should also result in failure. I'm fine with just using extra_values as is (i.e. the original verison).

Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you all!

@yifanmai yifanmai merged commit 87cd4d8 into main Mar 19, 2025
8 checks passed
@yifanmai yifanmai deleted the med-helm branch March 19, 2025 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants