Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/run-ut-on-pr-py.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ env:
PY310_VERSION: 3.10.12
jobs:
pr_run_test:
runs-on: [self-hosted, Linux]
runs-on: [self-hosted, Linux, run]
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow configuration change appears unrelated to the bugfix described in the PR title and description. The PR is specifically about fixing syntax errors in the MBPPPass@kEvaluator class, but this change modifies CI/CD runner configuration. Consider removing this change or creating a separate PR for infrastructure updates.

Suggested change
runs-on: [self-hosted, Linux, run]
runs-on: ubuntu-latest

Copilot uses AI. Check for mistakes.
timeout-minutes: 20
steps:
- name: Checkout code
Expand Down
11 changes: 6 additions & 5 deletions ais_bench/benchmark/datasets/mbpp.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ def __init__(self, metric: str = 'MBPP') -> None:
DSET_CODES.INVALID_MBPP_METRIC,
f"MBPP evaluator metric must be 'MBPP' or 'MBPPPlus', got '{self.metric}'"
)
super.__init__()
super().__init__()

def score(self, predictions, references):
if len(predictions) != len(references):
Expand Down Expand Up @@ -397,13 +397,13 @@ def _execution(programs, timeout):
exec(programs, exec_globals)
key.append('pass')
except TimeOutException:
logger.debug(f"Program execution timeout for index {index}")
logger.debug(f"Program execution timeout for task_id {task_id}")
key.append('timeout')
except AssertionError as e:
logger.debug(f"Program assertion failed for index {index}: {e}")
logger.debug(f"Program assertion failed for task_id {task_id}: {e}")
key.append('wrong_answer')
except BaseException as e:
logger.debug(f"Program execution failed for index {index}: {e}")
logger.debug(f"Program execution failed for task_id {task_id}: {e}")
key.append('failed')

manager = multiprocessing.Manager()
Expand All @@ -428,10 +428,11 @@ class MBPPPassKEvaluator(MBPPEvaluator):
k(Tuple[int]): Choices of Pass@k. Defaults to (1, 10, 100)
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should be updated to document the new 'metric' parameter that was added to the constructor. Include the parameter type, description, and default value to maintain consistency with documentation standards.

Suggested change
k(Tuple[int]): Choices of Pass@k. Defaults to (1, 10, 100)
k(Tuple[int]): Choices of Pass@k. Defaults to (1, 10, 100).
metric (str): Name of the evaluation metric. Defaults to 'MBPP'.

Copilot uses AI. Check for mistakes.
"""

def __init__(self, k=(1, 10, 100)) -> None:
def __init__(self, k=(1, 10, 100), metric: str = 'MBPP') -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the addition of the metric parameter, the class docstring for MBPPPassKEvaluator is now out of sync. Please update the docstring to include documentation for the new metric parameter to improve code clarity and maintainability.

if not isinstance(k, Sequence):
k = (k, )
self.k = k
super().__init__(metric=metric)
Comment on lines +431 to +435
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new metric parameter lacks test coverage. Consider adding test cases to verify that the metric parameter is correctly passed to the parent MBPPEvaluator class and that both 'MBPP' and 'MBPPPlus' values work correctly with MBPPPassKEvaluator.

Copilot uses AI. Check for mistakes.

@staticmethod
def estimate_pass_at_k(
Expand Down
Loading