Refactor Hint Retrieval #295

patricebechard · 2025-09-10T17:22:29Z

Description by Korbit AI

What change is being made?

Refactor hint retrieval system by introducing HintsSource class and updating the generic agent to incorporate hint retrieval including various methods like direct, LLM, and embeddings, as well as new scripts for generic and hinter agents.

Why are these changes being made?

Previously, hint retrieval functionality was scattered and redundant, making it difficult to manage and extend. This refactor consolidates hint retrieval into a dedicated HintsSource class, providing a cleaner, reusable, and more maintainable approach. It also enhances the capability of hint retrieval using different methods, allowing for flexible experimentation with poor performing hints alleviation.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

…hinter

korbit-ai · 2025-09-10T17:22:34Z

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Static parallel job allocation ▹ view
	Invalid LLM Model Reference ▹ view
	In-function NumPy Import ▹ view
	Ignored Parallel Backend Parameter ▹ view
	Unsafe Environment Variable Loading ▹ view
	Unsafe Study Data Loading ▹ view
	Unhandled dictionary key access ▹ view
	Undocumented numeric parameters ▹ view
	Missing Ray Backend Validation ▹ view
	Missing configuration value documentation ▹ view

Files scanned

File Path	Reviewed
experiments/generic/run_generic_agent.sh	✅
experiments/hinter/run_hinter_agent.sh	✅
experiments/generic/run_generic_agent.py	✅
experiments/hinter/run_hinter_agent.py	✅
src/agentlab/utils/hinting.py	✅
src/agentlab/agents/generic_agent_hinter/generic_agent.py	✅
src/agentlab/agents/generic_agent_hinter/generic_agent_prompt.py	✅
src/agentlab/agents/tool_use_agent/tool_use_agent.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-09-11T19:20:44Z

experiments/generic/run_generic_agent.sh

+PARALLEL_BACKEND="ray"
+
+N_JOBS=5


Static parallel job allocation

Tell me more

What is the issue?

The script sets a fixed number of parallel jobs (N_JOBS=5) without considering the host system's CPU resources.

Why this matters

Without adapting to available CPU cores, this could lead to either underutilization of system resources or resource contention, impacting overall performance.

Suggested change ∙ Feature Preview

Dynamically set N_JOBS based on available CPU cores. Add the following before N_JOBS assignment:

# Use 75% of available CPU cores by default N_JOBS=$(( $(nproc) * 3 / 4 ))

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-09-11T19:20:44Z

experiments/generic/run_generic_agent.sh

+
+BENCHMARK="workarena_l1"
+
+LLM_CONFIG="azure/gpt-5-mini-2025-08-07"


Invalid LLM Model Reference

Tell me more

What is the issue?

The script references a non-existent GPT model with a future date (2025-08-07), which will cause the program to fail.

Why this matters

The program will fail to run as it cannot connect to a model that doesn't exist yet, preventing the experiment from executing.

Suggested change ∙ Feature Preview

Replace with an existing GPT model configuration, for example:

LLM_CONFIG="azure/gpt-4-0613"

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-09-11T19:20:45Z

experiments/generic/run_generic_agent.py

+    import numpy as np
+    rng = np.random.default_rng(42)
+    rng.shuffle(benchmark.env_args_list)


In-function NumPy Import

Tell me more

What is the issue?

NumPy is imported inside the function rather than at module level, causing unnecessary import overhead on each function call.

Why this matters

Importing modules inside functions creates overhead as Python needs to process the import each time the function is called. This is especially important in performance-critical applications or when the function is called frequently.

Suggested change ∙ Feature Preview

Move the NumPy import to the top of the file with other imports:

import numpy as np

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-09-11T19:20:45Z

experiments/generic/run_generic_agent.py

+    parser.add_argument("--parallel-backend", type=str, default="ray")
+    parser.add_argument("--reproducibility-mode", action="store_true")
+
+    args = parser.parse_args()
+
+    # instantiate agent
+    agent_args = [get_base_agent(args.llm_config)]
+    benchmark = DEFAULT_BENCHMARKS[args.benchmark]()
+
+    ##################### Shuffle env args list, pick subset
+    import numpy as np
+    rng = np.random.default_rng(42)
+    rng.shuffle(benchmark.env_args_list)
+    benchmark.env_args_list = benchmark.env_args_list[:33]
+    #####################
+
+    # for env_args in benchmark.env_args_list:
+        # env_args.max_steps = 100
+
+    if args.relaunch:
+        #  relaunch an existing study
+        study = Study.load_most_recent(contains=None)
+        study.find_incomplete(include_errors=True)
+
+    else:
+        study = Study(
+            agent_args,
+            benchmark,
+            logging_level=logging.WARNING,
+            logging_level_stdout=logging.WARNING,
+        )
+
+    study.run(
+        n_jobs=args.n_jobs,
+        parallel_backend="ray",


Ignored Parallel Backend Parameter

Tell me more

What is the issue?

The parallel_backend argument is hardcoded in study.run() despite accepting it as a command-line argument, making the CLI parameter ineffective.

Why this matters

Ignoring the user-specified parallel backend could lead to suboptimal performance if the user has chosen a backend better suited for their specific workload or environment.

Suggested change ∙ Feature Preview

Use the command-line argument in the study.run() call:

study.run( n_jobs=args.n_jobs, parallel_backend=args.parallel_backend, strict_reproducibility=args.reproducibility_mode, n_relaunch=args.n_relaunch, )

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-09-11T19:20:45Z

experiments/generic/run_generic_agent.py

+from dotenv import load_dotenv
+
+load_dotenv()


Unsafe Environment Variable Loading

Tell me more

What is the issue?

Unconditional loading of environment variables without error handling or path specification.

Why this matters

If the .env file is missing or inaccessible, the application will continue without environment variables, potentially exposing sensitive configuration or causing runtime errors if required variables are missing.

Suggested change ∙ Feature Preview

from dotenv import load_dotenv import sys if not load_dotenv(): print("Error: Failed to load .env file") sys.exit(1)

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-09-11T19:20:45Z

experiments/generic/run_generic_agent.py

+
+    if args.relaunch:
+        #  relaunch an existing study
+        study = Study.load_most_recent(contains=None)


Unsafe Study Data Loading

Tell me more

What is the issue?

Loading arbitrary most recent study data without validation or access control.

Why this matters

Without proper access control or validation, the code could load sensitive or malicious study data from the filesystem that was placed there by another user.

Suggested change ∙ Feature Preview

# Add path validation and access control study_path = Study.get_most_recent_path(contains=None) if not is_safe_study_path(study_path): # implement this function to validate path raise SecurityError("Invalid or unauthorized study path") study = Study.load_most_recent(contains=None)

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-09-11T19:20:45Z

experiments/generic/run_generic_agent.py

+
+    # instantiate agent
+    agent_args = [get_base_agent(args.llm_config)]
+    benchmark = DEFAULT_BENCHMARKS[args.benchmark]()


Unhandled dictionary key access

Tell me more

What is the issue?

Dictionary access of DEFAULT_BENCHMARKS with user input is not wrapped in a try-catch block to handle KeyError exceptions.

Why this matters

If an invalid benchmark name is provided, the program will crash with an uncaught KeyError instead of providing a helpful error message.

Suggested change ∙ Feature Preview

try: benchmark = DEFAULT_BENCHMARKS[args.benchmark]() except KeyError: print(f"Error: '{args.benchmark}' is not a valid benchmark. Available benchmarks: {list(DEFAULT_BENCHMARKS.keys())}") exit(1)

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-09-11T19:20:45Z

experiments/generic/run_generic_agent.sh

+N_JOBS=5
+N_RELAUNCH=3


Undocumented numeric parameters

Tell me more

What is the issue?

The numerical configuration values lack explanation of their purpose and constraints.

Why this matters

Without context, it's not clear what these numbers control or what ranges are appropriate.

Suggested change ∙ Feature Preview

# Number of parallel jobs to run (recommended: 1-10) N_JOBS=5 # Number of retry attempts for failed jobs (recommended: 1-5) N_RELAUNCH=3

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-09-11T19:20:45Z

experiments/generic/run_generic_agent.sh

+# PARALLEL_BACKEND="sequential"
+PARALLEL_BACKEND="ray"


Missing Ray Backend Validation

Tell me more

What is the issue?

The script doesn't validate if Ray is properly installed and initialized before using it as the parallel backend.

Why this matters

Without proper Ray initialization checks, the program may fail at runtime if Ray is not available in the environment.

Suggested change ∙ Feature Preview

Add Ray availability check before running the script:

# Check if Ray is available if [ "$PARALLEL_BACKEND" = "ray" ]; then python -c "import ray" > /dev/null 2>&1 || { echo "Error: Ray is not installed"; exit 1; } fi python experiments/generic/run_generic_agent.py \

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-09-11T19:20:45Z

experiments/generic/run_generic_agent.sh

+BENCHMARK="workarena_l1"
+
+LLM_CONFIG="azure/gpt-5-mini-2025-08-07"


Missing configuration value documentation

Tell me more

What is the issue?

The hardcoded values lack comments explaining what they represent and what valid options are available.

Why this matters

Without documentation, future maintainers won't know what other benchmark types or LLM configurations are valid choices.

Suggested change ∙ Feature Preview

# Benchmark type to run (options: workarena_l1, workarena_l2, etc) BENCHMARK="workarena_l1" # LLM configuration path (format: provider/model-name-version) LLM_CONFIG="azure/gpt-5-mini-2025-08-07"

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

patricebechard added 4 commits September 9, 2025 16:50

(wip) refactor hinting index

c86873b

(wip) clean up prompt file

7e55cd7

add scripts to run generic and hinter agents, update tmlr config for …

66b9692

…hinter

move HintsSource to separate hinting file

d2166b3

patricebechard added 5 commits September 10, 2025 16:06

update hinter agent and prompt

60ad8e4

fix prompt for task hint

4a2c7de

undo changes to tmlr config

eafd5fc

update hinter agent

70d701e

formatting

91119d6

patricebechard requested a review from hnekoeiq September 11, 2025 19:03

patricebechard marked this pull request as ready for review September 11, 2025 19:04

korbit-ai bot reviewed Sep 11, 2025

View reviewed changes

patricebechard added 2 commits September 12, 2025 17:24

bug fix hint retrieval

a3b6ca4

improve launch script

49ebc89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Hint Retrieval #295

Refactor Hint Retrieval #295

Uh oh!

patricebechard commented Sep 10, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot commented Sep 10, 2025

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

korbit-ai bot Sep 11, 2025

Uh oh!

Uh oh!


		BENCHMARK="workarena_l1"

		LLM_CONFIG="azure/gpt-5-mini-2025-08-07"

		PARALLEL_BACKEND="ray"

		N_JOBS=5

		# PARALLEL_BACKEND="sequential"
		PARALLEL_BACKEND="ray"

Refactor Hint Retrieval #295

Are you sure you want to change the base?

Refactor Hint Retrieval #295

Uh oh!

Conversation

patricebechard commented Sep 10, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot commented Sep 10, 2025

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

korbit-ai bot Sep 11, 2025

Choose a reason for hiding this comment

Static parallel job allocation

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Sep 11, 2025

Choose a reason for hiding this comment

Invalid LLM Model Reference

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Sep 11, 2025

Choose a reason for hiding this comment

In-function NumPy Import

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Sep 11, 2025

Choose a reason for hiding this comment

Ignored Parallel Backend Parameter

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Sep 11, 2025

Choose a reason for hiding this comment

Unsafe Environment Variable Loading

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Sep 11, 2025

Choose a reason for hiding this comment

Unsafe Study Data Loading

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Sep 11, 2025

Choose a reason for hiding this comment

Unhandled dictionary key access

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Sep 11, 2025

Choose a reason for hiding this comment

Undocumented numeric parameters

What is the issue?

Why this matters

patricebechard commented Sep 10, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading