Skip to content

Report Generation Agent: Adding Trajectory Evaluation#27

Merged
lotif merged 8 commits into
mainfrom
marcelo/trajectory-eval
Feb 2, 2026
Merged

Report Generation Agent: Adding Trajectory Evaluation#27
lotif merged 8 commits into
mainfrom
marcelo/trajectory-eval

Conversation

@lotif
Copy link
Copy Markdown
Collaborator

@lotif lotif commented Feb 2, 2026

Summary

Adding an agent for trajectory evaluation + some minor refactors.

For an example of how the evals are looking like, please see:
https://us.cloud.langfuse.com/project/cmkwsswke005dad07gxujnipq/datasets/cml5fp1y305luad06kocwn33q/runs/4a3bd97b-99ed-41ee-ad9e-01ec6d3f179e

Clickup Ticket(s): NA

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

  • Moved all prompt templates to a prompts.py
  • Added trajectory info to the ground truth dataset
  • Added the trajectory evaluator to the evaluate.py script
  • Updated README.md

Testing

  • Tests pass locally (uv run pytest tests/)
  • Type checking passes (uv run mypy <src_dir>)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:
Executed the import data and evaluate scripts

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated (if applicable)
  • No sensitive information (API keys, credentials) exposed

lotif added 8 commits January 29, 2026 17:15
commit b4e124d
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 29 17:00:58 2026 -0500

    Small fixes, additional logging and updated groud truth

commit 7d59004
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 16:30:23 2026 -0500

    Upgrading python-multipart + small improvements

commit 2906b36
Merge: 285591b bba7326
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 16:12:03 2026 -0500

    Merge branch 'main' into marcelo/langfuse-integration

commit 285591b
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 16:09:16 2026 -0500

    Adding readme instructions

commit 37348c0
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 15:53:47 2026 -0500

    Minor improvements

commit 9fdc71d
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 15:46:25 2026 -0500

    Addingh evaluator and retry mechanism

commit 5af7152
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 14:41:36 2026 -0500

    Using langfuse to upload a dataset and run the evaluation

commit c1980fe
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 12:39:33 2026 -0500

    Adding the eval dataset and making changes to the eval script. Adding tenacity for retrying mechanism

commit 02c3ac5
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 16:57:37 2026 -0500

    Added code comments

commit da9b0c9
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 16:50:06 2026 -0500

    Finished using LLMs to evaluate result

commit f0af403
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 13:51:06 2026 -0500

    Moving forward with the evaluation script + some more refactorings

commit 93ee157
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 11:36:52 2026 -0500

    Reporting to langfuse and removed clutter

commit d029285
Merge: a39ac1d 9549395
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 11:00:28 2026 -0500

    Merge branch 'main' into marcelo/langfuse-integration

commit a39ac1d
Merge: cdf0647 efd80cb
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 17:09:09 2026 -0500

    Merge branch 'marcelo/report-agent' into marcelo/langfuse-integration

commit efd80cb
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 17:03:37 2026 -0500

    CR by Franklin

commit 7a2a57f
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 16:31:49 2026 -0500

    CR by Franklin

commit cdf0647
Merge: 53d0589 534f8e5
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 16:25:19 2026 -0500

    Merge branch 'marcelo/report-agent' into marcelo/langfuse-integration

commit 534f8e5
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 16:19:30 2026 -0500

    CR by Franklin

commit 53d0589
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 16:07:33 2026 -0500

    Some more langfuse things

commit 40dfc6f
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 13:42:41 2026 -0500

    Parsing client responses into langfuse traces

commit 20e4ec5
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 11:42:38 2026 -0500

    Small refactor

commit ee8b854
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 11:36:14 2026 -0500

    Moving env and logging config to the top of the file

commit 66a4494
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 11:13:05 2026 -0500

    CR by Amrit

commit f9d7862
Merge: dc02ff2 9042ace
Author: Marcelo Lotif <lotif@users.noreply.github.com>
Date:   Mon Jan 26 11:12:42 2026 -0500

    Merge branch 'main' into marcelo/report-agent

commit dc02ff2
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 23 12:56:56 2026 -0500

    Grammar fixes

commit 530360e
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 23 12:42:50 2026 -0500

    Adding a couple more vulnerabilities to the skip list

commit 7bb081f
Merge: 6e3c4c2 bd34ef0
Author: Marcelo Lotif <lotif@users.noreply.github.com>
Date:   Fri Jan 23 12:37:19 2026 -0500

    Merge branch 'main' into marcelo/report-agent

commit 6e3c4c2
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 23 12:35:08 2026 -0500

    One more readme paragraph

commit 37b4000
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 23 12:27:23 2026 -0500

    Movign files around, adding the ddl file and the import script

commit 3458565
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 22 14:39:47 2026 -0500

    Generating xlsx reports

commit 22fc569
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 22 12:28:40 2026 -0500

    Adding more report examples

commit 6592a1c
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 22 11:55:41 2026 -0500

    Deleting weaviate stuff, using Online Retail dataset instead

commit 0098f7d
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 22 11:37:51 2026 -0500

    Weaviate local and remote scripts

commit 9e6ce2e
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 21 11:47:00 2026 -0500

    Adding data import for the online retail dataset and some more instructions

commit a77a60f
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 16 17:56:15 2026 -0500

    WIp trying to make it work
commit 4507d52
Merge: b4e124d 412298a
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 30 16:58:09 2026 -0500

    Merge branch 'main' into marcelo/langfuse-integration

commit 412298a
Author: Amrit Krishnan <amrit110@gmail.com>
Date:   Fri Jan 30 13:29:20 2026 -0500

    Feature/knowledge agent (#18)

    * Add initial working implementation using search grounding

    * [pre-commit.ci] Add auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

    * Remove example implementation

    * Fix GHSA-wp53-j4wj-2cfg, pin python-multipart version

    * Update agent to ReAct, fix grounding tool

    * Update README.md

    * Add tracing to langfuse

    * Clear notebook cells

    * Remove python-multipart as direct dependency and only update it

    * Remove D103 and E402 from being ignored in pre-commit check and fix notebooks

    * Move imports to top of the file

    * Simplify tracing module to just read directly from env variables

    * Rename async client manager for agent, reuse existing async client manager for tracing

    * Clarify optional dataset variable in docstring

    * Fix format_response_with_citations

    * Return results instead of modifying input params

    * Use pydantic native desc docstring instead of numpy style

    * Unify config to use same across agents

    * Use ADK's session management, remove custom implementation

    * Remove weaviate from client manager

    ---------

    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

commit b4e124d
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 29 17:00:58 2026 -0500

    Small fixes, additional logging and updated groud truth

commit 7d59004
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 16:30:23 2026 -0500

    Upgrading python-multipart + small improvements

commit 2906b36
Merge: 285591b bba7326
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 16:12:03 2026 -0500

    Merge branch 'main' into marcelo/langfuse-integration

commit 285591b
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 16:09:16 2026 -0500

    Adding readme instructions

commit 37348c0
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 15:53:47 2026 -0500

    Minor improvements

commit 9fdc71d
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 15:46:25 2026 -0500

    Addingh evaluator and retry mechanism

commit 5af7152
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 14:41:36 2026 -0500

    Using langfuse to upload a dataset and run the evaluation

commit c1980fe
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 28 12:39:33 2026 -0500

    Adding the eval dataset and making changes to the eval script. Adding tenacity for retrying mechanism

commit 02c3ac5
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 16:57:37 2026 -0500

    Added code comments

commit da9b0c9
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 16:50:06 2026 -0500

    Finished using LLMs to evaluate result

commit f0af403
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 13:51:06 2026 -0500

    Moving forward with the evaluation script + some more refactorings

commit 93ee157
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 11:36:52 2026 -0500

    Reporting to langfuse and removed clutter

commit d029285
Merge: a39ac1d 9549395
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Tue Jan 27 11:00:28 2026 -0500

    Merge branch 'main' into marcelo/langfuse-integration

commit a39ac1d
Merge: cdf0647 efd80cb
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 17:09:09 2026 -0500

    Merge branch 'marcelo/report-agent' into marcelo/langfuse-integration

commit efd80cb
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 17:03:37 2026 -0500

    CR by Franklin

commit 7a2a57f
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 16:31:49 2026 -0500

    CR by Franklin

commit cdf0647
Merge: 53d0589 534f8e5
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 16:25:19 2026 -0500

    Merge branch 'marcelo/report-agent' into marcelo/langfuse-integration

commit 534f8e5
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 16:19:30 2026 -0500

    CR by Franklin

commit 53d0589
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 16:07:33 2026 -0500

    Some more langfuse things

commit 40dfc6f
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 13:42:41 2026 -0500

    Parsing client responses into langfuse traces

commit 20e4ec5
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 11:42:38 2026 -0500

    Small refactor

commit ee8b854
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 11:36:14 2026 -0500

    Moving env and logging config to the top of the file

commit 66a4494
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Mon Jan 26 11:13:05 2026 -0500

    CR by Amrit

commit f9d7862
Merge: dc02ff2 9042ace
Author: Marcelo Lotif <lotif@users.noreply.github.com>
Date:   Mon Jan 26 11:12:42 2026 -0500

    Merge branch 'main' into marcelo/report-agent

commit dc02ff2
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 23 12:56:56 2026 -0500

    Grammar fixes

commit 530360e
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 23 12:42:50 2026 -0500

    Adding a couple more vulnerabilities to the skip list

commit 7bb081f
Merge: 6e3c4c2 bd34ef0
Author: Marcelo Lotif <lotif@users.noreply.github.com>
Date:   Fri Jan 23 12:37:19 2026 -0500

    Merge branch 'main' into marcelo/report-agent

commit 6e3c4c2
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 23 12:35:08 2026 -0500

    One more readme paragraph

commit 37b4000
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 23 12:27:23 2026 -0500

    Movign files around, adding the ddl file and the import script

commit 3458565
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 22 14:39:47 2026 -0500

    Generating xlsx reports

commit 22fc569
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 22 12:28:40 2026 -0500

    Adding more report examples

commit 6592a1c
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 22 11:55:41 2026 -0500

    Deleting weaviate stuff, using Online Retail dataset instead

commit 0098f7d
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Thu Jan 22 11:37:51 2026 -0500

    Weaviate local and remote scripts

commit 9e6ce2e
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Wed Jan 21 11:47:00 2026 -0500

    Adding data import for the online retail dataset and some more instructions

commit a77a60f
Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai>
Date:   Fri Jan 16 17:56:15 2026 -0500

    WIp trying to make it work
Copy link
Copy Markdown
Member

@amrit110 amrit110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Haven't tried it but the code looks fine.

@lotif lotif merged commit fcf41a9 into main Feb 2, 2026
3 checks passed
@lotif lotif deleted the marcelo/trajectory-eval branch February 2, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants