-
Notifications
You must be signed in to change notification settings - Fork 569
benchmarking for prompting strategies with no Human in the loop #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
benchmarks/prompt-bench/README.md
Outdated
## Project Folder Structure | ||
In this section, the main folder structure is described. | ||
```plaintext | ||
llm-cai-project/ # Root directory of the project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cristobalvch i think this route is wrong
|
||
**Fully Automated (No HITL):** | ||
The pipeline is designed to be **fully automated, with no Human-in-the-Loop (HITL)**. When the agent attempts to solve the challenge labs, **no human interaction with the model is required**; all decisions, iterations, and actions are executed autonomously according to the experiment’s configuration and the prompt templates. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cristobalvch could you please add a "results" section in here wherein you include images and summarize results obtained, that'd be very interesting.
I really liked the implementation! I tested it and it works correctly.
Also, if you could add a table with all the PortSwigger challenges, that would be awesome 🙌 Thank you very much for your collaboration @cristobalvch |
I'm on it! |
@cristobalvch ping us whenever this is ready for another review and thanks for the contrib! |
yes thanks for waiting me! It's almost complete. I just have to run again some evaluations to add the results sections with better metrics performance :) |
Benchmark designed to evaluate a fully automated integration of LLMs (Large Language Models) with no HITL (Human-in-the-Loop) into web application attack scenarios using CAI (Cybersecurity AI). Its goal is to test various prompting strategies and different LLMs to assess their effectiveness in identifying vulnerabilities within web applications.