safety-examples

This repository uses safety-tooling as a submodule and showcases how to use the LLM api, experiment utils, prompt utils and environment setup. The repo is set up to have core code in examples and lightweight scripts that call that code in experiments/<exp_name>. We recommend forking this repository to get started with your project since it has pre-commit hooks and the submodule setup already. Even if you think you won’t need any of this tooling, we suggest forking anyway since it will be easy for you to add your own cool tooling to safety-tooling in the future if you would like.

Set-up

First pull the submodules:

git submodule update --init --recursive

Follow the instructions in safety-tooling/README.md to set up the environment. Ensure to make a .env file with your API keys. This .env file can be in the root of the repo or in the safety-tooling submodule (safetytooling.utils.setup_environment() will check both places).
Install the requirements:

uv pip install -e safety-tooling
uv pip install -r safety-tooling/requirements_dev.txt
uv pip install -r requirements.txt # once you have added any other dependencies

Run tests

To run tests, cd into the safety-tooling directory and run:

python -m pytest -n 6

Add the submodule to your pythonpath

The submodule must be in your pythonpath for you to be able to use it. There are a few ways to do this:

Recommended: Install the submodule with uv pip install -e safety-tooling
Add the submodule to your pythonpath in the <main_module>/__init__.py of your main module (e.g. see examples/__init__.py). You then have to call your code like python -m <main_module>.<other_module>.

Add the submodule to your pythonpath manually. E.g. in a notebook, you can use this function at the top of your notebooks:

import os
import pathlib
import sys


def put_submodule_in_python_path(submodule_name: str):
    repo_root = pathlib.Path(os.getcwd())
    submodule_path = repo_root / submodule_name
    if submodule_path.exists():
        sys.path.append(str(submodule_path))

put_submodule_in_python_path("safety-tooling")

Examples

The repo has 4 examples:

Getting responses from an LLM to harmful direct requests and classifying them with another LLM. This involves chaining two scripts together.
- experiments/examples/241223_running_examples/run_get_responses.sh
- Stage 1: Gets LLM responses with examples.inference.get_responses
  - This adds a new column to the input jsonl (experiments/examples/241223_running_examples/direct_request.jsonl) and saves it as a new output file. This workflow is nice because you get to keep all metadata along with the model output.
- Stage 2: Gets the classifier output using the HarmBench grader with examples.inference.run_classifier
  - This also adds another column for the classifier decision and outputs a new jsonl.
  - Once finished you can now analyse the results in a notebook quickly with pandas.
Best of N jailbreaking
- experiments/examples/241223_running_examples/run_bon_jailbreaking.sh
- Repeated sampling of new augmented harmful queries until one jailbreaks the model.
PAIR attack
- experiments/examples/241223_running_examples/run_pair.sh
- Use an attacker LLM to adapt and rewrite harmful prompts until it jailbreaks the model.
Multi choice benchmarks
- experiments/examples/241223_running_examples/run_capability_evals.sh
- Run benchmarks like MMLU and find the accuracy.

Citation

If you use this repo in your work, please cite it as follows:

@misc{safety_examples_2025,
  author       = {John Hughes and safety-research},
  title        = {safety-research/safety-examples: v1.0.0},
  year         = {2025},
  publisher    = {Zenodo},
  version      = {v1.0.0},
  doi          = {10.5281/zenodo.15363605},
  url          = {https://doi.org/10.5281/zenodo.15363605}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.vscode		.vscode
examples		examples
experiments/examples/241223_running_examples		experiments/examples/241223_running_examples
prompts		prompts
safety-tooling @ 07161eb		safety-tooling @ 07161eb
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

safety-examples

Set-up

Run tests

Add the submodule to your pythonpath

Examples

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

safety-research/safety-examples

Folders and files

Latest commit

History

Repository files navigation

safety-examples

Set-up

Run tests

Add the submodule to your pythonpath

Examples

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages