GitHub - TrustGen/TrustEval-toolkit: TrustEval: A modular and extensible toolkit for comprehensive trust evaluation of generative foundation models (GenFMs)

Video Tutorials

Watch step-by-step tutorials on our YouTube channel:

trusteval_video_003.mov

Table of Contents

Video Tutorials
📍 Overview
👾 Features
🚀 Getting Started
- ⚙️ Installation
- 🤖 Usage
Trustworthiness Report
Contributing
License

📍 Overview

TrustEval-toolkit is a dynamic and comprehensive framework for evaluating the trustworthiness of Generative Foundation Models (GenFMs) across dimensions such as safety, fairness, robustness, privacy, and more.

👾 Features

Dynamic Dataset Generation: Automatically generate datasets tailored for evaluation tasks.
Multi-Model Compatibility: Evaluate LLMs, VLMs, T2I models, and more.
Customizable Metrics: Configure workflows with flexible metrics and evaluation methods.
Metadata-Driven Pipelines: Design and execute test cases efficiently using metadata.
Comprehensive Dimensions: Evaluate models across safety, fairness, robustness, privacy, and truthfulness.
Optimized Inference: Faster evaluations with optimized inference pipelines.
Detailed Reports: Generate interactive, easy-to-interpret evaluation reports.

🚀 Getting Started

⚙️ Installation

To install the TrustEval-toolkit, follow these steps:

1. Clone the Repository

git clone https://github.com/nauyisu022/TrustEval-toolkit.git
cd TrustEval-toolkit

2. Set Up a Conda Environment

Create and activate a new environment with Python 3.10:

conda create -n trusteval_env python=3.10
conda activate trusteval_env

3. Install Dependencies

Install the package and its dependencies:

pip install .

🤖 Usage

Configure API Keys

Run the configuration script to set up your API keys:

python trusteval/src/configuration.py

Quick Start

The following example demonstrates an Advanced AI Risk Evaluation workflow.

Step 0: Set Your Project Base Directory

import os
base_dir = os.getcwd() + '/advanced_ai_risk'

Step 1: Download Metadata

from trusteval import download_metadata

download_metadata(
    section='advanced_ai_risk',
    output_path=base_dir
)

Step 2: Generate Datasets Dynamically

from trusteval.dimension.ai_risk import dynamic_dataset_generator

dynamic_dataset_generator(
    base_dir=base_dir,
)

Step 3: Apply Contextual Variations

from trusteval import contextual_variator_cli

contextual_variator_cli(
    dataset_folder=base_dir
)

Step 4: Generate Model Responses

from trusteval import generate_responses

request_type = ['llm']  # Options: 'llm', 'vlm', 't2i'
async_list = ['your_async_model']
sync_list = ['your_sync_model']

await generate_responses(
    data_folder=base_dir,
    request_type=request_type,
    async_list=async_list,
    sync_list=sync_list,
)

Step 5: Evaluate and Generate Reports

Judge the Responses

from trusteval import judge_responses

target_models = ['your_target_model1', 'your_target_model2']
judge_type = 'llm'  # Options: 'llm', 'vlm', 't2i'
judge_key = 'your_judge_key'
async_judge_model = ['your_async_model']

await judge_responses(
    data_folder=base_dir,
    async_judge_model=async_judge_model,
    target_models=target_models,
    judge_type=judge_type,
)

Generate Evaluation Metrics

from trusteval import lm_metric

lm_metric(
    base_dir=base_dir,
    aspect='ai_risk',
    model_list=target_models,
)

Generate Final Report

from trusteval import report_generator

report_generator(
    base_dir=base_dir,
    aspect='ai_risk',
    model_list=target_models,
)

Your report.html will be saved in the base_dir folder. For additional examples, check the examples folder.

Trustworthiness Report

A detailed trustworthiness evaluation report is generated for each dimension. The reports are presented as interactive web pages, which can be opened in a browser to explore the results. The report includes the following sections:

The data shown in the images below is simulated and does not reflect actual results.

Test Model Results

Displays the evaluation scores for each model, with a breakdown of average scores across evaluation dimensions.

Model Performance Summary

Summarizes the model's performance in the evaluated dimension using LLM-generated summaries, highlighting comparisons with other models.

Error Case Study

Presents error cases for the evaluated dimension, including input/output examples and detailed judgments.

Leaderboard

Shows the evaluation results for all models, along with visualized comparisons to previous versions (e.g., our v1.0 results).

Contributing

We welcome contributions from the community! To contribute:

Fork the repository.
Create a feature branch (git checkout -b feature-name).
Commit your changes (git commit -m 'Add feature').
Push to your branch (git push origin feature-name).
Open a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
annotation		annotation
backup		backup
config		config
docs		docs
examples		examples
images		images
trusteval		trusteval
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
logo.png		logo.png
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Tutorials

📍 Overview

👾 Features

🚀 Getting Started

⚙️ Installation

1. Clone the Repository

2. Set Up a Conda Environment

3. Install Dependencies

🤖 Usage

Configure API Keys

Quick Start

Step 0: Set Your Project Base Directory

Step 1: Download Metadata

Step 2: Generate Datasets Dynamically

Step 3: Apply Contextual Variations

Step 4: Generate Model Responses

Step 5: Evaluate and Generate Reports

Trustworthiness Report

Test Model Results

Model Performance Summary

Error Case Study

Leaderboard

Contributing

License

About

Releases

Packages

Contributors 4

Languages

License

TrustGen/TrustEval-toolkit

Folders and files

Latest commit

History

Repository files navigation

Video Tutorials

📍 Overview

👾 Features

🚀 Getting Started

⚙️ Installation

1. Clone the Repository

2. Set Up a Conda Environment

3. Install Dependencies

🤖 Usage

Configure API Keys

Quick Start

Step 0: Set Your Project Base Directory

Step 1: Download Metadata

Step 2: Generate Datasets Dynamically

Step 3: Apply Contextual Variations

Step 4: Generate Model Responses

Step 5: Evaluate and Generate Reports

Trustworthiness Report

Test Model Results

Model Performance Summary

Error Case Study

Leaderboard

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages