FreeEval Visualizer is a web-based tool designed to help researchers and practitioners visualize and analyze evaluation results for large language models. It provides an intuitive interface for exploring evaluation data, conducting human evaluations, and gaining insights into model performance.
- Dashboard: Get an overview of evaluation results with interactive charts and summary statistics.
- Analysis: Dive deep into the data with detailed visualizations and correlation analysis.
- Case Browser: Easily search and filter through individual evaluation cases.
- Human Evaluation: Create and manage human evaluation sessions for more nuanced assessments.
- Multi-mode Support: Compatible with various evaluation types including pairwise comparisons, direct scoring, and matching evaluations.
-
Clone the repository:
git clone https://github.com/WisdomShell/FreeEval.git cd FreeEval
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Run evaluation with FreeEval, and the results for visualization will be saved in a JSON file, the path will be shown in the console output.
-
Start the Flask development server:
python visualizer/app.py --mode [evaluation-mode] --result-path [path-to-results-json] --port [port-number] --addr [address]
Replace
[evaluation-mode]
with eitherpairwise-comparison
,direct-scoring
, ormatching
. -
Open a web browser and navigate to
http://localhost:[port-number]
(replace with the actual port number you specified). -
Use the sidebar navigation to explore different features of the visualizer.
To conduct human evaluations:
- Click on "Human Evaluation" in the sidebar.
- Create a new evaluation session or load an existing one.
- Follow the on-screen instructions to annotate cases.
- Use the progress bar to track your annotation progress.
Contributions to FreeEval Visualizer are welcome! Please feel free to submit a Pull Request.
- This project is part of the FreeEval framework for evaluating large language models.
- Built with Flask, Tailwind CSS, and Flowbite components.