GenAI Performance Analyzer

A streamlined tool for analyzing and visualizing performance metrics of Large Language Models (LLMs) across different configurations and runtime environments. Compatible with performance data generated by NVIDIA's GenAI-Perf tool.

Features

Interactive Visualization: Dynamic plots with:
- Latency distributions shown as box plots with throughput on x-axis
- Performance metrics plotted against request throughput
- Statistical indicators (mean, quartiles, P90, P99)
Configuration Comparison: Compare different model configurations side by side with:
- Overlaid box plots for latency distributions
- Trend lines showing metric variations with throughput
- Color-coded model configurations for easy differentiation
Model Configuration Display:
- Compact model selection format: model_name (CLOUD | INSTANCE | GPU | ENGINE | GPU_CONFIG | PARALLEL)
- Detailed configuration panel showing:
  - Hardware details (Cloud, Instance, GPU)
  - Software configuration (Engine, GPU Config, Parallelism)
  - Optimization strategy
Metric Analysis: Analyze various performance metrics including:
- Request Latency
- Time to First Token
- Inter-token Latency
- Request Throughput
- Output Token Throughput
- Output Token Throughput per Request

Architecture

The application follows a modular architecture with clear separation of concerns. View the detailed Architecture Documentation for:

System Components Overview
Data Flow Diagram
Component Descriptions
Technology Stack Details

Interface Preview

Click to view interface screenshots

Latency Distribution View

Analyze latency distributions across different token configurations with interactive box plots and statistical insights

Model Comparison View

Compare performance metrics between different model configurations and concurrency levels

Overall Interface

Complete interface with sidebar controls and metric visualization panels

Installation

Clone the repository:

git clone https://github.com/yourusername/genai-perf-analyzer.git
cd genai-perf-analyzer

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Place your performance data in the data/ directory following the expected format:

data/
├── meta_llama-3.1-8b_aws_p5.48xlarge_h100_tensorrt_llm-h100-fp8-tp1-pp1-throughput/
│   ├── meta_llama-3.1-8b-instruct-openai-chat-concurrency1/
│   │   ├── 200_200_genai_perf.json    # Input 200, Output 200 tokens
│   │   ├── 200_5_genai_perf.json      # Input 200, Output 5 tokens
│   │   └── 1000_200_genai_perf.json   # Input 1000, Output 200 tokens
│   └── meta_llama-3.1-8b-instruct-openai-chat-concurrency2/
│       └── ...
└── meta_llama-3.1-8b_aws_g5.12xlarge_a10g_tensorrt_llm-a10g-bf16-tp2-latency/
    └── meta_llama-3.1-8b-instruct-openai-chat-concurrency1/
        └── ...

Directory naming convention:

Top level: {model_name}_{cloud_provider}_{instance_type}_{gpu_type}_{model_profile} where model_profile contains:
- Engine type (e.g., tensorrt_llm, vllm)
- GPU name (e.g., h100, a10g)
- Precision (e.g., fp8, bf16)
- Tensor parallelism (e.g., tp1, tp2)
- Pipeline parallelism (e.g., pp1)
- Optimization target (throughput/latency)
Test runs: {model_name}-{api_type}-concurrency{N}
Results: {input_tokens}_{output_tokens}_genai_perf.json

The model profile information is displayed in a consistent format throughout the application:

Selection dropdown: meta_llama-3.1-8b (AWS | p5.48xlarge | A100 | TRT | H100-FP8 | TP1-PP1)
Detailed view shows:
- Hardware: Cloud provider, Instance type, GPU model
- Software: Engine type, GPU configuration, Parallelism strategy
- Additional: Optimization target

Run the Streamlit application:

streamlit run app/main.py

Access the web interface at http://localhost:8501

Data Format

The analyzer expects performance data in JSON format compatible with NVIDIA's GenAI-Perf output structure:

{
  "request_latency": {
    "unit": "ms",
    "avg": 100.0,
    "p50": 95.0,
    "p90": 150.0,
    "p95": 180.0,
    "p99": 200.0,
    "min": 50.0,
    "max": 250.0,
    "std": 30.0
  },
  // ... other metrics
}

For information about generating performance data, refer to the GenAI-Perf documentation.

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with Streamlit
Visualization powered by Plotly
Data analysis using NumPy and Pandas

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI Performance Analyzer

Features

Architecture

Interface Preview

Latency Distribution View

Model Comparison View

Overall Interface

Installation

Usage

Data Format

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenAI Performance Analyzer

Features

Architecture

Interface Preview

Latency Distribution View

Model Comparison View

Overall Interface

Installation

Usage

Data Format

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages