A streamlined tool for analyzing and visualizing performance metrics of Large Language Models (LLMs) across different configurations and runtime environments. Compatible with performance data generated by NVIDIA's GenAI-Perf tool.
- Interactive Visualization: Dynamic plots with:
- Latency distributions shown as box plots with throughput on x-axis
- Performance metrics plotted against request throughput
- Statistical indicators (mean, quartiles, P90, P99)
- Configuration Comparison: Compare different model configurations side by side with:
- Overlaid box plots for latency distributions
- Trend lines showing metric variations with throughput
- Color-coded model configurations for easy differentiation
- Model Configuration Display:
- Compact model selection format:
model_name (CLOUD | INSTANCE | GPU | ENGINE | GPU_CONFIG | PARALLEL) - Detailed configuration panel showing:
- Hardware details (Cloud, Instance, GPU)
- Software configuration (Engine, GPU Config, Parallelism)
- Optimization strategy
- Compact model selection format:
- Metric Analysis: Analyze various performance metrics including:
- Request Latency
- Time to First Token
- Inter-token Latency
- Request Throughput
- Output Token Throughput
- Output Token Throughput per Request
The application follows a modular architecture with clear separation of concerns. View the detailed Architecture Documentation for:
- System Components Overview
- Data Flow Diagram
- Component Descriptions
- Technology Stack Details
Click to view interface screenshots
Analyze latency distributions across different token configurations with interactive box plots and statistical insights
Compare performance metrics between different model configurations and concurrency levels
Complete interface with sidebar controls and metric visualization panels
- Clone the repository:
git clone https://github.com/yourusername/genai-perf-analyzer.git
cd genai-perf-analyzer- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Place your performance data in the
data/directory following the expected format:
data/
├── meta_llama-3.1-8b_aws_p5.48xlarge_h100_tensorrt_llm-h100-fp8-tp1-pp1-throughput/
│ ├── meta_llama-3.1-8b-instruct-openai-chat-concurrency1/
│ │ ├── 200_200_genai_perf.json # Input 200, Output 200 tokens
│ │ ├── 200_5_genai_perf.json # Input 200, Output 5 tokens
│ │ └── 1000_200_genai_perf.json # Input 1000, Output 200 tokens
│ └── meta_llama-3.1-8b-instruct-openai-chat-concurrency2/
│ └── ...
└── meta_llama-3.1-8b_aws_g5.12xlarge_a10g_tensorrt_llm-a10g-bf16-tp2-latency/
└── meta_llama-3.1-8b-instruct-openai-chat-concurrency1/
└── ...
Directory naming convention:
- Top level:
{model_name}_{cloud_provider}_{instance_type}_{gpu_type}_{model_profile}wheremodel_profilecontains:- Engine type (e.g., tensorrt_llm, vllm)
- GPU name (e.g., h100, a10g)
- Precision (e.g., fp8, bf16)
- Tensor parallelism (e.g., tp1, tp2)
- Pipeline parallelism (e.g., pp1)
- Optimization target (throughput/latency)
- Test runs:
{model_name}-{api_type}-concurrency{N} - Results:
{input_tokens}_{output_tokens}_genai_perf.json
The model profile information is displayed in a consistent format throughout the application:
- Selection dropdown:
meta_llama-3.1-8b (AWS | p5.48xlarge | A100 | TRT | H100-FP8 | TP1-PP1) - Detailed view shows:
- Hardware: Cloud provider, Instance type, GPU model
- Software: Engine type, GPU configuration, Parallelism strategy
- Additional: Optimization target
- Run the Streamlit application:
streamlit run app/main.py- Access the web interface at
http://localhost:8501
The analyzer expects performance data in JSON format compatible with NVIDIA's GenAI-Perf output structure:
{
"request_latency": {
"unit": "ms",
"avg": 100.0,
"p50": 95.0,
"p90": 150.0,
"p95": 180.0,
"p99": 200.0,
"min": 50.0,
"max": 250.0,
"std": 30.0
},
// ... other metrics
}For information about generating performance data, refer to the GenAI-Perf documentation.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.