Skip to content

CUD2V/llmops_devstack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLM Operations Development Stack

A secure, localhost-only infrastructure for LLM development and experimentation, featuring automated setup and management of MLflow for experiment tracking and Forgejo for version control and CI/CD. Includes convenient shell integration with enhanced prompts showing git branches, Python environments, and job status.

🎯 Purpose

This repository provides scripts and configuration for managing a local LLM development environment with:

  • MLflow: Experiment tracking, model versioning, and artifact management
  • Forgejo: Self-hosted Git service with CI/CD capabilities
  • Private & Secure: All services run locally with no external dependencies
  • Security First: Localhost-only access, secure configurations, and proper file permissions

πŸ—οΈ Architecture

The stack creates a self-contained development environment:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                LLM DevStack                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Forgejo (localhost:3000)     MLflow (localhost:5000)  β”‚
β”‚  β”œβ”€ Git repositories          β”œβ”€ Experiment tracking   β”‚
β”‚  β”œβ”€ CI/CD pipelines           β”œβ”€ Model registry        β”‚
β”‚  └─ Issue tracking            └─ Artifact storage      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              Local File System                         β”‚
β”‚  β”œβ”€ SQLite databases                                   β”‚
β”‚  β”œβ”€ Git repositories                                   β”‚
β”‚  β”œβ”€ ML artifacts                                       β”‚
β”‚  └─ Configuration files                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • macOS or Linux
  • Python 3.7+ for MLflow virtual environment
  • curl for downloading Forgejo binary
  • openssl for generating security keys
  • Homebrew (macOS only) - for Forgejo installation

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd llmops_devstack
  2. Configure installation paths:

    cp config.env.example config.env
    # Edit config.env with your preferred installation directories (optional)
  3. Run the setup script:

    ./scripts/setup.sh
  4. Configure shell (optional):

    ./scripts/configure_shell.sh
    source ~/.bashrc

    Adds convenient aliases and enhanced prompt - see Shell Integration for details

  5. Start the services:

    ./scripts/start_services.sh
  6. Access the services:

πŸ“‹ Configuration

Default Configuration

The system uses sensible defaults that work out-of-the-box:

  • Installation Directory: $HOME/llmops_services
  • Network Binding: 127.0.0.1 (localhost only)
  • Default Ports: 3000 (Forgejo), 5000 (MLflow)
  • Auto Port Detection: Finds available ports if defaults are busy

Customization

Edit config.env to customize:

# Base installation directory
BASE_DIR="$HOME/llmops_services"

# Network security
FORGEJO_SERVER_HOST="127.0.0.1"    # Localhost only
MLFLOW_SERVER_HOST="127.0.0.1"     # Localhost only

# Service timeouts
GRACEFUL_SHUTDOWN_TIMEOUT=10        # Seconds
STATUS_CHECK_TIMEOUT=2              # Seconds

# Default ports (auto-detected if busy)
DEFAULT_FORGEJO_PORT=3000
DEFAULT_MLFLOW_PORT=5000

πŸ”§ Usage

Service Management

# Start all services
./scripts/start_services.sh

# Check service status
./scripts/status_services.sh

# Stop all services
./scripts/stop_services.sh

Forgejo Setup

Initial Setup (first time only):

  1. Start services: ./scripts/start_services.sh
  2. Visit Forgejo web interface (URL will be displayed)
  3. Complete the installation wizard:
    • Database Type: Select SQLite3 (file-based, no server required)
    • Leave other database settings as default
    • Scroll down to the bottom of the configuration page
    • Administrator Account: Fill out username and password for your admin user
    • Click "Install Forgejo" to complete setup
  4. Registration is disabled by default for security

Environment Activation

# Activate the development environment
source $BASE_DIR/activate.sh

# Now you can use MLflow CLI directly
mlflow --help

πŸ“ Directory Structure

$HOME/llmops_services/
β”œβ”€β”€ forgejo/
β”‚   β”œβ”€β”€ bin/forgejo                 # Forgejo binary
β”‚   β”œβ”€β”€ data/gitea/                 # Database and repositories
β”‚   └── logs/                       # Application logs
β”œβ”€β”€ mlflow/
β”‚   β”œβ”€β”€ tracking/                   # MLflow tracking database
β”‚   β”œβ”€β”€ artifacts/                  # Model artifacts
β”‚   β”œβ”€β”€ logs/                       # Application logs
β”‚   └── mlflow.db                   # SQLite database
β”œβ”€β”€ venv/                           # Python virtual environment
β”œβ”€β”€ logs/                           # Service startup logs
└── activate.sh                     # Environment activation script

# Runtime files in project root:
β”œβ”€β”€ .forgejo.pid                    # Process ID (for shutdown)
β”œβ”€β”€ .forgejo.port                   # Port number (for access)
β”œβ”€β”€ .mlflow.pid                     # Process ID (for shutdown)
└── .mlflow.port                    # Port number (for access)

πŸ› οΈ Advanced Usage

Shell Integration

Automated Setup (Recommended):

# Configure shell with aliases and enhanced prompt
./scripts/configure_shell.sh

# Apply changes
source ~/.bashrc

Manual Setup:

# Add aliases to ~/.bashrc or ~/.zshrc
alias llmops-start="/path/to/llmops_devstack/scripts/start_services.sh && source ~/llmops_services/.mlflow_env"
alias llmops-stop="/path/to/llmops_devstack/scripts/stop_services.sh"
alias llmops-status="/path/to/llmops_devstack/scripts/status_services.sh"

The configure_shell.sh script automatically:

  • Detects your shell (bash/zsh)
  • Creates backup of existing configuration
  • Adds LLM DevStack aliases with absolute paths
  • Configures enhanced prompt with git branch, Python environment, and SLURM job info
  • Safely updates existing configuration if run again

MLflow Integration

import mlflow

# MLflow will automatically use the local tracking server
mlflow.set_tracking_uri("http://localhost:5000")

# Track experiments
with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_artifact("model.pkl")

Forgejo Integration

# Clone repositories
git clone http://localhost:3000/username/repository.git

# Set up CI/CD in .forgejo/workflows/
# Push changes to trigger builds

πŸ” Troubleshooting

Common Issues

Services won't start:

# Check if ports are available
netstat -an | grep :3000
netstat -an | grep :5000

# Check logs
tail -f $BASE_DIR/logs/forgejo.log
tail -f $BASE_DIR/logs/mlflow.log

Permission errors:

# Fix file permissions
chmod 700 $BASE_DIR/forgejo/data
chmod 700 $BASE_DIR/mlflow

Can't access services:

# Verify services are running
./scripts/status_services.sh

# Check network binding
ps aux | grep forgejo
ps aux | grep mlflow

Log Locations

  • Service startup logs: $BASE_DIR/logs/
  • Forgejo logs: $BASE_DIR/forgejo/logs/
  • MLflow logs: $BASE_DIR/mlflow/logs/

πŸ“„ License

MIT License - see LICENSE file for details.

Data Backup

Create Backup:

# Backup all data with timestamped archive
./scripts/backup_data.sh

# Backups are stored in $BASE_DIR/backups/YYYYMMDD_HHMMSS/
# Includes manifest file with detailed inventory

Restore from Backup:

# Stop services first
./scripts/stop_services.sh

# Copy data back from backup directory
cp -r $BASE_DIR/backups/20240101_120000/mlflow/* $BASE_DIR/mlflow/
cp -r $BASE_DIR/backups/20240101_120000/forgejo/* $BASE_DIR/forgejo/data/

# Restart services
./scripts/start_services.sh

Process Management

Clean Up Orphaned Processes:

# Interactive cleanup of MLflow processes
./scripts/cleanup_mlflow.sh

# Useful if services weren't stopped properly

πŸ“š Examples

The examples/ directory contains sample scripts demonstrating integration with HPC environments and common ML workflows:

  • gpu_check.sh - SLURM job script for GPU availability checking with MLflow logging
  • See examples/README.md for detailed usage instructions

πŸ”§ Script Reference

Script Purpose Usage
setup.sh Initial installation and configuration ./scripts/setup.sh
start_services.sh Start MLflow and Forgejo services ./scripts/start_services.sh
stop_services.sh Gracefully stop all services ./scripts/stop_services.sh
status_services.sh Check service status and URLs ./scripts/status_services.sh
configure_shell.sh Configure shell aliases and enhanced prompt ./scripts/configure_shell.sh
backup_data.sh Create timestamped backup of all data ./scripts/backup_data.sh
cleanup_mlflow.sh Clean up orphaned MLflow processes ./scripts/cleanup_mlflow.sh

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages