Testing Sampling Methods with curl

Updated Endpoint Usage

The monitoring server now supports different sampling methods through the /generate endpoint.

Basic Structure

curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Your prompt here",
    "max_tokens": 100,
    "sampling": {
      "method": "METHOD_NAME",
      "parameter1": value1,
      "parameter2": value2
    }
  }'

Available Sampling Methods

1. Greedy Sampling (Default)

curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "The capital of France is",
    "max_tokens": 50,
    "sampling": {
      "method": "greedy"
    }
  }'

2. Top-K Sampling

curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Once upon a time in a magical forest",
    "max_tokens": 100,
    "sampling": {
      "method": "top-k",
      "top_k": 40,
      "temperature": 0.8
    }
  }'

Top-K with different K values:

# Conservative (K=10)
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "The future of artificial intelligence is",
    "max_tokens": 80,
    "sampling": {
      "method": "top-k",
      "top_k": 10,
      "temperature": 0.7
    }
  }'

# Creative (K=100)
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a creative story about",
    "max_tokens": 120,
    "sampling": {
      "method": "top-k",
      "top_k": 100,
      "temperature": 1.0
    }
  }'

3. Top-P (Nucleus) Sampling

curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "In the year 2050, technology will",
    "max_tokens": 100,
    "sampling": {
      "method": "top-p",
      "top_p": 0.9,
      "temperature": 0.8
    }
  }'

Top-P with different P values:

# Conservative (P=0.5)
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Climate change solutions include",
    "max_tokens": 80,
    "sampling": {
      "method": "top-p",
      "top_p": 0.5,
      "temperature": 0.6
    }
  }'

# Creative (P=0.95)
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Imagine a world where",
    "max_tokens": 120,
    "sampling": {
      "method": "top-p",
      "top_p": 0.95,
      "temperature": 1.1
    }
  }'

4. Temperature Scaling

curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "The most important lesson in life is",
    "max_tokens": 100,
    "sampling": {
      "method": "temperature",
      "temperature": 0.8
    }
  }'

Temperature with different values:

# Conservative (T=0.3)
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "The scientific method involves",
    "max_tokens": 80,
    "sampling": {
      "method": "temperature",
      "temperature": 0.3
    }
  }'

# Balanced (T=0.7)
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "My favorite hobby is",
    "max_tokens": 100,
    "sampling": {
      "method": "temperature",
      "temperature": 0.7
    }
  }'

# Creative (T=1.2)
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "In a parallel universe",
    "max_tokens": 120,
    "sampling": {
      "method": "temperature",
      "temperature": 1.2
    }
  }'

Advanced Combinations

Combined Top-K + Top-P

curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "The best programming language for beginners is",
    "max_tokens": 100,
    "sampling": {
      "method": "top-k",
      "top_k": 50,
      "top_p": 0.9,
      "temperature": 0.8
    }
  }'

Long-form Generation

curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a short essay about the impact of social media:",
    "max_tokens": 200,
    "sampling": {
      "method": "top-p",
      "top_p": 0.92,
      "temperature": 0.85,
      "repetition_penalty": 1.1
    }
  }'

Testing Animation Features

The frontend now provides customized animations for different sampling methods:

Greedy: Standard green theme with simple token selection
Top-K: Purple theme with filtering animations showing eliminated tokens
Temperature: Orange theme with flowing gradients and probability shifts

To test the animations:

Start the monitoring server: ./llama-monitoring-server
Open the frontend: http://localhost:3000
Use any of the sampling configurations above
Click "View Animation" to see method-specific visualizations

Response Format

All endpoints return:

{
  "response": "Generated text...",
  "session_id": "unique_session_id",
  "sampling_logs": [/* Detailed sampling information */]
}

Health Check

curl http://localhost:8080/health

Returns:

{
  "status": "healthy",
  "model_loaded": true,
  "server_version": "1.0.0"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing Sampling Methods with curl

Updated Endpoint Usage

Basic Structure

Available Sampling Methods

1. Greedy Sampling (Default)

2. Top-K Sampling

3. Top-P (Nucleus) Sampling

4. Temperature Scaling

Advanced Combinations

Combined Top-K + Top-P

Long-form Generation

Testing Animation Features

Response Format

Health Check

FilesExpand file tree

test_sampling_methods.md

Latest commit

History

test_sampling_methods.md

File metadata and controls

Testing Sampling Methods with curl

Updated Endpoint Usage

Basic Structure

Available Sampling Methods

1. Greedy Sampling (Default)

2. Top-K Sampling

3. Top-P (Nucleus) Sampling

4. Temperature Scaling

Advanced Combinations

Combined Top-K + Top-P

Long-form Generation

Testing Animation Features

Response Format

Health Check