diff --git a/docs/docs.md b/docs/docs.md
index 7dc8c86d..ed418e12 100644
--- a/docs/docs.md
+++ b/docs/docs.md
@@ -43,12 +43,16 @@ docs/
 │   ├── distributed-testing.md
 │   └── playbook-reference.md
 │
+├── proposals/               # Design proposals
+│   ├── README.md            # Proposal workflow
+│   └── results-directory-structure.md
+│
 └── reference/               # Reference documentation
     ├── model-yaml-schema.md
     ├── test-yaml-schema.md
     ├── matrix-yaml-schema.md
     └── cli-reference.md
-```text
+```
 
 ## Documentation by Topic
 
@@ -93,6 +97,14 @@ docs/
 2. [Test Scenario Schema](reference/test-yaml-schema.md) - Test YAML format
 3. [CLI Reference](reference/cli-reference.md) - Command-line tool reference
 
+### Design Proposals
+
+Design proposals for changes to the framework:
+
+1. [Proposals Overview](proposals/README.md) - Proposal workflow and guidelines
+2. [Results Directory Structure](proposals/results-directory-structure.md) -
+   Unified results structure and custom test run names
+
 ## Contributing to Documentation
 
 Documentation is written in Markdown and follows these conventions:
@@ -118,6 +130,7 @@ pre-commit run --all-files
 | methodology/metrics.md | ✅ Complete | 2024-02-08 |
 | methodology/reporting.md | ✅ Complete | 2024-02-08 |
 | platform-setup/x86/intel/deterministic-benchmarking.md | ✅ Complete | (current) |
+| proposals/results-directory-structure.md | 🚧 In Review | 2024-03-30 |
 | containers/* | 📝 Planned | - |
 | ansible/* | 📝 Planned | - |
 | getting-started/* | 📝 Planned | - |
diff --git a/docs/proposals/README.md b/docs/proposals/README.md
new file mode 100644
index 00000000..c8edc8ca
--- /dev/null
+++ b/docs/proposals/README.md
@@ -0,0 +1,26 @@
+# Proposals
+
+This directory contains design proposals for the vLLM CPU Performance Evaluation framework.
+
+## Active Proposals
+
+- [**Results Directory Structure**](results-directory-structure.md) - Unified results directory structure and custom test run names (addresses [Issue #73](https://github.com/redhat-et/vllm-cpu-perf-eval/issues/73))
+
+## Proposal Workflow
+
+1. **Draft**: Create a proposal document in this directory
+2. **Review**: Share with team for feedback and discussion
+3. **Approved**: Update implementation checklist and create tracking issues
+4. **Implemented**: Mark as completed and link to PRs
+
+## Template
+
+New proposals should include:
+
+- **Summary**: What problem does this solve?
+- **Current State**: What's the problem with the current approach?
+- **Proposed Solution**: Detailed design with examples
+- **Migration Strategy**: How to transition (if breaking change)
+- **Implementation Checklist**: Specific tasks required
+- **Questions for Review**: Open questions for team discussion
+- **Alternatives Considered**: What other approaches were evaluated?
diff --git a/docs/proposals/results-directory-structure.md b/docs/proposals/results-directory-structure.md
new file mode 100644
index 00000000..8b3d09ec
--- /dev/null
+++ b/docs/proposals/results-directory-structure.md
@@ -0,0 +1,586 @@
+# Results Directory Structure Proposal
+
+## Executive Summary
+
+This document proposes a unified, intuitive results directory structure across all test types in the repository, along with support for custom test run names.
+
+**Related GitHub Issue:** [#73 - RFE: Support User Specified Testrun Names](https://github.com/redhat-et/vllm-cpu-perf-eval/issues/73)
+
+This proposal addresses:
+1. Inconsistent results directory structures across test types
+2. Missing support for custom test run names (Issue #73)
+3. Results overwrites in bash scripts (no test_run_id)
+4. Future Streamlit visualization filtering by test run name
+
+## Current State Analysis
+
+### Current Directory Structures
+
+#### 1. LLM Tests (Ansible - Single Test)
+```
+results/llm/{model}/{workload}-{test_run_id}/{core_config_name}/
+└── Example: results/llm/TinyLlama__TinyLlama-1.1B-Chat-v1.0/chat-20240315-143022/4c-tp1/
+```
+
+#### 2. LLM Tests (Ansible - Core Sweep)
+```
+results/llm/{model}/{workload}-{test_run_id}/cores_{N}/
+└── Example: results/llm/TinyLlama__TinyLlama-1.1B-Chat-v1.0/chat-20240315-143022/cores_8/
+```
+
+#### 3. Embedding Tests (Ansible)
+```
+results/embedding/{model}/{test_type}-{test_run_id}/
+└── Example: results/embedding/ibm-granite__granite-embedding-278m-multilingual/baseline-20240315-143022/
+```
+
+#### 4. Embedding Tests (Bash Scripts)
+```
+results/embedding-models/{model_basename}/{test_type}/
+└── Example: results/embedding-models/granite-embedding-278m-multilingual/baseline/
+└── NO test_run_id - results get overwritten!
+```
+
+### Identified Issues
+
+1. **Inconsistent top-level directories**: `llm/` vs `embedding/` vs `embedding-models/`
+2. **Missing test_run_id**: Bash scripts don't use test run IDs, causing result overwrites
+3. **No custom naming**: Auto-generated timestamps only (e.g., `20240315-143022`)
+4. **Different structures**: Core sweep uses `cores_{N}`, single tests use `{core_config_name}`
+5. **Confusing workload prefix**: `{workload}-{test_run_id}` mixes semantics
+6. **Model name escaping**: Inconsistent handling of `/` in model names
+
+## Proposed Unified Structure
+
+### Design Principles
+
+1. **Consistency**: Same structure across all test types
+2. **Clarity**: Clear semantic naming at each level
+3. **Flexibility**: Support both auto-generated and custom run names
+4. **No Overwrites**: Every test run gets a unique directory
+5. **Discoverability**: Easy to find and navigate results
+
+### Proposed Directory Structure
+
+```
+results/
+├── llm/                                        # Top-level: model type
+│   └── {test_suite}/                          # Test suite (concurrent-load, scalability, etc.)
+│       └── {model_safe}/                      # Model (slashes replaced with __)
+│           └── {workload}/                    # Workload (chat, code, summarization, rag)
+│               └── {run_name}/                # Run name (custom or auto-timestamp)
+│                   └── {config}/              # Configuration (cores_8, 4c-tp1, etc.)
+│                       ├── benchmarks.json
+│                       ├── benchmarks.csv
+│                       ├── guidellm.log
+│                       ├── test-metadata.json
+│                       ├── vllm-server.log
+│                       └── system-metrics.log
+│
+└── embedding/                                  # Top-level: model type
+    └── {scenario}/                            # Test scenario (baseline, latency, concurrent)
+        └── {model_safe}/                      # Model (slashes replaced with __)
+            └── {run_name}/                    # Run name (custom or auto-timestamp)
+                └── {config}/                  # Configuration (if applicable)
+                    ├── benchmarks.json
+                    ├── test-metadata.json
+                    └── ...
+```
+
+### Hierarchy Levels Explained
+
+#### LLM Tests (6 levels)
+1. **Model Type** (`llm/`) - Top-level categorization
+2. **Test Suite** - Which test methodology (concurrent-load, scalability, resource-contention, etc.)
+3. **Model** - Specific model being tested
+4. **Workload** - Test scenario (chat, code, summarization, rag)
+5. **Run Name** - Unique identifier (custom or timestamp)
+6. **Config** - Hardware/software configuration (cores_8, 4c-tp1, etc.)
+
+#### Embedding Tests (5 levels)
+1. **Model Type** (`embedding/`) - Top-level categorization
+2. **Scenario** - Test scenario (baseline, latency, concurrent)
+3. **Model** - Specific model being tested
+4. **Run Name** - Unique identifier (custom or timestamp)
+5. **Config** - Configuration (if applicable - often just files at run_name level)
+```
+
+### Path Examples
+
+#### LLM Examples
+
+```text
+Concurrent load test with auto-generated timestamp:
+results/llm/concurrent-load/TinyLlama__TinyLlama-1.1B-Chat-v1.0/chat/20240315-143022/4c-tp1/
+
+Concurrent load test with custom run name:
+results/llm/concurrent-load/TinyLlama__TinyLlama-1.1B-Chat-v1.0/chat/baseline-comparison/4c-tp1/
+
+Scalability core sweep with auto-generated timestamp:
+results/llm/scalability/TinyLlama__TinyLlama-1.1B-Chat-v1.0/chat/20240315-143022/cores_8/
+results/llm/scalability/TinyLlama__TinyLlama-1.1B-Chat-v1.0/chat/20240315-143022/cores_16/
+
+Scalability core sweep with custom run name:
+results/llm/scalability/meta-llama__Llama-3.2-3B-Instruct/summarization/prod-validation-v2/cores_8/
+results/llm/scalability/meta-llama__Llama-3.2-3B-Instruct/summarization/prod-validation-v2/cores_16/
+
+Resource contention test:
+results/llm/resource-contention/Qwen__Qwen2.5-3B-Instruct/code/multi-tenant-test/shared-cores/
+```
+
+#### Embedding Examples
+
+```text
+Baseline test with auto-generated timestamp:
+results/embedding/baseline/ibm-granite__granite-embedding-278m-multilingual/20240315-143022/sweep-inf.json
+results/embedding/baseline/ibm-granite__granite-embedding-278m-multilingual/20240315-143022/sweep-25pct.json
+
+Baseline test with custom run name:
+results/embedding/baseline/ibm-granite__granite-embedding-278m-multilingual/prod-release-candidate/sweep-inf.json
+
+Latency test with different concurrency levels:
+results/embedding/latency/ibm-granite__granite-embedding-english-r2/20240315-154530/concurrent-16.json
+results/embedding/latency/ibm-granite__granite-embedding-english-r2/20240315-154530/concurrent-32.json
+results/embedding/latency/ibm-granite__granite-embedding-english-r2/20240315-154530/concurrent-64.json
+```
+
+### How Test Suite is Determined
+
+**For LLM Tests:**
+- Test suite can be passed as a parameter: `-e "test_suite=concurrent-load"`
+- Or inferred from the playbook name:
+  - `llm-benchmark-concurrent-load.yml` → `concurrent-load`
+  - `llm-core-sweep-auto.yml` → `scalability`
+  - Future: `llm-resource-contention.yml` → `resource-contention`
+- Default: `scalability` (for backwards compatibility with core sweeps)
+
+**For Embedding Tests:**
+- Test scenario is already part of the test definition (baseline, latency)
+- No additional parameter needed
+
+### Structure Benefits
+
+1. **Hierarchical Organization**: Natural grouping by test type → test suite → model → scenario → run
+2. **No Overwrites**: Each run gets its own directory under the run name
+3. **Easy Comparison**: All runs for a model/scenario in one place
+4. **Clear Semantics**: Each level has a clear meaning
+5. **Test Suite Isolation**: Different test methodologies don't mix results
+6. **Custom Naming**: Users can specify meaningful run names
+7. **Backwards Compatible**: Can coexist with old structure during migration
+
+## Custom Test Run Names
+
+### Implementation Approach
+
+#### 1. Add Optional Parameter to All Test Scripts
+
+**Variable Name**: `test_run_name` (optional)
+
+**GitHub Issue #73 Requirements:**
+- Users can specify arbitrary test run names at invocation time
+- Example from issue: `test_run_name=cores_4_8_12-rag-chat-code_PI34`
+- This enables grouping related test runs together
+- Supports test tracking and organization (e.g., by project, sprint, or experiment)
+
+**Behavior**:
+- If `test_run_name` is provided: Use it as-is (with validation/sanitization)
+- If `test_run_name` is NOT provided: Auto-generate timestamp `YYYYMMDD-HHMMSS`
+- Environment variable support: `TEST_RUN_NAME=my-test ./run-test.sh`
+
+#### 2. Name Validation
+
+Allowed characters: `a-z`, `A-Z`, `0-9`, `-`, `_`
+
+Invalid characters replaced with `_` (sanitization approach - more user-friendly than rejection)
+
+**Examples of sanitization:**
+- Input: `"my test run!"` → Output: `"my_test_run_"`
+- Input: `"cores_4_8_12-rag-chat-code_PI34"` → Output: `"cores_4_8_12-rag-chat-code_PI34"` (no change)
+
+#### Benefits of Custom Naming (Issue #73)
+
+1. **Human-readable organization** - No need to remember what `20240315-143022` was testing
+2. **Project tracking** - Link results to project initiatives (e.g., `PI34`, `JIRA-1234`)
+3. **Easy comparison** - Compare `baseline-v1` vs `optimized-v1` without checking timestamps
+4. **Streamlit filtering** - Filter/search by meaningful names in visualization
+5. **Collaboration** - Team members can identify test purposes without documentation
+6. **Automation-friendly** - CI/CD can use build numbers or commit SHAs as run names
+
+#### 3. Parameter Examples
+
+**Ansible Playbooks:**
+```bash
+# Auto-generated timestamp (current behavior)
+ansible-playbook llm-benchmark-auto.yml \
+  -e "test_model=TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
+  -e "workload_type=chat" \
+  -e "requested_cores=16"
+
+# Custom run name
+ansible-playbook llm-benchmark-auto.yml \
+  -e "test_model=TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
+  -e "workload_type=chat" \
+  -e "requested_cores=16" \
+  -e "test_run_name=baseline-v1"
+
+# Custom run name + specify test suite
+ansible-playbook llm-benchmark-auto.yml \
+  -e "test_model=TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
+  -e "workload_type=chat" \
+  -e "requested_cores=16" \
+  -e "test_suite=concurrent-load" \
+  -e "test_run_name=prod-validation"
+```
+
+**Bash Scripts:**
+```bash
+# Auto-generated timestamp
+./run-baseline.sh ibm-granite/granite-embedding-278m-multilingual
+
+# Custom run name
+./run-baseline.sh ibm-granite/granite-embedding-278m-multilingual \
+  --run-name prod-release-candidate
+```
+
+**Core Sweep Script:**
+```bash
+# Auto-generated timestamp
+./scripts/run-core-sweep.sh TinyLlama/TinyLlama-1.1B-Chat-v1.0 chat "8,16,32"
+
+# Custom run name
+./scripts/run-core-sweep.sh TinyLlama/TinyLlama-1.1B-Chat-v1.0 chat "8,16,32" \
+  --run-name scalability-test-march
+
+# Example from Issue #73 - project-specific naming
+./scripts/run-core-sweep.sh meta-llama/Llama-3.2-3B-Instruct chat "4,8,12" \
+  --run-name cores_4_8_12-rag-chat-code_PI34
+```
+
+**Environment Variable Support:**
+```bash
+# Set via environment variable
+export TEST_RUN_NAME=my-baseline-test
+./scripts/run-core-sweep.sh TinyLlama/TinyLlama-1.1B-Chat-v1.0 chat "8,16,32"
+
+# Or inline
+TEST_RUN_NAME=Q1-validation ./run-baseline.sh ibm-granite/granite-embedding-278m-multilingual
+```
+
+## Migration Strategy
+
+### Phase 1: Add New Structure Support (Non-Breaking)
+1. Update all playbooks to support the new structure
+2. Keep old structure as default
+3. Add `use_new_results_structure` flag (default: false)
+
+### Phase 2: Parallel Support
+1. Document both structures
+2. Allow users to opt into new structure
+3. Update examples to use new structure
+
+### Phase 3: Switch Default (Breaking Change)
+1. Make new structure the default
+2. Update all documentation
+3. Provide migration script for existing results
+
+### Phase 4: Deprecate Old Structure
+1. Remove old structure support after N releases
+2. Keep migration script available
+
+## Streamlit Visualization Support
+
+As mentioned in [Issue #73](https://github.com/redhat-et/vllm-cpu-perf-eval/issues/73), the Streamlit visualization tool needs to support filtering by test run name.
+
+### Required Changes
+
+1. **Update results discovery logic** to handle new directory structure
+2. **Add test run name filter** in the UI
+3. **Support test suite filtering** (concurrent-load, scalability, etc.)
+4. **Parse test-metadata.json** to extract run name if needed
+
+### Example Streamlit Filter UI
+
+```python
+# Streamlit sidebar filters
+test_suite = st.sidebar.selectbox("Test Suite", ["All", "concurrent-load", "scalability", "resource-contention"])
+test_run_name = st.sidebar.text_input("Test Run Name (filter)", "")
+model = st.sidebar.selectbox("Model", available_models)
+workload = st.sidebar.selectbox("Workload", available_workloads)
+```
+
+### Benefits for Visualization
+
+- **Group related runs**: Filter by project name (e.g., `PI34`, `Q1-baseline`)
+- **Compare experiments**: View all runs matching a pattern (e.g., `cores_*`)
+- **Test suite isolation**: View only concurrent-load or scalability results
+- **Time-based filtering**: Still possible with timestamp-based run names
+
+**Note:** Streamlit implementation details should be tracked in a separate issue/PR once the core directory structure is finalized.
+
+## Use Cases for Custom Test Run Names
+
+### 1. Project/Sprint Tracking (Issue #73 Example)
+```bash
+# Group all tests for Project Initiative 34
+TEST_RUN_NAME=cores_4_8_12-rag-chat-code_PI34 ./run-core-sweep.sh ...
+```
+Result: `results/llm/scalability/{model}/{workload}/cores_4_8_12-rag-chat-code_PI34/`
+
+### 2. Experiment Comparison
+```bash
+# Baseline before optimization
+TEST_RUN_NAME=baseline-v1 ./llm-benchmark-auto.yml ...
+
+# After optimization
+TEST_RUN_NAME=optimized-v1 ./llm-benchmark-auto.yml ...
+
+# Compare results:
+# results/llm/concurrent-load/{model}/chat/baseline-v1/
+# results/llm/concurrent-load/{model}/chat/optimized-v1/
+```
+
+### 3. Release Validation
+```bash
+# Pre-release testing
+TEST_RUN_NAME=v1.2.0-rc1 ./run-all-tests.sh
+
+# Production validation
+TEST_RUN_NAME=v1.2.0-prod ./run-all-tests.sh
+```
+
+### 4. Hardware Configuration Testing
+```bash
+# Test different hardware configs with descriptive names
+TEST_RUN_NAME=spr-96c-512gb ./run-core-sweep.sh ...
+TEST_RUN_NAME=icx-64c-256gb ./run-core-sweep.sh ...
+```
+
+### 5. Date-based Organization (Still Supported)
+```bash
+# Users can still use timestamps manually
+TEST_RUN_NAME=2024-03-15_baseline ./run-tests.sh
+# Or let the system auto-generate: 20240315-143022
+```
+
+### 6. Multi-User Testing Environment
+```bash
+# Each user can prefix with their name
+TEST_RUN_NAME=alice-feature-test ./run-tests.sh
+TEST_RUN_NAME=bob-performance-test ./run-tests.sh
+```
+
+## Implementation Checklist
+
+### Core Changes
+
+- [ ] Define `test_run_name` variable in all playbooks
+- [ ] Define `test_suite` variable in LLM playbooks (with default)
+- [ ] Add name validation/sanitization function/filter
+- [ ] Update path construction in all playbooks:
+  - [ ] `llm-benchmark.yml`
+  - [ ] `llm-benchmark-auto.yml`
+  - [ ] `llm-core-sweep-auto.yml`
+  - [ ] `embedding-benchmark.yml`
+- [ ] Update bash scripts:
+  - [ ] `run-baseline.sh`
+  - [ ] `run-latency.sh`
+  - [ ] `run-all.sh`
+  - [ ] `run-core-sweep.sh`
+- [ ] Update results collection tasks
+- [ ] Update metadata generation to include:
+  - [ ] `test_run_name`
+  - [ ] `test_suite` (for LLM tests)
+
+### Documentation Updates
+
+- [ ] Update [automation/test-execution/ansible/ansible.md](../automation/test-execution/ansible/ansible.md)
+- [ ] Update [README.md](../README.md)
+- [ ] Update [results/results.md](../results/results.md) (if exists)
+- [ ] Add migration guide
+- [ ] Update all example commands in documentation
+
+### Streamlit Visualization Updates (Issue #73)
+
+- [ ] Update results directory scanning to support new structure
+- [ ] Add test suite filter/selector
+- [ ] Add test run name filter (text input or dropdown)
+- [ ] Update path parsing to extract all hierarchy levels
+- [ ] Test filtering by custom run names
+- [ ] Document new filtering capabilities
+
+### Testing
+
+- [ ] Test single LLM run with auto name
+- [ ] Test single LLM run with custom name
+- [ ] Test single LLM run with custom name via env var
+- [ ] Test core sweep with auto name
+- [ ] Test core sweep with custom name
+- [ ] Test embedding baseline with auto name
+- [ ] Test embedding baseline with custom name
+- [ ] Test embedding latency with auto name
+- [ ] Test embedding latency with custom name
+- [ ] Verify no overwrites occur
+- [ ] Verify invalid names are sanitized correctly
+- [ ] Verify special characters in names are handled
+- [ ] Test example from Issue #73: `cores_4_8_12-rag-chat-code_PI34`
+
+## Example Implementation Code
+
+### Ansible Variable Setup
+
+```yaml
+# In playbook vars section
+vars:
+  # Use custom name if provided, otherwise generate timestamp
+  test_run_name_raw: "{{ test_run_name | default(lookup('pipe', 'date +%Y%m%d-%H%M%S')) }}"
+
+  # Sanitize the name (replace invalid chars with _)
+  test_run_name_safe: "{{ test_run_name_raw | regex_replace('[^a-zA-Z0-9_-]', '_') }}"
+
+  # Build new-style path
+  results_base_dir: "{{ playbook_dir }}/../../../results"
+  model_safe: "{{ test_model | replace('/', '__') }}"
+
+  # For LLM tests
+  test_suite: "concurrent-load"  # or "scalability", "resource-contention"
+  results_path: "{{ results_base_dir }}/llm/{{ test_suite }}/{{ model_safe }}/{{ workload_type }}/{{ test_run_name_safe }}/{{ core_configuration.name }}"
+
+  # For embedding tests
+  test_scenario: "baseline"  # or "latency"
+  results_path: "{{ results_base_dir }}/embedding/{{ test_scenario }}/{{ model_safe }}/{{ test_run_name_safe }}"
+```
+
+### Bash Script Parameter Parsing
+
+```bash
+# Default to auto-generated timestamp
+TEST_RUN_NAME="${TEST_RUN_NAME:-$(date +%Y%m%d-%H%M%S)}"
+
+# Parse --run-name argument
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    --run-name)
+      TEST_RUN_NAME="$2"
+      shift 2
+      ;;
+    # ... other args
+  esac
+done
+
+# Sanitize name
+TEST_RUN_NAME=$(echo "$TEST_RUN_NAME" | sed 's/[^a-zA-Z0-9_-]/_/g')
+
+# Build path
+# For embedding tests: results/embedding/{scenario}/{model}/{run_name}/
+RESULT_PATH="${RESULTS_DIR}/${TEST_SCENARIO}/${MODEL_BASENAME}/${TEST_RUN_NAME}"
+```
+
+## Alternatives Considered
+
+### Alternative 1: Flat Structure with Longer Names
+```
+results/llm__TinyLlama__chat__20240315-143022__4c-tp1/
+```
+**Rejected**: Hard to browse, no logical grouping
+
+### Alternative 2: Date-based Directory Structure
+```
+results/2024/03/15/llm/TinyLlama/...
+```
+**Rejected**: Prioritizes date over test type/model, hard to find related tests
+
+### Alternative 3: Keep Current Structure, Add Run Name to Timestamp
+```
+results/llm/{model}/{workload}-baseline-20240315-143022/
+```
+**Rejected**: Still mixes semantics, doesn't fully solve organization issues
+
+## Questions for Review
+
+### Directory Structure
+1. Should we support subdirectories in custom run names? (e.g., `2024-Q1/baseline`)
+2. Should we add a top-level timestamp directory for archival? (e.g., `results/2024-03-15/llm/...`)
+3. Should embedding tests have a config level like LLM tests? (probably not needed currently)
+
+### Custom Run Names (Issue #73)
+
+1. Should custom names be validated (reject invalid) or sanitized (auto-fix)?
+   - **Current proposal:** Sanitize (more user-friendly)
+2. Should we enforce a maximum length for run names? (e.g., 100 characters)
+3. Should we support environment variable `TEST_RUN_NAME` in addition to command-line args?
+   - **Current proposal:** Yes
+
+### Metadata and Indexing
+
+1. Should we add a `results.json` index file at the workload/scenario level for faster discovery?
+2. Should test-metadata.json include the full path structure for easier parsing?
+
+### Streamlit Integration
+
+1. Should Streamlit changes be part of this PR or a separate follow-up issue?
+   - **Recommendation:** Separate issue once directory structure is finalized
+2. Should we add a `--list-runs` flag to CLI tools to show available test run names?
+
+## Recommendations
+
+1. **Start with Phase 1**: Implement new structure support without breaking existing workflows
+2. **Document thoroughly**: Clear examples and migration guides
+3. **Get feedback early**: Test with a few users before making it default
+4. **Keep it simple**: Don't over-engineer, the proposed structure handles 95% of use cases
+5. **Plan for growth**: Structure should accommodate future test types easily
+
+## Summary: What This Proposal Delivers
+
+### For Issue #73 - Custom Test Run Names
+✅ **User-specified test run names** via command-line args or environment variables
+✅ **Arbitrary naming** with sanitization (e.g., `cores_4_8_12-rag-chat-code_PI34`)
+✅ **Backwards compatible** - auto-generates timestamps if not specified
+✅ **Streamlit filtering support** - foundation for visualization filtering
+
+### For Repository Organization
+✅ **Unified structure** - consistent across LLM and embedding tests
+✅ **Test suite isolation** - concurrent-load vs scalability vs resource-contention
+✅ **No overwrites** - every test run gets unique directory
+✅ **Clear hierarchy** - intuitive navigation from test type → suite → model → scenario → run → config
+
+### Implementation Phases
+1. **Phase 1 (Non-Breaking)**: Add new structure support with opt-in flag
+2. **Phase 2 (Parallel)**: Document both structures, update examples
+3. **Phase 3 (Breaking)**: Switch default to new structure
+4. **Phase 4 (Cleanup)**: Deprecate old structure support
+
+### Example: Issue #73 Use Case
+
+**Before (Current):**
+```bash
+# No way to specify custom name
+./run-core-sweep.sh meta-llama/Llama-3.2-3B-Instruct rag "4,8,12"
+# Results: ./results/llm/meta-llama__Llama-3.2-3B-Instruct/rag-20240315-143022/cores_4/
+#          Hard to identify what this test was for!
+```
+
+**After (With This Proposal):**
+```bash
+# Custom name as requested in Issue #73
+TEST_RUN_NAME=cores_4_8_12-rag-chat-code_PI34 \
+  ./run-core-sweep.sh meta-llama/Llama-3.2-3B-Instruct rag "4,8,12"
+
+# Results: ./results/llm/scalability/meta-llama__Llama-3.2-3B-Instruct/rag/cores_4_8_12-rag-chat-code_PI34/cores_4/
+#          Clear project tracking! Can filter in Streamlit by "PI34"
+```
+
+### Follow-up Work
+- **Streamlit visualization**: Update to support new structure and filtering (separate PR)
+- **Migration script**: Convert existing results to new structure (optional)
+- **Documentation**: Update all guides and examples
+
+## Next Steps
+
+1. **Review this proposal** with the team
+2. **Answer open questions** (see Questions for Review section)
+3. **Get feedback on Issue #73** - does this meet the requirements?
+4. **Create implementation tasks** (break down checklist into GitHub issues)
+5. **Start with Ansible playbooks** (higher priority than bash scripts)
+6. **Write tests** to verify behavior
+7. **Update documentation** as changes are made
+8. **Create follow-up issue for Streamlit** once core structure is merged