Skip to content

Conversation

@jeffreyaven
Copy link
Member

  • Error detection: Eliminates need for external apps to parse errors
  • Markdown-KV: 60.7% LLM accuracy (vs 44.3% CSV) - research-backed
  • Backward compatible: No breaking changes
  • Version: 3.8.2 ready for release

This commit implements centralized error detection to move error handling
logic from external applications (like stackql-deploy) into pystackql itself.

Changes:
- Add errors.yaml configuration file with error patterns
  - Fuzzy matches for HTTP 4xx/5xx status codes
  - Exact matches for error prefixes
  - StackQL-specific error patterns (disparity, missing operations)

- Implement ErrorDetector class (pystackql/core/error_detector.py)
  - Loads error patterns from errors.yaml at initialization
  - Supports fuzzy (case-insensitive substring) matching
  - Supports exact (prefix) matching
  - Provides is_error() and extract_error_info() methods

- Integrate error detection into OutputFormatter
  - Check raw data strings for error patterns
  - Check parsed JSON data recursively for errors
  - Move detected errors to 'error' field instead of 'data'
  - Return empty list for data when error is detected
  - Apply detection to both query and statement results

- Add PyYAML>=5.4.0 dependency
  - Updated requirements.txt
  - Updated pyproject.toml dependencies

- Add MANIFEST.in to include errors.yaml in package distribution

- Add comprehensive test suite (tests/test_error_detection.py)
  - Tests for ErrorDetector class
  - Tests for OutputFormatter integration
  - Tests for specific homebrew provider 404 error scenario

This centralizes error detection so external applications no longer need
to parse stdout messages to identify error conditions. When StackQL
returns error messages in stdout (instead of stderr), they are now
automatically detected and properly formatted as errors.
This commit extends the error detection system with regex pattern matching,
enabling complex error patterns with variable parts (URLs, IPs, hostnames).

Changes:
- Add regex_matches section to errors.yaml
  - DNS lookup errors: 'dial tcp:.*no such host'
  - Connection refused errors
  - Timeout errors (context deadline, i/o timeout, net/http timeout)
  - Handles user's example: Get "https://fred.brew.sh/...": dial tcp: lookup fred.brew.sh on 8.8.8.8:53: no such host

- Update ErrorDetector class
  - Add regex_patterns list to store compiled regex objects
  - Compile patterns with re.IGNORECASE flag for case-insensitive matching
  - Check messages against regex patterns in is_error() method
  - Update extract_error_info() to return pattern_type ("fuzzy", "exact", or "regex")

- Extend test suite with regex pattern tests
  - Test regex pattern loading and compilation
  - Test DNS lookup error detection (user's example)
  - Test connection refused errors
  - Test timeout errors
  - Test case-insensitive regex matching
  - Test error info extraction with pattern_type

Now supports three pattern types:
- Fuzzy: Fast substring matching for simple patterns
- Exact: Precise prefix/exact matching
- Regex: Flexible pattern matching for complex errors with variable parts

Tested with user's DNS error example - successfully detected!
This commit adds a new output format optimized for LLM understanding
and updates the package version to 3.8.2.

New Feature: Markdown-KV Output Format
- Add 'markdownkv' as a new output format option
- Optimized for LLM understanding (60.7% accuracy vs 44.3% for CSV)
- Based on research: https://www.empiricalagents.com/blog/which-table-format-do-llms-understand-best
- Hierarchical structure with markdown headers and code blocks
- Ideal for RAG pipelines and AI systems processing tabular data

Implementation:
- Update OutputFormatter class to support markdownkv
  - Add _format_markdownkv() for query results
  - Add _format_markdownkv_error() for error formatting
  - Add _format_markdownkv_statement() for statement results
  - Format: "# Query Results" + "## Record N" + code blocks with key: value pairs

- Update StackQL class for server mode support
  - Handle markdownkv in execute() for queries
  - Handle markdownkv in executeStmt() for statements

- Add comprehensive test suite
  - tests/test_markdownkv_format.py
  - Tests for simple data, null values, errors, statements
  - Tests for LLM-friendly structure validation
  - Tests for server mode compatibility

Version & Documentation:
- Bump version from 3.8.1 to 3.8.2 in pyproject.toml
- Update CHANGELOG.md with:
  - Centralized error detection feature
  - Markdown-KV output format feature
  - New dependencies (PyYAML)
  - New test suites

This release includes both the error detection feature (previous commits)
and the new Markdown-KV format, making pystackql more powerful for
AI/LLM use cases and production deployments.
@jeffreyaven jeffreyaven merged commit 44e69f6 into main Nov 9, 2025
16 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants