[Feature]  Implement Native LLM Structured Outputs for the Debugging Engine

## Problem Statement

Currently, the `analyze_code_structured` method in `backend/app/services/llm_analysis.py` relies heavily on prompt engineering (e.g., instructing the model to "respond ONLY JSON with this shape") and manual string parsing (`_extract_json`) to generate structured code analysis. This approach is inherently brittle. If the LLM hallucinates markdown blocks incorrectly, misses a bracket, or returns a string instead of an integer for a debugging line number, the manual parser throws an `invalid_json_payload` exception. This not only breaks the API response but can also crash downstream consumers like the VS Code extension that rely on strict types for rendering line markers.

## Proposed Solution

Migrate the AI calls to use native LLM Structured Outputs (JSON Schema validation), which is supported by most modern AI providers.

1. Define a strict schema (e.g., via Pydantic) representing the exact required shape for the code analysis payload (including `explanation`, `debugging`, `suggestions`, `complexity`, and `optimized_version`).
2. Pass this schema directly into the LLM provider's structured output parameter within `LLMAnalysisClient._chat_completion`.
3. Remove the legacy `_extract_json` string-stripping fallback logic entirely, as the API will natively guarantee the JSON shape and data types.

## Alternatives Considered

* **Enhancing the Regex in `_extract_json`:** We could write more robust regex to handle edge cases in the LLM's text output, but this does not solve the core issue of type enforcement (e.g., ensuring `line` is always an integer and `severity` is a strict enum).
* **Using Third-Party Wrappers (e.g., Instructor or Guardrails):** While these enforce schemas, they introduce unnecessary dependencies and latency. Using the native structured output parameters of the underlying models (like `gpt-4o-mini`) is cleaner and more performant.

## Additional Context

* **Target File:** `backend/app/services/llm_analysis.py`
* **Target Method:** `analyze_code_structured`
* As a contributor participating in GSSoC 2026, I would love to be assigned to this issue so I can implement this architectural stability enhancement for the backend!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Implement Native LLM Structured Outputs for the Debugging Engine #900

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature] Implement Native LLM Structured Outputs for the Debugging Engine #900

Description

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions