Skip to content

fix(rag): handle malformed PDF parser failures#976

Open
eshaanag wants to merge 2 commits into
SdSarthak:mainfrom
eshaanag:fix/rag-ingest-parser-errors
Open

fix(rag): handle malformed PDF parser failures#976
eshaanag wants to merge 2 commits into
SdSarthak:mainfrom
eshaanag:fix/rag-ingest-parser-errors

Conversation

@eshaanag

@eshaanag eshaanag commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #975

Handles malformed or encrypted PDF parser failures as sanitized client errors, and maps unexpected document-loader failures to logged, generic service-unavailable responses. Adds focused regression coverage for both failure paths so parser and infrastructure details are not exposed to API clients.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor
  • Tests
  • Infra / CI

Checklist

  • I have read CONTRIBUTING.md
  • My code follows the project style (PEP 8 for Python, ESLint for TS)
  • I have added/updated tests where relevant
  • pytest backend/tests/ passes locally (local checkout does not have pytest installed)
  • I have not committed .env or any secrets
  • I have updated documentation if needed

Testing

  • git diff --check
  • python3 -m compileall backend/app/api/v1/rag.py backend/tests/test_rag_ingest.py
  • python3 -m pytest backend/tests/test_rag_ingest.py -q (not run: local Python environment has no pytest module)

Screenshots (if UI change)

Not applicable.

eshaanag added 2 commits June 4, 2026 22:18
Signed-off-by: Eshaan Agrawal <agrawaleshaan12@gmail.com>
Signed-off-by: Eshaan Agrawal <agrawaleshaan12@gmail.com>
@eshaanag

eshaanag commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

CI baseline note after the latest push:

  • Backend Tests: the two new tests pass explicitly (test_pdf_parser_failure_returns_400 and test_unexpected_loader_failure_returns_503); final result is 28 failed, 374 passed, 1 skipped.
  • Pytest (auto-discover): 28 failed, 374 passed, 1 skipped.
  • Latest main Backend Tests result is 28 failed, 372 passed, 1 skipped, with the same existing RAG route-registration failures: https://github.com/SdSarthak/AegisAI/actions/runs/26838482535
  • Frontend lint also matches the existing main failure (25 errors in untouched frontend files).

Current PR runs: CI · Pytest

@eshaanag

eshaanag commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

This PR is tied to #975, which I raised for GSSoC 2026 around the RAG PDF parser failure path. Could you please add the relevant GSSoC label if it fits the program rules?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: malformed RAG PDFs escape validation as unhandled 500s

1 participant