Skip to content

[Bug]: Cross-tenant document download via GET /api/v1/documents/<document_id> (IDOR / Broken Object-Level Authorization) #133

@enjoyandlove

Description

@enjoyandlove

Self Checks

  • I have searched for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (Language Policy).
  • Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

8995662ee

RAGFlow image version

nightly (vulnerable code introduced in commit 58819f5d3, PR #14927, merged 2026-05-15; still present at 8995662ee).

Other environment information

  • Hardware parameters: N/A (deployment-independent)
  • OS type: N/A
  • Others: Any multi-tenant RAGFlow deployment (more than one registered user / issued API key). Storage-engine independent (Elasticsearch or Infinity) — the defect is in the API authorization layer, not in storage.

Actual behavior

Severity: Critical — cross-tenant confidential data exposure (CWE-639: Authorization Bypass Through User-Controlled Key / IDOR / Broken Object-Level Authorization).

The download_document handler at api/apps/sdk/doc.py#L115-L170 (route GET /api/v1/documents/<document_id>) looks documents up by ID alone and streams the raw file with no ownership / tenant check:

@manager.route("/documents/<document_id>", methods=["GET"])
@login_required
async def download_document(document_id):
    if not document_id:
        return get_error_data_result(message="Specify document_id please.")
    doc = DocumentService.query(id=document_id)          # ← queries by ID ONLY, no tenant scope
    if not doc:
        return get_error_data_result(message=f"The dataset not own the document {document_id}.")
    doc_id, doc_location = File2DocumentService.get_storage_address(doc_id=document_id)
    file_stream = settings.STORAGE_IMPL.get(doc_id, doc_location)   # ← streams the raw file
    ...
    return await send_file(file, as_attachment=True, attachment_filename=doc[0].name, ...)

@login_required only verifies the caller is some authenticated principal. Its current_user resolver (_load_user, api/apps/init.py#L129-L188) accepts a session cookie, a JWT, or any valid API token — the APIToken.query(token=...) branch (lines 170–186) authenticates the request as the token's owning tenant. The handler then never references the authenticated user, tenant, or owning knowledge base. DocumentService.query applies no implicit tenant filter — BaseService.query (api/db/services/common_service.py#L90) just forwards **kwargs as equality filters.

Result: any logged-in user — or any holder of any issued API key, from any tenant — can download the raw file of any document belonging to any other tenant in the deployment.

Why this is a real, distinct bug (not a false positive, not a duplicate)

  1. Every other document-access path scopes the lookup by ownership; this one is the lone outlier.

    • The sibling endpoint directly above it, download (api/apps/sdk/doc.py#L94) — DocumentService.query(kb_id=dataset_id, id=document_id) (scopes by KB).
    • api/apps/sdk/doc.py#L211, #L302, #L430KnowledgebaseService.accessible(kb_id=..., user_id=tenant_id).
    • api/apps/restful_apis/document_api.py:271, 317, 719, … — same accessible(...) guard.
    • Most tellingly, there is a parallel "return the raw file bytes" endpointget(doc_id) at api/apps/restful_apis/document_api.py:1838-1846 — whose own docstring states it blocks "cross-tenant ID enumeration" and does exactly:
      if not DocumentService.accessible(doc_id, current_user.id):
          return get_data_error_result(message="Document not found!")

    The new SDK download_document is the only raw-file path that dropped the ownership check.

  2. It is a freshly introduced regression, not a pre-existing/known issue. The route was added in commit 58819f5d3 (PR #14927, "fix: add document download endpoint and refactor existing download function", 2026-05-15). The refactor split the dataset-scoped download from a new dataset-less download_document and, in doing so, omitted the guard.

  3. It is not the tenant_id vs user_id "confusion" false-positive pattern seen elsewhere. RAGFlow intentionally sets tenant.id == user.id for the account owner (api/db/joint_services/user_account_service.py:65, 75-78 — the tenant row gets "id": user_id and the OWNER UserTenant row gets "tenant_id": user_id), so KnowledgebaseService.accessible, which compares kb.tenant_id == user_id (api/db/services/knowledgebase_service.py#L485-L499), is correct. This endpoint is a genuine missing-check defect, not a mislabeled-argument one.

Impact

In a multi-tenant RAG product the uploaded documents are the confidential asset. Any free / low-privilege account, or any issued API key, can exfiltrate every other tenant's source files. Document IDs are 32-char hex but leak constantly — in GET /api/v1/datasets/<id>/documents responses, chat citation / reference payloads, shared chatbot links, and logs — so this is practically exploitable, not merely theoretical. No special privileges, no relationship to the victim tenant, and no brute force are required once an ID is observed.

Expected behavior

The handler should enforce the same authorization check used by every other document-access path: return the file only if the authenticated user can access the document's knowledge base. A request for a document the caller does not own must return an authorization / not-found error, not the file (mirroring the get(doc_id) endpoint, which returns "Document not found!" to avoid cross-tenant ID enumeration).

Steps to reproduce

  1. Deploy RAGFlow with at least two tenants (Tenant A and Tenant B), each with their own login / API key.
  2. As Tenant A, upload a document and note its document_id (visible in GET /api/v1/datasets/<id>/documents, citations, etc.). Example: 9f8c1a2b3d4e5f60718293a4b5c6d7e8.
  3. As Tenant B (a completely separate, low-privilege account with no relationship to Tenant A), call the new endpoint with Tenant B's own valid credentials:
    curl -L -H "Authorization: Bearer <TENANT_B_API_KEY>" \
         "http://<host>/api/v1/documents/9f8c1a2b3d4e5f60718293a4b5c6d7e8" \
         -o leaked.bin
  4. Observed: Tenant B receives Tenant A's raw document file (leaked.bin).
    Expected: an authorization / not-found error (e.g. You don't own the document <id>.).

Additional information

Suggested fix

Add the same ownership guard the rest of the codebase uses. Because this handler uses @login_required (no injected tenant_id), use current_user.id, which is the established pattern for login_required handlers (e.g. api/apps/restful_apis/document_api.py:1846). The cleanest option reuses DocumentService.accessible (api/db/services/document_service.py#L766), which is itself just KnowledgebaseService.accessible(doc.kb_id, user_id):

from api.apps import current_user, login_required  # add current_user to the existing import

@manager.route("/documents/<document_id>", methods=["GET"])
@login_required
async def download_document(document_id):
    if not document_id:
        return get_error_data_result(message="Specify document_id please.")
    if not DocumentService.accessible(document_id, current_user.id):
        return get_error_data_result(message=f"You don't own the document {document_id}.")
    doc = DocumentService.query(id=document_id)
    if not doc:
        return get_error_data_result(message=f"Document {document_id} not found.")
    ...

Equivalently, query first then call KnowledgebaseService.accessible(kb_id=doc[0].kb_id, user_id=current_user.id), matching the guard at api/apps/sdk/doc.py#L211 / #L302.

Summary

  • File: api/apps/sdk/doc.py
  • Function: download_document (lines 115–170)
  • Route: GET /api/v1/documents/<document_id>
  • Introduced by: commit 58819f5d3 (PR #14927, 2026-05-15)
  • Class: CWE-639 — Authorization Bypass Through User-Controlled Key (IDOR / Broken Object-Level Authorization)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions