Self Checks
RAGFlow workspace code commit ID
8995662ee
RAGFlow image version
nightly (vulnerable code introduced in commit 58819f5d3, PR #14927, merged 2026-05-15; still present at 8995662ee).
Other environment information
- Hardware parameters: N/A (deployment-independent)
- OS type: N/A
- Others: Any multi-tenant RAGFlow deployment (more than one registered user / issued API key). Storage-engine independent (Elasticsearch or Infinity) — the defect is in the API authorization layer, not in storage.
Actual behavior
Severity: Critical — cross-tenant confidential data exposure (CWE-639: Authorization Bypass Through User-Controlled Key / IDOR / Broken Object-Level Authorization).
The download_document handler at api/apps/sdk/doc.py#L115-L170 (route GET /api/v1/documents/<document_id>) looks documents up by ID alone and streams the raw file with no ownership / tenant check:
@manager.route("/documents/<document_id>", methods=["GET"])
@login_required
async def download_document(document_id):
if not document_id:
return get_error_data_result(message="Specify document_id please.")
doc = DocumentService.query(id=document_id) # ← queries by ID ONLY, no tenant scope
if not doc:
return get_error_data_result(message=f"The dataset not own the document {document_id}.")
doc_id, doc_location = File2DocumentService.get_storage_address(doc_id=document_id)
file_stream = settings.STORAGE_IMPL.get(doc_id, doc_location) # ← streams the raw file
...
return await send_file(file, as_attachment=True, attachment_filename=doc[0].name, ...)
@login_required only verifies the caller is some authenticated principal. Its current_user resolver (_load_user, api/apps/init.py#L129-L188) accepts a session cookie, a JWT, or any valid API token — the APIToken.query(token=...) branch (lines 170–186) authenticates the request as the token's owning tenant. The handler then never references the authenticated user, tenant, or owning knowledge base. DocumentService.query applies no implicit tenant filter — BaseService.query (api/db/services/common_service.py#L90) just forwards **kwargs as equality filters.
Result: any logged-in user — or any holder of any issued API key, from any tenant — can download the raw file of any document belonging to any other tenant in the deployment.
Why this is a real, distinct bug (not a false positive, not a duplicate)
-
Every other document-access path scopes the lookup by ownership; this one is the lone outlier.
- The sibling endpoint directly above it,
download (api/apps/sdk/doc.py#L94) — DocumentService.query(kb_id=dataset_id, id=document_id) (scopes by KB).
- api/apps/sdk/doc.py#L211,
#L302, #L430 — KnowledgebaseService.accessible(kb_id=..., user_id=tenant_id).
api/apps/restful_apis/document_api.py:271, 317, 719, … — same accessible(...) guard.
- Most tellingly, there is a parallel "return the raw file bytes" endpoint —
get(doc_id) at api/apps/restful_apis/document_api.py:1838-1846 — whose own docstring states it blocks "cross-tenant ID enumeration" and does exactly:
if not DocumentService.accessible(doc_id, current_user.id):
return get_data_error_result(message="Document not found!")
The new SDK download_document is the only raw-file path that dropped the ownership check.
-
It is a freshly introduced regression, not a pre-existing/known issue. The route was added in commit 58819f5d3 (PR #14927, "fix: add document download endpoint and refactor existing download function", 2026-05-15). The refactor split the dataset-scoped download from a new dataset-less download_document and, in doing so, omitted the guard.
-
It is not the tenant_id vs user_id "confusion" false-positive pattern seen elsewhere. RAGFlow intentionally sets tenant.id == user.id for the account owner (api/db/joint_services/user_account_service.py:65, 75-78 — the tenant row gets "id": user_id and the OWNER UserTenant row gets "tenant_id": user_id), so KnowledgebaseService.accessible, which compares kb.tenant_id == user_id (api/db/services/knowledgebase_service.py#L485-L499), is correct. This endpoint is a genuine missing-check defect, not a mislabeled-argument one.
Impact
In a multi-tenant RAG product the uploaded documents are the confidential asset. Any free / low-privilege account, or any issued API key, can exfiltrate every other tenant's source files. Document IDs are 32-char hex but leak constantly — in GET /api/v1/datasets/<id>/documents responses, chat citation / reference payloads, shared chatbot links, and logs — so this is practically exploitable, not merely theoretical. No special privileges, no relationship to the victim tenant, and no brute force are required once an ID is observed.
Expected behavior
The handler should enforce the same authorization check used by every other document-access path: return the file only if the authenticated user can access the document's knowledge base. A request for a document the caller does not own must return an authorization / not-found error, not the file (mirroring the get(doc_id) endpoint, which returns "Document not found!" to avoid cross-tenant ID enumeration).
Steps to reproduce
- Deploy RAGFlow with at least two tenants (Tenant A and Tenant B), each with their own login / API key.
- As Tenant A, upload a document and note its
document_id (visible in GET /api/v1/datasets/<id>/documents, citations, etc.). Example: 9f8c1a2b3d4e5f60718293a4b5c6d7e8.
- As Tenant B (a completely separate, low-privilege account with no relationship to Tenant A), call the new endpoint with Tenant B's own valid credentials:
curl -L -H "Authorization: Bearer <TENANT_B_API_KEY>" \
"http://<host>/api/v1/documents/9f8c1a2b3d4e5f60718293a4b5c6d7e8" \
-o leaked.bin
- Observed: Tenant B receives Tenant A's raw document file (
leaked.bin).
Expected: an authorization / not-found error (e.g. You don't own the document <id>.).
Additional information
Suggested fix
Add the same ownership guard the rest of the codebase uses. Because this handler uses @login_required (no injected tenant_id), use current_user.id, which is the established pattern for login_required handlers (e.g. api/apps/restful_apis/document_api.py:1846). The cleanest option reuses DocumentService.accessible (api/db/services/document_service.py#L766), which is itself just KnowledgebaseService.accessible(doc.kb_id, user_id):
from api.apps import current_user, login_required # add current_user to the existing import
@manager.route("/documents/<document_id>", methods=["GET"])
@login_required
async def download_document(document_id):
if not document_id:
return get_error_data_result(message="Specify document_id please.")
if not DocumentService.accessible(document_id, current_user.id):
return get_error_data_result(message=f"You don't own the document {document_id}.")
doc = DocumentService.query(id=document_id)
if not doc:
return get_error_data_result(message=f"Document {document_id} not found.")
...
Equivalently, query first then call KnowledgebaseService.accessible(kb_id=doc[0].kb_id, user_id=current_user.id), matching the guard at api/apps/sdk/doc.py#L211 / #L302.
Summary
- File:
api/apps/sdk/doc.py
- Function:
download_document (lines 115–170)
- Route:
GET /api/v1/documents/<document_id>
- Introduced by: commit
58819f5d3 (PR #14927, 2026-05-15)
- Class: CWE-639 — Authorization Bypass Through User-Controlled Key (IDOR / Broken Object-Level Authorization)
Self Checks
RAGFlow workspace code commit ID
8995662ee
RAGFlow image version
nightly (vulnerable code introduced in commit
58819f5d3, PR #14927, merged 2026-05-15; still present at8995662ee).Other environment information
Actual behavior
Severity: Critical — cross-tenant confidential data exposure (CWE-639: Authorization Bypass Through User-Controlled Key / IDOR / Broken Object-Level Authorization).
The
download_documenthandler at api/apps/sdk/doc.py#L115-L170 (routeGET /api/v1/documents/<document_id>) looks documents up by ID alone and streams the raw file with no ownership / tenant check:@login_requiredonly verifies the caller is some authenticated principal. Itscurrent_userresolver (_load_user, api/apps/init.py#L129-L188) accepts a session cookie, a JWT, or any valid API token — theAPIToken.query(token=...)branch (lines 170–186) authenticates the request as the token's owning tenant. The handler then never references the authenticated user, tenant, or owning knowledge base.DocumentService.queryapplies no implicit tenant filter —BaseService.query(api/db/services/common_service.py#L90) just forwards**kwargsas equality filters.Result: any logged-in user — or any holder of any issued API key, from any tenant — can download the raw file of any document belonging to any other tenant in the deployment.
Why this is a real, distinct bug (not a false positive, not a duplicate)
Every other document-access path scopes the lookup by ownership; this one is the lone outlier.
download(api/apps/sdk/doc.py#L94) —DocumentService.query(kb_id=dataset_id, id=document_id)(scopes by KB).#L302,#L430—KnowledgebaseService.accessible(kb_id=..., user_id=tenant_id).api/apps/restful_apis/document_api.py:271, 317, 719, …— sameaccessible(...)guard.get(doc_id)atapi/apps/restful_apis/document_api.py:1838-1846— whose own docstring states it blocks "cross-tenant ID enumeration" and does exactly:The new SDK
download_documentis the only raw-file path that dropped the ownership check.It is a freshly introduced regression, not a pre-existing/known issue. The route was added in commit
58819f5d3(PR #14927, "fix: add document download endpoint and refactor existing download function", 2026-05-15). The refactor split the dataset-scopeddownloadfrom a new dataset-lessdownload_documentand, in doing so, omitted the guard.It is not the
tenant_idvsuser_id"confusion" false-positive pattern seen elsewhere. RAGFlow intentionally setstenant.id == user.idfor the account owner (api/db/joint_services/user_account_service.py:65, 75-78— the tenant row gets"id": user_idand the OWNERUserTenantrow gets"tenant_id": user_id), soKnowledgebaseService.accessible, which compareskb.tenant_id == user_id(api/db/services/knowledgebase_service.py#L485-L499), is correct. This endpoint is a genuine missing-check defect, not a mislabeled-argument one.Impact
In a multi-tenant RAG product the uploaded documents are the confidential asset. Any free / low-privilege account, or any issued API key, can exfiltrate every other tenant's source files. Document IDs are 32-char hex but leak constantly — in
GET /api/v1/datasets/<id>/documentsresponses, chat citation / reference payloads, shared chatbot links, and logs — so this is practically exploitable, not merely theoretical. No special privileges, no relationship to the victim tenant, and no brute force are required once an ID is observed.Expected behavior
The handler should enforce the same authorization check used by every other document-access path: return the file only if the authenticated user can access the document's knowledge base. A request for a document the caller does not own must return an authorization / not-found error, not the file (mirroring the
get(doc_id)endpoint, which returns "Document not found!" to avoid cross-tenant ID enumeration).Steps to reproduce
document_id(visible inGET /api/v1/datasets/<id>/documents, citations, etc.). Example:9f8c1a2b3d4e5f60718293a4b5c6d7e8.leaked.bin).Expected: an authorization / not-found error (e.g.
You don't own the document <id>.).Additional information
Suggested fix
Add the same ownership guard the rest of the codebase uses. Because this handler uses
@login_required(no injectedtenant_id), usecurrent_user.id, which is the established pattern forlogin_requiredhandlers (e.g.api/apps/restful_apis/document_api.py:1846). The cleanest option reusesDocumentService.accessible(api/db/services/document_service.py#L766), which is itself justKnowledgebaseService.accessible(doc.kb_id, user_id):Equivalently, query first then call
KnowledgebaseService.accessible(kb_id=doc[0].kb_id, user_id=current_user.id), matching the guard at api/apps/sdk/doc.py#L211 /#L302.Summary
api/apps/sdk/doc.pydownload_document(lines 115–170)GET /api/v1/documents/<document_id>58819f5d3(PR #14927, 2026-05-15)