-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Problem Description
When using the local vLLM backend to generate QA pairs (e.g. atomic / aggregated QA), inputs that contain Markdown-formatted content (such as #, ``` , *, `>`, `---`) may cause QA parsing failures.
Reproduction Conditions
-
Use local vLLM backend
-
Input text is Markdown-formatted and contains one or more of:
- Headings (
#,##) - Separators (
---,***) - Code blocks (```)
- Comment-style or annotation-heavy text
- Headings (
-
Generate and parse QA pairs (e.g. atomic / aggregated QA)
Bad Case Example
Below is two real input example that reliably triggers the issue.
Although the overall request is valid JSON, the content field contains a large amount of Markdown separators and annotation-style text:
Observed Behavior
When processed by local vLLM, this input may result in:
- Markdown separators such as
---being interpreted as semantic or structural boundaries - Question and Answer delimiters being duplicated, shifted, or merged
- QA extraction logic failing to reliably identify the true Q/A boundaries
- Final QA outputs becoming malformed or unparseable
This behavior is especially prominent in prompts that contain annotation-heavy Markdown content.
Root Cause Analysis
1. Markdown Special Characters Are Not Handled During Input Construction
In VLLMWrapper._build_inputs, conversation history and prompts are constructed via plain string concatenation:
@staticmethod
def _build_inputs(prompt: str, history: Optional[List[str]] = None) -> str:
msgs = history or []
lines = []
for m in msgs:
if isinstance(m, dict):
role = m.get("role", "")
content = m.get("content", "")
lines.append(f"{role}: {content}")
else:
lines.append(str(m))
lines.append(prompt)
return "\n".join(lines)This implementation:
- Does not escape or normalize Markdown structural symbols
- Directly injects Markdown syntax into the model context
As a result, the model may misinterpret Markdown markers as semantic or QA boundaries, leading to unstable QA generation and parsing failures.