Skip to content

Markdown to PDF

Frederik Beimgraben edited this page Jun 4, 2026 · 2 revisions

Markdown to PDF

pytex_markdown converts Markdown to native TeX nodes (via marko). Headings, emphasis, inline/fenced code, lists, links, images, GFM tables, block quotes and thematic breaks all map to the standard library, and text is LaTeX-escaped. You can use it two ways: from the CLI on a .md file, or as a factory inside a Python document.

from pytex_markdown import Markdown, IncludeMarkdown

body = Markdown("# Title\n\nText with **bold**, `code`, [a link](https://x).")
body = IncludeMarkdown("notes.md", base_level=-1)   # base_level=-1: # -> \chapter

Both factories are registered, so they also work inside \iffalse{pytex(...)}\fi markers in a .py.tex source.

CLI variants

When the pytex command renders a .md file it wraps the converted nodes in a document chosen by --variant:

Variant Document
plain a bare Document (default class article); #\section.
report an HSRT report with title page and table of contents; #\chapter.
report-makers a report branded with the MAKERS logo (title page + footer).
protocol-asta an AStA meeting protocol (HSRT report, AStA logos).
protocol-stupa a StuPa meeting protocol (HSRT report, StuPa logos).
pytex notes.md --build                      # auto-detected style
pytex notes.md --variant report --build     # forced

Without --variant, protocol frontmatter (gremium: or typ: protokoll) picks a protocol style; everything else falls back to plain.

Frontmatter

A YAML --- block at the top of the file feeds document-class parameters. The parser is a small hand-rolled subset — scalars, inline flow lists ([a, b]), block lists (- item), and block scalars (| literal / > folded, with +/- chomping). That's enough for everything below; you don't need a full YAML engine.

---
title: Quarterly Report
author: Jane Doe
abstract: A short summary.
keywords: [latex, python, reports]
datalines:
  - "Course: Typesetting 101"
  - "Date: 2026-06-04"
logos: [INF, MAKERS]
bibliography: refs.bib
---

# Introduction
...

Keys the report styles read:

  • title — title-page title. If absent, the first # heading is used (and then not also rendered as a chapter).
  • author, abstract, keywords.
  • datalines — a list of "Label: value" entries for the title page.
  • logos — title-page logos: vendored names (INF, MAKERS) and/or paths to custom image files.
  • bibliography (aliases literatur, bibliografie) — see below.
  • abstract_heading / keywords_heading — rename the default "Abstract" / "Keywords" sections.

--config overrides the frontmatter with a JSON object, handy for one-off tweaks without editing the file:

pytex notes.md --variant plain \
  --config '{"documentclass": "scrartcl", "classoptions": ["11pt", "twocolumn"]}'

classoptions accepts a list ("twocolumn", "DIV=12") or a {key: value} object.

GitHub-style callouts

A blockquote opened with a callout marker becomes a colored box (from pytex_components):

> [!NOTE]
> Worth knowing.
Marker Box
[!NOTE], [!INFO] InfoBox
[!TIP], [!HINT], [!SUCCESS] SuccessBox
[!IMPORTANT] ImportantBox
[!WARNING], [!CAUTION] WarningBox

Citations and bibliography

Citations use Pandoc syntax and register the biblatex requirement automatically:

  • [@key] and [@key, p. 5]\autocite
  • [@a; @b] → a combined \autocite{a,b}
  • a narrative @key\textcite

Keys may contain internal punctuation but not trailing, so a sentence-ending period is never swallowed; code spans and e-mail addresses are left alone.

The bibliography comes from the bibliography: frontmatter key, which is either a path to a .bib file or inline BibTeX as a | block scalar:

---
bibliography: |
  @book{knuth1984,
    author = {Knuth, Donald E.},
    title  = {The TeXbook},
  }
---

As @knuth1984 showed, ...

Reports embed the bibliography via filecontents + \addbibresource (so the document stays self-contained) and emit a numbered \printbibliography.

Typographic extras

A few conveniences on top of plain Markdown:

  • ASCII math arrows->, =>, <-> and friends become inline math arrows (\rightarrow, …).
  • Tables get a little vertical breathing room (\addvspace) above and below.

Unicode glyph handling

The PDF is compiled with tectonic (XeTeX), which does no font fallback: a character the active text font lacks renders as a blank "tofu" box. The bundled DIN text font is missing a handful of common symbols, so the converter rewrites them to font-independent constructs before they reach the page. The rewrite only touches prose — code spans and code blocks are left exactly as written.

Char Becomes Renders via
\euro{} eurosym (ships its own glyph)
$\rightarrow$ math font
$\leftrightarrow$ math font
$\leq$ math font
$\geq$ math font
· $\cdot$ math font

The arrow targets are the same macros the ASCII arrows use, so and -> typeset identically. · is treated as the multiplication dot (\cdot) rather than \textperiodcentered, because the latter is itself a text-font glyph and would tofu under DIN; the math \cdot is always available.

Markdown("Pay 5€ — quality ≥ 90% — see A → B.").rendered
# ... 5\euro{} ... $\geq$ ... $\rightarrow$ ...

This works at every trust level. eurosym is on the package allowlist for UNTRUSTED and SANDBOXED builds, so a in untrusted input via the Blob API renders instead of being rejected; the math targets pull no package at all.

Missing glyphs

A character that is neither in the table above nor present in every bundled DIN weight is genuinely unrenderable. Instead of emitting silent tofu, the converter replaces it with a visible \texttt{[missing glyph]} placeholder and raises a MissingGlyphWarning naming the character and its code point:

import warnings
from pytex_markdown import Markdown
from pytex_markdown.glyphs import MissingGlyphWarning

with warnings.catch_warnings(record=True) as caught:
    warnings.simplefilter("always", MissingGlyphWarning)
    out = Markdown("price 中 tag").rendered

# out contains:  price \texttt{[missing glyph]} tag
# caught[0].message:  no glyph for '中' (U+4E2D) in the bundled DIN font ...

DIN coverage is determined by parsing the bundled fonts' cmap tables directly (no extra dependency). The rule is deliberately conservative: a character counts as renderable only if it is present in every bundled DIN weight, since a glyph missing from a single weight (e.g. bold) would tofu wherever that weight is used. Common Latin text — including German diacritics (äöüß), dashes and ellipsis — is fully covered and passes through unchanged.

Clone this wiki locally