-
Notifications
You must be signed in to change notification settings - Fork 0
Markdown to PDF
pytex_markdown converts Markdown to native TeX nodes (via
marko). Headings, emphasis, inline/fenced
code, lists, links, images, GFM tables, block quotes and thematic breaks all map
to the standard library, and text is LaTeX-escaped. You can use it two ways: from
the CLI on a .md file, or as a factory inside a Python document.
from pytex_markdown import Markdown, IncludeMarkdown
body = Markdown("# Title\n\nText with **bold**, `code`, [a link](https://x).")
body = IncludeMarkdown("notes.md", base_level=-1) # base_level=-1: # -> \chapterBoth factories are registered, so they also work inside \iffalse{pytex(...)}\fi
markers in a .py.tex source.
When the pytex command renders a .md file it wraps the converted nodes in a
document chosen by --variant:
| Variant | Document |
|---|---|
plain |
a bare Document (default class article); # → \section. |
report |
an HSRT report with title page and table of contents; # → \chapter. |
report-makers |
a report branded with the MAKERS logo (title page + footer). |
protocol-asta |
an AStA meeting protocol (HSRT report, AStA logos). |
protocol-stupa |
a StuPa meeting protocol (HSRT report, StuPa logos). |
pytex notes.md --build # auto-detected style
pytex notes.md --variant report --build # forcedWithout --variant, protocol frontmatter (gremium: or typ: protokoll) picks
a protocol style; everything else falls back to plain.
A YAML --- block at the top of the file feeds document-class parameters. The
parser is a small hand-rolled subset — scalars, inline flow lists ([a, b]),
block lists (- item), and block scalars (| literal / > folded, with +/-
chomping). That's enough for everything below; you don't need a full YAML
engine.
---
title: Quarterly Report
author: Jane Doe
abstract: A short summary.
keywords: [latex, python, reports]
datalines:
- "Course: Typesetting 101"
- "Date: 2026-06-04"
logos: [INF, MAKERS]
bibliography: refs.bib
---
# Introduction
...Keys the report styles read:
-
title— title-page title. If absent, the first#heading is used (and then not also rendered as a chapter). -
author,abstract,keywords. -
datalines— a list of"Label: value"entries for the title page. -
logos— title-page logos: vendored names (INF,MAKERS) and/or paths to custom image files. -
bibliography(aliasesliteratur,bibliografie) — see below. -
abstract_heading/keywords_heading— rename the default "Abstract" / "Keywords" sections.
--config overrides the frontmatter with a JSON object, handy for one-off
tweaks without editing the file:
pytex notes.md --variant plain \
--config '{"documentclass": "scrartcl", "classoptions": ["11pt", "twocolumn"]}'classoptions accepts a list ("twocolumn", "DIV=12") or a {key: value}
object.
A blockquote opened with a callout marker becomes a colored box (from
pytex_components):
> [!NOTE]
> Worth knowing.| Marker | Box |
|---|---|
[!NOTE], [!INFO]
|
InfoBox |
[!TIP], [!HINT], [!SUCCESS]
|
SuccessBox |
[!IMPORTANT] |
ImportantBox |
[!WARNING], [!CAUTION]
|
WarningBox |
Citations use Pandoc syntax and register the biblatex requirement
automatically:
-
[@key]and[@key, p. 5]→\autocite -
[@a; @b]→ a combined\autocite{a,b} - a narrative
@key→\textcite
Keys may contain internal punctuation but not trailing, so a sentence-ending period is never swallowed; code spans and e-mail addresses are left alone.
The bibliography comes from the bibliography: frontmatter key, which is either
a path to a .bib file or inline BibTeX as a | block scalar:
---
bibliography: |
@book{knuth1984,
author = {Knuth, Donald E.},
title = {The TeXbook},
}
---
As @knuth1984 showed, ...Reports embed the bibliography via filecontents + \addbibresource (so the
document stays self-contained) and emit a numbered \printbibliography.
A few conveniences on top of plain Markdown:
-
ASCII math arrows —
->,=>,<->and friends become inline math arrows (\rightarrow, …). -
Tables get a little vertical breathing room (
\addvspace) above and below.
The PDF is compiled with tectonic (XeTeX), which does no font fallback: a character the active text font lacks renders as a blank "tofu" box. The bundled DIN text font is missing a handful of common symbols, so the converter rewrites them to font-independent constructs before they reach the page. The rewrite only touches prose — code spans and code blocks are left exactly as written.
| Char | Becomes | Renders via |
|---|---|---|
€ |
\euro{} |
eurosym (ships its own glyph) |
→ |
$\rightarrow$ |
math font |
↔ |
$\leftrightarrow$ |
math font |
≤ |
$\leq$ |
math font |
≥ |
$\geq$ |
math font |
· |
$\cdot$ |
math font |
The arrow targets are the same macros the ASCII arrows use, so → and ->
typeset identically. · is treated as the multiplication dot (\cdot) rather
than \textperiodcentered, because the latter is itself a text-font glyph and
would tofu under DIN; the math \cdot is always available.
Markdown("Pay 5€ — quality ≥ 90% — see A → B.").rendered
# ... 5\euro{} ... $\geq$ ... $\rightarrow$ ...This works at every trust level. eurosym is on the package allowlist for
UNTRUSTED and SANDBOXED builds, so a € in untrusted input via the
Blob API renders instead of being rejected; the math targets pull no
package at all.
A character that is neither in the table above nor present in every
bundled DIN weight is genuinely unrenderable. Instead of emitting silent tofu,
the converter replaces it with a visible \texttt{[missing glyph]} placeholder
and raises a MissingGlyphWarning naming the character and its code point:
import warnings
from pytex_markdown import Markdown
from pytex_markdown.glyphs import MissingGlyphWarning
with warnings.catch_warnings(record=True) as caught:
warnings.simplefilter("always", MissingGlyphWarning)
out = Markdown("price 中 tag").rendered
# out contains: price \texttt{[missing glyph]} tag
# caught[0].message: no glyph for '中' (U+4E2D) in the bundled DIN font ...DIN coverage is determined by parsing the bundled fonts' cmap tables directly
(no extra dependency). The rule is deliberately conservative: a character counts
as renderable only if it is present in every bundled DIN weight, since a glyph
missing from a single weight (e.g. bold) would tofu wherever that weight is used.
Common Latin text — including German diacritics (äöüß), dashes and ellipsis —
is fully covered and passes through unchanged.