diff options
Diffstat (limited to 'docs/handbook/cmark/latex-renderer.md')
| -rw-r--r-- | docs/handbook/cmark/latex-renderer.md | 320 |
1 files changed, 320 insertions, 0 deletions
diff --git a/docs/handbook/cmark/latex-renderer.md b/docs/handbook/cmark/latex-renderer.md new file mode 100644 index 0000000000..d7a492d580 --- /dev/null +++ b/docs/handbook/cmark/latex-renderer.md @@ -0,0 +1,320 @@ +# cmark — LaTeX Renderer + +## Overview + +The LaTeX renderer (`latex.c`) converts a `cmark_node` AST into LaTeX source, suitable for compilation with `pdflatex`, `xelatex`, or `lualatex`. It uses the generic render framework from `render.c`, operating through a per-character output callback (`outc`) and a per-node render callback (`S_render_node`). + +## Entry Point + +```c +char *cmark_render_latex(cmark_node *root, int options, int width); +``` + +- `root` — AST root node +- `options` — Option flags (`CMARK_OPT_SOURCEPOS`, `CMARK_OPT_HARDBREAKS`, `CMARK_OPT_NOBREAKS`, `CMARK_OPT_UNSAFE`) +- `width` — Target line width for hard-wrapping; 0 disables wrapping + +## Character Escaping (`outc`) + +The `outc` function handles per-character output decisions. It is the most complex part of the LaTeX renderer, with different behavior for three escaping modes: + +```c +static void outc(cmark_renderer *renderer, cmark_escaping escape, + int32_t c, unsigned char nextc); +``` + +### LITERAL Mode +Pass-through: all characters are output unchanged. + +### NORMAL Mode +Extensive special-character handling: + +| Character | LaTeX Output | Purpose | +|-----------|-------------|---------| +| `$` | `\$` | Math mode delimiter | +| `%` | `\%` | Comment character | +| `&` | `\&` | Table column separator | +| `_` | `\_` | Subscript operator | +| `#` | `\#` | Parameter reference | +| `^` | `\^{}` | Superscript operator | +| `{` | `\{` | Group open | +| `}` | `\}` | Group close | +| `~` | `\textasciitilde{}` | Non-breaking space | +| `[` | `{[}` | Optional argument bracket | +| `]` | `{]}` | Optional argument bracket | +| `\` | `\textbackslash{}` | Escape character | +| `|` | `\textbar{}` | Pipe | +| `'` | `\textquotesingle{}` | Straight single quote | +| `"` | `\textquotedbl{}` | Straight double quote | +| `` ` `` | `\textasciigrave{}` | Backtick | +| `\xA0` (NBSP) | `~` | LaTeX non-breaking space | +| `\x2014` (—) | `---` | Em dash | +| `\x2013` (–) | `--` | En dash | +| `\x2018` (') | `` ` `` | Left single quote | +| `\x2019` (') | `'` | Right single quote | +| `\x201C` (") | ` `` ` | Left double quote | +| `\x201D` (") | `''` | Right double quote | + +### URL Mode +Only these characters are escaped: +- `$` → `\$` +- `%` → `\%` +- `&` → `\&` +- `_` → `\_` +- `#` → `\#` +- `{` → `\{` +- `}` → `\}` + +All other characters pass through unchanged. + +## Link Type Classification + +The renderer classifies links into five categories: + +```c +typedef enum { + NO_LINK, + URL_AUTOLINK, + EMAIL_AUTOLINK, + NORMAL_LINK, + INTERNAL_LINK, +} link_type; +``` + +### `get_link_type()` + +```c +static link_type get_link_type(cmark_node *node) { + // 1. "mailto:" links where text matches url + // 2. "http[s]:" links where text matches url (with or without protocol) + // 3. Links starting with '#' → INTERNAL_LINK + // 4. Everything else → NORMAL_LINK +} +``` + +Detection logic: +1. **URL_AUTOLINK**: The `url` starts with `http://` or `https://`, the link has exactly one text child, and that child's content matches the URL (or matches the URL minus the protocol prefix). +2. **EMAIL_AUTOLINK**: The `url` starts with `mailto:`, the link has exactly one text child, and that child's content matches the URL after `mailto:`. +3. **INTERNAL_LINK**: The `url` starts with `#`. +4. **NORMAL_LINK**: Everything else. + +## Enumeration Level + +For nested ordered lists, the renderer selects the appropriate LaTeX counter style: + +```c +static int S_get_enumlevel(cmark_node *node) { + int enumlevel = 0; + cmark_node *tmp = node; + while (tmp) { + if (tmp->type == CMARK_NODE_LIST && + cmark_node_get_list_type(tmp) == CMARK_ORDERED_LIST) { + enumlevel++; + } + tmp = tmp->parent; + } + return enumlevel; +} +``` + +This walks up the tree, counting ordered list ancestors. LaTeX ordered lists cycle through: `enumi` (arabic), `enumii` (alpha), `enumiii` (roman), `enumiv` (Alpha). + +## Node Rendering (`S_render_node`) + +### Block Nodes + +#### Document +No output. + +#### Block Quote +``` +ENTER: \begin{quote}\n +EXIT: \end{quote}\n +``` + +#### List +``` +ENTER (bullet): \begin{itemize}\n +ENTER (ordered): \begin{enumerate}\n + \def\labelenumI{COUNTER}\n (if start != 1) + \setcounter{enumI}{START-1}\n +EXIT: \end{itemize}\n or \end{enumerate}\n +``` + +The counter is formatted based on enumeration level: +- Level 1: `\arabic{enumi}.` +- Level 2: `\alph{enumii}.` (surrounded by `(`) +- Level 3: `\roman{enumiii}.` +- Level 4: `\Alph{enumiv}.` + +Period delimiters use `.`, parenthesis delimiters use `)`. + +#### Item +``` +ENTER: \item{} (empty braces prevent ligatures with following content) +EXIT: \n +``` + +#### Heading +``` +ENTER: \section{ or \subsection{ or \subsubsection{ or \paragraph{ or \subparagraph{ +EXIT: }\n +``` + +Mapping: level 1 → `\section`, level 2 → `\subsection`, level 3 → `\subsubsection`, level 4 → `\paragraph`, level 5 → `\subparagraph`. + +#### Code Block +```latex +\begin{verbatim} +LITERAL CONTENT +\end{verbatim} +``` + +The content is output in `LITERAL` escape mode (no character escaping). Info strings are ignored. + +#### HTML Block +``` +ENTER: % raw HTML omitted\n (as a LaTeX comment) +``` + +Raw HTML is always omitted in LaTeX output, regardless of `CMARK_OPT_UNSAFE`. + +#### Thematic Break +``` +\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}\n +``` + +#### Paragraph +Same tight-list check as the HTML renderer: +```c +parent = cmark_node_parent(node); +grandparent = cmark_node_parent(parent); +tight = (grandparent && grandparent->type == CMARK_NODE_LIST) ? + grandparent->as.list.tight : false; +``` +- Normal: newline before and after +- Tight: no leading/trailing blank lines + +### Inline Nodes + +#### Text +Output with NORMAL escaping. + +#### Soft Break +Depends on options: +- `CMARK_OPT_HARDBREAKS`: `\\\\\n` +- `CMARK_OPT_NOBREAKS`: space +- Default: newline + +#### Line Break +``` +\\\\\n +``` + +#### Code (inline) +``` +\texttt{ESCAPED CONTENT} +``` + +Special handling: Code content is output character-by-character with inline-code escaping. Special characters (`\`, `{`, `}`, `$`, `%`, `&`, `_`, `#`, `^`, `~`) are escaped. + +#### Emphasis +``` +ENTER: \emph{ +EXIT: } +``` + +#### Strong +``` +ENTER: \textbf{ +EXIT: } +``` + +#### Link +Rendering depends on link type: + +**NORMAL_LINK:** +``` +ENTER: \href{URL}{ +EXIT: } +``` + +**URL_AUTOLINK:** +``` +ENTER: \url{URL} +(children are skipped — no EXIT rendering needed) +``` + +**EMAIL_AUTOLINK:** +``` +ENTER: \href{URL}{\nolinkurl{ +EXIT: }} +``` + +**INTERNAL_LINK:** +``` +ENTER: (nothing — rendered as plain text) +EXIT: (~\ref{LABEL}) +``` + +Where `LABEL` is the URL with the leading `#` stripped. + +**NO_LINK:** +No output. + +#### Image +``` +ENTER: \protect\includegraphics{URL} +``` + +Image children (alt text) are skipped. If `CMARK_OPT_UNSAFE` is not set and the URL matches `_scan_dangerous_url()`, the URL is omitted. + +#### HTML Inline +``` +% raw HTML omitted +``` + +Always omitted, regardless of `CMARK_OPT_UNSAFE`. + +## Source Position Comments + +When `CMARK_OPT_SOURCEPOS` is set, the renderer adds LaTeX comments before block elements: + +```c +snprintf(buffer, BUFFER_SIZE, "%% %d:%d-%d:%d\n", + cmark_node_get_start_line(node), cmark_node_get_start_column(node), + cmark_node_get_end_line(node), cmark_node_get_end_column(node)); +``` + +## Example Output + +Markdown input: +```markdown +# Hello World + +A paragraph with *emphasis* and **bold**. + +- Item 1 +- Item 2 +``` + +LaTeX output: +```latex +\section{Hello World} + +A paragraph with \emph{emphasis} and \textbf{bold}. + +\begin{itemize} +\item{}Item 1 + +\item{}Item 2 + +\end{itemize} +``` + +## Cross-References + +- [latex.c](../../cmark/src/latex.c) — Full implementation +- [render-framework.md](render-framework.md) — Generic render framework (`cmark_render()`, `cmark_renderer`) +- [public-api.md](public-api.md) — `cmark_render_latex()` API docs +- [html-renderer.md](html-renderer.md) — Contrast with direct buffer renderer |
