summaryrefslogtreecommitdiff
path: root/docs/handbook/cmark/latex-renderer.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/handbook/cmark/latex-renderer.md')
-rw-r--r--docs/handbook/cmark/latex-renderer.md320
1 files changed, 320 insertions, 0 deletions
diff --git a/docs/handbook/cmark/latex-renderer.md b/docs/handbook/cmark/latex-renderer.md
new file mode 100644
index 0000000000..d7a492d580
--- /dev/null
+++ b/docs/handbook/cmark/latex-renderer.md
@@ -0,0 +1,320 @@
+# cmark — LaTeX Renderer
+
+## Overview
+
+The LaTeX renderer (`latex.c`) converts a `cmark_node` AST into LaTeX source, suitable for compilation with `pdflatex`, `xelatex`, or `lualatex`. It uses the generic render framework from `render.c`, operating through a per-character output callback (`outc`) and a per-node render callback (`S_render_node`).
+
+## Entry Point
+
+```c
+char *cmark_render_latex(cmark_node *root, int options, int width);
+```
+
+- `root` — AST root node
+- `options` — Option flags (`CMARK_OPT_SOURCEPOS`, `CMARK_OPT_HARDBREAKS`, `CMARK_OPT_NOBREAKS`, `CMARK_OPT_UNSAFE`)
+- `width` — Target line width for hard-wrapping; 0 disables wrapping
+
+## Character Escaping (`outc`)
+
+The `outc` function handles per-character output decisions. It is the most complex part of the LaTeX renderer, with different behavior for three escaping modes:
+
+```c
+static void outc(cmark_renderer *renderer, cmark_escaping escape,
+ int32_t c, unsigned char nextc);
+```
+
+### LITERAL Mode
+Pass-through: all characters are output unchanged.
+
+### NORMAL Mode
+Extensive special-character handling:
+
+| Character | LaTeX Output | Purpose |
+|-----------|-------------|---------|
+| `$` | `\$` | Math mode delimiter |
+| `%` | `\%` | Comment character |
+| `&` | `\&` | Table column separator |
+| `_` | `\_` | Subscript operator |
+| `#` | `\#` | Parameter reference |
+| `^` | `\^{}` | Superscript operator |
+| `{` | `\{` | Group open |
+| `}` | `\}` | Group close |
+| `~` | `\textasciitilde{}` | Non-breaking space |
+| `[` | `{[}` | Optional argument bracket |
+| `]` | `{]}` | Optional argument bracket |
+| `\` | `\textbackslash{}` | Escape character |
+| `|` | `\textbar{}` | Pipe |
+| `'` | `\textquotesingle{}` | Straight single quote |
+| `"` | `\textquotedbl{}` | Straight double quote |
+| `` ` `` | `\textasciigrave{}` | Backtick |
+| `\xA0` (NBSP) | `~` | LaTeX non-breaking space |
+| `\x2014` (—) | `---` | Em dash |
+| `\x2013` (–) | `--` | En dash |
+| `\x2018` (') | `` ` `` | Left single quote |
+| `\x2019` (') | `'` | Right single quote |
+| `\x201C` (") | ` `` ` | Left double quote |
+| `\x201D` (") | `''` | Right double quote |
+
+### URL Mode
+Only these characters are escaped:
+- `$` → `\$`
+- `%` → `\%`
+- `&` → `\&`
+- `_` → `\_`
+- `#` → `\#`
+- `{` → `\{`
+- `}` → `\}`
+
+All other characters pass through unchanged.
+
+## Link Type Classification
+
+The renderer classifies links into five categories:
+
+```c
+typedef enum {
+ NO_LINK,
+ URL_AUTOLINK,
+ EMAIL_AUTOLINK,
+ NORMAL_LINK,
+ INTERNAL_LINK,
+} link_type;
+```
+
+### `get_link_type()`
+
+```c
+static link_type get_link_type(cmark_node *node) {
+ // 1. "mailto:" links where text matches url
+ // 2. "http[s]:" links where text matches url (with or without protocol)
+ // 3. Links starting with '#' → INTERNAL_LINK
+ // 4. Everything else → NORMAL_LINK
+}
+```
+
+Detection logic:
+1. **URL_AUTOLINK**: The `url` starts with `http://` or `https://`, the link has exactly one text child, and that child's content matches the URL (or matches the URL minus the protocol prefix).
+2. **EMAIL_AUTOLINK**: The `url` starts with `mailto:`, the link has exactly one text child, and that child's content matches the URL after `mailto:`.
+3. **INTERNAL_LINK**: The `url` starts with `#`.
+4. **NORMAL_LINK**: Everything else.
+
+## Enumeration Level
+
+For nested ordered lists, the renderer selects the appropriate LaTeX counter style:
+
+```c
+static int S_get_enumlevel(cmark_node *node) {
+ int enumlevel = 0;
+ cmark_node *tmp = node;
+ while (tmp) {
+ if (tmp->type == CMARK_NODE_LIST &&
+ cmark_node_get_list_type(tmp) == CMARK_ORDERED_LIST) {
+ enumlevel++;
+ }
+ tmp = tmp->parent;
+ }
+ return enumlevel;
+}
+```
+
+This walks up the tree, counting ordered list ancestors. LaTeX ordered lists cycle through: `enumi` (arabic), `enumii` (alpha), `enumiii` (roman), `enumiv` (Alpha).
+
+## Node Rendering (`S_render_node`)
+
+### Block Nodes
+
+#### Document
+No output.
+
+#### Block Quote
+```
+ENTER: \begin{quote}\n
+EXIT: \end{quote}\n
+```
+
+#### List
+```
+ENTER (bullet): \begin{itemize}\n
+ENTER (ordered): \begin{enumerate}\n
+ \def\labelenumI{COUNTER}\n (if start != 1)
+ \setcounter{enumI}{START-1}\n
+EXIT: \end{itemize}\n or \end{enumerate}\n
+```
+
+The counter is formatted based on enumeration level:
+- Level 1: `\arabic{enumi}.`
+- Level 2: `\alph{enumii}.` (surrounded by `(`)
+- Level 3: `\roman{enumiii}.`
+- Level 4: `\Alph{enumiv}.`
+
+Period delimiters use `.`, parenthesis delimiters use `)`.
+
+#### Item
+```
+ENTER: \item{} (empty braces prevent ligatures with following content)
+EXIT: \n
+```
+
+#### Heading
+```
+ENTER: \section{ or \subsection{ or \subsubsection{ or \paragraph{ or \subparagraph{
+EXIT: }\n
+```
+
+Mapping: level 1 → `\section`, level 2 → `\subsection`, level 3 → `\subsubsection`, level 4 → `\paragraph`, level 5 → `\subparagraph`.
+
+#### Code Block
+```latex
+\begin{verbatim}
+LITERAL CONTENT
+\end{verbatim}
+```
+
+The content is output in `LITERAL` escape mode (no character escaping). Info strings are ignored.
+
+#### HTML Block
+```
+ENTER: % raw HTML omitted\n (as a LaTeX comment)
+```
+
+Raw HTML is always omitted in LaTeX output, regardless of `CMARK_OPT_UNSAFE`.
+
+#### Thematic Break
+```
+\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}\n
+```
+
+#### Paragraph
+Same tight-list check as the HTML renderer:
+```c
+parent = cmark_node_parent(node);
+grandparent = cmark_node_parent(parent);
+tight = (grandparent && grandparent->type == CMARK_NODE_LIST) ?
+ grandparent->as.list.tight : false;
+```
+- Normal: newline before and after
+- Tight: no leading/trailing blank lines
+
+### Inline Nodes
+
+#### Text
+Output with NORMAL escaping.
+
+#### Soft Break
+Depends on options:
+- `CMARK_OPT_HARDBREAKS`: `\\\\\n`
+- `CMARK_OPT_NOBREAKS`: space
+- Default: newline
+
+#### Line Break
+```
+\\\\\n
+```
+
+#### Code (inline)
+```
+\texttt{ESCAPED CONTENT}
+```
+
+Special handling: Code content is output character-by-character with inline-code escaping. Special characters (`\`, `{`, `}`, `$`, `%`, `&`, `_`, `#`, `^`, `~`) are escaped.
+
+#### Emphasis
+```
+ENTER: \emph{
+EXIT: }
+```
+
+#### Strong
+```
+ENTER: \textbf{
+EXIT: }
+```
+
+#### Link
+Rendering depends on link type:
+
+**NORMAL_LINK:**
+```
+ENTER: \href{URL}{
+EXIT: }
+```
+
+**URL_AUTOLINK:**
+```
+ENTER: \url{URL}
+(children are skipped — no EXIT rendering needed)
+```
+
+**EMAIL_AUTOLINK:**
+```
+ENTER: \href{URL}{\nolinkurl{
+EXIT: }}
+```
+
+**INTERNAL_LINK:**
+```
+ENTER: (nothing — rendered as plain text)
+EXIT: (~\ref{LABEL})
+```
+
+Where `LABEL` is the URL with the leading `#` stripped.
+
+**NO_LINK:**
+No output.
+
+#### Image
+```
+ENTER: \protect\includegraphics{URL}
+```
+
+Image children (alt text) are skipped. If `CMARK_OPT_UNSAFE` is not set and the URL matches `_scan_dangerous_url()`, the URL is omitted.
+
+#### HTML Inline
+```
+% raw HTML omitted
+```
+
+Always omitted, regardless of `CMARK_OPT_UNSAFE`.
+
+## Source Position Comments
+
+When `CMARK_OPT_SOURCEPOS` is set, the renderer adds LaTeX comments before block elements:
+
+```c
+snprintf(buffer, BUFFER_SIZE, "%% %d:%d-%d:%d\n",
+ cmark_node_get_start_line(node), cmark_node_get_start_column(node),
+ cmark_node_get_end_line(node), cmark_node_get_end_column(node));
+```
+
+## Example Output
+
+Markdown input:
+```markdown
+# Hello World
+
+A paragraph with *emphasis* and **bold**.
+
+- Item 1
+- Item 2
+```
+
+LaTeX output:
+```latex
+\section{Hello World}
+
+A paragraph with \emph{emphasis} and \textbf{bold}.
+
+\begin{itemize}
+\item{}Item 1
+
+\item{}Item 2
+
+\end{itemize}
+```
+
+## Cross-References
+
+- [latex.c](../../cmark/src/latex.c) — Full implementation
+- [render-framework.md](render-framework.md) — Generic render framework (`cmark_render()`, `cmark_renderer`)
+- [public-api.md](public-api.md) — `cmark_render_latex()` API docs
+- [html-renderer.md](html-renderer.md) — Contrast with direct buffer renderer