1 files changed, 317 insertions, 0 deletions
diff --git a/docs/handbook/cmark/inline-parsing.md b/docs/handbook/cmark/inline-parsing.md
new file mode 100644
index 0000000000..4485017305
--- /dev/null
+++ b/docs/handbook/cmark/inline-parsing.md
@@ -0,0 +1,317 @@
+# cmark — Inline Parsing
+
+## Overview
+
+Inline parsing is Phase 2 of cmark's pipeline. Implemented in `inlines.c`, it processes the text content of paragraph and heading nodes, recognizing emphasis (`*`, `_`), code spans (`` ` ``), links (`[text](url)`), images (`![alt](url)`), autolinks (`<url>`), raw HTML inline, hard line breaks, soft line breaks, and smart punctuation.
+
+The entry point is `cmark_parse_inlines()`, called from `process_inlines()` in `blocks.c` after all block structure has been finalized.
+
+## The `subject` Struct
+
+All inline parsing state is tracked in the `subject` struct:
+
+```c
+typedef struct {
+  cmark_mem *mem;                          // Memory allocator
+  cmark_chunk input;                       // The text being parsed
+  unsigned flags;                          // Skip flags for HTML constructs
+  int line;                                // Source line number
+  bufsize_t pos;                           // Current byte position in input
+  int block_offset;                        // Column offset of the containing block
+  int column_offset;                       // Adjustment for multi-line source position tracking
+  cmark_reference_map *refmap;             // Link reference definitions
+  delimiter *last_delim;                   // Top of delimiter stack (linked list, newest first)
+  bracket *last_bracket;                   // Top of bracket stack (linked list, newest first)
+  bufsize_t backticks[MAXBACKTICKS + 1];   // Cached positions of backtick sequences
+  bool scanned_for_backticks;              // Whether the full input has been scanned for backticks
+  bool no_link_openers;                    // Optimization: set when no link openers remain
+} subject;
+```
+
+`MAXBACKTICKS` is defined as 1000. The `backticks` array caches the positions of backtick sequences of each length, enabling O(1) lookup once the input has been fully scanned.
+
+### Skip Flags
+
+The `flags` field uses bit flags to track which HTML constructs have been confirmed absent:
+
+```c
+#define FLAG_SKIP_HTML_CDATA        (1u << 0)
+#define FLAG_SKIP_HTML_DECLARATION  (1u << 1)
+#define FLAG_SKIP_HTML_PI           (1u << 2)
+#define FLAG_SKIP_HTML_COMMENT      (1u << 3)
+```
+
+Once a scan for a particular HTML construct fails, the flag is set to avoid rescanning.
+
+## The Delimiter Stack
+
+Emphasis and smart punctuation use a delimiter stack. Each entry is:
+
+```c
+typedef struct delimiter {
+  struct delimiter *previous;    // Link to older delimiter
+  struct delimiter *next;        // Link to newer delimiter (towards top)
+  cmark_node *inl_text;          // The text node created for this delimiter run
+  bufsize_t position;            // Position in the input
+  bufsize_t length;              // Number of delimiter characters remaining
+  unsigned char delim_char;      // '*', '_', '\'', or '"'
+  bool can_open;                 // Whether this run can open emphasis
+  bool can_close;                // Whether this run can close emphasis
+} delimiter;
+```
+
+The stack is a doubly-linked list with `last_delim` pointing to the newest entry.
+
+## The Bracket Stack
+
+Links and images use a separate bracket stack:
+
+```c
+typedef struct bracket {
+  struct bracket *previous;      // Link to older bracket
+  cmark_node *inl_text;          // The text node for '[' or '!['
+  bufsize_t position;            // Position in the input
+  bool image;                    // Whether this is an image opener '!['
+  bool active;                   // Can still match (set to false when deactivated)
+  bool bracket_after;            // Whether a '[' appeared after this bracket
+} bracket;
+```
+
+Brackets are deactivated (set `active = false`) when:
+- A matching `]` fails to produce a valid link (the opener is deactivated to prevent infinite loops)
+- An inner link is formed (outer brackets are deactivated per spec)
+
+## Emphasis Flanking Rules: `scan_delims()`
+
+```c
+static int scan_delims(subject *subj, unsigned char c, bool *can_open,
+                       bool *can_close);
+```
+
+This function determines whether a run of `*`, `_`, `'`, or `"` characters can open and/or close emphasis, following the CommonMark spec's Unicode-aware flanking rules:
+
+1. The function looks at the character **before** the run and the character **after** the run.
+2. It uses `cmark_utf8proc_iterate()` to decode the surrounding characters as full Unicode code points.
+3. It classifies them using `cmark_utf8proc_is_space()` and `cmark_utf8proc_is_punctuation_or_symbol()`.
+
+The flanking rules:
+- **Left-flanking**: numdelims > 0, character after is not a space, AND (character after is not punctuation OR character before is a space or punctuation)
+- **Right-flanking**: numdelims > 0, character before is not a space, AND (character before is not punctuation OR character after is a space or punctuation)
+
+For `*`: `can_open = left_flanking`, `can_close = right_flanking`
+
+For `_`:
+```c
+*can_open = left_flanking &&
+            (!right_flanking || cmark_utf8proc_is_punctuation_or_symbol(before_char));
+*can_close = right_flanking &&
+             (!left_flanking || cmark_utf8proc_is_punctuation_or_symbol(after_char));
+```
+
+For `'` and `"` (smart punctuation):
+```c
+*can_open = left_flanking &&
+     (!right_flanking || before_char == '(' || before_char == '[') &&
+     before_char != ']' && before_char != ')';
+*can_close = right_flanking;
+```
+
+The function advances `subj->pos` past the delimiter run and returns the number of delimiter characters consumed. For quotes, only 1 delimiter is consumed regardless of how many appear.
+
+## Emphasis Resolution: `S_insert_emph()`
+
+```c
+static delimiter *S_insert_emph(subject *subj, delimiter *opener,
+                                delimiter *closer);
+```
+
+When a closing delimiter is found that matches an opener on the stack, this function creates emphasis nodes:
+
+1. If the opener and closer have combined length >= 2 AND both have individual length >= 2, create a `CMARK_NODE_STRONG` node (consuming 2 characters from each).
+2. Otherwise, create a `CMARK_NODE_EMPH` node (consuming 1 character from each).
+3. All inline nodes between the opener and closer are moved to become children of the new emphasis node.
+4. Any delimiters between the opener and closer are removed from the stack.
+5. If the opener is exhausted (`length == 0`), it's removed from the stack.
+6. If the closer is exhausted, it's removed too; otherwise, processing continues.
+
+## Code Span Parsing: `handle_backticks()`
+
+```c
+static cmark_node *handle_backticks(subject *subj, int options);
+```
+
+When a backtick is encountered:
+
+1. `take_while(subj, isbacktick)` consumes the opening backtick run and records its length.
+2. `scan_to_closing_backticks()` searches forward for a matching backtick run of the same length.
+
+The scanning function uses the `subj->backticks[]` array to cache positions of backtick sequences. If `subj->scanned_for_backticks` is true and the cached position for the needed length is behind the current position, it immediately returns 0 (no match).
+
+If no closing backticks are found, the opening run is emitted as literal text. If found, the content between is extracted, normalized via `S_normalize_code()`:
+
+```c
+static void S_normalize_code(cmark_strbuf *s) {
+  // 1. Convert \r\n and \r to spaces
+  // 2. Convert \n to spaces
+  // 3. If content begins and ends with a space and contains non-space chars,
+  //    strip one leading and one trailing space
+}
+```
+
+## Link Parsing
+
+When `]` is encountered after an opener on the bracket stack:
+
+### Inline Links: `[text](url "title")`
+
+The parser looks for `(` immediately after `]`, then:
+1. Skips optional whitespace
+2. Tries to parse a link destination (URL)
+3. Skips optional whitespace
+4. Optionally parses a link title (in single quotes, double quotes, or parentheses)
+5. Expects `)`
+
+### Reference Links: `[text][ref]` or `[text][]` or `[text]`
+
+If the inline link syntax doesn't match, the parser tries:
+1. `[text][ref]` — explicit reference
+2. `[text][]` — collapsed reference (label = text)
+3. `[text]` — shortcut reference (label = text)
+
+Reference lookup uses `cmark_reference_lookup()` against the parser's `refmap`.
+
+### URL Cleaning
+
+```c
+unsigned char *cmark_clean_url(cmark_mem *mem, cmark_chunk *url);
+```
+
+Trims the URL, unescapes HTML entities, and handles angle-bracket-delimited URLs.
+
+### Autolinks
+
+```c
+static inline cmark_node *make_autolink(subject *subj, int start_column,
+                                        int end_column, cmark_chunk url,
+                                        int is_email);
+```
+
+Autolinks (`<http://example.com>` or `<user@example.com>`) are detected via the `scan_autolink_uri()` and `scan_autolink_email()` scanner functions. Email autolinks have `mailto:` prepended to the URL automatically:
+
+```c
+static unsigned char *cmark_clean_autolink(cmark_mem *mem, cmark_chunk *url,
+                                           int is_email) {
+  cmark_strbuf buf = CMARK_BUF_INIT(mem);
+  cmark_chunk_trim(url);
+  if (is_email)
+    cmark_strbuf_puts(&buf, "mailto:");
+  houdini_unescape_html_f(&buf, url->data, url->len);
+  return cmark_strbuf_detach(&buf);
+}
+```
+
+## Smart Punctuation
+
+When `CMARK_OPT_SMART` is enabled, the inline parser transforms:
+
+```c
+static const char *EMDASH = "\xE2\x80\x94";           // —
+static const char *ENDASH = "\xE2\x80\x93";           // –
+static const char *ELLIPSES = "\xE2\x80\xA6";         // …
+static const char *LEFTDOUBLEQUOTE = "\xE2\x80\x9C";  // "
+static const char *RIGHTDOUBLEQUOTE = "\xE2\x80\x9D"; // "
+static const char *LEFTSINGLEQUOTE = "\xE2\x80\x98";  // '
+static const char *RIGHTSINGLEQUOTE = "\xE2\x80\x99";  // '
+```
+
+- `---` becomes em dash (—)
+- `--` becomes en dash (–)
+- `...` becomes ellipsis (…)
+- `'` and `"` are converted to curly quotes using the delimiter stack (open/close logic)
+
+## Hard and Soft Line Breaks
+
+- **Hard line break**: Two or more spaces before a line ending, or a backslash before a line ending. Creates a `CMARK_NODE_LINEBREAK` node.
+- **Soft line break**: A line ending not preceded by spaces or backslash. Creates a `CMARK_NODE_SOFTBREAK` node.
+
+## Special Character Dispatch
+
+```c
+static bufsize_t subject_find_special_char(subject *subj, int options);
+```
+
+This function scans forward from `subj->pos` looking for the next special character that needs inline processing. Special characters include:
+- Line endings (`\r`, `\n`)
+- Backtick (`` ` ``)
+- Backslash (`\`)
+- Ampersand (`&`)
+- Less-than (`<`)
+- Open bracket (`[`)
+- Close bracket (`]`)
+- Exclamation mark (`!`)
+- Emphasis characters (`*`, `_`)
+
+Any text between special characters is collected as a `CMARK_NODE_TEXT` node.
+
+## Source Position Tracking
+
+```c
+static void adjust_subj_node_newlines(subject *subj, cmark_node *node,
+                                     int matchlen, int extra, int options);
+```
+
+When `CMARK_OPT_SOURCEPOS` is enabled, this function adjusts source positions for multi-line inline constructs. It counts newlines in the just-matched span and updates:
+- `subj->line` — incremented by the number of newlines
+- `node->end_line` — adjusted for multi-line spans
+- `node->end_column` — set to characters after the last newline
+- `subj->column_offset` — adjusted for correct subsequent position calculations
+
+## Inline Node Factory Functions
+
+The inline parser uses efficient factory functions:
+
+```c
+// Macros for simple nodes
+#define make_linebreak(mem) make_simple(mem, CMARK_NODE_LINEBREAK)
+#define make_softbreak(mem) make_simple(mem, CMARK_NODE_SOFTBREAK)
+#define make_emph(mem) make_simple(mem, CMARK_NODE_EMPH)
+#define make_strong(mem) make_simple(mem, CMARK_NODE_STRONG)
+```
+
+```c
+// Fast child appending (bypasses S_can_contain validation)
+static void append_child(cmark_node *node, cmark_node *child) {
+  cmark_node *old_last_child = node->last_child;
+  child->next = NULL;
+  child->prev = old_last_child;
+  child->parent = node;
+  node->last_child = child;
+  if (old_last_child) {
+    old_last_child->next = child;
+  } else {
+    node->first_child = child;
+  }
+}
+```
+
+This `append_child()` is a simplified version of the public `cmark_node_append_child()`, skipping containership validation since the inline parser always produces valid structures.
+
+## The Main Parse Loop
+
+```c
+void cmark_parse_inlines(cmark_mem *mem, cmark_node *parent,
+                         cmark_reference_map *refmap, int options);
+```
+
+This function initializes a `subject` from the parent node's `data` field, then repeatedly calls `parse_inline()` until the input is exhausted. Each call to `parse_inline()` finds the next special character, emits any preceding text as a `CMARK_NODE_TEXT`, and dispatches to the appropriate handler.
+
+After all characters are processed, the delimiter stack is processed to resolve any remaining emphasis, and then cleaned up.
+
+## Cross-References
+
+- [inlines.c](../../cmark/src/inlines.c) — Full implementation
+- [inlines.h](../../cmark/src/inlines.h) — Internal API declarations
+- [block-parsing.md](block-parsing.md) — Phase 1 that produces the input for inline parsing
+- [reference-system.md](reference-system.md) — How link references are stored and looked up
+- [scanner-system.md](scanner-system.md) — Scanner functions for HTML tags, autolinks, etc.
+- [utf8-handling.md](utf8-handling.md) — Unicode character classification for flanking rules