summaryrefslogtreecommitdiff
path: root/docs/handbook/cmark/code-style.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/handbook/cmark/code-style.md')
-rw-r--r--docs/handbook/cmark/code-style.md293
1 files changed, 293 insertions, 0 deletions
diff --git a/docs/handbook/cmark/code-style.md b/docs/handbook/cmark/code-style.md
new file mode 100644
index 0000000000..0ac2af2def
--- /dev/null
+++ b/docs/handbook/cmark/code-style.md
@@ -0,0 +1,293 @@
+# cmark — Code Style and Conventions
+
+## Overview
+
+This document describes the coding conventions and patterns used throughout the cmark codebase. Understanding these conventions makes the source code easier to navigate.
+
+## Naming Conventions
+
+### Public API Functions
+
+All public functions use the `cmark_` prefix:
+```c
+cmark_node *cmark_node_new(cmark_node_type type);
+cmark_parser *cmark_parser_new(int options);
+char *cmark_render_html(cmark_node *root, int options);
+```
+
+### Internal (Static) Functions
+
+File-local static functions use the `S_` prefix:
+```c
+static void S_render_node(cmark_node *node, cmark_event_type ev_type,
+ struct render_state *state, int options);
+static cmark_node *S_node_new(cmark_node_type type, cmark_mem *mem);
+static void S_free_nodes(cmark_node *e);
+static bool S_is_leaf(cmark_node *node);
+static int S_get_enumlevel(cmark_node *node);
+```
+
+This convention makes it immediately clear whether a function has file-local scope.
+
+### Internal (Non-Static) Functions
+
+Functions that are internal to the library but shared across translation units use:
+- `cmark_` prefix (same as public) — declared in private headers (e.g., `parser.h`, `node.h`)
+- No `S_` prefix
+
+Examples:
+```c
+// In node.h (private header):
+void cmark_node_set_type(cmark_node *node, cmark_node_type type);
+cmark_node *make_block(cmark_mem *mem, cmark_node_type type,
+ int start_line, int start_column);
+```
+
+### Struct Members
+
+No prefix convention — struct members use plain names:
+```c
+struct cmark_node {
+ cmark_mem *mem;
+ cmark_node *next;
+ cmark_node *prev;
+ cmark_node *parent;
+ cmark_node *first_child;
+ cmark_node *last_child;
+ // ...
+};
+```
+
+### Type Names
+
+Typedefs use the `cmark_` prefix:
+```c
+typedef struct cmark_node cmark_node;
+typedef struct cmark_parser cmark_parser;
+typedef struct cmark_iter cmark_iter;
+typedef int32_t bufsize_t; // Exception: no cmark_ prefix
+```
+
+### Enum Values
+
+Enum constants use the `CMARK_` prefix with UPPER_CASE:
+```c
+typedef enum {
+ CMARK_NODE_NONE,
+ CMARK_NODE_DOCUMENT,
+ CMARK_NODE_BLOCK_QUOTE,
+ // ...
+} cmark_node_type;
+```
+
+### Preprocessor Macros
+
+Macros use UPPER_CASE, sometimes with `CMARK_` prefix:
+```c
+#define CMARK_OPT_SOURCEPOS (1 << 1)
+#define CMARK_BUF_INIT(mem) { mem, cmark_strbuf__initbuf, 0, 0 }
+#define MAX_LINK_LABEL_LENGTH 999
+#define CODE_INDENT 4
+```
+
+## Error Handling Patterns
+
+### Allocation Failure
+
+The default allocator (`xcalloc`, `xrealloc`) aborts on failure:
+```c
+static void *xcalloc(size_t nmemb, size_t size) {
+ void *ptr = calloc(nmemb, size);
+ if (!ptr) abort();
+ return ptr;
+}
+```
+
+Functions that allocate never return NULL — they either succeed or terminate. This eliminates NULL-check boilerplate throughout the codebase.
+
+### Invalid Input
+
+Functions that receive invalid arguments typically:
+1. Return 0/false/NULL for queries
+2. Do nothing for mutations
+3. Never crash
+
+Example from `node.c`:
+```c
+int cmark_node_set_heading_level(cmark_node *node, int level) {
+ if (node == NULL || node->type != CMARK_NODE_HEADING) return 0;
+ if (level < 1 || level > 6) return 0;
+ node->as.heading.level = level;
+ return 1;
+}
+```
+
+### Return Conventions
+
+- **0/1 for success/failure**: Setter functions return 1 on success, 0 on failure
+- **NULL for not found**: Lookup functions return NULL when the item doesn't exist
+- **Assertion for invariants**: Internal invariants use `assert()`:
+ ```c
+ assert(googled_node->type == CMARK_NODE_DOCUMENT);
+ ```
+
+## Header Guard Style
+
+```c
+#ifndef CMARK_NODE_H
+#define CMARK_NODE_H
+// ...
+#endif
+```
+
+Guards use `CMARK_` prefix + uppercase filename + `_H`.
+
+## Include Patterns
+
+### Public Headers
+```c
+#include "cmark.h" // Always first — provides all public types
+```
+
+### Private Headers
+```c
+#include "node.h" // Internal node definitions
+#include "parser.h" // Parser internals
+#include "buffer.h" // cmark_strbuf
+#include "chunk.h" // cmark_chunk
+#include "references.h" // Reference map
+#include "utf8.h" // UTF-8 utilities
+#include "scanners.h" // re2c-generated scanners
+```
+
+### System Headers
+```c
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+#include <stdio.h>
+```
+
+## Inline Functions
+
+The `CMARK_INLINE` macro abstracts compiler-specific inline syntax:
+```c
+#ifdef _MSC_VER
+#define CMARK_INLINE __forceinline
+#else
+#define CMARK_INLINE __inline__
+#endif
+```
+
+Used for small, hot-path functions in headers:
+```c
+static CMARK_INLINE void cmark_chunk_free(cmark_mem *mem, cmark_chunk *c) { ... }
+static CMARK_INLINE cmark_chunk cmark_chunk_dup(...) { ... }
+```
+
+## Memory Ownership Patterns
+
+### Owning vs Non-Owning
+
+The `cmark_chunk` type makes ownership explicit:
+- `alloc > 0` → the chunk owns the memory and must free it
+- `alloc == 0` → the chunk borrows memory from elsewhere
+
+### Transfer of Ownership
+
+`cmark_strbuf_detach()` transfers ownership from a strbuf to the caller:
+```c
+unsigned char *data = cmark_strbuf_detach(&buf);
+// Caller now owns 'data' and must free it
+```
+
+### Consistent Cleanup
+
+Free functions null out pointers after freeing:
+```c
+static CMARK_INLINE void cmark_chunk_free(cmark_mem *mem, cmark_chunk *c) {
+ if (c->alloc)
+ mem->free((void *)c->data);
+ c->data = NULL; // NULL after free
+ c->alloc = 0;
+ c->len = 0;
+}
+```
+
+## Iterative vs Recursive Patterns
+
+The codebase avoids recursion for tree operations to prevent stack overflow on deeply nested input:
+
+### Iterative Tree Destruction
+`S_free_nodes()` uses sibling-list splicing instead of recursion:
+```c
+// Splice children into sibling chain
+if (e->first_child) {
+ cmark_node *last = e->last_child;
+ last->next = e->next;
+ e->next = e->first_child;
+}
+```
+
+### Iterator-Based Traversal
+All rendering uses `cmark_iter` instead of recursive `render_children()`:
+```c
+while ((ev_type = cmark_iter_next(iter)) != CMARK_EVENT_DONE) {
+ cur = cmark_iter_get_node(iter);
+ S_render_node(cur, ev_type, &state, options);
+}
+```
+
+## Type Size Definitions
+
+```c
+typedef int32_t bufsize_t;
+```
+
+Buffer sizes use `int32_t` (not `size_t`) to:
+1. Allow negative values for error signaling
+2. Keep node structs compact (32-bit vs 64-bit on LP64)
+3. Limit maximum allocation to 2GB (adequate for text processing)
+
+## Bitmask Patterns
+
+Option flags use single-bit constants:
+```c
+#define CMARK_OPT_SOURCEPOS (1 << 1)
+#define CMARK_OPT_HARDBREAKS (1 << 2)
+#define CMARK_OPT_UNSAFE (1 << 17)
+#define CMARK_OPT_NOBREAKS (1 << 4)
+#define CMARK_OPT_VALIDATE_UTF8 (1 << 9)
+#define CMARK_OPT_SMART (1 << 10)
+```
+
+Tested with bitwise AND:
+```c
+if (options & CMARK_OPT_SOURCEPOS) { ... }
+```
+
+Combined with bitwise OR:
+```c
+int options = CMARK_OPT_SOURCEPOS | CMARK_OPT_SMART;
+```
+
+## Leaf Mask Pattern
+
+`S_is_leaf()` in `iterator.c` uses a bitmask for O(1) node-type classification:
+```c
+static const int S_leaf_mask =
+ (1 << CMARK_NODE_HTML_BLOCK) | (1 << CMARK_NODE_THEMATIC_BREAK) |
+ (1 << CMARK_NODE_CODE_BLOCK) | (1 << CMARK_NODE_TEXT) | ...;
+
+static bool S_is_leaf(cmark_node *node) {
+ return ((1 << node->type) & S_leaf_mask) != 0;
+}
+```
+
+This is more efficient than a switch statement for a simple boolean classification.
+
+## Cross-References
+
+- [architecture.md](architecture.md) — Design decisions
+- [memory-management.md](memory-management.md) — Allocator patterns
+- [public-api.md](public-api.md) — Public API naming