1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
|
# cmark — Memory Management
## Overview
cmark's memory management is built around three concepts:
1. **Pluggable allocator** (`cmark_mem`) — a function-pointer table for calloc/realloc/free
2. **Owning buffer** (`cmark_strbuf`) — a growable byte buffer that owns its memory
3. **Non-owning slice** (`cmark_chunk`) — a view into either a `cmark_strbuf` or external memory
## Pluggable Allocator
### `cmark_mem` Structure
```c
typedef struct cmark_mem {
void *(*calloc)(size_t, size_t);
void *(*realloc)(void *, size_t);
void (*free)(void *);
} cmark_mem;
```
All allocation throughout cmark respects this interface. Every node, buffer, parser, and iterator receives a `cmark_mem *` and uses it for all allocations.
### Default Allocator
```c
static void *xcalloc(size_t nmemb, size_t size) {
void *ptr = calloc(nmemb, size);
if (!ptr) abort();
return ptr;
}
static void *xrealloc(void *ptr, size_t size) {
void *new_ptr = realloc(ptr, size);
if (!new_ptr) abort();
return new_ptr;
}
cmark_mem DEFAULT_MEM_ALLOCATOR = {xcalloc, xrealloc, free};
```
The default allocator wraps standard `calloc`/`realloc`/`free`, adding `abort()` on allocation failure. This means cmark never returns NULL from allocations — it terminates on out-of-memory.
### Getting the Default Allocator
```c
cmark_mem *cmark_get_default_mem_allocator(void) {
return &DEFAULT_MEM_ALLOCATOR;
}
```
### Custom Allocator Usage
Users can provide custom allocators (arena allocators, debug allocators, etc.) via:
```c
cmark_parser *cmark_parser_new_with_mem(int options, cmark_mem *mem);
cmark_node *cmark_node_new_with_mem(cmark_node_type type, cmark_mem *mem);
```
The allocator propagates: nodes created by the parser inherit the parser's allocator. Iterators use the root node's allocator.
## Growable Buffer (`cmark_strbuf`)
### Structure
```c
struct cmark_strbuf {
cmark_mem *mem;
unsigned char *ptr;
bufsize_t asize; // allocated size
bufsize_t size; // used size (excluding NUL terminator)
};
```
### Initialization
```c
#define CMARK_BUF_INIT(mem) { mem, cmark_strbuf__initbuf, 0, 0 }
```
`cmark_strbuf__initbuf` is a static empty buffer that avoids allocating for empty strings:
```c
unsigned char cmark_strbuf__initbuf[1] = {0};
```
This means: uninitialized/empty buffers point to a shared static empty string rather than NULL. This eliminates NULL checks throughout the code.
### Growth Strategy
```c
void cmark_strbuf_grow(cmark_strbuf *buf, bufsize_t target_size) {
// Minimum allocation of 8 bytes
bufsize_t new_size = 8;
// Double until >= target (or use 2x current if growing existing)
if (buf->asize) {
new_size = buf->asize;
}
while (new_size < target_size) {
new_size *= 2;
}
// Allocate
if (buf->ptr == cmark_strbuf__initbuf) {
buf->ptr = (unsigned char *)buf->mem->calloc(new_size, 1);
} else {
buf->ptr = (unsigned char *)buf->mem->realloc(buf->ptr, new_size);
}
buf->asize = new_size;
}
```
The growth strategy doubles the capacity each time, ensuring amortized O(1) appends. Minimum capacity is 8 bytes.
When the buffer transitions from the shared static init to a real allocation, `calloc` is used (zero-initialized). Subsequent growths use `realloc`.
### Key Operations
```c
// Appending
void cmark_strbuf_put(cmark_strbuf *buf, const unsigned char *data, bufsize_t len);
void cmark_strbuf_puts(cmark_strbuf *buf, const char *string);
void cmark_strbuf_putc(cmark_strbuf *buf, int c);
// Printf-style
void cmark_strbuf_printf(cmark_strbuf *buf, const char *fmt, ...);
void cmark_strbuf_vprintf(cmark_strbuf *buf, const char *fmt, va_list ap);
// Manipulation
void cmark_strbuf_clear(cmark_strbuf *buf); // Reset size to 0, keep allocation
void cmark_strbuf_set(cmark_strbuf *buf, const unsigned char *data, bufsize_t len);
void cmark_strbuf_sets(cmark_strbuf *buf, const char *string);
void cmark_strbuf_copy_cstr(char *data, bufsize_t datasize, const cmark_strbuf *buf);
void cmark_strbuf_swap(cmark_strbuf *a, cmark_strbuf *b);
// Whitespace
void cmark_strbuf_trim(cmark_strbuf *buf); // Trim leading and trailing whitespace
void cmark_strbuf_normalize_whitespace(cmark_strbuf *buf); // Collapse runs to single space
void cmark_strbuf_unescape(cmark_strbuf *buf); // Process backslash escapes
// Lifecycle
unsigned char *cmark_strbuf_detach(cmark_strbuf *buf); // Return ptr, reset buf to init
void cmark_strbuf_free(cmark_strbuf *buf); // Free memory, reset to init
```
### `cmark_strbuf_detach()`
```c
unsigned char *cmark_strbuf_detach(cmark_strbuf *buf) {
unsigned char *data = buf->ptr;
if (buf->asize == 0) {
// Never allocated — return a new empty string
data = (unsigned char *)buf->mem->calloc(1, 1);
}
// Reset buffer to initial state
buf->ptr = cmark_strbuf__initbuf;
buf->asize = 0;
buf->size = 0;
return data;
}
```
Transfers ownership of the buffer's memory to the caller. The buffer is reset to the empty init state. The caller must `free()` the returned pointer.
### Whitespace Normalization
```c
void cmark_strbuf_normalize_whitespace(cmark_strbuf *s) {
bool last_char_was_space = false;
bufsize_t r, w;
for (r = 0, w = 0; r < s->size; r++) {
if (cmark_isspace(s->ptr[r])) {
if (!last_char_was_space) {
s->ptr[w++] = ' ';
last_char_was_space = true;
}
} else {
s->ptr[w++] = s->ptr[r];
last_char_was_space = false;
}
}
cmark_strbuf_truncate(s, w);
}
```
Collapses consecutive whitespace into a single space. Uses an in-place read/write cursor technique.
### Backslash Unescape
```c
void cmark_strbuf_unescape(cmark_strbuf *buf) {
bufsize_t r, w;
for (r = 0, w = 0; r < buf->size; r++) {
if (buf->ptr[r] == '\\' && cmark_ispunct(buf->ptr[r + 1]))
r++;
buf->ptr[w++] = buf->ptr[r];
}
cmark_strbuf_truncate(buf, w);
}
```
Removes backslash escapes before punctuation characters, in-place.
## Non-Owning Slice (`cmark_chunk`)
### Structure
```c
typedef struct {
const unsigned char *data;
bufsize_t len;
bufsize_t alloc; // 0 if non-owning, > 0 if owning
} cmark_chunk;
```
A `cmark_chunk` is either:
- **Non-owning** (`alloc == 0`): Points into someone else's memory (e.g., the parser's input buffer)
- **Owning** (`alloc > 0`): Owns its `data` pointer and must free it
### Key Operations
```c
// Create a non-owning reference
static CMARK_INLINE cmark_chunk cmark_chunk_buf_detach(cmark_strbuf *buf);
static CMARK_INLINE cmark_chunk cmark_chunk_literal(const char *data);
static CMARK_INLINE cmark_chunk cmark_chunk_dup(const cmark_chunk *ch, bufsize_t pos, bufsize_t len);
// Free (only if owning)
static CMARK_INLINE void cmark_chunk_free(cmark_mem *mem, cmark_chunk *c) {
if (c->alloc)
mem->free((void *)c->data);
c->data = NULL;
c->alloc = 0;
c->len = 0;
}
```
### Ownership Transfer
`cmark_chunk_buf_detach()` takes ownership of a `cmark_strbuf`'s memory:
```c
static CMARK_INLINE cmark_chunk cmark_chunk_buf_detach(cmark_strbuf *buf) {
cmark_chunk c;
c.len = buf->size;
c.data = cmark_strbuf_detach(buf);
c.alloc = 1; // Now owns the data
return c;
}
```
### Non-Owning References
`cmark_chunk_dup()` creates a non-owning view into existing memory:
```c
static CMARK_INLINE cmark_chunk cmark_chunk_dup(const cmark_chunk *ch,
bufsize_t pos, bufsize_t len) {
cmark_chunk c = {ch->data + pos, len, 0}; // alloc = 0: non-owning
return c;
}
```
This is used extensively during parsing to avoid copying strings. For example, text node content during inline parsing initially points into the parser's line buffer. Only when the node outlives the parse does the data need to be copied.
## Node Memory Management
### Node Allocation
```c
static cmark_node *S_node_new(cmark_node_type type, cmark_mem *mem) {
cmark_node *node = (cmark_node *)mem->calloc(1, sizeof(*node));
cmark_strbuf_init(mem, &node->content, 0);
node->type = (uint16_t)type;
node->mem = mem;
return node;
}
```
Nodes are zero-initialized via `calloc`. The `mem` pointer is stored on the node for later freeing.
### Node Deallocation
```c
static void S_free_nodes(cmark_node *e) {
cmark_node *next;
while (e != NULL) {
// Free type-specific data
switch (e->type) {
case CMARK_NODE_CODE_BLOCK:
cmark_chunk_free(e->mem, &e->as.code.info);
cmark_chunk_free(e->mem, &e->as.literal);
break;
case CMARK_NODE_LINK:
case CMARK_NODE_IMAGE:
e->mem->free(e->as.link.url);
e->mem->free(e->as.link.title);
break;
// ... other types
}
// Splice children into the free list
if (e->first_child) {
cmark_node *last = e->last_child;
last->next = e->next;
e->next = e->first_child;
}
// Advance and free
next = e->next;
e->mem->free(e);
e = next;
}
}
```
This is an iterative (non-recursive) destructor that avoids stack overflow on deeply nested ASTs. The key technique is **sibling-list splicing**: children are inserted into the sibling chain before the current position, converting tree traversal into linear list traversal.
### What Gets Freed Per Node Type
| Node Type | Freed Data |
|-----------|-----------|
| `CODE_BLOCK` | `as.code.info` chunk, `as.literal` chunk |
| `TEXT`, `HTML_BLOCK`, `HTML_INLINE`, `CODE` | `as.literal` chunk |
| `LINK`, `IMAGE` | `as.link.url`, `as.link.title` |
| `CUSTOM_BLOCK`, `CUSTOM_INLINE` | `as.custom.on_enter`, `as.custom.on_exit` |
| `HEADING` | `as.heading.setext_content` (if chunk) |
| All nodes | `content` strbuf |
## Parser Memory
The parser allocates:
- A `cmark_parser` struct
- A `cmark_strbuf` for the current line (`linebuf`)
- A `cmark_strbuf` for collected content (`content`)
- A `cmark_reference_map` for link references
- Individual `cmark_node` objects for the AST
When `cmark_parser_free()` is called, only the parser's own resources are freed — the AST is NOT freed (the user owns it). To free the AST, call `cmark_node_free()` on the root.
## Memory Safety Patterns
1. **No NULL returns**: The default allocator aborts on failure. User allocators should do the same or handle errors externally.
2. **Init buffers**: `cmark_strbuf__initbuf` prevents NULL pointer dereferences on empty buffers.
3. **Owning vs non-owning**: The `cmark_chunk.alloc` field prevents double-frees and ensures non-owning references are not freed.
4. **Iterative destruction**: `S_free_nodes()` avoids stack overflow on deep trees.
## Cross-References
- [buffer.c](../../cmark/src/buffer.c), [buffer.h](../../cmark/src/buffer.h) — `cmark_strbuf` implementation
- [chunk.h](../../cmark/src/chunk.h) — `cmark_chunk` definition
- [cmark.c](../../cmark/src/cmark.c) — Default allocator, `cmark_get_default_mem_allocator()`
- [node.c](../../cmark/src/node.c) — Node allocation and deallocation
- [ast-node-system.md](ast-node-system.md) — Node structure and lifecycle
|