docs/handbook/cmark/testing.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281

# cmark — Testing

## Overview

cmark has a multi-layered testing infrastructure: C API unit tests, spec conformance tests, pathological input tests, fuzz testing, and memory sanitizers. The build system integrates all of these through CMake and CTest.

## Test Infrastructure (CMakeLists.txt)

### API Tests

```cmake
add_executable(api_test api_test/main.c)
target_link_libraries(api_test libcmark_static)
add_test(NAME api_test COMMAND api_test)
```

The API test executable links against the static library and tests the public C API directly.

### Spec Tests

```cmake
add_test(NAME spec_test
         COMMAND ${PYTHON_EXECUTABLE} test/spec_tests.py
                 --spec test/spec.txt
                 --program $<TARGET_FILE:cmark>)
```

Spec tests run the `cmark` binary against the CommonMark specification. The Python script `spec_tests.py` parses the spec file, extracts input/output examples, runs `cmark` on each input, and compares the output.

### Pathological Tests

```cmake
add_test(NAME pathological_test
         COMMAND ${PYTHON_EXECUTABLE} test/pathological_tests.py
                 --program $<TARGET_FILE:cmark>)
```

These tests verify that cmark handles pathological inputs (deeply nested structures, long runs of special characters) without excessive time or memory usage.

### Smart Punctuation Tests

```cmake
add_test(NAME smart_punct_test
         COMMAND ${PYTHON_EXECUTABLE} test/spec_tests.py
                 --spec test/smart_punct.txt
                 --program $<TARGET_FILE:cmark>
                 --extensions "")
```

Tests for the `CMARK_OPT_SMART` option (curly quotes, em/en dashes, ellipses).

### Roundtrip Tests

```cmake
add_test(NAME roundtrip_test
         COMMAND ${PYTHON_EXECUTABLE} test/roundtrip_tests.py
                 --spec test/spec.txt
                 --program $<TARGET_FILE:cmark>)
```

Roundtrip tests verify that `cmark -t commonmark | cmark -t html` produces the same HTML as direct `cmark -t html`.

### Entity Tests

```cmake
add_test(NAME entity_test
         COMMAND ${PYTHON_EXECUTABLE} test/spec_tests.py
                 --spec test/entity.txt
                 --program $<TARGET_FILE:cmark>)
```

Tests HTML entity handling.

### Regression Tests

```cmake
add_test(NAME regression_test
         COMMAND ${PYTHON_EXECUTABLE} test/spec_tests.py
                 --spec test/regression.txt
                 --program $<TARGET_FILE:cmark>)
```

Regression tests cover previously discovered bugs.

## API Test (`api_test/main.c`)

The API test file is a single C source file with test functions covering every public API function. Test patterns used:

### Test Macros

```c
#define OK(test, msg) \
  if (test) { passes++; } \
  else { failures++; fprintf(stderr, "FAIL: %s\n  %s\n", __func__, msg); }

#define INT_EQ(actual, expected, msg) \
  if ((actual) == (expected)) { passes++; } \
  else { failures++; fprintf(stderr, "FAIL: %s\n  Expected %d got %d: %s\n", \
         __func__, expected, actual, msg); }

#define STR_EQ(actual, expected, msg) \
  if (strcmp(actual, expected) == 0) { passes++; } \
  else { failures++; fprintf(stderr, "FAIL: %s\n  Expected \"%s\" got \"%s\": %s\n", \
         __func__, expected, actual, msg); }
```

### Test Categories

1. **Version tests**: Verify `cmark_version()` and `cmark_version_string()` return correct values
2. **Constructor tests**: `cmark_node_new()` for each node type
3. **Accessor tests**: Get/set for heading level, list type, list tight, content, etc.
4. **Tree manipulation tests**: `cmark_node_append_child()`, `cmark_node_insert_before()`, etc.
5. **Parser tests**: `cmark_parse_document()`, streaming `cmark_parser_feed()` + `cmark_parser_finish()`
6. **Renderer tests**: Verify HTML, XML, man, LaTeX, CommonMark output for known inputs
7. **Iterator tests**: `cmark_iter_new()`, traversal order, `cmark_iter_reset()`
8. **Memory tests**: Custom allocator, `cmark_node_free()`, no leaks

### Example Test Function

```c
static void test_md_to_html(const char *markdown, const char *expected_html,
                            const char *msg) {
  char *html = cmark_markdown_to_html(markdown, strlen(markdown),
                                      CMARK_OPT_DEFAULT);
  STR_EQ(html, expected_html, msg);
  free(html);
}
```

## Spec Test Format

The spec file (`test/spec.txt`) uses a specific format:

```
```````````````````````````````` example
Markdown input here
.
<p>Expected HTML output here</p>
````````````````````````````````
```

Each example is delimited by `example` markers. The `.` on a line by itself separates input from expected output.

The Python test runner (`test/spec_tests.py`):
1. Parses the spec file to extract examples
2. For each example, runs the `cmark` binary with the input
3. Compares the actual output with the expected output
4. Reports pass/fail for each example

## Pathological Input Tests

The pathological test file (`test/pathological_tests.py`) generates adversarial inputs designed to trigger worst-case behavior:

- Deeply nested block quotes (`> > > > > ...`)
- Deeply nested lists
- Long runs of backticks
- Many consecutive closing brackets `]]]]]...]`
- Long emphasis delimiter runs `***...***`
- Repeated link definitions

Each test verifies that cmark completes within a reasonable time bound (not quadratic or exponential).

## Fuzzing

### LibFuzzer

```cmake
if(CMARK_LIB_FUZZER)
  add_executable(cmark_fuzz fuzz/cmark_fuzz.c)
  target_link_libraries(cmark_fuzz libcmark_static)
  target_compile_options(cmark_fuzz PRIVATE -fsanitize=fuzzer)
  target_link_options(cmark_fuzz PRIVATE -fsanitize=fuzzer)
endif()
```

The fuzzer entry point (`fuzz/cmark_fuzz.c`) implements:
```c
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  // Parse data as CommonMark
  // Render to all formats
  // Free everything
  // Return 0
}
```

This subjects all parsers and renderers to random input.

### Building with Fuzzing

```bash
cmake -DCMARK_LIB_FUZZER=ON \
      -DCMAKE_C_COMPILER=clang \
      -DCMAKE_C_FLAGS="-fsanitize=fuzzer-no-link,address" \
      ..
make
./cmark_fuzz corpus/
```

## Sanitizer Builds

### Address Sanitizer (ASan)

```bash
cmake -DCMAKE_BUILD_TYPE=Asan ..
```

Sets flags: `-fsanitize=address -fno-omit-frame-pointer`

Detects:
- Buffer overflows (stack, heap, global)
- Use-after-free
- Double-free
- Memory leaks (LSAN)

### Undefined Behavior Sanitizer (UBSan)

```bash
cmake -DCMAKE_BUILD_TYPE=Ubsan ..
```

Sets flags: `-fsanitize=undefined`

Detects:
- Signed integer overflow
- Null pointer dereference
- Misaligned access
- Invalid shift
- Out-of-bounds array access

## Running Tests

### Full Test Suite

```bash
mkdir build && cd build
cmake ..
make
ctest
```

### Verbose Output

```bash
ctest --verbose
```

### Single Test

```bash
ctest -R api_test
ctest -R spec_test
```

### With ASan

```bash
mkdir build-asan && cd build-asan
cmake -DCMAKE_BUILD_TYPE=Asan ..
make
ctest
```

## Test Data Files

| File | Purpose |
|------|---------|
| `test/spec.txt` | CommonMark specification with examples |
| `test/smart_punct.txt` | Smart punctuation examples |
| `test/entity.txt` | HTML entity test cases |
| `test/regression.txt` | Regression test cases |
| `test/spec_tests.py` | Spec test runner script |
| `test/pathological_tests.py` | Pathological input tests |
| `test/roundtrip_tests.py` | CommonMark roundtrip tests |
| `api_test/main.c` | C API unit tests |
| `fuzz/cmark_fuzz.c` | LibFuzzer entry point |

## Cross-References

- [building.md](building.md) — Build configurations including test builds
- [public-api.md](public-api.md) — API functions tested by `api_test`
- [cli-usage.md](cli-usage.md) — The `cmark` binary tested by spec tests