summaryrefslogtreecommitdiff
path: root/docs/handbook/neozip/overview.md
blob: acf32a427e98899ffac403352718c40f93c6f328 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
# Neozip Overview

## What Is Neozip?

Neozip is Project Tick's fork of **zlib-ng**, which is itself a modernized,
performance-oriented fork of the venerable zlib compression library. Neozip
provides a drop-in replacement for zlib with significantly improved throughput
on modern hardware while retaining full API and format compatibility with the
original zlib 1.3.1 specification.

The library implements the **DEFLATE** compressed data format (RFC 1951),
wrapped in either the **zlib** container (RFC 1950) or the **gzip** container
(RFC 1952). It also exposes raw deflate streams without any wrapper.

Neozip tracks upstream zlib-ng closely. At the time of writing, the embedded
version strings are:

```c
#define ZLIBNG_VERSION "2.3.90"
#define ZLIB_VERSION   "1.3.1.zlib-ng"
```

---

## Why Neozip Exists

The original zlib library was written in the early 1990s when CPUs had very
different performance characteristics. While zlib is extremely portable and
well-tested, it leaves significant performance on the table on modern
processors because:

1. **No SIMD utilisation** — zlib's inner loops (match finding, checksumming,
   sliding the hash window) are scalar C targeting 32-bit architectures.
2. **Conservative data structures** — hash chain lengths, buffer sizes, and
   alignment are tuned for machines with tiny caches.
3. **No runtime CPU feature detection** — the same compiled binary cannot
   select between SSE2 and AVX-512 code paths at runtime.

Neozip (via zlib-ng) addresses every one of these issues while maintaining
byte-for-byte compatible output with zlib for any given set of compression
parameters (when the `ZLIB_COMPAT` build option is enabled).

---

## Feature List

### Core Compression and Decompression

| Feature | Description |
|---|---|
| DEFLATE compression (RFC 1951) | Full implementation of LZ77 + Huffman coding |
| DEFLATE decompression | State-machine inflater with optimised fast paths |
| zlib wrapper (RFC 1950) | Adler-32 integrity, two-byte header |
| gzip wrapper (RFC 1952) | CRC-32 integrity, file metadata header |
| Raw deflate | No wrapper, caller handles integrity |
| Compression levels 0–9 | From stored (level 0) through maximum compression (level 9) |
| Multiple strategies | `Z_DEFAULT_STRATEGY`, `Z_FILTERED`, `Z_HUFFMAN_ONLY`, `Z_RLE`, `Z_FIXED` |
| Streaming API | Process data in arbitrarily-sized chunks via `deflate()` / `inflate()` |
| One-shot API | `compress()` / `uncompress()` for simple in-memory use |
| gzip file I/O | `gzopen()`, `gzread()`, `gzwrite()`, `gzprintf()`, etc. |
| Dictionary support | Pre-seed compression / decompression with a shared dictionary |

### Performance Optimisations

| Optimisation | Details |
|---|---|
| Runtime CPU detection | `cpu_features.c` queries CPUID (x86), `/proc/cpuinfo` (ARM), etc. |
| Function dispatch table | `functable.c` selects the best implementation for each hot function |
| x86 SSE2 | `slide_hash`, `compare256`, `chunkset`, `inflate_fast`, CRC-32 Chorba |
| x86 SSSE3 | `adler32`, `chunkset`, `inflate_fast` |
| x86 SSE4.1 | CRC-32 Chorba SSE4.1 variant |
| x86 SSE4.2 | `adler32_copy` |
| x86 PCLMULQDQ | Carryless-multiply CRC-32 |
| x86 AVX2 | `adler32`, `compare256`, `chunkset`, `slide_hash`, `inflate_fast`, `longest_match` |
| x86 AVX-512 | `adler32`, `compare256`, `chunkset`, `inflate_fast`, `longest_match` |
| x86 AVX-512 VNNI | `adler32` using VPDPBUSD |
| x86 VPCLMULQDQ | Vectorised CRC-32 with AVX2 and AVX-512 widths |
| ARM NEON | `adler32`, `compare256`, `chunkset`, `slide_hash`, `inflate_fast`, `longest_match` |
| ARM CRC32 extension | Hardware CRC-32 instructions |
| ARM PMULL+EOR3 | Polynomial multiply CRC-32 with SHA3 three-way XOR |
| ARMv6 SIMD | `slide_hash` for 32-bit ARM |
| PowerPC VMX/VSX | `adler32`, `slide_hash`, `chunkset`, `inflate_fast` |
| POWER8/9 | Optimised Adler-32, CRC-32, compare256 |
| RISC-V RVV | Vector extensions for core loops |
| RISC-V Zbc | Bit-manipulation CRC-32 |
| IBM z/Architecture DFLTCC | Hardware deflate/inflate in a single instruction |
| LoongArch LSX/LASX | SIMD for CRC-32 and general loops |

### Algorithmic Improvements

| Improvement | Details |
|---|---|
| Quick deflate (level 1) | Intel-designed single-pass strategy (`deflate_quick.c`) |
| Medium deflate (levels 3-6) | Intel-designed strategy bridging fast and slow (`deflate_medium.c`) |
| Chorba CRC-32 | Modern CRC algorithm by Kadatch & Jenkins with braided and SIMD variants |
| 64-bit bit buffer | `bi_buf` is `uint64_t` instead of `unsigned long`, reducing flush frequency |
| Unified memory allocation | Single `zalloc` call for all deflate/inflate buffers, cache-line aligned |
| LIT_MEM mode | Separate distance/length buffers for platforms without fast unaligned access |
| Rolling hash for level 9 | `insert_string_roll` for better match quality at maximum compression |

### Build System

| Feature | Details |
|---|---|
| CMake (≥ 3.14) | Primary build system with extensive option detection |
| C11 standard | Default; C99, C17, C23 also supported |
| zlib-compat mode | `ZLIB_COMPAT=ON` produces a drop-in `libz` replacement |
| Native mode | `ZLIB_COMPAT=OFF` produces `libz-ng` with `zng_` prefixed API |
| Static and shared libraries | Both targets generated |
| Google Test suite | Comprehensive C++ test suite under `test/` |
| Fuzz targets | Under `test/fuzz/` for OSS-Fuzz integration |
| Benchmark suite | Google Benchmark harnesses under `test/benchmarks/` |
| Sanitizer support | ASan, MSan, TSan, UBSan integration via `WITH_SANITIZER` |
| Code coverage | `WITH_CODE_COVERAGE` for lcov/gcov |

---

## Repository Structure

The neozip source tree is organised as follows:

```
neozip/
├── CMakeLists.txt          # Top-level build configuration
├── deflate.c / deflate.h   # Core compression engine
├── deflate_fast.c          # Level 1-2 (or 2-3) fast strategy
├── deflate_medium.c        # Level 3-6 medium strategy (Intel)
├── deflate_slow.c          # Level 7-9 lazy/slow strategy
├── deflate_quick.c         # Level 1 quick strategy (Intel)
├── deflate_stored.c        # Level 0 stored (no compression)
├── deflate_huff.c          # Huffman-only strategy
├── deflate_rle.c           # Run-length encoding strategy
├── deflate_p.h             # Private deflate inline helpers
├── inflate.c / inflate.h   # Decompression state machine
├── inflate_p.h             # Private inflate inline helpers
├── infback.c               # Inflate with caller-provided window
├── inftrees.c / inftrees.h # Huffman code table builder for inflate
├── inffast_tpl.h           # Fast inflate inner loop template
├── inffixed_tbl.h          # Fixed Huffman tables for inflate
├── trees.c / trees.h       # Huffman tree construction for deflate
├── trees_emit.h            # Bit emission macros for tree output
├── trees_tbl.h             # Static Huffman tree tables
├── adler32.c               # Adler-32 checksum entry points
├── adler32_p.h             # Scalar Adler-32 implementation
├── crc32.c                 # CRC-32 checksum entry points
├── crc32_braid_p.h         # Braided CRC-32 configuration
├── crc32_braid_comb.c      # CRC-32 combine logic
├── crc32_chorba_p.h        # Chorba CRC-32 algorithm
├── compress.c              # One-shot compress()
├── uncompr.c               # One-shot uncompress()
├── gzlib.c                 # gzip file I/O common code
├── gzread.c                # gzip file reading
├── gzwrite.c               # gzip file writing
├── gzguts.h                # gzip internal definitions
├── zlib.h.in               # Public API header (zlib-compat mode)
├── zlib-ng.h.in            # Public API header (native mode)
├── zbuild.h                # Build-system defines, compiler abstraction
├── zutil.h / zutil.c       # Internal utility functions
├── zutil_p.h               # Private utility helpers
├── zendian.h               # Endianness detection and byte-swap macros
├── zmemory.h               # Aligned memory read/write helpers
├── zarch.h                 # Architecture detection macros
├── cpu_features.c / .h     # Runtime CPU feature detection dispatch
├── functable.c / .h        # Runtime function pointer dispatch table
├── arch_functions.h        # Architecture-specific function declarations
├── arch_natives.h          # Native (compile-time) arch selection
├── insert_string.c         # Hash table insert implementations
├── insert_string_p.h       # Private insert_string helpers
├── insert_string_tpl.h     # Insert string template macros
├── match_tpl.h             # Longest-match template (compare256 based)
├── chunkset_tpl.h          # Chunk memory-set template
├── compare256_rle.h        # RLE-optimised compare256
├── arch/                   # Architecture-specific SIMD implementations
│   ├── generic/            # Portable C fallbacks
│   ├── x86/                # SSE2, SSSE3, SSE4, AVX2, AVX-512, PCLMULQDQ
│   ├── arm/                # NEON, CRC32 extension, PMULL
│   ├── power/              # VMX, VSX, POWER8, POWER9
│   ├── s390/               # IBM z DFLTCC
│   ├── riscv/              # RVV, Zbc
│   └── loongarch/          # LSX, LASX
├── test/                   # GTest test suite, fuzz targets, benchmarks
├── cmake/                  # CMake modules (intrinsic detection, etc.)
├── doc/                    # Upstream documentation
├── tools/                  # Utility scripts
└── win32/                  # Windows-specific files
```

---

## Data Formats

Neozip processes three container formats, all built on top of the same DEFLATE
compressed data representation:

### Raw Deflate (RFC 1951)

A sequence of DEFLATE blocks with no framing. The caller is responsible for
any integrity checking. Selected by passing a negative `windowBits` value
(e.g., `-15`) to `deflateInit2()` / `inflateInit2()`.

### zlib Format (RFC 1950)

```
+---+---+   +---+---+---+---+
| CMF|FLG|   |     DATA      |  ...  +---+---+---+---+
+---+---+   +---+---+---+---+        |   ADLER-32    |
                                      +---+---+---+---+
```

- **CMF** (Compression Method and Flags): method = 8 (deflate), window size
- **FLG**: check bits, optional preset dictionary flag (`FDICT`)
- **DATA**: raw deflate blocks
- **ADLER-32**: checksum of uncompressed data (big-endian)

Overhead: 6 bytes (`ZLIB_WRAPLEN`).

### gzip Format (RFC 1952)

```
+---+---+---+---+---+---+---+---+---+---+  +-------+  +---+---+---+---+---+---+---+---+
|ID1|ID2| CM|FLG|     MTIME     |XFL| OS |  | DATA  |  |     CRC-32    |    ISIZE      |
+---+---+---+---+---+---+---+---+---+---+  +-------+  +---+---+---+---+---+---+---+---+
```

- **ID1, ID2**: Magic bytes `0x1f`, `0x8b`
- **CM**: Compression method (8 = deflate)
- **FLG**: Flags for text, CRC, extra, name, comment
- **MTIME**: Modification time (Unix epoch)
- **XFL**: Extra flags (2 = best compression, 4 = fastest)
- **OS**: Operating system code
- **DATA**: Raw deflate blocks
- **CRC-32**: CRC of uncompressed data
- **ISIZE**: Uncompressed size mod 2^32

Overhead: 18 bytes (`GZIP_WRAPLEN`).

---

## Compilation Modes

### zlib-Compatible Mode (`ZLIB_COMPAT=ON`)

When built with `-DZLIB_COMPAT=ON`:

- The library is named `libz` (no suffix).
- All public symbols use standard zlib names: `deflateInit`, `inflate`, `crc32`, etc.
- The `z_stream` structure uses `unsigned long` for `total_in` / `total_out`.
- Header file is `zlib.h`.
- Symbol prefix macro `PREFIX()` expands to `z_` (via mangling headers).
- The `ZLIB_COMPAT` preprocessor macro is defined.
- gzip file operations (`WITH_GZFILEOP`) are forced on.

This is the mode to use when neozip must be a transparent drop-in replacement
for system zlib.

### Native Mode (`ZLIB_COMPAT=OFF`)

When built with `-DZLIB_COMPAT=OFF` (the default):

- The library is named `libz-ng`.
- All public symbols use `zng_` prefixed names: `zng_deflateInit`, `zng_inflate`, etc.
- The `zng_stream` structure uses fixed-width types (`uint32_t`).
- Header file is `zlib-ng.h`.
- Symbol prefix macro `PREFIX()` expands to `zng_`.
- No `ZLIB_COMPAT` macro is defined.

Native mode is recommended for new code. Types are cleaner and there is no
ambiguity about which zlib implementation is in use.

---

## Compression Levels and Strategies

### Compression Levels

Neozip maps each compression level (0–9) to a specific **strategy function**
and a set of tuning parameters defined in the `configuration_table` in
`deflate.c`:

```c
static const config configuration_table[10] = {
/*      good lazy nice chain */
/* 0 */ {0,    0,  0,    0, deflate_stored},  /* store only */
/* 1 */ {0,    0,  0,    0, deflate_quick},   /* quick strategy */
/* 2 */ {4,    4,  8,    4, deflate_fast},
/* 3 */ {4,    6, 16,    6, deflate_medium},
/* 4 */ {4,   12, 32,   24, deflate_medium},
/* 5 */ {8,   16, 32,   32, deflate_medium},
/* 6 */ {8,   16, 128, 128, deflate_medium},
/* 7 */ {8,   32, 128, 256, deflate_slow},
/* 8 */ {32, 128, 258, 1024, deflate_slow},
/* 9 */ {32, 258, 258, 4096, deflate_slow},
};
```

The `config` fields are:
- **good_length** — reduce lazy search above this match length
- **max_lazy** — do not perform lazy search above this match length
- **nice_length** — quit search above this match length
- **max_chain** — maximum hash chain length to traverse
- **func** — pointer to the strategy function

### Strategy Functions

| Strategy | Levels | Source File | Description |
|---|---|---|---|
| `deflate_stored` | 0 | `deflate_stored.c` | No compression; copies input as stored blocks |
| `deflate_quick` | 1 | `deflate_quick.c` | Fastest compression; static Huffman, minimal match search |
| `deflate_fast` | 2 (or 1–3 without quick) | `deflate_fast.c` | Greedy matching, no lazy evaluation |
| `deflate_medium` | 3–6 | `deflate_medium.c` | Balanced: limited lazy evaluation, match merging |
| `deflate_slow` | 7–9 | `deflate_slow.c` | Full lazy evaluation, deepest hash chain search |
| `deflate_huff` | (Z_HUFFMAN_ONLY) | `deflate_huff.c` | Huffman-only, no LZ77 matching |
| `deflate_rle` | (Z_RLE) | `deflate_rle.c` | Run-length encoding, distance always 1 |

### Explicit Strategies

The `strategy` parameter to `deflateInit2()` can override the default level-based
selection:

- **`Z_DEFAULT_STRATEGY` (0)** — Normal deflate; level selects the function.
- **`Z_FILTERED` (1)** — Optimised for data produced by a filter (e.g., PNG predictors).
  Uses `deflate_slow` with short match rejection.
- **`Z_HUFFMAN_ONLY` (2)** — No LZ77; every byte is a literal.
- **`Z_RLE` (3)** — Only find runs of identical bytes (distance = 1).
- **`Z_FIXED` (4)** — Use fixed Huffman codes instead of dynamic trees.

---

## Memory Layout

Neozip uses a **single-allocation** strategy for both deflate and inflate
states. The function `alloc_deflate()` in `deflate.c` computes the total
buffer size required and calls `zalloc` exactly once, then partitions the
returned memory into:

1. **Window buffer** — Aligned to `WINDOW_PAD_SIZE` (64 or 4096 bytes depending
   on architecture). Size: `2 * (1 << windowBits)`.
2. **prev array** — `Pos` (uint16_t) array of size `1 << windowBits`. Aligned to 64 bytes.
3. **head array** — `Pos` array of size `HASH_SIZE` (65536). Aligned to 64 bytes.
4. **pending_buf** — Output bit buffer of size `lit_bufsize * LIT_BUFS + 1`. Aligned to 64 bytes.
5. **deflate_state** — The `internal_state` struct itself. Aligned to 64 bytes
   (cache-line aligned via `ALIGNED_(64)`).
6. **deflate_allocs** — Book-keeping struct tracking the original allocation pointer.

The `inflate_state` uses an analogous scheme via `alloc_inflate()`:

1. **Window buffer** — `(1 << MAX_WBITS) + 64` bytes with `WINDOW_PAD_SIZE` alignment.
2. **inflate_state** — The state struct, 64-byte aligned.
3. **inflate_allocs** — Book-keeping.

This approach minimises the number of `malloc` calls, improves cache locality,
and simplifies cleanup (a single `zfree` releases everything).

---

## Thread Safety

Neozip is thread-safe under the following conditions:

1. Each `z_stream` instance is accessed from only one thread at a time.
2. The `zalloc` and `zfree` callbacks are thread-safe (the defaults use
   `malloc` / `free`, which are thread-safe on all supported platforms).

The function dispatch table (`functable`) uses atomic stores during
initialisation:

```c
#define FUNCTABLE_ASSIGN(VAR, FUNC_NAME) \
    __atomic_store(&(functable.FUNC_NAME), &(VAR.FUNC_NAME), __ATOMIC_SEQ_CST)
#define FUNCTABLE_BARRIER() __atomic_thread_fence(__ATOMIC_SEQ_CST)
```

This ensures that even if multiple threads call `deflateInit` / `inflateInit`
concurrently, the function table is initialised safely.

---

## Version Identification

The library provides several ways to query version information:

```c
const char *zlibVersion(void);         // Returns "1.3.1.zlib-ng" in compat mode
const char *zlibng_version(void);      // Returns "2.3.90"

// Compile-time constants
#define ZLIBNG_VERSION  "2.3.90"
#define ZLIBNG_VERNUM   0x02039000L
#define ZLIB_VERSION    "1.3.1.zlib-ng"
#define ZLIB_VERNUM     0x131f
```

In compat mode, `deflateInit` and `inflateInit` verify that the header version
matches the library version to prevent ABI mismatches:

```c
#define CHECK_VER_STSIZE(version, stream_size) \
    (version == NULL || version[0] != ZLIB_VERSION[0] || \
     stream_size != (int32_t)sizeof(PREFIX3(stream)))
```

---

## Licensing

Neozip inherits the zlib/libpng license from both zlib and zlib-ng:

> This software is provided 'as-is', without any express or implied warranty.
> Permission is granted to anyone to use this software for any purpose,
> including commercial applications, and to alter it and redistribute it freely,
> subject to the following restrictions: [...]

See `LICENSE.md` in the neozip source tree for the full text.

---

## Key Differences from Upstream zlib

| Area | zlib 1.3.1 | Neozip (zlib-ng) |
|---|---|---|
| Bit buffer width | 32-bit `unsigned long` | 64-bit `uint64_t` |
| Hash table size | 32768 entries (15 bits) | 65536 entries (16 bits) |
| Match buffer format | Overlaid `sym_buf` only | `LIT_MEM` option for separate `d_buf`/`l_buf` |
| Hash function | Three-byte rolling | Four-byte CRC-based or multiplicative |
| SIMD acceleration | None | Extensive (see Performance Optimisations) |
| CPU detection | None (compile-time only) | Runtime `cpuid` / feature detection |
| Memory allocation | Multiple `zalloc` calls | Single allocation, cache-aligned |
| Minimum match length | 3 (`STD_MIN_MATCH`) | Internally uses `WANT_MIN_MATCH = 4` for speed |
| Quick strategy | None | `deflate_quick` for level 1 |
| Medium strategy | None | `deflate_medium` for levels 3–6 |
| Data structure alignment | None | `ALIGNED_(64)` on key structs |
| Build system | Makefile / CMake | CMake primary with full feature detection |

---

## Quick Start

### Building

```bash
cd neozip
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j$(nproc)
```

### Using in a CMake Project

```cmake
find_package(zlib-ng CONFIG REQUIRED)
target_link_libraries(myapp PRIVATE zlib-ng::zlib-ng)
```

Or in zlib-compat mode:

```cmake
find_package(ZLIB CONFIG REQUIRED)
target_link_libraries(myapp PRIVATE ZLIB::ZLIB)
```

### Minimal Compression Example

```c
#include <zlib-ng.h>
#include <string.h>
#include <stdio.h>

int main(void) {
    const char *source = "Hello, Neozip! This is a test of compression.";
    size_t source_len = strlen(source);

    size_t dest_len = zng_compressBound(source_len);
    unsigned char *dest = malloc(dest_len);

    int ret = zng_compress(dest, &dest_len, (const unsigned char *)source, source_len);
    if (ret == Z_OK) {
        printf("Compressed %zu bytes to %zu bytes\n", source_len, dest_len);
    }

    unsigned char *recovered = malloc(source_len + 1);
    size_t recovered_len = source_len;
    zng_uncompress(recovered, &recovered_len, dest, dest_len);
    recovered[recovered_len] = '\0';
    printf("Recovered: %s\n", recovered);

    free(dest);
    free(recovered);
    return 0;
}
```

---

## Further Reading

- [Architecture](architecture.md) — Module-by-module breakdown of the source
- [Building](building.md) — Complete CMake option reference
- [Deflate Algorithms](deflate-algorithms.md) — LZ77 match finding and strategies
- [Inflate Engine](inflate-engine.md) — Decompression state machine
- [Huffman Coding](huffman-coding.md) — Tree construction and bit emission
- [Checksum Algorithms](checksum-algorithms.md) — CRC-32 and Adler-32 details
- [Hardware Acceleration](hardware-acceleration.md) — CPU detection and dispatch
- [x86 Optimizations](x86-optimizations.md) — SSE/AVX/PCLMULQDQ implementations
- [ARM Optimizations](arm-optimizations.md) — NEON and CRC32 extension
- [Gzip Support](gzip-support.md) — gzip file I/O layer
- [API Reference](api-reference.md) — Full public API documentation
- [Performance Tuning](performance-tuning.md) — Benchmarking and tuning guide
- [Testing](testing.md) — Test suite reference
- [Code Style](code-style.md) — Coding conventions