diff options
Diffstat (limited to 'docs/handbook/neozip/deflate-algorithms.md')
| -rw-r--r-- | docs/handbook/neozip/deflate-algorithms.md | 797 |
1 files changed, 797 insertions, 0 deletions
diff --git a/docs/handbook/neozip/deflate-algorithms.md b/docs/handbook/neozip/deflate-algorithms.md new file mode 100644 index 0000000000..4d3eeb03c9 --- /dev/null +++ b/docs/handbook/neozip/deflate-algorithms.md @@ -0,0 +1,797 @@ +# Deflate Algorithms + +## Overview + +The DEFLATE algorithm (RFC 1951) combines **LZ77** sliding-window compression +with **Huffman coding**. Neozip implements DEFLATE through a modular strategy +system where each compression level maps to a specific strategy function with +tuned parameters. This document covers every aspect of the compression +pipeline: hash chains, match finding, lazy evaluation, and the strategy +functions. + +--- + +## The DEFLATE Compression Pipeline + +At a high level, DEFLATE works as follows: + +1. **Sliding window**: Input is processed through a 32KB (configurable) + sliding window. Each position is hashed and inserted into a hash table. +2. **LZ77 match finding**: For each position, the hash table is consulted to + find previous occurrences of the same byte sequence. The longest match + within the window distance is selected. +3. **Symbol emission**: Either a **literal** byte (no match found) or a + **length/distance pair** (match found) is recorded. +4. **Huffman coding**: Accumulated symbols are encoded with Huffman codes + (static, dynamic, or the block is stored uncompressed) and output as a + DEFLATE block. + +``` +Input bytes + │ + ▼ +┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ +│ Sliding │ ──▶ │ Hash Insert │ ──▶ │ Match Finding│ ──▶ │ Symbol │ +│ Window │ │ (head/prev) │ │ (longest_ │ │ Buffer │ +│ (32K×2) │ │ │ │ match) │ │ (sym_buf) │ +└──────────┘ └──────────────┘ └──────────────┘ └─────┬──────┘ + │ + ▼ + ┌────────────┐ + │ Huffman │ + │ Encoding │ + │ (trees.c) │ + └─────┬──────┘ + │ + ▼ + Compressed + Output +``` + +--- + +## The Sliding Window + +The sliding window is a byte array of size `2 * w_size`, where `w_size` +defaults to 32768 (32KB, corresponding to `windowBits = 15`): + +```c +unsigned char *window; // Size: 2 * w_size bytes +unsigned int window_size; // = 2 * w_size +unsigned int w_size; // = 1 << windowBits (default 32768) +``` + +Input is read into the **upper half** of the window (positions `w_size` to +`2*w_size - 1`). When this half fills up, the lower half is discarded, the +upper half is moved to the lower half (`memcpy` or `memmove`), and new input +fills the upper half again. This is the "slide" operation. + +### Window Sliding + +The `fill_window()` function in `deflate.c` performs the slide: + +1. If more data is needed and `strstart >= w_size + MAX_DIST(s)`: + - Copy bytes `[w_size, 2*w_size)` to `[0, w_size)` + - Adjust `match_start`, `strstart`, `block_start` by `-w_size` + - Call `FUNCTABLE_CALL(slide_hash)(s)` to update hash table entries +2. Read new input from `strm->next_in` into the free space +3. Maintain the `high_water` mark for memory-check safety + +### `slide_hash()` + +When the window slides, all hash table entries (`head[]` and `prev[]`) must be +decremented by `w_size`. Entries that would become negative (pointing before +the new window start) are set to zero. + +The generic C implementation: +```c +void slide_hash_c(deflate_state *s) { + Pos *p; + unsigned n = HASH_SIZE; + p = &s->head[n]; + do { + unsigned m = *--p; + *p = (Pos)(m >= w_size ? m - w_size : 0); + } while (--n); + // Same for prev[] +} +``` + +SIMD implementations process 8 or 16 entries at a time using vector +subtraction and saturation. + +--- + +## Hash Table Structure + +The hash table maps byte sequences to window positions for fast match lookup. +It consists of two arrays: + +### `head[HASH_SIZE]` + +An array of `Pos` (uint16_t) values, indexed by hash value. Each entry +contains the most recent window position that hashed to that index. + +```c +#define HASH_BITS 16u +#define HASH_SIZE 65536u // 2^16 +#define HASH_MASK (HASH_SIZE - 1u) + +Pos *head; // head[HASH_SIZE]: most recent position for each hash +``` + +Note: zlib-ng uses a 16-bit hash (65536 entries) compared to original zlib's +15-bit hash (32768 entries), providing better distribution and fewer collisions. + +### `prev[w_size]` + +An array maintaining the hash **chain** — a linked list of all positions +that share the same hash value: + +```c +Pos *prev; // prev[w_size]: chain links, indexed by (position & w_mask) +``` + +When a new position `str` is inserted for hash value `h`: +```c +prev[str & w_mask] = head[h]; // Link to previous chain head +head[h] = str; // New chain head +``` + +Walking the chain from `head[h]` through `prev[]` yields all previous +positions with the same hash, in reverse chronological order. + +--- + +## Hash Function + +Neozip hashes 4 bytes at the current position (compared to zlib's 3 bytes). +The hash function is defined in `insert_string_tpl.h`: + +```c +Z_FORCEINLINE static uint32_t UPDATE_HASH(uint32_t h, uint32_t val) { + HASH_CALC(h, val); // Architecture-specific hash computation + return h & HASH_CALC_MASK; // Mask to HASH_SIZE +} +``` + +The `HASH_CALC` macro uses a fast multiplicative hash or CRC-based hash +depending on the configuration. Reading 4 bytes at once (`zng_memread_4`) +provides better hash distribution than 3-byte hashing. + +### Hash Insert Operations + +Several insert variants exist: + +#### `quick_insert_value()` + +Used by `deflate_fast` and `deflate_quick`. Inserts a single position +with a pre-read 4-byte value: + +```c +Z_FORCEINLINE static uint32_t QUICK_INSERT_VALUE( + deflate_state *const s, uint32_t str, uint32_t val) { + uint32_t hm, head; + HASH_CALC_VAR_INIT; + HASH_CALC(HASH_CALC_VAR, val); + HASH_CALC_VAR &= HASH_CALC_MASK; + hm = HASH_CALC_VAR; + head = s->head[hm]; + if (LIKELY(head != str)) { + s->prev[str & W_MASK(s)] = (Pos)head; + s->head[hm] = (Pos)str; + } + return head; +} +``` + +#### `quick_insert_string()` + +Like `quick_insert_value()` but reads the 4 bytes itself: + +```c +Z_FORCEINLINE static uint32_t QUICK_INSERT_STRING( + deflate_state *const s, uint32_t str) { + uint8_t *strstart = s->window + str + HASH_CALC_OFFSET; + uint32_t val, hm, head; + HASH_CALC_VAR_INIT; + HASH_CALC_READ; // val = Z_U32_FROM_LE(zng_memread_4(strstart)) + HASH_CALC(HASH_CALC_VAR, val); + // ... insert ... + return head; +} +``` + +#### `insert_string()` + +Batch insert. Inserts `count` consecutive positions, used after a match +to insert all strings within the matched region: + +```c +void insert_string(deflate_state *const s, uint32_t str, uint32_t count); +``` + +#### `insert_string_roll()` / `quick_insert_string_roll()` + +Rolling hash insert for level 9 (`deflate_slow` with `LONGEST_MATCH_SLOW`). +Uses a different hash update that considers the full string context for +better match quality at the cost of speed. + +--- + +## Match Finding + +### `longest_match()` (Standard) + +Defined via the `match_tpl.h` template. This is the hot inner loop of +compression. + +**Algorithm**: + +1. Set `best_len` to `prev_length` (or `STD_MIN_MATCH - 1`) +2. Read the first 8 bytes at `scan` and at `scan + offset` for fast + comparison (where `offset = best_len - 1`, adjusted for word boundaries) +3. Set `limit` to prevent matches beyond `MAX_DIST(s)` +4. If `best_len >= good_match`, halve `chain_length` to speed up +5. Walk the hash chain via `prev[]`: + a. For each candidate `cur_match`: + - Quick reject: compare 8 bytes at match end position (`mbase_end + cur_match`) + against `scan_end` + - Quick reject: compare 8 bytes at match start against `scan_start` + - If both match: call `compare256()` for exact length + - If new length > `best_len`: update `best_len` and `match_start` + - If `best_len >= nice_match`: return immediately + - Update `scan_end` for the new best length + b. Follow `prev[cur_match & wmask]` to next chain entry + c. Stop when `chain_length` exhausted or `cur_match <= limit` + +```c +Z_INTERNAL uint32_t LONGEST_MATCH(deflate_state *const s, uint32_t cur_match) { + // Early exit: if chain becomes too deep for poor matches + int32_t early_exit = s->level < EARLY_EXIT_TRIGGER_LEVEL; + + while (--chain_length) { + match = mbase_start + cur_match; + if (zng_memread_8(mbase_end + cur_match) == scan_end && + zng_memread_8(match) == scan_start) { + len = FUNCTABLE_CALL(compare256)(scan + 2, match + 2) + 2; + if (len > best_len) { + s->match_start = cur_match; + best_len = len; + if (best_len >= nice_match) return best_len; + offset = best_len - 1; + // Update scan_end + } + } + cur_match = prev[cur_match & wmask]; + if (cur_match <= limit) return best_len; + } +} +``` + +**Key parameters** that control match-finding behaviour: + +| Parameter | Field | Effect | +|---|---|---| +| `max_chain_length` | `s->max_chain_length` | Maximum hash chain entries to check | +| `good_match` | `s->good_match` | Halve chain search above this best length | +| `nice_match` | `s->nice_match` | Stop searching above this length | +| `max_lazy_match` | `s->max_lazy_match` | Don't bother with lazy match above this | + +### `longest_match_slow()` (Level 9) + +The slow variant (`LONGEST_MATCH_SLOW`) adds chain re-rooting for better +match quality: + +1. When continuing a lazy evaluation search, it doesn't just follow the + chain from the hash of the current position +2. Instead, it finds the most distant chain starting from positions + `scan[1], scan[2], ..., scan[best_len]` using `update_hash_roll()` +3. This effectively searches multiple hash chains to find matches that + the standard algorithm would miss + +```c +// Re-root: find a more distant chain start +hash = update_hash_roll(0, scan[1]); +hash = update_hash_roll(hash, scan[2]); +for (uint32_t i = 3; i <= best_len; i++) { + hash = update_hash_roll(hash, scan[i]); + pos = s->head[hash]; + if (pos < cur_match) { + cur_match = pos; // Found a more distant starting point + } +} +``` + +### `compare256()` + +Compares up to 256 bytes between two pointers, returning the number of +matching bytes. Architecture-specific implementations: + +| Implementation | File | Method | +|---|---|---| +| `compare256_c` | `arch/generic/compare256_c.c` | 8-byte word comparison | +| `compare256_sse2` | `arch/x86/compare256_sse2.c` | 16-byte SSE2 comparison | +| `compare256_avx2` | `arch/x86/compare256_avx2.c` | 32-byte AVX2 comparison | +| `compare256_avx512` | `arch/x86/compare256_avx512.c` | 64-byte AVX-512 comparison | +| `compare256_neon` | `arch/arm/compare256_neon.c` | 16-byte NEON comparison | + +The compare function uses `FUNCTABLE_CALL(compare256)` for dispatch. + +--- + +## Configuration Table + +Each compression level has a set of tuning parameters defined in +`deflate.c`: + +```c +typedef struct config_s { + uint16_t good_length; // Reduce lazy search above this match length + uint16_t max_lazy; // Do not perform lazy search above this length + uint16_t nice_length; // Quit search above this match length + uint16_t max_chain; // Maximum hash chain length + compress_func func; // Strategy function pointer +} config; +``` + +The full table: + +| Level | good | lazy | nice | chain | Strategy | +|---|---|---|---|---|---| +| 0 | 0 | 0 | 0 | 0 | `deflate_stored` | +| 1 | 0 | 0 | 0 | 0 | `deflate_quick` | +| 2 | 4 | 4 | 8 | 4 | `deflate_fast` | +| 3 | 4 | 6 | 16 | 6 | `deflate_medium` | +| 4 | 4 | 12 | 32 | 24 | `deflate_medium` | +| 5 | 8 | 16 | 32 | 32 | `deflate_medium` | +| 6 | 8 | 16 | 128 | 128 | `deflate_medium` | +| 7 | 8 | 32 | 128 | 256 | `deflate_slow` | +| 8 | 32 | 128 | 258 | 1024 | `deflate_slow` | +| 9 | 32 | 258 | 258 | 4096 | `deflate_slow` | + +When `NO_QUICK_STRATEGY` is defined, level 1 uses `deflate_fast` and level 2 +shifts accordingly. When `NO_MEDIUM_STRATEGY` is defined, levels 3–6 use +`deflate_fast` or `deflate_slow`. + +--- + +## Strategy Functions in Detail + +### `deflate_stored` (Level 0) + +No compression. Input is emitted as stored blocks (type 0 in DEFLATE): + +- Each stored block has a 5-byte header: BFINAL (1 bit), BTYPE (2 bits = 00), + padding to byte boundary, LEN (16 bits), NLEN (16 bits, one's complement of LEN) +- Maximum stored block length: 65535 bytes (`MAX_STORED`) +- Directly copies from `next_in` to `next_out` when possible +- Falls back to buffering through the window when direct copy isn't feasible + +```c +Z_INTERNAL block_state deflate_stored(deflate_state *s, int flush) { + unsigned min_block = MIN(s->pending_buf_size - 5, w_size); + // ... + len = MAX_STORED; + have = (s->bi_valid + 42) >> 3; // Header overhead + // Copy blocks directly when possible +} +``` + +### `deflate_quick` (Level 1) + +The fastest compression strategy, designed by Intel: + +- Uses **static Huffman trees** only (no dynamic tree construction) +- Single-pass greedy matching with no lazy evaluation +- Emits blocks via `zng_tr_emit_tree(s, STATIC_TREES, last)` and + `zng_tr_emit_end_block(s, static_ltree, last)` +- Tracks block state: `block_open` = 0 (closed), 1 (open), 2 (open + last) +- Flushes when `pending` approaches `pending_buf_size` + +```c +Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) { + // Start static block + quick_start_block(s, last); + + for (;;) { + // Flush if pending buffer nearly full + if (s->pending + ((BIT_BUF_SIZE + 7) >> 3) >= s->pending_buf_size) { + PREFIX(flush_pending)(s->strm); + // ... + } + + // Hash insert + uint32_t hash_head = quick_insert_value(s, s->strstart, str_val); + + // Try to find a match + if (dist <= MAX_DIST(s) && dist > 0) { + // Quick match comparison... + if (match_val == str_val) { + // Full match via longest_match or inline compare + zng_tr_emit_dist(s, static_ltree, static_dtree, ...); + } + } + // No match: emit literal + zng_tr_emit_lit(s, static_ltree, lc); + } +} +``` + +### `deflate_fast` (Level 2, or 1–3 without quick) + +Greedy matching without lazy evaluation: + +1. `fill_window()` if `lookahead < MIN_LOOKAHEAD` +2. Hash insert via `quick_insert_value()` +3. If match found (within `MAX_DIST`, length ≥ `WANT_MIN_MATCH`): + - Record match via `zng_tr_tally_dist()` + - Insert all strings within the match region + - Advance `strstart` by `match_length` +4. If no match: + - Record literal via `zng_tr_tally_lit()` + - Advance `strstart` by 1 +5. If tally function returns true (buffer full): flush block + +```c +Z_INTERNAL block_state deflate_fast(deflate_state *s, int flush) { + for (;;) { + if (s->lookahead < MIN_LOOKAHEAD) { + PREFIX(fill_window)(s); + // ... + } + if (s->lookahead >= WANT_MIN_MATCH) { + uint32_t hash_head = quick_insert_value(s, s->strstart, str_val); + if (dist <= MAX_DIST(s) && dist > 0 && hash_head != 0) { + match_len = FUNCTABLE_CALL(longest_match)(s, hash_head); + } + } + if (match_len >= WANT_MIN_MATCH) { + bflush = zng_tr_tally_dist(s, s->strstart - s->match_start, + match_len - STD_MIN_MATCH); + s->lookahead -= match_len; + // Insert strings within match + s->strstart += match_len; + } else { + bflush = zng_tr_tally_lit(s, lc); + s->lookahead--; + s->strstart++; + } + if (UNLIKELY(bflush)) + FLUSH_BLOCK(s, 0); + } +} +``` + +### `deflate_medium` (Levels 3–6) + +Intel's balanced strategy that bridges fast and slow: + +Uses a `struct match` to track match attributes: +```c +struct match { + uint16_t match_start; + uint16_t match_length; + uint16_t strstart; + uint16_t orgstart; +}; +``` + +Key helper functions: +- **`find_best_match()`** — Calls `longest_match()` and returns a `struct match` +- **`emit_match()`** — Emits literals for short matches (< `WANT_MIN_MATCH`) + or a length/distance pair for longer ones +- **`insert_match()`** — Inserts hash entries for matched positions + +The algorithm maintains a two-position lookahead: + +1. Find best match at current position +2. Find best match at next position +3. Compare: if next match is better by a meaningful margin, emit current + position as a literal and adopt the next match +4. Otherwise, emit the current match + +```c +Z_INTERNAL block_state deflate_medium(deflate_state *s, int flush) { + struct match current_match, next_match; + + for (;;) { + current_match = find_best_match(s, hash_head); + + if (current_match.match_length >= WANT_MIN_MATCH) { + // Check if next position has a better match + next_match = find_best_match(s, next_hash_head); + if (next_match.match_length > current_match.match_length + 1) { + // Skip current, use next + emit_match(s, literal_match); + insert_match(s, literal_match); + continue; + } + } + emit_match(s, current_match); + insert_match(s, current_match); + } +} +``` + +### `deflate_slow` (Levels 7–9) + +Full lazy match evaluation — the traditional approach for maximum compression: + +1. Find longest match at current position +2. **Do not emit it yet** — set `match_available = 1` +3. Advance to next position, find another match +4. If the new match is **not better** than the previous one: + - Emit the previous match (the "lazy" match) + - Insert all strings within the matched region + - Skip past the match +5. If the new match **is better**: + - Emit the previous position as a literal + - Record the new match as `match_available` + - Continue from step 3 + +```c +Z_INTERNAL block_state deflate_slow(deflate_state *s, int flush) { + // Level ≥ 9: use slow match finder and rolling insert + if (level >= 9) { + longest_match = FUNCTABLE_FPTR(longest_match_slow); + insert_string_func = insert_string_roll; + } + + for (;;) { + // Find match + if (dist <= MAX_DIST(s) && s->prev_length < s->max_lazy_match) { + match_len = longest_match(s, hash_head); + } + + // Lazy evaluation + if (s->prev_length >= STD_MIN_MATCH && match_len <= s->prev_length) { + // Previous match was better — emit it + bflush = zng_tr_tally_dist(s, s->strstart - 1 - s->prev_match, + s->prev_length - STD_MIN_MATCH); + // Insert strings within match + s->prev_length -= 1; + s->lookahead -= s->prev_length; + s->strstart += s->prev_length; + s->prev_length = 0; + s->match_available = 0; + } else if (s->match_available) { + // Previous position has no good match — emit as literal + bflush = zng_tr_tally_lit(s, s->window[s->strstart - 1]); + } else { + s->match_available = 1; + } + + s->prev_length = match_len; + s->prev_match = s->match_start; + } +} +``` + +**Level 9 enhancements**: +- Uses `longest_match_slow` which re-roots hash chains for deeper search +- Uses `insert_string_roll` with a rolling hash for better distribution +- `max_chain = 4096` provides the deepest chain traversal +- `nice_match = 258` (maximum match length) means it never gives up early + +### `deflate_huff` (Z_HUFFMAN_ONLY) + +Huffman-only compression — every byte is emitted as a literal, no LZ77: + +```c +Z_INTERNAL block_state deflate_huff(deflate_state *s, int flush) { + for (;;) { + if (s->lookahead == 0) { + PREFIX(fill_window)(s); + if (s->lookahead == 0) { + if (flush == Z_NO_FLUSH) return need_more; + break; + } + } + bflush = zng_tr_tally_lit(s, s->window[s->strstart]); + s->lookahead--; + s->strstart++; + if (bflush) FLUSH_BLOCK(s, 0); + } +} +``` + +This forces construction of a Huffman tree that encodes only literal +frequencies. Useful when the data is already compressed or random. + +### `deflate_rle` (Z_RLE) + +Run-length encoding — only finds runs of identical bytes (distance = 1): + +```c +Z_INTERNAL block_state deflate_rle(deflate_state *s, int flush) { + for (;;) { + // Check for a run: scan[-1] == scan[0] == scan[1] + if (s->lookahead >= STD_MIN_MATCH && s->strstart > 0) { + scan = s->window + s->strstart - 1; + if (scan[0] == scan[1] && scan[1] == scan[2]) { + match_len = compare256_rle(scan, scan + 3) + 2; + match_len = MIN(match_len, STD_MAX_MATCH); + } + } + + if (match_len >= STD_MIN_MATCH) { + bflush = zng_tr_tally_dist(s, 1, match_len - STD_MIN_MATCH); + } else { + bflush = zng_tr_tally_lit(s, s->window[s->strstart]); + } + } +} +``` + +The `compare256_rle()` function is optimised for the RLE case where +all bytes in the run are identical: + +```c +// compare256_rle.h +static inline uint32_t compare256_rle_64(const uint8_t *src0, const uint8_t *src1) { + // Read 8-byte words from src0, replicate src1[0] across 8 bytes + // XOR and count trailing zeros to find first difference +} +``` + +--- + +## Symbol Buffer + +Matches and literals are stored in a symbol buffer before being emitted: + +### Overlaid Format (`sym_buf`, default) + +Three bytes per symbol: +- `sym_buf[i+0]`, `sym_buf[i+1]` — distance (0 for literal) +- `sym_buf[i+2]` — literal byte or match length + +```c +// zng_tr_tally_dist: +zng_memwrite_4(&s->sym_buf[sym_next], Z_U32_TO_LE(dist | ((uint32_t)len << 16))); +s->sym_next = sym_next + 3; + +// zng_tr_tally_lit: +zng_memwrite_4(&s->sym_buf[sym_next], Z_U32_TO_LE((uint32_t)c << 16)); +s->sym_next = sym_next + 3; +``` + +### Separate Format (`LIT_MEM`) + +When `LIT_MEM` is defined (automatic when `OPTIMAL_CMP < 32`): +```c +uint16_t *d_buf; // Distance buffer +unsigned char *l_buf; // Literal/length buffer + +// zng_tr_tally_dist: +s->d_buf[sym_next] = (uint16_t)dist; +s->l_buf[sym_next] = (uint8_t)len; +s->sym_next = sym_next + 1; +``` + +This uses ~20% more memory but is 1–2% faster on platforms without fast +unaligned access. + +The buffer size is `lit_bufsize` entries. When `sym_next` reaches `sym_end`, +the block is flushed. The constant `LIT_BUFS` determines the buffer +multiplier: 4 (overlaid) or 5 (separate). + +--- + +## Block Flushing + +`zng_tr_flush_block()` in `trees.c` decides how to emit the accumulated +symbols: + +1. **Compute tree statistics**: Build dynamic Huffman trees, compute + `opt_len` (dynamic tree bit cost) and `static_len` (static tree bit cost) +2. **Compute stored cost**: Raw data length + 5 bytes overhead per block +3. **Choose the best**: + - If stored cost ≤ `opt_len` and stored cost ≤ `static_len`: emit stored block + - Else if `static_len` ≤ `opt_len + 10`: emit with static trees + - Else: emit with dynamic trees + +```c +void zng_tr_flush_block(deflate_state *s, char *buf, uint32_t stored_len, int last) { + build_tree(s, &s->l_desc); + build_tree(s, &s->d_desc); + + if (stored_len + 4 <= opt_len && buf != NULL) { + // Stored block + } else if (s->strategy == Z_FIXED || static_len == opt_len) { + // Static trees + compress_block(s, static_ltree, static_dtree); + } else { + // Dynamic trees + send_all_trees(s, lcodes, dcodes, blcodes); + compress_block(s, s->dyn_ltree, s->dyn_dtree); + } + init_block(s); // Reset for next block +} +``` + +--- + +## Match Length Constraints + +| Constant | Value | Purpose | +|---|---|---| +| `STD_MIN_MATCH` | 3 | Minimum match length per DEFLATE spec | +| `STD_MAX_MATCH` | 258 | Maximum match length per DEFLATE spec | +| `WANT_MIN_MATCH` | 4 | Internal minimum for actual match output | +| `MIN_LOOKAHEAD` | `STD_MAX_MATCH + WANT_MIN_MATCH + 1` | Minimum lookahead before refilling window | +| `MAX_DIST(s)` | `s->w_size - MIN_LOOKAHEAD` | Maximum back-reference distance | + +`WANT_MIN_MATCH = 4` is a performance optimisation: 3-byte matches provide +minimal compression benefit but cost significant CPU time to find. By +requiring 4-byte matches, the hash function can read a full 32-bit word +and the match finder can use 32-bit comparisons for faster rejection. + +--- + +## Block Types in Deflate Output + +Every DEFLATE block starts with a 3-bit header: +- Bit 0: `BFINAL` — 1 if this is the last block +- Bits 1–2: `BTYPE` — block type (00=stored, 01=fixed, 10=dynamic, 11=reserved) + +### Stored Block (BTYPE=00) + +``` +BFINAL BTYPE pad LEN NLEN DATA + 1 00 0-7 16 16 LEN bytes +``` + +### Fixed Huffman Block (BTYPE=01) + +Uses predefined static trees (`static_ltree`, `static_dtree`). No tree +data in the block — the decoder knows the fixed codes. + +### Dynamic Huffman Block (BTYPE=10) + +Contains the tree definition before the data: +``` +BFINAL BTYPE HLIT HDIST HCLEN [Code lengths for code length alphabet] +[Encoded literal/length code lengths] [Encoded distance code lengths] +[Compressed data using the defined trees] +``` + +--- + +## Flush Modes + +The `flush` parameter to `deflate()` controls output behaviour: + +| Value | Name | Effect | +|---|---|---| +| 0 | `Z_NO_FLUSH` | Normal operation; deflate decides when to emit output | +| 1 | `Z_PARTIAL_FLUSH` | Flush output, emit empty fixed code block (10 bits) | +| 2 | `Z_SYNC_FLUSH` | Flush output, align to byte, emit stored empty block (00 00 FF FF) | +| 3 | `Z_FULL_FLUSH` | Like `Z_SYNC_FLUSH` but also resets compression state for random access | +| 4 | `Z_FINISH` | Complete the stream. Returns `Z_STREAM_END` when done | +| 5 | `Z_BLOCK` | Stop at next block boundary | +| 6 | `Z_TREES` | Like `Z_BLOCK` but also emit trees (for `Z_TREES` flush) | + +Flush priority is ordered via the `RANK()` macro: +```c +#define RANK(f) (((f) * 2) - ((f) > 4 ? 9 : 0)) +``` + +--- + +## The `block_state` Return Values + +Each strategy function returns a `block_state`: + +```c +typedef enum { + need_more, // Need more input or more output space + block_done, // Block was flushed + finish_started, // Z_FINISH started, need more output calls + finish_done // Z_FINISH complete +} block_state; +``` + +`deflate()` uses these to control its outer loop and determine when to +return to the caller. |
