# Deflate Algorithms ## Overview The DEFLATE algorithm (RFC 1951) combines **LZ77** sliding-window compression with **Huffman coding**. Neozip implements DEFLATE through a modular strategy system where each compression level maps to a specific strategy function with tuned parameters. This document covers every aspect of the compression pipeline: hash chains, match finding, lazy evaluation, and the strategy functions. --- ## The DEFLATE Compression Pipeline At a high level, DEFLATE works as follows: 1. **Sliding window**: Input is processed through a 32KB (configurable) sliding window. Each position is hashed and inserted into a hash table. 2. **LZ77 match finding**: For each position, the hash table is consulted to find previous occurrences of the same byte sequence. The longest match within the window distance is selected. 3. **Symbol emission**: Either a **literal** byte (no match found) or a **length/distance pair** (match found) is recorded. 4. **Huffman coding**: Accumulated symbols are encoded with Huffman codes (static, dynamic, or the block is stored uncompressed) and output as a DEFLATE block. ``` Input bytes │ ▼ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │ Sliding │ ──▶ │ Hash Insert │ ──▶ │ Match Finding│ ──▶ │ Symbol │ │ Window │ │ (head/prev) │ │ (longest_ │ │ Buffer │ │ (32K×2) │ │ │ │ match) │ │ (sym_buf) │ └──────────┘ └──────────────┘ └──────────────┘ └─────┬──────┘ │ ▼ ┌────────────┐ │ Huffman │ │ Encoding │ │ (trees.c) │ └─────┬──────┘ │ ▼ Compressed Output ``` --- ## The Sliding Window The sliding window is a byte array of size `2 * w_size`, where `w_size` defaults to 32768 (32KB, corresponding to `windowBits = 15`): ```c unsigned char *window; // Size: 2 * w_size bytes unsigned int window_size; // = 2 * w_size unsigned int w_size; // = 1 << windowBits (default 32768) ``` Input is read into the **upper half** of the window (positions `w_size` to `2*w_size - 1`). When this half fills up, the lower half is discarded, the upper half is moved to the lower half (`memcpy` or `memmove`), and new input fills the upper half again. This is the "slide" operation. ### Window Sliding The `fill_window()` function in `deflate.c` performs the slide: 1. If more data is needed and `strstart >= w_size + MAX_DIST(s)`: - Copy bytes `[w_size, 2*w_size)` to `[0, w_size)` - Adjust `match_start`, `strstart`, `block_start` by `-w_size` - Call `FUNCTABLE_CALL(slide_hash)(s)` to update hash table entries 2. Read new input from `strm->next_in` into the free space 3. Maintain the `high_water` mark for memory-check safety ### `slide_hash()` When the window slides, all hash table entries (`head[]` and `prev[]`) must be decremented by `w_size`. Entries that would become negative (pointing before the new window start) are set to zero. The generic C implementation: ```c void slide_hash_c(deflate_state *s) { Pos *p; unsigned n = HASH_SIZE; p = &s->head[n]; do { unsigned m = *--p; *p = (Pos)(m >= w_size ? m - w_size : 0); } while (--n); // Same for prev[] } ``` SIMD implementations process 8 or 16 entries at a time using vector subtraction and saturation. --- ## Hash Table Structure The hash table maps byte sequences to window positions for fast match lookup. It consists of two arrays: ### `head[HASH_SIZE]` An array of `Pos` (uint16_t) values, indexed by hash value. Each entry contains the most recent window position that hashed to that index. ```c #define HASH_BITS 16u #define HASH_SIZE 65536u // 2^16 #define HASH_MASK (HASH_SIZE - 1u) Pos *head; // head[HASH_SIZE]: most recent position for each hash ``` Note: zlib-ng uses a 16-bit hash (65536 entries) compared to original zlib's 15-bit hash (32768 entries), providing better distribution and fewer collisions. ### `prev[w_size]` An array maintaining the hash **chain** — a linked list of all positions that share the same hash value: ```c Pos *prev; // prev[w_size]: chain links, indexed by (position & w_mask) ``` When a new position `str` is inserted for hash value `h`: ```c prev[str & w_mask] = head[h]; // Link to previous chain head head[h] = str; // New chain head ``` Walking the chain from `head[h]` through `prev[]` yields all previous positions with the same hash, in reverse chronological order. --- ## Hash Function Neozip hashes 4 bytes at the current position (compared to zlib's 3 bytes). The hash function is defined in `insert_string_tpl.h`: ```c Z_FORCEINLINE static uint32_t UPDATE_HASH(uint32_t h, uint32_t val) { HASH_CALC(h, val); // Architecture-specific hash computation return h & HASH_CALC_MASK; // Mask to HASH_SIZE } ``` The `HASH_CALC` macro uses a fast multiplicative hash or CRC-based hash depending on the configuration. Reading 4 bytes at once (`zng_memread_4`) provides better hash distribution than 3-byte hashing. ### Hash Insert Operations Several insert variants exist: #### `quick_insert_value()` Used by `deflate_fast` and `deflate_quick`. Inserts a single position with a pre-read 4-byte value: ```c Z_FORCEINLINE static uint32_t QUICK_INSERT_VALUE( deflate_state *const s, uint32_t str, uint32_t val) { uint32_t hm, head; HASH_CALC_VAR_INIT; HASH_CALC(HASH_CALC_VAR, val); HASH_CALC_VAR &= HASH_CALC_MASK; hm = HASH_CALC_VAR; head = s->head[hm]; if (LIKELY(head != str)) { s->prev[str & W_MASK(s)] = (Pos)head; s->head[hm] = (Pos)str; } return head; } ``` #### `quick_insert_string()` Like `quick_insert_value()` but reads the 4 bytes itself: ```c Z_FORCEINLINE static uint32_t QUICK_INSERT_STRING( deflate_state *const s, uint32_t str) { uint8_t *strstart = s->window + str + HASH_CALC_OFFSET; uint32_t val, hm, head; HASH_CALC_VAR_INIT; HASH_CALC_READ; // val = Z_U32_FROM_LE(zng_memread_4(strstart)) HASH_CALC(HASH_CALC_VAR, val); // ... insert ... return head; } ``` #### `insert_string()` Batch insert. Inserts `count` consecutive positions, used after a match to insert all strings within the matched region: ```c void insert_string(deflate_state *const s, uint32_t str, uint32_t count); ``` #### `insert_string_roll()` / `quick_insert_string_roll()` Rolling hash insert for level 9 (`deflate_slow` with `LONGEST_MATCH_SLOW`). Uses a different hash update that considers the full string context for better match quality at the cost of speed. --- ## Match Finding ### `longest_match()` (Standard) Defined via the `match_tpl.h` template. This is the hot inner loop of compression. **Algorithm**: 1. Set `best_len` to `prev_length` (or `STD_MIN_MATCH - 1`) 2. Read the first 8 bytes at `scan` and at `scan + offset` for fast comparison (where `offset = best_len - 1`, adjusted for word boundaries) 3. Set `limit` to prevent matches beyond `MAX_DIST(s)` 4. If `best_len >= good_match`, halve `chain_length` to speed up 5. Walk the hash chain via `prev[]`: a. For each candidate `cur_match`: - Quick reject: compare 8 bytes at match end position (`mbase_end + cur_match`) against `scan_end` - Quick reject: compare 8 bytes at match start against `scan_start` - If both match: call `compare256()` for exact length - If new length > `best_len`: update `best_len` and `match_start` - If `best_len >= nice_match`: return immediately - Update `scan_end` for the new best length b. Follow `prev[cur_match & wmask]` to next chain entry c. Stop when `chain_length` exhausted or `cur_match <= limit` ```c Z_INTERNAL uint32_t LONGEST_MATCH(deflate_state *const s, uint32_t cur_match) { // Early exit: if chain becomes too deep for poor matches int32_t early_exit = s->level < EARLY_EXIT_TRIGGER_LEVEL; while (--chain_length) { match = mbase_start + cur_match; if (zng_memread_8(mbase_end + cur_match) == scan_end && zng_memread_8(match) == scan_start) { len = FUNCTABLE_CALL(compare256)(scan + 2, match + 2) + 2; if (len > best_len) { s->match_start = cur_match; best_len = len; if (best_len >= nice_match) return best_len; offset = best_len - 1; // Update scan_end } } cur_match = prev[cur_match & wmask]; if (cur_match <= limit) return best_len; } } ``` **Key parameters** that control match-finding behaviour: | Parameter | Field | Effect | |---|---|---| | `max_chain_length` | `s->max_chain_length` | Maximum hash chain entries to check | | `good_match` | `s->good_match` | Halve chain search above this best length | | `nice_match` | `s->nice_match` | Stop searching above this length | | `max_lazy_match` | `s->max_lazy_match` | Don't bother with lazy match above this | ### `longest_match_slow()` (Level 9) The slow variant (`LONGEST_MATCH_SLOW`) adds chain re-rooting for better match quality: 1. When continuing a lazy evaluation search, it doesn't just follow the chain from the hash of the current position 2. Instead, it finds the most distant chain starting from positions `scan[1], scan[2], ..., scan[best_len]` using `update_hash_roll()` 3. This effectively searches multiple hash chains to find matches that the standard algorithm would miss ```c // Re-root: find a more distant chain start hash = update_hash_roll(0, scan[1]); hash = update_hash_roll(hash, scan[2]); for (uint32_t i = 3; i <= best_len; i++) { hash = update_hash_roll(hash, scan[i]); pos = s->head[hash]; if (pos < cur_match) { cur_match = pos; // Found a more distant starting point } } ``` ### `compare256()` Compares up to 256 bytes between two pointers, returning the number of matching bytes. Architecture-specific implementations: | Implementation | File | Method | |---|---|---| | `compare256_c` | `arch/generic/compare256_c.c` | 8-byte word comparison | | `compare256_sse2` | `arch/x86/compare256_sse2.c` | 16-byte SSE2 comparison | | `compare256_avx2` | `arch/x86/compare256_avx2.c` | 32-byte AVX2 comparison | | `compare256_avx512` | `arch/x86/compare256_avx512.c` | 64-byte AVX-512 comparison | | `compare256_neon` | `arch/arm/compare256_neon.c` | 16-byte NEON comparison | The compare function uses `FUNCTABLE_CALL(compare256)` for dispatch. --- ## Configuration Table Each compression level has a set of tuning parameters defined in `deflate.c`: ```c typedef struct config_s { uint16_t good_length; // Reduce lazy search above this match length uint16_t max_lazy; // Do not perform lazy search above this length uint16_t nice_length; // Quit search above this match length uint16_t max_chain; // Maximum hash chain length compress_func func; // Strategy function pointer } config; ``` The full table: | Level | good | lazy | nice | chain | Strategy | |---|---|---|---|---|---| | 0 | 0 | 0 | 0 | 0 | `deflate_stored` | | 1 | 0 | 0 | 0 | 0 | `deflate_quick` | | 2 | 4 | 4 | 8 | 4 | `deflate_fast` | | 3 | 4 | 6 | 16 | 6 | `deflate_medium` | | 4 | 4 | 12 | 32 | 24 | `deflate_medium` | | 5 | 8 | 16 | 32 | 32 | `deflate_medium` | | 6 | 8 | 16 | 128 | 128 | `deflate_medium` | | 7 | 8 | 32 | 128 | 256 | `deflate_slow` | | 8 | 32 | 128 | 258 | 1024 | `deflate_slow` | | 9 | 32 | 258 | 258 | 4096 | `deflate_slow` | When `NO_QUICK_STRATEGY` is defined, level 1 uses `deflate_fast` and level 2 shifts accordingly. When `NO_MEDIUM_STRATEGY` is defined, levels 3–6 use `deflate_fast` or `deflate_slow`. --- ## Strategy Functions in Detail ### `deflate_stored` (Level 0) No compression. Input is emitted as stored blocks (type 0 in DEFLATE): - Each stored block has a 5-byte header: BFINAL (1 bit), BTYPE (2 bits = 00), padding to byte boundary, LEN (16 bits), NLEN (16 bits, one's complement of LEN) - Maximum stored block length: 65535 bytes (`MAX_STORED`) - Directly copies from `next_in` to `next_out` when possible - Falls back to buffering through the window when direct copy isn't feasible ```c Z_INTERNAL block_state deflate_stored(deflate_state *s, int flush) { unsigned min_block = MIN(s->pending_buf_size - 5, w_size); // ... len = MAX_STORED; have = (s->bi_valid + 42) >> 3; // Header overhead // Copy blocks directly when possible } ``` ### `deflate_quick` (Level 1) The fastest compression strategy, designed by Intel: - Uses **static Huffman trees** only (no dynamic tree construction) - Single-pass greedy matching with no lazy evaluation - Emits blocks via `zng_tr_emit_tree(s, STATIC_TREES, last)` and `zng_tr_emit_end_block(s, static_ltree, last)` - Tracks block state: `block_open` = 0 (closed), 1 (open), 2 (open + last) - Flushes when `pending` approaches `pending_buf_size` ```c Z_INTERNAL block_state deflate_quick(deflate_state *s, int flush) { // Start static block quick_start_block(s, last); for (;;) { // Flush if pending buffer nearly full if (s->pending + ((BIT_BUF_SIZE + 7) >> 3) >= s->pending_buf_size) { PREFIX(flush_pending)(s->strm); // ... } // Hash insert uint32_t hash_head = quick_insert_value(s, s->strstart, str_val); // Try to find a match if (dist <= MAX_DIST(s) && dist > 0) { // Quick match comparison... if (match_val == str_val) { // Full match via longest_match or inline compare zng_tr_emit_dist(s, static_ltree, static_dtree, ...); } } // No match: emit literal zng_tr_emit_lit(s, static_ltree, lc); } } ``` ### `deflate_fast` (Level 2, or 1–3 without quick) Greedy matching without lazy evaluation: 1. `fill_window()` if `lookahead < MIN_LOOKAHEAD` 2. Hash insert via `quick_insert_value()` 3. If match found (within `MAX_DIST`, length ≥ `WANT_MIN_MATCH`): - Record match via `zng_tr_tally_dist()` - Insert all strings within the match region - Advance `strstart` by `match_length` 4. If no match: - Record literal via `zng_tr_tally_lit()` - Advance `strstart` by 1 5. If tally function returns true (buffer full): flush block ```c Z_INTERNAL block_state deflate_fast(deflate_state *s, int flush) { for (;;) { if (s->lookahead < MIN_LOOKAHEAD) { PREFIX(fill_window)(s); // ... } if (s->lookahead >= WANT_MIN_MATCH) { uint32_t hash_head = quick_insert_value(s, s->strstart, str_val); if (dist <= MAX_DIST(s) && dist > 0 && hash_head != 0) { match_len = FUNCTABLE_CALL(longest_match)(s, hash_head); } } if (match_len >= WANT_MIN_MATCH) { bflush = zng_tr_tally_dist(s, s->strstart - s->match_start, match_len - STD_MIN_MATCH); s->lookahead -= match_len; // Insert strings within match s->strstart += match_len; } else { bflush = zng_tr_tally_lit(s, lc); s->lookahead--; s->strstart++; } if (UNLIKELY(bflush)) FLUSH_BLOCK(s, 0); } } ``` ### `deflate_medium` (Levels 3–6) Intel's balanced strategy that bridges fast and slow: Uses a `struct match` to track match attributes: ```c struct match { uint16_t match_start; uint16_t match_length; uint16_t strstart; uint16_t orgstart; }; ``` Key helper functions: - **`find_best_match()`** — Calls `longest_match()` and returns a `struct match` - **`emit_match()`** — Emits literals for short matches (< `WANT_MIN_MATCH`) or a length/distance pair for longer ones - **`insert_match()`** — Inserts hash entries for matched positions The algorithm maintains a two-position lookahead: 1. Find best match at current position 2. Find best match at next position 3. Compare: if next match is better by a meaningful margin, emit current position as a literal and adopt the next match 4. Otherwise, emit the current match ```c Z_INTERNAL block_state deflate_medium(deflate_state *s, int flush) { struct match current_match, next_match; for (;;) { current_match = find_best_match(s, hash_head); if (current_match.match_length >= WANT_MIN_MATCH) { // Check if next position has a better match next_match = find_best_match(s, next_hash_head); if (next_match.match_length > current_match.match_length + 1) { // Skip current, use next emit_match(s, literal_match); insert_match(s, literal_match); continue; } } emit_match(s, current_match); insert_match(s, current_match); } } ``` ### `deflate_slow` (Levels 7–9) Full lazy match evaluation — the traditional approach for maximum compression: 1. Find longest match at current position 2. **Do not emit it yet** — set `match_available = 1` 3. Advance to next position, find another match 4. If the new match is **not better** than the previous one: - Emit the previous match (the "lazy" match) - Insert all strings within the matched region - Skip past the match 5. If the new match **is better**: - Emit the previous position as a literal - Record the new match as `match_available` - Continue from step 3 ```c Z_INTERNAL block_state deflate_slow(deflate_state *s, int flush) { // Level ≥ 9: use slow match finder and rolling insert if (level >= 9) { longest_match = FUNCTABLE_FPTR(longest_match_slow); insert_string_func = insert_string_roll; } for (;;) { // Find match if (dist <= MAX_DIST(s) && s->prev_length < s->max_lazy_match) { match_len = longest_match(s, hash_head); } // Lazy evaluation if (s->prev_length >= STD_MIN_MATCH && match_len <= s->prev_length) { // Previous match was better — emit it bflush = zng_tr_tally_dist(s, s->strstart - 1 - s->prev_match, s->prev_length - STD_MIN_MATCH); // Insert strings within match s->prev_length -= 1; s->lookahead -= s->prev_length; s->strstart += s->prev_length; s->prev_length = 0; s->match_available = 0; } else if (s->match_available) { // Previous position has no good match — emit as literal bflush = zng_tr_tally_lit(s, s->window[s->strstart - 1]); } else { s->match_available = 1; } s->prev_length = match_len; s->prev_match = s->match_start; } } ``` **Level 9 enhancements**: - Uses `longest_match_slow` which re-roots hash chains for deeper search - Uses `insert_string_roll` with a rolling hash for better distribution - `max_chain = 4096` provides the deepest chain traversal - `nice_match = 258` (maximum match length) means it never gives up early ### `deflate_huff` (Z_HUFFMAN_ONLY) Huffman-only compression — every byte is emitted as a literal, no LZ77: ```c Z_INTERNAL block_state deflate_huff(deflate_state *s, int flush) { for (;;) { if (s->lookahead == 0) { PREFIX(fill_window)(s); if (s->lookahead == 0) { if (flush == Z_NO_FLUSH) return need_more; break; } } bflush = zng_tr_tally_lit(s, s->window[s->strstart]); s->lookahead--; s->strstart++; if (bflush) FLUSH_BLOCK(s, 0); } } ``` This forces construction of a Huffman tree that encodes only literal frequencies. Useful when the data is already compressed or random. ### `deflate_rle` (Z_RLE) Run-length encoding — only finds runs of identical bytes (distance = 1): ```c Z_INTERNAL block_state deflate_rle(deflate_state *s, int flush) { for (;;) { // Check for a run: scan[-1] == scan[0] == scan[1] if (s->lookahead >= STD_MIN_MATCH && s->strstart > 0) { scan = s->window + s->strstart - 1; if (scan[0] == scan[1] && scan[1] == scan[2]) { match_len = compare256_rle(scan, scan + 3) + 2; match_len = MIN(match_len, STD_MAX_MATCH); } } if (match_len >= STD_MIN_MATCH) { bflush = zng_tr_tally_dist(s, 1, match_len - STD_MIN_MATCH); } else { bflush = zng_tr_tally_lit(s, s->window[s->strstart]); } } } ``` The `compare256_rle()` function is optimised for the RLE case where all bytes in the run are identical: ```c // compare256_rle.h static inline uint32_t compare256_rle_64(const uint8_t *src0, const uint8_t *src1) { // Read 8-byte words from src0, replicate src1[0] across 8 bytes // XOR and count trailing zeros to find first difference } ``` --- ## Symbol Buffer Matches and literals are stored in a symbol buffer before being emitted: ### Overlaid Format (`sym_buf`, default) Three bytes per symbol: - `sym_buf[i+0]`, `sym_buf[i+1]` — distance (0 for literal) - `sym_buf[i+2]` — literal byte or match length ```c // zng_tr_tally_dist: zng_memwrite_4(&s->sym_buf[sym_next], Z_U32_TO_LE(dist | ((uint32_t)len << 16))); s->sym_next = sym_next + 3; // zng_tr_tally_lit: zng_memwrite_4(&s->sym_buf[sym_next], Z_U32_TO_LE((uint32_t)c << 16)); s->sym_next = sym_next + 3; ``` ### Separate Format (`LIT_MEM`) When `LIT_MEM` is defined (automatic when `OPTIMAL_CMP < 32`): ```c uint16_t *d_buf; // Distance buffer unsigned char *l_buf; // Literal/length buffer // zng_tr_tally_dist: s->d_buf[sym_next] = (uint16_t)dist; s->l_buf[sym_next] = (uint8_t)len; s->sym_next = sym_next + 1; ``` This uses ~20% more memory but is 1–2% faster on platforms without fast unaligned access. The buffer size is `lit_bufsize` entries. When `sym_next` reaches `sym_end`, the block is flushed. The constant `LIT_BUFS` determines the buffer multiplier: 4 (overlaid) or 5 (separate). --- ## Block Flushing `zng_tr_flush_block()` in `trees.c` decides how to emit the accumulated symbols: 1. **Compute tree statistics**: Build dynamic Huffman trees, compute `opt_len` (dynamic tree bit cost) and `static_len` (static tree bit cost) 2. **Compute stored cost**: Raw data length + 5 bytes overhead per block 3. **Choose the best**: - If stored cost ≤ `opt_len` and stored cost ≤ `static_len`: emit stored block - Else if `static_len` ≤ `opt_len + 10`: emit with static trees - Else: emit with dynamic trees ```c void zng_tr_flush_block(deflate_state *s, char *buf, uint32_t stored_len, int last) { build_tree(s, &s->l_desc); build_tree(s, &s->d_desc); if (stored_len + 4 <= opt_len && buf != NULL) { // Stored block } else if (s->strategy == Z_FIXED || static_len == opt_len) { // Static trees compress_block(s, static_ltree, static_dtree); } else { // Dynamic trees send_all_trees(s, lcodes, dcodes, blcodes); compress_block(s, s->dyn_ltree, s->dyn_dtree); } init_block(s); // Reset for next block } ``` --- ## Match Length Constraints | Constant | Value | Purpose | |---|---|---| | `STD_MIN_MATCH` | 3 | Minimum match length per DEFLATE spec | | `STD_MAX_MATCH` | 258 | Maximum match length per DEFLATE spec | | `WANT_MIN_MATCH` | 4 | Internal minimum for actual match output | | `MIN_LOOKAHEAD` | `STD_MAX_MATCH + WANT_MIN_MATCH + 1` | Minimum lookahead before refilling window | | `MAX_DIST(s)` | `s->w_size - MIN_LOOKAHEAD` | Maximum back-reference distance | `WANT_MIN_MATCH = 4` is a performance optimisation: 3-byte matches provide minimal compression benefit but cost significant CPU time to find. By requiring 4-byte matches, the hash function can read a full 32-bit word and the match finder can use 32-bit comparisons for faster rejection. --- ## Block Types in Deflate Output Every DEFLATE block starts with a 3-bit header: - Bit 0: `BFINAL` — 1 if this is the last block - Bits 1–2: `BTYPE` — block type (00=stored, 01=fixed, 10=dynamic, 11=reserved) ### Stored Block (BTYPE=00) ``` BFINAL BTYPE pad LEN NLEN DATA 1 00 0-7 16 16 LEN bytes ``` ### Fixed Huffman Block (BTYPE=01) Uses predefined static trees (`static_ltree`, `static_dtree`). No tree data in the block — the decoder knows the fixed codes. ### Dynamic Huffman Block (BTYPE=10) Contains the tree definition before the data: ``` BFINAL BTYPE HLIT HDIST HCLEN [Code lengths for code length alphabet] [Encoded literal/length code lengths] [Encoded distance code lengths] [Compressed data using the defined trees] ``` --- ## Flush Modes The `flush` parameter to `deflate()` controls output behaviour: | Value | Name | Effect | |---|---|---| | 0 | `Z_NO_FLUSH` | Normal operation; deflate decides when to emit output | | 1 | `Z_PARTIAL_FLUSH` | Flush output, emit empty fixed code block (10 bits) | | 2 | `Z_SYNC_FLUSH` | Flush output, align to byte, emit stored empty block (00 00 FF FF) | | 3 | `Z_FULL_FLUSH` | Like `Z_SYNC_FLUSH` but also resets compression state for random access | | 4 | `Z_FINISH` | Complete the stream. Returns `Z_STREAM_END` when done | | 5 | `Z_BLOCK` | Stop at next block boundary | | 6 | `Z_TREES` | Like `Z_BLOCK` but also emit trees (for `Z_TREES` flush) | Flush priority is ordered via the `RANK()` macro: ```c #define RANK(f) (((f) * 2) - ((f) > 4 ? 9 : 0)) ``` --- ## The `block_state` Return Values Each strategy function returns a `block_state`: ```c typedef enum { need_more, // Need more input or more output space block_done, // Block was flushed finish_started, // Z_FINISH started, need more output calls finish_done // Z_FINISH complete } block_state; ``` `deflate()` uses these to control its outer loop and determine when to return to the caller.