# Inflate Engine ## Overview The inflate engine decompresses DEFLATE-format data (RFC 1951) wrapped in either a zlib container (RFC 1950), a gzip container (RFC 1952), or with no wrapper (raw deflate). It is implemented as a state machine in `inflate.c`, with a hot inner loop in `inffast_tpl.h` and table-building in `inftrees.c`. The engine processes input byte-by-byte through a 64-bit bit accumulator (`hold`), transitioning between states as it parses headers, decodes Huffman codes, copies back-referenced data, and verifies integrity checksums. --- ## State Machine The `inflate_mode` enum defines all possible states: ```c typedef enum { HEAD = 16180, // Waiting for magic header FLAGS, // Waiting for method and flags (gzip) TIME, // Waiting for modification time (gzip) OS, // Waiting for extra flags and OS (gzip) EXLEN, // Waiting for extra length (gzip) EXTRA, // Waiting for extra bytes (gzip) NAME, // Waiting for end of file name (gzip) COMMENT, // Waiting for end of comment (gzip) HCRC, // Waiting for header CRC (gzip) DICTID, // Waiting for dictionary check value DICT, // Waiting for inflateSetDictionary() call TYPE, // Waiting for type bits (including last-flag bit) TYPEDO, // Same as TYPE but skip check to exit on new block STORED, // Waiting for stored size (length and complement) COPY_, // Waiting for input or output to copy stored block (first time) COPY, // Waiting for input or output to copy stored block TABLE, // Waiting for dynamic block table lengths LENLENS, // Waiting for code length code lengths CODELENS, // Waiting for length/lit and distance code lengths LEN_, // Waiting for length/lit/eob code (first time) LEN, // Waiting for length/lit/eob code LENEXT, // Waiting for length extra bits DIST, // Waiting for distance code DISTEXT, // Waiting for distance extra bits MATCH, // Waiting for output space to copy string LIT, // Waiting for output space to write literal CHECK, // Waiting for 32-bit check value LENGTH, // Waiting for 32-bit length (gzip) DONE, // Finished check, remain here until reset BAD, // Got a data error, remain here until reset SYNC // Looking for synchronization bytes } inflate_mode; ``` The starting value `HEAD = 16180` (a non-zero, distinctive value) helps catch uninitialised state errors. --- ## State Transitions ### Header Processing ``` HEAD ─┬─ (gzip header detected) ──▶ FLAGS ─▶ TIME ─▶ OS ─▶ EXLEN ─▶ EXTRA │ ─▶ NAME ─▶ COMMENT ─▶ HCRC ─▶ TYPE │ ├─ (zlib header detected) ──▶ DICTID ─▶ DICT ─▶ TYPE │ or │ ──▶ TYPE (no dictionary) │ └─ (raw deflate) ──────────▶ TYPEDO ``` ### Block Processing ``` TYPE ──▶ TYPEDO ─┬─ (stored block) ──▶ STORED ─▶ COPY_ ─▶ COPY ──▶ TYPE │ ├─ (dynamic block) ──▶ TABLE ─▶ LENLENS ─▶ CODELENS ─▶ LEN_ │ ├─ (fixed block) ──▶ LEN_ │ └─ (last block) ──▶ CHECK ``` ### Data Decoding ``` LEN_ ──▶ LEN ─┬─ (literal) ──▶ LIT ──▶ LEN │ ├─ (length code) ──▶ LENEXT ──▶ DIST ──▶ DISTEXT ──▶ MATCH ──▶ LEN │ └─ (end-of-block) ──▶ TYPE (or CHECK if last block) ``` ### Trailer Processing ``` CHECK ──▶ LENGTH (gzip only) ──▶ DONE ``` --- ## The `inflate_state` Structure ```c struct ALIGNED_(64) inflate_state { // Stream back-pointer PREFIX3(stream) *strm; // State machine inflate_mode mode; // Current state int last; // true if processing last block int wrap; // bit 0: zlib, bit 1: gzip, bit 2: validate check int havedict; // Dictionary provided? int flags; // gzip header flags (-1 = no header yet, 0 = zlib) // Integrity unsigned was; // Initial match length for inflateMark unsigned long check; // Running checksum (Adler-32 or CRC-32) unsigned long total; // Running output byte count PREFIX(gz_headerp) head;// Where to save gzip header info // Sliding window unsigned wbits; // log2(requested window size) uint32_t wsize; // Window size (or 0 if not allocated) uint32_t wbufsize; // Real allocated window size including padding uint32_t whave; // Valid bytes in window uint32_t wnext; // Window write index unsigned char *window; // Sliding window buffer // Bit accumulator uint64_t hold; // 64-bit input bit accumulator unsigned bits; // Bits currently held // Code tables unsigned lenbits; // Root bits for length code table code const *lencode; // Length/literal code table pointer code const *distcode; // Distance code table pointer unsigned distbits; // Root bits for distance code table // Current match state uint32_t length; // Literal value or copy length unsigned offset; // Copy distance unsigned extra; // Extra bits to read // Dynamic table building unsigned ncode; // Code length code lengths count unsigned nlen; // Length code count unsigned ndist; // Distance code count uint32_t have; // Code lengths decoded so far code *next; // Next free slot in codes[] // Working storage uint16_t lens[320]; // Code lengths (max 286 + 30 + padding) uint16_t work[288]; // Work area for inflate_table code codes[ENOUGH]; // Code tables (ENOUGH = 1924 entries) inflate_allocs *alloc_bufs; }; ``` ### Key Dimensions | Constant | Value | Purpose | |---|---|---| | `ENOUGH` | 1924 | Maximum total code table entries | | `ENOUGH_LENS` | 1332 | Maximum literal/length table entries | | `ENOUGH_DISTS` | 592 | Maximum distance table entries | | `MAX_WBITS` | 15 | Maximum window bits (32KB window) | | `MIN_WBITS` | 8 | Minimum window bits (256-byte window) | --- ## Bit Accumulator The inflate engine uses a 64-bit bit buffer for efficient bit extraction: ```c uint64_t hold; // Accumulated bits (LSB first) unsigned bits; // Number of valid bits in hold ``` Bits are read from `hold` from the least significant end: ```c // Load bits from input #define PULLBYTE() do { \ hold += (uint64_t)(*next++) << bits; \ bits += 8; \ } while (0) // Ensure at least n bits are available #define NEEDBITS(n) do { \ while (bits < (unsigned)(n)) \ PULLBYTE(); \ } while (0) // Read n bits from hold #define BITS(n) ((unsigned)hold & ((1U << (n)) - 1)) // Drop n bits from hold #define DROPBITS(n) do { \ hold >>= (n); \ bits -= (unsigned)(n); \ } while (0) ``` The 64-bit width allows accumulating up to 8 bytes before overflow, reducing the frequency of input reads. --- ## Huffman Code Decoding ### The `code` Structure Each entry in a decoding table is a `code` structure: ```c typedef struct { unsigned char bits; // Bits to consume from input unsigned char op; // Operation type + extra bits uint16_t val; // Output value or table offset } code; ``` The `op` field encodes the meaning: | `op` Value | Meaning | |---|---| | `0x00` | Literal: `val` is the literal byte | | `0x0t` (t ≠ 0) | Table link: `t` is the number of additional index bits | | `0x1e` | Length or distance: `e` extra bits, `val` is the base value | | `0x60` | End of block | | `0x40` | Invalid code | ### Table Building via `zng_inflate_table()` `inftrees.c` builds two-level Huffman decoding tables: ```c int zng_inflate_table(codetype type, uint16_t *lens, unsigned codes, code **table, unsigned *bits, uint16_t *work); ``` Parameters: - `type` — `CODES` (code lengths), `LENS` (literal/length), or `DISTS` (distances) - `lens` — Array of code lengths for each symbol - `codes` — Number of symbols - `table` — Output: pointer to the root table - `bits` — Input: requested root table bits; Output: actual root table bits - `work` — Scratch space The algorithm: 1. Count code lengths → `count[]` 2. Compute offsets for sorting → `offs[]` 3. Sort symbols by code length → `work[]` 4. Fill the primary table (root bits = `*bits`) 5. For codes longer than root bits, create sub-tables Root table sizes: - Literal/length: 10 bits (1024 entries) - Distance: 9 bits (512 entries) Sub-tables handle codes longer than the root bits, linked from the primary table via the `op` field. ### Fixed Huffman Tables For fixed-code blocks (BTYPE=01), predefined tables are stored in `inffixed_tbl.h`. These are computed once and reused. The `fixedtables()` function in `inflate.c` sets up the fixed tables: ```c state->lencode = lenfix; // Predefined literal/length table state->lenbits = 9; // Root bits for fixed literal/length codes state->distcode = distfix; // Predefined distance table state->distbits = 5; // Root bits for fixed distance codes ``` --- ## The Inflate Loop The main `inflate()` function is a single large `for` loop with a `switch` on `state->mode`: ```c int32_t Z_EXPORT PREFIX(inflate)(PREFIX3(stream) *strm, int32_t flush) { // Input/output tracking unsigned char *put = strm->next_out; unsigned char *next = strm->next_in; uint64_t hold = state->hold; unsigned bits = state->bits; for (;;) switch (state->mode) { case HEAD: // Detect zlib/gzip/raw header NEEDBITS(16); if (state->wrap == 0) { // Raw deflate state->mode = TYPEDO; break; } if ((hold & 0xff) == 0x1f && ((hold >> 8) & 0xff) == 0x8b) { // gzip header state->mode = FLAGS; } else { // zlib header (CMF + FLG) state->mode = DICTID; // or TYPE } break; case TYPE: // Read block header NEEDBITS(3); state->last = BITS(1); DROPBITS(1); switch (BITS(2)) { case 0: state->mode = STORED; break; // Stored block case 1: fixedtables(state); // Fixed Huffman state->mode = LEN_; break; case 2: state->mode = TABLE; break; // Dynamic Huffman case 3: state->mode = BAD; break; // Reserved (error) } DROPBITS(2); break; case LEN: // Decode literal/length code here = state->lencode[BITS(state->lenbits)]; if (here.op == 0) { // Literal *put++ = (unsigned char)(here.val); DROPBITS(here.bits); state->mode = LEN; } else if (here.op & 16) { // Length code state->length = here.val; state->extra = here.op & 15; state->mode = LENEXT; } else if (here.op == 96) { // End of block state->mode = TYPE; } break; case DIST: // Decode distance code here = state->distcode[BITS(state->distbits)]; // ... similar pattern ... break; case MATCH: // Copy from window // Copy state->length bytes from position (out - state->offset) // Uses chunkmemset_safe() for SIMD-accelerated copying break; case CHECK: // Verify checksum // Compare computed check with stored value break; case DONE: ret = Z_STREAM_END; goto inf_leave; } inf_leave: // Save state state->hold = hold; state->bits = bits; return ret; } ``` --- ## Fast Inflate Path When sufficient input and output are available, `inflate()` calls the fast inner loop: ```c case LEN_: // Check if we can use the fast path if (have >= 6 && left >= 258) { FUNCTABLE_CALL(inflate_fast)(strm, out); // Reload local state state->mode = LEN; break; } state->mode = LEN; break; ``` `inflate_fast()` (defined via `inffast_tpl.h`) processes codes without returning to the main switch loop: 1. Pre-load bits into the 64-bit accumulator 2. Loop while input ≥ 6 bytes and output ≥ 258 bytes: - Decode literal/length from `lencode` - If literal: output directly - If length: decode distance from `distcode`, copy from window - If EOB: exit loop 3. Handle sub-table traversal for codes longer than root bits SIMD variants (SSE2, AVX2, AVX-512, NEON) accelerate the copy operation via `chunkmemset_safe()`, which can copy overlapping regions efficiently using vector loads/stores. --- ## Window Management The inflate sliding window stores recent output for back-reference copying: ```c unsigned char *window; // Circular buffer uint32_t wsize; // Window size uint32_t whave; // Valid bytes in window uint32_t wnext; // Write position (circular) ``` ### `updatewindow()` Called after each inflate pass to copy decompressed output into the window: ```c static void updatewindow(PREFIX3(stream) *strm, const uint8_t *end, uint32_t len, int32_t cksum) { struct inflate_state *state = (struct inflate_state *)strm->state; // First time: set up window size if (state->wsize == 0) { state->wsize = 1U << state->wbits; state->wnext = 0; state->whave = 0; } // Copy output to window (handles wraparound) if (len >= state->wsize) { // Copy last wsize bytes memcpy(state->window, end - state->wsize, state->wsize); state->wnext = 0; state->whave = state->wsize; } else { // Copy len bytes, wrapping around if necessary uint32_t dist = MIN(state->wsize - state->wnext, len); memcpy(state->window + state->wnext, end - len, dist); len -= dist; if (len) { memcpy(state->window, end - len, len); state->wnext = len; state->whave = state->wsize; } else { state->wnext += dist; if (state->wnext == state->wsize) state->wnext = 0; if (state->whave < state->wsize) state->whave += dist; } } } ``` ### Back-Reference Copying When a length/distance pair is decoded, the engine copies `length` bytes from position `(output_position - distance)`: ```c case MATCH: // Copy from window from = put - state->offset; if (state->offset > (put - state->window)) { // Source is in the circular window buffer // Handle wrap-around via window[] } // Copy length bytes to put, potentially overlapping FUNCTABLE_CALL(chunkmemset_safe)(put, from, state->length, left); ``` The `chunkmemset_safe()` function handles the case where `distance < length` (overlapping copy), which occurs in runs of repeated patterns. SIMD implementations can vectorise even overlapping copies by replicating the pattern. --- ## Checksum Handling During decompression, a running checksum is computed: - **zlib format**: Adler-32 of the uncompressed data - **gzip format**: CRC-32 of the uncompressed data The selection is based on `state->flags`: ```c static inline void inf_chksum_cpy(PREFIX3(stream) *strm, uint8_t *dst, const uint8_t *src, uint32_t copy) { struct inflate_state *state = (struct inflate_state *)strm->state; if (state->flags) strm->adler = state->check = FUNCTABLE_CALL(crc32_copy)(state->check, dst, src, copy); else strm->adler = state->check = FUNCTABLE_CALL(adler32_copy)(state->check, dst, src, copy); } ``` The `_copy` variants compute the checksum while simultaneously copying data, avoiding a second pass over the output. After all blocks are decompressed, the stored checksum is read and compared: ```c case CHECK: NEEDBITS(32); if (ZSWAP32((unsigned long)hold) != state->check) { strm->msg = "incorrect data check"; state->mode = BAD; break; } state->mode = LENGTH; // gzip: also check ISIZE break; ``` --- ## Error Handling The inflate engine detects and reports several error conditions: | Error | Detection | Recovery | |---|---|---| | Invalid block type (11) | `TYPE` state | `state->mode = BAD` | | Invalid stored block length | `STORED` state: `LEN != ~NLEN` | `BAD` | | Invalid code length repeat | `CODELENS` state | `BAD` | | Invalid literal/length code | Decoding | `BAD` | | Invalid distance code | Decoding | `BAD` | | Distance too far back | `MATCH` state | `BAD` (or zero-fill if `INFLATE_ALLOW_INVALID_DIST`) | | Checksum mismatch | `CHECK` state | `Z_DATA_ERROR` | | Header version mismatch | `HEAD` state | `Z_STREAM_ERROR` | The `inflateSync()` function searches for a sync point (four bytes `00 00 FF FF`) to recover from data corruption. `inflateMark()` reports the current decompression position for partial recovery. --- ## Inflate API Functions ### Core Functions ```c int inflateInit(z_stream *strm); int inflateInit2(z_stream *strm, int windowBits); int inflate(z_stream *strm, int flush); int inflateEnd(z_stream *strm); ``` ### Window Bits Parameter The `windowBits` parameter to `inflateInit2()` controls both the window size and the stream format: | windowBits | Format | Window Size | |---|---|---| | 8..15 | zlib (auto-detect dictionary) | 2^windowBits | | -8..-15 | Raw deflate (no wrapper) | 2^|windowBits| | | 24..31 (8..15 + 16) | gzip only | 2^(windowBits-16) | | 40..47 (8..15 + 32) | Auto-detect zlib or gzip | 2^(windowBits-32) | ### Reset Functions ```c int inflateReset(z_stream *strm); // Full reset int inflateResetKeep(z_stream *strm); // Reset but keep window int inflateReset2(z_stream *strm, int windowBits); // Reset with new windowBits ``` ### Dictionary Support ```c int inflateSetDictionary(z_stream *strm, const unsigned char *dictionary, unsigned dictLength); int inflateGetDictionary(z_stream *strm, unsigned char *dictionary, unsigned *dictLength); ``` When the zlib header indicates a preset dictionary (`FDICT` flag), `inflate()` returns `Z_NEED_DICT`. The application must then call `inflateSetDictionary()` with the correct dictionary before continuing. ### Sync and Recovery ```c int inflateSync(z_stream *strm); // Search for sync point long inflateMark(z_stream *strm); // Report progress int inflatePrime(z_stream *strm, int bits, int value); // Prime bit buffer ``` ### Header Access ```c int inflateGetHeader(z_stream *strm, gz_header *head); // Get gzip header info ``` ### Copy ```c int inflateCopy(z_stream *dest, z_stream *source); // Deep copy ``` --- ## Memory Allocation Inflate uses a single-allocation strategy via `alloc_inflate()`: ```c inflate_allocs* alloc_inflate(PREFIX3(stream) *strm) { int window_size = INFLATE_ADJUST_WINDOW_SIZE((1 << MAX_WBITS) + 64); int state_size = sizeof(inflate_state); int alloc_size = sizeof(inflate_allocs); // Calculate positions with alignment padding int window_pos = PAD_WINDOW(0); int state_pos = PAD_64(window_pos + window_size); int alloc_pos = PAD_16(state_pos + state_size); int total_size = PAD_64(alloc_pos + alloc_size + (WINDOW_PAD_SIZE - 1)); char *buf = strm->zalloc(strm->opaque, 1, total_size); // Partition buf into window, state, alloc_bufs return alloc_bufs; } ``` A single `zfree()` call releases everything: ```c void free_inflate(PREFIX3(stream) *strm) { inflate_allocs *alloc_bufs = state->alloc_bufs; alloc_bufs->zfree(strm->opaque, alloc_bufs->buf_start); strm->state = NULL; } ``` --- ## `infback.c` — Callback-Based Inflate An alternative inflate interface where the application provides input and output callbacks: ```c int inflateBack(z_stream *strm, in_func in, void *in_desc, out_func out, void *out_desc); ``` The caller provides: - `in(in_desc, &buf)` — Returns available input and sets `buf` - `out(out_desc, buf, len)` — Consumes `len` bytes of output from `buf` This avoids buffer management complexity for applications that can provide data on demand.