diff options
Diffstat (limited to 'docs/handbook/neozip/inflate-engine.md')
| -rw-r--r-- | docs/handbook/neozip/inflate-engine.md | 665 |
1 files changed, 665 insertions, 0 deletions
diff --git a/docs/handbook/neozip/inflate-engine.md b/docs/handbook/neozip/inflate-engine.md new file mode 100644 index 0000000000..6467cff487 --- /dev/null +++ b/docs/handbook/neozip/inflate-engine.md @@ -0,0 +1,665 @@ +# Inflate Engine + +## Overview + +The inflate engine decompresses DEFLATE-format data (RFC 1951) wrapped in +either a zlib container (RFC 1950), a gzip container (RFC 1952), or with no +wrapper (raw deflate). It is implemented as a state machine in `inflate.c`, +with a hot inner loop in `inffast_tpl.h` and table-building in `inftrees.c`. + +The engine processes input byte-by-byte through a 64-bit bit accumulator +(`hold`), transitioning between states as it parses headers, decodes Huffman +codes, copies back-referenced data, and verifies integrity checksums. + +--- + +## State Machine + +The `inflate_mode` enum defines all possible states: + +```c +typedef enum { + HEAD = 16180, // Waiting for magic header + FLAGS, // Waiting for method and flags (gzip) + TIME, // Waiting for modification time (gzip) + OS, // Waiting for extra flags and OS (gzip) + EXLEN, // Waiting for extra length (gzip) + EXTRA, // Waiting for extra bytes (gzip) + NAME, // Waiting for end of file name (gzip) + COMMENT, // Waiting for end of comment (gzip) + HCRC, // Waiting for header CRC (gzip) + DICTID, // Waiting for dictionary check value + DICT, // Waiting for inflateSetDictionary() call + TYPE, // Waiting for type bits (including last-flag bit) + TYPEDO, // Same as TYPE but skip check to exit on new block + STORED, // Waiting for stored size (length and complement) + COPY_, // Waiting for input or output to copy stored block (first time) + COPY, // Waiting for input or output to copy stored block + TABLE, // Waiting for dynamic block table lengths + LENLENS, // Waiting for code length code lengths + CODELENS, // Waiting for length/lit and distance code lengths + LEN_, // Waiting for length/lit/eob code (first time) + LEN, // Waiting for length/lit/eob code + LENEXT, // Waiting for length extra bits + DIST, // Waiting for distance code + DISTEXT, // Waiting for distance extra bits + MATCH, // Waiting for output space to copy string + LIT, // Waiting for output space to write literal + CHECK, // Waiting for 32-bit check value + LENGTH, // Waiting for 32-bit length (gzip) + DONE, // Finished check, remain here until reset + BAD, // Got a data error, remain here until reset + SYNC // Looking for synchronization bytes +} inflate_mode; +``` + +The starting value `HEAD = 16180` (a non-zero, distinctive value) helps catch +uninitialised state errors. + +--- + +## State Transitions + +### Header Processing + +``` +HEAD ─┬─ (gzip header detected) ──▶ FLAGS ─▶ TIME ─▶ OS ─▶ EXLEN ─▶ EXTRA + │ ─▶ NAME ─▶ COMMENT ─▶ HCRC ─▶ TYPE + │ + ├─ (zlib header detected) ──▶ DICTID ─▶ DICT ─▶ TYPE + │ or + │ ──▶ TYPE (no dictionary) + │ + └─ (raw deflate) ──────────▶ TYPEDO +``` + +### Block Processing + +``` +TYPE ──▶ TYPEDO ─┬─ (stored block) ──▶ STORED ─▶ COPY_ ─▶ COPY ──▶ TYPE + │ + ├─ (dynamic block) ──▶ TABLE ─▶ LENLENS ─▶ CODELENS ─▶ LEN_ + │ + ├─ (fixed block) ──▶ LEN_ + │ + └─ (last block) ──▶ CHECK +``` + +### Data Decoding + +``` +LEN_ ──▶ LEN ─┬─ (literal) ──▶ LIT ──▶ LEN + │ + ├─ (length code) ──▶ LENEXT ──▶ DIST ──▶ DISTEXT ──▶ MATCH ──▶ LEN + │ + └─ (end-of-block) ──▶ TYPE (or CHECK if last block) +``` + +### Trailer Processing + +``` +CHECK ──▶ LENGTH (gzip only) ──▶ DONE +``` + +--- + +## The `inflate_state` Structure + +```c +struct ALIGNED_(64) inflate_state { + // Stream back-pointer + PREFIX3(stream) *strm; + + // State machine + inflate_mode mode; // Current state + int last; // true if processing last block + int wrap; // bit 0: zlib, bit 1: gzip, bit 2: validate check + int havedict; // Dictionary provided? + int flags; // gzip header flags (-1 = no header yet, 0 = zlib) + + // Integrity + unsigned was; // Initial match length for inflateMark + unsigned long check; // Running checksum (Adler-32 or CRC-32) + unsigned long total; // Running output byte count + PREFIX(gz_headerp) head;// Where to save gzip header info + + // Sliding window + unsigned wbits; // log2(requested window size) + uint32_t wsize; // Window size (or 0 if not allocated) + uint32_t wbufsize; // Real allocated window size including padding + uint32_t whave; // Valid bytes in window + uint32_t wnext; // Window write index + unsigned char *window; // Sliding window buffer + + // Bit accumulator + uint64_t hold; // 64-bit input bit accumulator + unsigned bits; // Bits currently held + + // Code tables + unsigned lenbits; // Root bits for length code table + code const *lencode; // Length/literal code table pointer + code const *distcode; // Distance code table pointer + unsigned distbits; // Root bits for distance code table + + // Current match state + uint32_t length; // Literal value or copy length + unsigned offset; // Copy distance + unsigned extra; // Extra bits to read + + // Dynamic table building + unsigned ncode; // Code length code lengths count + unsigned nlen; // Length code count + unsigned ndist; // Distance code count + uint32_t have; // Code lengths decoded so far + code *next; // Next free slot in codes[] + + // Working storage + uint16_t lens[320]; // Code lengths (max 286 + 30 + padding) + uint16_t work[288]; // Work area for inflate_table + code codes[ENOUGH]; // Code tables (ENOUGH = 1924 entries) + + inflate_allocs *alloc_bufs; +}; +``` + +### Key Dimensions + +| Constant | Value | Purpose | +|---|---|---| +| `ENOUGH` | 1924 | Maximum total code table entries | +| `ENOUGH_LENS` | 1332 | Maximum literal/length table entries | +| `ENOUGH_DISTS` | 592 | Maximum distance table entries | +| `MAX_WBITS` | 15 | Maximum window bits (32KB window) | +| `MIN_WBITS` | 8 | Minimum window bits (256-byte window) | + +--- + +## Bit Accumulator + +The inflate engine uses a 64-bit bit buffer for efficient bit extraction: + +```c +uint64_t hold; // Accumulated bits (LSB first) +unsigned bits; // Number of valid bits in hold +``` + +Bits are read from `hold` from the least significant end: + +```c +// Load bits from input +#define PULLBYTE() do { \ + hold += (uint64_t)(*next++) << bits; \ + bits += 8; \ +} while (0) + +// Ensure at least n bits are available +#define NEEDBITS(n) do { \ + while (bits < (unsigned)(n)) \ + PULLBYTE(); \ +} while (0) + +// Read n bits from hold +#define BITS(n) ((unsigned)hold & ((1U << (n)) - 1)) + +// Drop n bits from hold +#define DROPBITS(n) do { \ + hold >>= (n); \ + bits -= (unsigned)(n); \ +} while (0) +``` + +The 64-bit width allows accumulating up to 8 bytes before overflow, reducing +the frequency of input reads. + +--- + +## Huffman Code Decoding + +### The `code` Structure + +Each entry in a decoding table is a `code` structure: + +```c +typedef struct { + unsigned char bits; // Bits to consume from input + unsigned char op; // Operation type + extra bits + uint16_t val; // Output value or table offset +} code; +``` + +The `op` field encodes the meaning: + +| `op` Value | Meaning | +|---|---| +| `0x00` | Literal: `val` is the literal byte | +| `0x0t` (t ≠ 0) | Table link: `t` is the number of additional index bits | +| `0x1e` | Length or distance: `e` extra bits, `val` is the base value | +| `0x60` | End of block | +| `0x40` | Invalid code | + +### Table Building via `zng_inflate_table()` + +`inftrees.c` builds two-level Huffman decoding tables: + +```c +int zng_inflate_table(codetype type, uint16_t *lens, unsigned codes, + code **table, unsigned *bits, uint16_t *work); +``` + +Parameters: +- `type` — `CODES` (code lengths), `LENS` (literal/length), or `DISTS` (distances) +- `lens` — Array of code lengths for each symbol +- `codes` — Number of symbols +- `table` — Output: pointer to the root table +- `bits` — Input: requested root table bits; Output: actual root table bits +- `work` — Scratch space + +The algorithm: +1. Count code lengths → `count[]` +2. Compute offsets for sorting → `offs[]` +3. Sort symbols by code length → `work[]` +4. Fill the primary table (root bits = `*bits`) +5. For codes longer than root bits, create sub-tables + +Root table sizes: +- Literal/length: 10 bits (1024 entries) +- Distance: 9 bits (512 entries) + +Sub-tables handle codes longer than the root bits, linked from the primary +table via the `op` field. + +### Fixed Huffman Tables + +For fixed-code blocks (BTYPE=01), predefined tables are stored in +`inffixed_tbl.h`. These are computed once and reused. + +The `fixedtables()` function in `inflate.c` sets up the fixed tables: +```c +state->lencode = lenfix; // Predefined literal/length table +state->lenbits = 9; // Root bits for fixed literal/length codes +state->distcode = distfix; // Predefined distance table +state->distbits = 5; // Root bits for fixed distance codes +``` + +--- + +## The Inflate Loop + +The main `inflate()` function is a single large `for` loop with a `switch` +on `state->mode`: + +```c +int32_t Z_EXPORT PREFIX(inflate)(PREFIX3(stream) *strm, int32_t flush) { + // Input/output tracking + unsigned char *put = strm->next_out; + unsigned char *next = strm->next_in; + uint64_t hold = state->hold; + unsigned bits = state->bits; + + for (;;) switch (state->mode) { + case HEAD: + // Detect zlib/gzip/raw header + NEEDBITS(16); + if (state->wrap == 0) { + // Raw deflate + state->mode = TYPEDO; + break; + } + if ((hold & 0xff) == 0x1f && ((hold >> 8) & 0xff) == 0x8b) { + // gzip header + state->mode = FLAGS; + } else { + // zlib header (CMF + FLG) + state->mode = DICTID; // or TYPE + } + break; + + case TYPE: + // Read block header + NEEDBITS(3); + state->last = BITS(1); + DROPBITS(1); + switch (BITS(2)) { + case 0: state->mode = STORED; break; // Stored block + case 1: fixedtables(state); // Fixed Huffman + state->mode = LEN_; break; + case 2: state->mode = TABLE; break; // Dynamic Huffman + case 3: state->mode = BAD; break; // Reserved (error) + } + DROPBITS(2); + break; + + case LEN: + // Decode literal/length code + here = state->lencode[BITS(state->lenbits)]; + if (here.op == 0) { + // Literal + *put++ = (unsigned char)(here.val); + DROPBITS(here.bits); + state->mode = LEN; + } else if (here.op & 16) { + // Length code + state->length = here.val; + state->extra = here.op & 15; + state->mode = LENEXT; + } else if (here.op == 96) { + // End of block + state->mode = TYPE; + } + break; + + case DIST: + // Decode distance code + here = state->distcode[BITS(state->distbits)]; + // ... similar pattern ... + break; + + case MATCH: + // Copy from window + // Copy state->length bytes from position (out - state->offset) + // Uses chunkmemset_safe() for SIMD-accelerated copying + break; + + case CHECK: + // Verify checksum + // Compare computed check with stored value + break; + + case DONE: + ret = Z_STREAM_END; + goto inf_leave; + } + +inf_leave: + // Save state + state->hold = hold; + state->bits = bits; + return ret; +} +``` + +--- + +## Fast Inflate Path + +When sufficient input and output are available, `inflate()` calls the fast +inner loop: + +```c +case LEN_: + // Check if we can use the fast path + if (have >= 6 && left >= 258) { + FUNCTABLE_CALL(inflate_fast)(strm, out); + // Reload local state + state->mode = LEN; + break; + } + state->mode = LEN; + break; +``` + +`inflate_fast()` (defined via `inffast_tpl.h`) processes codes without +returning to the main switch loop: + +1. Pre-load bits into the 64-bit accumulator +2. Loop while input ≥ 6 bytes and output ≥ 258 bytes: + - Decode literal/length from `lencode` + - If literal: output directly + - If length: decode distance from `distcode`, copy from window + - If EOB: exit loop +3. Handle sub-table traversal for codes longer than root bits + +SIMD variants (SSE2, AVX2, AVX-512, NEON) accelerate the copy operation +via `chunkmemset_safe()`, which can copy overlapping regions efficiently +using vector loads/stores. + +--- + +## Window Management + +The inflate sliding window stores recent output for back-reference copying: + +```c +unsigned char *window; // Circular buffer +uint32_t wsize; // Window size +uint32_t whave; // Valid bytes in window +uint32_t wnext; // Write position (circular) +``` + +### `updatewindow()` + +Called after each inflate pass to copy decompressed output into the window: + +```c +static void updatewindow(PREFIX3(stream) *strm, const uint8_t *end, + uint32_t len, int32_t cksum) { + struct inflate_state *state = (struct inflate_state *)strm->state; + + // First time: set up window size + if (state->wsize == 0) { + state->wsize = 1U << state->wbits; + state->wnext = 0; + state->whave = 0; + } + + // Copy output to window (handles wraparound) + if (len >= state->wsize) { + // Copy last wsize bytes + memcpy(state->window, end - state->wsize, state->wsize); + state->wnext = 0; + state->whave = state->wsize; + } else { + // Copy len bytes, wrapping around if necessary + uint32_t dist = MIN(state->wsize - state->wnext, len); + memcpy(state->window + state->wnext, end - len, dist); + len -= dist; + if (len) { + memcpy(state->window, end - len, len); + state->wnext = len; + state->whave = state->wsize; + } else { + state->wnext += dist; + if (state->wnext == state->wsize) state->wnext = 0; + if (state->whave < state->wsize) state->whave += dist; + } + } +} +``` + +### Back-Reference Copying + +When a length/distance pair is decoded, the engine copies `length` bytes +from position `(output_position - distance)`: + +```c +case MATCH: + // Copy from window + from = put - state->offset; + if (state->offset > (put - state->window)) { + // Source is in the circular window buffer + // Handle wrap-around via window[] + } + // Copy length bytes to put, potentially overlapping + FUNCTABLE_CALL(chunkmemset_safe)(put, from, state->length, left); +``` + +The `chunkmemset_safe()` function handles the case where `distance < length` +(overlapping copy), which occurs in runs of repeated patterns. SIMD +implementations can vectorise even overlapping copies by replicating the +pattern. + +--- + +## Checksum Handling + +During decompression, a running checksum is computed: + +- **zlib format**: Adler-32 of the uncompressed data +- **gzip format**: CRC-32 of the uncompressed data + +The selection is based on `state->flags`: +```c +static inline void inf_chksum_cpy(PREFIX3(stream) *strm, uint8_t *dst, + const uint8_t *src, uint32_t copy) { + struct inflate_state *state = (struct inflate_state *)strm->state; + if (state->flags) + strm->adler = state->check = FUNCTABLE_CALL(crc32_copy)(state->check, dst, src, copy); + else + strm->adler = state->check = FUNCTABLE_CALL(adler32_copy)(state->check, dst, src, copy); +} +``` + +The `_copy` variants compute the checksum while simultaneously copying data, +avoiding a second pass over the output. + +After all blocks are decompressed, the stored checksum is read and compared: + +```c +case CHECK: + NEEDBITS(32); + if (ZSWAP32((unsigned long)hold) != state->check) { + strm->msg = "incorrect data check"; + state->mode = BAD; + break; + } + state->mode = LENGTH; // gzip: also check ISIZE + break; +``` + +--- + +## Error Handling + +The inflate engine detects and reports several error conditions: + +| Error | Detection | Recovery | +|---|---|---| +| Invalid block type (11) | `TYPE` state | `state->mode = BAD` | +| Invalid stored block length | `STORED` state: `LEN != ~NLEN` | `BAD` | +| Invalid code length repeat | `CODELENS` state | `BAD` | +| Invalid literal/length code | Decoding | `BAD` | +| Invalid distance code | Decoding | `BAD` | +| Distance too far back | `MATCH` state | `BAD` (or zero-fill if `INFLATE_ALLOW_INVALID_DIST`) | +| Checksum mismatch | `CHECK` state | `Z_DATA_ERROR` | +| Header version mismatch | `HEAD` state | `Z_STREAM_ERROR` | + +The `inflateSync()` function searches for a sync point (four bytes `00 00 FF FF`) +to recover from data corruption. `inflateMark()` reports the current +decompression position for partial recovery. + +--- + +## Inflate API Functions + +### Core Functions + +```c +int inflateInit(z_stream *strm); +int inflateInit2(z_stream *strm, int windowBits); +int inflate(z_stream *strm, int flush); +int inflateEnd(z_stream *strm); +``` + +### Window Bits Parameter + +The `windowBits` parameter to `inflateInit2()` controls both the window size +and the stream format: + +| windowBits | Format | Window Size | +|---|---|---| +| 8..15 | zlib (auto-detect dictionary) | 2^windowBits | +| -8..-15 | Raw deflate (no wrapper) | 2^|windowBits| | +| 24..31 (8..15 + 16) | gzip only | 2^(windowBits-16) | +| 40..47 (8..15 + 32) | Auto-detect zlib or gzip | 2^(windowBits-32) | + +### Reset Functions + +```c +int inflateReset(z_stream *strm); // Full reset +int inflateResetKeep(z_stream *strm); // Reset but keep window +int inflateReset2(z_stream *strm, int windowBits); // Reset with new windowBits +``` + +### Dictionary Support + +```c +int inflateSetDictionary(z_stream *strm, const unsigned char *dictionary, unsigned dictLength); +int inflateGetDictionary(z_stream *strm, unsigned char *dictionary, unsigned *dictLength); +``` + +When the zlib header indicates a preset dictionary (`FDICT` flag), `inflate()` +returns `Z_NEED_DICT`. The application must then call `inflateSetDictionary()` +with the correct dictionary before continuing. + +### Sync and Recovery + +```c +int inflateSync(z_stream *strm); // Search for sync point +long inflateMark(z_stream *strm); // Report progress +int inflatePrime(z_stream *strm, int bits, int value); // Prime bit buffer +``` + +### Header Access + +```c +int inflateGetHeader(z_stream *strm, gz_header *head); // Get gzip header info +``` + +### Copy + +```c +int inflateCopy(z_stream *dest, z_stream *source); // Deep copy +``` + +--- + +## Memory Allocation + +Inflate uses a single-allocation strategy via `alloc_inflate()`: + +```c +inflate_allocs* alloc_inflate(PREFIX3(stream) *strm) { + int window_size = INFLATE_ADJUST_WINDOW_SIZE((1 << MAX_WBITS) + 64); + int state_size = sizeof(inflate_state); + int alloc_size = sizeof(inflate_allocs); + + // Calculate positions with alignment padding + int window_pos = PAD_WINDOW(0); + int state_pos = PAD_64(window_pos + window_size); + int alloc_pos = PAD_16(state_pos + state_size); + int total_size = PAD_64(alloc_pos + alloc_size + (WINDOW_PAD_SIZE - 1)); + + char *buf = strm->zalloc(strm->opaque, 1, total_size); + // Partition buf into window, state, alloc_bufs + return alloc_bufs; +} +``` + +A single `zfree()` call releases everything: +```c +void free_inflate(PREFIX3(stream) *strm) { + inflate_allocs *alloc_bufs = state->alloc_bufs; + alloc_bufs->zfree(strm->opaque, alloc_bufs->buf_start); + strm->state = NULL; +} +``` + +--- + +## `infback.c` — Callback-Based Inflate + +An alternative inflate interface where the application provides input and +output callbacks: + +```c +int inflateBack(z_stream *strm, + in_func in, void *in_desc, + out_func out, void *out_desc); +``` + +The caller provides: +- `in(in_desc, &buf)` — Returns available input and sets `buf` +- `out(out_desc, buf, len)` — Consumes `len` bytes of output from `buf` + +This avoids buffer management complexity for applications that can provide +data on demand. |
