summaryrefslogtreecommitdiff
path: root/docs/handbook/neozip/inflate-engine.md
blob: 6467cff4877c892213bbf52eec0c5404520ac15f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
# Inflate Engine

## Overview

The inflate engine decompresses DEFLATE-format data (RFC 1951) wrapped in
either a zlib container (RFC 1950), a gzip container (RFC 1952), or with no
wrapper (raw deflate). It is implemented as a state machine in `inflate.c`,
with a hot inner loop in `inffast_tpl.h` and table-building in `inftrees.c`.

The engine processes input byte-by-byte through a 64-bit bit accumulator
(`hold`), transitioning between states as it parses headers, decodes Huffman
codes, copies back-referenced data, and verifies integrity checksums.

---

## State Machine

The `inflate_mode` enum defines all possible states:

```c
typedef enum {
    HEAD = 16180,   // Waiting for magic header
    FLAGS,          // Waiting for method and flags (gzip)
    TIME,           // Waiting for modification time (gzip)
    OS,             // Waiting for extra flags and OS (gzip)
    EXLEN,          // Waiting for extra length (gzip)
    EXTRA,          // Waiting for extra bytes (gzip)
    NAME,           // Waiting for end of file name (gzip)
    COMMENT,        // Waiting for end of comment (gzip)
    HCRC,           // Waiting for header CRC (gzip)
    DICTID,         // Waiting for dictionary check value
    DICT,           // Waiting for inflateSetDictionary() call
    TYPE,           // Waiting for type bits (including last-flag bit)
    TYPEDO,         // Same as TYPE but skip check to exit on new block
    STORED,         // Waiting for stored size (length and complement)
    COPY_,          // Waiting for input or output to copy stored block (first time)
    COPY,           // Waiting for input or output to copy stored block
    TABLE,          // Waiting for dynamic block table lengths
    LENLENS,        // Waiting for code length code lengths
    CODELENS,       // Waiting for length/lit and distance code lengths
    LEN_,           // Waiting for length/lit/eob code (first time)
    LEN,            // Waiting for length/lit/eob code
    LENEXT,         // Waiting for length extra bits
    DIST,           // Waiting for distance code
    DISTEXT,        // Waiting for distance extra bits
    MATCH,          // Waiting for output space to copy string
    LIT,            // Waiting for output space to write literal
    CHECK,          // Waiting for 32-bit check value
    LENGTH,         // Waiting for 32-bit length (gzip)
    DONE,           // Finished check, remain here until reset
    BAD,            // Got a data error, remain here until reset
    SYNC            // Looking for synchronization bytes
} inflate_mode;
```

The starting value `HEAD = 16180` (a non-zero, distinctive value) helps catch
uninitialised state errors.

---

## State Transitions

### Header Processing

```
HEAD ─┬─ (gzip header detected) ──▶ FLAGS ─▶ TIME ─▶ OS ─▶ EXLEN ─▶ EXTRA
      │                              ─▶ NAME ─▶ COMMENT ─▶ HCRC ─▶ TYPE
      │
      ├─ (zlib header detected) ──▶ DICTID ─▶ DICT ─▶ TYPE
      │                             or
      │                            ──▶ TYPE (no dictionary)
      │
      └─ (raw deflate) ──────────▶ TYPEDO
```

### Block Processing

```
TYPE ──▶ TYPEDO ─┬─ (stored block)  ──▶ STORED ─▶ COPY_ ─▶ COPY ──▶ TYPE
                 │
                 ├─ (dynamic block) ──▶ TABLE ─▶ LENLENS ─▶ CODELENS ─▶ LEN_
                 │
                 ├─ (fixed block)   ──▶ LEN_
                 │
                 └─ (last block)    ──▶ CHECK
```

### Data Decoding

```
LEN_ ──▶ LEN ─┬─ (literal)        ──▶ LIT ──▶ LEN
               │
               ├─ (length code)    ──▶ LENEXT ──▶ DIST ──▶ DISTEXT ──▶ MATCH ──▶ LEN
               │
               └─ (end-of-block)   ──▶ TYPE (or CHECK if last block)
```

### Trailer Processing

```
CHECK ──▶ LENGTH (gzip only) ──▶ DONE
```

---

## The `inflate_state` Structure

```c
struct ALIGNED_(64) inflate_state {
    // Stream back-pointer
    PREFIX3(stream) *strm;

    // State machine
    inflate_mode mode;      // Current state
    int last;               // true if processing last block
    int wrap;               // bit 0: zlib, bit 1: gzip, bit 2: validate check
    int havedict;           // Dictionary provided?
    int flags;              // gzip header flags (-1 = no header yet, 0 = zlib)

    // Integrity
    unsigned was;           // Initial match length for inflateMark
    unsigned long check;    // Running checksum (Adler-32 or CRC-32)
    unsigned long total;    // Running output byte count
    PREFIX(gz_headerp) head;// Where to save gzip header info

    // Sliding window
    unsigned wbits;         // log2(requested window size)
    uint32_t wsize;         // Window size (or 0 if not allocated)
    uint32_t wbufsize;      // Real allocated window size including padding
    uint32_t whave;         // Valid bytes in window
    uint32_t wnext;         // Window write index
    unsigned char *window;  // Sliding window buffer

    // Bit accumulator
    uint64_t hold;          // 64-bit input bit accumulator
    unsigned bits;          // Bits currently held

    // Code tables
    unsigned lenbits;       // Root bits for length code table
    code const *lencode;    // Length/literal code table pointer
    code const *distcode;   // Distance code table pointer
    unsigned distbits;      // Root bits for distance code table

    // Current match state
    uint32_t length;        // Literal value or copy length
    unsigned offset;        // Copy distance
    unsigned extra;         // Extra bits to read

    // Dynamic table building
    unsigned ncode;         // Code length code lengths count
    unsigned nlen;          // Length code count
    unsigned ndist;         // Distance code count
    uint32_t have;          // Code lengths decoded so far
    code *next;             // Next free slot in codes[]

    // Working storage
    uint16_t lens[320];     // Code lengths (max 286 + 30 + padding)
    uint16_t work[288];     // Work area for inflate_table
    code codes[ENOUGH];     // Code tables (ENOUGH = 1924 entries)

    inflate_allocs *alloc_bufs;
};
```

### Key Dimensions

| Constant | Value | Purpose |
|---|---|---|
| `ENOUGH` | 1924 | Maximum total code table entries |
| `ENOUGH_LENS` | 1332 | Maximum literal/length table entries |
| `ENOUGH_DISTS` | 592 | Maximum distance table entries |
| `MAX_WBITS` | 15 | Maximum window bits (32KB window) |
| `MIN_WBITS` | 8 | Minimum window bits (256-byte window) |

---

## Bit Accumulator

The inflate engine uses a 64-bit bit buffer for efficient bit extraction:

```c
uint64_t hold;    // Accumulated bits (LSB first)
unsigned bits;    // Number of valid bits in hold
```

Bits are read from `hold` from the least significant end:

```c
// Load bits from input
#define PULLBYTE() do { \
    hold += (uint64_t)(*next++) << bits; \
    bits += 8; \
} while (0)

// Ensure at least n bits are available
#define NEEDBITS(n) do { \
    while (bits < (unsigned)(n)) \
        PULLBYTE(); \
} while (0)

// Read n bits from hold
#define BITS(n) ((unsigned)hold & ((1U << (n)) - 1))

// Drop n bits from hold
#define DROPBITS(n) do { \
    hold >>= (n); \
    bits -= (unsigned)(n); \
} while (0)
```

The 64-bit width allows accumulating up to 8 bytes before overflow, reducing
the frequency of input reads.

---

## Huffman Code Decoding

### The `code` Structure

Each entry in a decoding table is a `code` structure:

```c
typedef struct {
    unsigned char bits;   // Bits to consume from input
    unsigned char op;     // Operation type + extra bits
    uint16_t val;         // Output value or table offset
} code;
```

The `op` field encodes the meaning:

| `op` Value | Meaning |
|---|---|
| `0x00` | Literal: `val` is the literal byte |
| `0x0t` (t ≠ 0) | Table link: `t` is the number of additional index bits |
| `0x1e` | Length or distance: `e` extra bits, `val` is the base value |
| `0x60` | End of block |
| `0x40` | Invalid code |

### Table Building via `zng_inflate_table()`

`inftrees.c` builds two-level Huffman decoding tables:

```c
int zng_inflate_table(codetype type, uint16_t *lens, unsigned codes,
                      code **table, unsigned *bits, uint16_t *work);
```

Parameters:
- `type` — `CODES` (code lengths), `LENS` (literal/length), or `DISTS` (distances)
- `lens` — Array of code lengths for each symbol
- `codes` — Number of symbols
- `table` — Output: pointer to the root table
- `bits` — Input: requested root table bits; Output: actual root table bits
- `work` — Scratch space

The algorithm:
1. Count code lengths → `count[]`
2. Compute offsets for sorting → `offs[]`
3. Sort symbols by code length → `work[]`
4. Fill the primary table (root bits = `*bits`)
5. For codes longer than root bits, create sub-tables

Root table sizes:
- Literal/length: 10 bits (1024 entries)
- Distance: 9 bits (512 entries)

Sub-tables handle codes longer than the root bits, linked from the primary
table via the `op` field.

### Fixed Huffman Tables

For fixed-code blocks (BTYPE=01), predefined tables are stored in
`inffixed_tbl.h`. These are computed once and reused.

The `fixedtables()` function in `inflate.c` sets up the fixed tables:
```c
state->lencode = lenfix;     // Predefined literal/length table
state->lenbits = 9;          // Root bits for fixed literal/length codes
state->distcode = distfix;   // Predefined distance table
state->distbits = 5;         // Root bits for fixed distance codes
```

---

## The Inflate Loop

The main `inflate()` function is a single large `for` loop with a `switch`
on `state->mode`:

```c
int32_t Z_EXPORT PREFIX(inflate)(PREFIX3(stream) *strm, int32_t flush) {
    // Input/output tracking
    unsigned char *put = strm->next_out;
    unsigned char *next = strm->next_in;
    uint64_t hold = state->hold;
    unsigned bits = state->bits;

    for (;;) switch (state->mode) {
    case HEAD:
        // Detect zlib/gzip/raw header
        NEEDBITS(16);
        if (state->wrap == 0) {
            // Raw deflate
            state->mode = TYPEDO;
            break;
        }
        if ((hold & 0xff) == 0x1f && ((hold >> 8) & 0xff) == 0x8b) {
            // gzip header
            state->mode = FLAGS;
        } else {
            // zlib header (CMF + FLG)
            state->mode = DICTID; // or TYPE
        }
        break;

    case TYPE:
        // Read block header
        NEEDBITS(3);
        state->last = BITS(1);
        DROPBITS(1);
        switch (BITS(2)) {
        case 0: state->mode = STORED; break;  // Stored block
        case 1: fixedtables(state);            // Fixed Huffman
                state->mode = LEN_; break;
        case 2: state->mode = TABLE; break;    // Dynamic Huffman
        case 3: state->mode = BAD; break;      // Reserved (error)
        }
        DROPBITS(2);
        break;

    case LEN:
        // Decode literal/length code
        here = state->lencode[BITS(state->lenbits)];
        if (here.op == 0) {
            // Literal
            *put++ = (unsigned char)(here.val);
            DROPBITS(here.bits);
            state->mode = LEN;
        } else if (here.op & 16) {
            // Length code
            state->length = here.val;
            state->extra = here.op & 15;
            state->mode = LENEXT;
        } else if (here.op == 96) {
            // End of block
            state->mode = TYPE;
        }
        break;

    case DIST:
        // Decode distance code
        here = state->distcode[BITS(state->distbits)];
        // ... similar pattern ...
        break;

    case MATCH:
        // Copy from window
        // Copy state->length bytes from position (out - state->offset)
        // Uses chunkmemset_safe() for SIMD-accelerated copying
        break;

    case CHECK:
        // Verify checksum
        // Compare computed check with stored value
        break;

    case DONE:
        ret = Z_STREAM_END;
        goto inf_leave;
    }

inf_leave:
    // Save state
    state->hold = hold;
    state->bits = bits;
    return ret;
}
```

---

## Fast Inflate Path

When sufficient input and output are available, `inflate()` calls the fast
inner loop:

```c
case LEN_:
    // Check if we can use the fast path
    if (have >= 6 && left >= 258) {
        FUNCTABLE_CALL(inflate_fast)(strm, out);
        // Reload local state
        state->mode = LEN;
        break;
    }
    state->mode = LEN;
    break;
```

`inflate_fast()` (defined via `inffast_tpl.h`) processes codes without
returning to the main switch loop:

1. Pre-load bits into the 64-bit accumulator
2. Loop while input ≥ 6 bytes and output ≥ 258 bytes:
   - Decode literal/length from `lencode`
   - If literal: output directly
   - If length: decode distance from `distcode`, copy from window
   - If EOB: exit loop
3. Handle sub-table traversal for codes longer than root bits

SIMD variants (SSE2, AVX2, AVX-512, NEON) accelerate the copy operation
via `chunkmemset_safe()`, which can copy overlapping regions efficiently
using vector loads/stores.

---

## Window Management

The inflate sliding window stores recent output for back-reference copying:

```c
unsigned char *window;   // Circular buffer
uint32_t wsize;          // Window size
uint32_t whave;          // Valid bytes in window
uint32_t wnext;          // Write position (circular)
```

### `updatewindow()`

Called after each inflate pass to copy decompressed output into the window:

```c
static void updatewindow(PREFIX3(stream) *strm, const uint8_t *end,
                          uint32_t len, int32_t cksum) {
    struct inflate_state *state = (struct inflate_state *)strm->state;

    // First time: set up window size
    if (state->wsize == 0) {
        state->wsize = 1U << state->wbits;
        state->wnext = 0;
        state->whave = 0;
    }

    // Copy output to window (handles wraparound)
    if (len >= state->wsize) {
        // Copy last wsize bytes
        memcpy(state->window, end - state->wsize, state->wsize);
        state->wnext = 0;
        state->whave = state->wsize;
    } else {
        // Copy len bytes, wrapping around if necessary
        uint32_t dist = MIN(state->wsize - state->wnext, len);
        memcpy(state->window + state->wnext, end - len, dist);
        len -= dist;
        if (len) {
            memcpy(state->window, end - len, len);
            state->wnext = len;
            state->whave = state->wsize;
        } else {
            state->wnext += dist;
            if (state->wnext == state->wsize) state->wnext = 0;
            if (state->whave < state->wsize) state->whave += dist;
        }
    }
}
```

### Back-Reference Copying

When a length/distance pair is decoded, the engine copies `length` bytes
from position `(output_position - distance)`:

```c
case MATCH:
    // Copy from window
    from = put - state->offset;
    if (state->offset > (put - state->window)) {
        // Source is in the circular window buffer
        // Handle wrap-around via window[]
    }
    // Copy length bytes to put, potentially overlapping
    FUNCTABLE_CALL(chunkmemset_safe)(put, from, state->length, left);
```

The `chunkmemset_safe()` function handles the case where `distance < length`
(overlapping copy), which occurs in runs of repeated patterns. SIMD
implementations can vectorise even overlapping copies by replicating the
pattern.

---

## Checksum Handling

During decompression, a running checksum is computed:

- **zlib format**: Adler-32 of the uncompressed data
- **gzip format**: CRC-32 of the uncompressed data

The selection is based on `state->flags`:
```c
static inline void inf_chksum_cpy(PREFIX3(stream) *strm, uint8_t *dst,
                                   const uint8_t *src, uint32_t copy) {
    struct inflate_state *state = (struct inflate_state *)strm->state;
    if (state->flags)
        strm->adler = state->check = FUNCTABLE_CALL(crc32_copy)(state->check, dst, src, copy);
    else
        strm->adler = state->check = FUNCTABLE_CALL(adler32_copy)(state->check, dst, src, copy);
}
```

The `_copy` variants compute the checksum while simultaneously copying data,
avoiding a second pass over the output.

After all blocks are decompressed, the stored checksum is read and compared:

```c
case CHECK:
    NEEDBITS(32);
    if (ZSWAP32((unsigned long)hold) != state->check) {
        strm->msg = "incorrect data check";
        state->mode = BAD;
        break;
    }
    state->mode = LENGTH;  // gzip: also check ISIZE
    break;
```

---

## Error Handling

The inflate engine detects and reports several error conditions:

| Error | Detection | Recovery |
|---|---|---|
| Invalid block type (11) | `TYPE` state | `state->mode = BAD` |
| Invalid stored block length | `STORED` state: `LEN != ~NLEN` | `BAD` |
| Invalid code length repeat | `CODELENS` state | `BAD` |
| Invalid literal/length code | Decoding | `BAD` |
| Invalid distance code | Decoding | `BAD` |
| Distance too far back | `MATCH` state | `BAD` (or zero-fill if `INFLATE_ALLOW_INVALID_DIST`) |
| Checksum mismatch | `CHECK` state | `Z_DATA_ERROR` |
| Header version mismatch | `HEAD` state | `Z_STREAM_ERROR` |

The `inflateSync()` function searches for a sync point (four bytes `00 00 FF FF`)
to recover from data corruption. `inflateMark()` reports the current
decompression position for partial recovery.

---

## Inflate API Functions

### Core Functions

```c
int inflateInit(z_stream *strm);
int inflateInit2(z_stream *strm, int windowBits);
int inflate(z_stream *strm, int flush);
int inflateEnd(z_stream *strm);
```

### Window Bits Parameter

The `windowBits` parameter to `inflateInit2()` controls both the window size
and the stream format:

| windowBits | Format | Window Size |
|---|---|---|
| 8..15 | zlib (auto-detect dictionary) | 2^windowBits |
| -8..-15 | Raw deflate (no wrapper) | 2^|windowBits| |
| 24..31 (8..15 + 16) | gzip only | 2^(windowBits-16) |
| 40..47 (8..15 + 32) | Auto-detect zlib or gzip | 2^(windowBits-32) |

### Reset Functions

```c
int inflateReset(z_stream *strm);      // Full reset
int inflateResetKeep(z_stream *strm);  // Reset but keep window
int inflateReset2(z_stream *strm, int windowBits);  // Reset with new windowBits
```

### Dictionary Support

```c
int inflateSetDictionary(z_stream *strm, const unsigned char *dictionary, unsigned dictLength);
int inflateGetDictionary(z_stream *strm, unsigned char *dictionary, unsigned *dictLength);
```

When the zlib header indicates a preset dictionary (`FDICT` flag), `inflate()`
returns `Z_NEED_DICT`. The application must then call `inflateSetDictionary()`
with the correct dictionary before continuing.

### Sync and Recovery

```c
int inflateSync(z_stream *strm);       // Search for sync point
long inflateMark(z_stream *strm);      // Report progress
int inflatePrime(z_stream *strm, int bits, int value);  // Prime bit buffer
```

### Header Access

```c
int inflateGetHeader(z_stream *strm, gz_header *head);  // Get gzip header info
```

### Copy

```c
int inflateCopy(z_stream *dest, z_stream *source);  // Deep copy
```

---

## Memory Allocation

Inflate uses a single-allocation strategy via `alloc_inflate()`:

```c
inflate_allocs* alloc_inflate(PREFIX3(stream) *strm) {
    int window_size = INFLATE_ADJUST_WINDOW_SIZE((1 << MAX_WBITS) + 64);
    int state_size = sizeof(inflate_state);
    int alloc_size = sizeof(inflate_allocs);

    // Calculate positions with alignment padding
    int window_pos = PAD_WINDOW(0);
    int state_pos = PAD_64(window_pos + window_size);
    int alloc_pos = PAD_16(state_pos + state_size);
    int total_size = PAD_64(alloc_pos + alloc_size + (WINDOW_PAD_SIZE - 1));

    char *buf = strm->zalloc(strm->opaque, 1, total_size);
    // Partition buf into window, state, alloc_bufs
    return alloc_bufs;
}
```

A single `zfree()` call releases everything:
```c
void free_inflate(PREFIX3(stream) *strm) {
    inflate_allocs *alloc_bufs = state->alloc_bufs;
    alloc_bufs->zfree(strm->opaque, alloc_bufs->buf_start);
    strm->state = NULL;
}
```

---

## `infback.c` — Callback-Based Inflate

An alternative inflate interface where the application provides input and
output callbacks:

```c
int inflateBack(z_stream *strm,
                in_func in, void *in_desc,
                out_func out, void *out_desc);
```

The caller provides:
- `in(in_desc, &buf)` — Returns available input and sets `buf`
- `out(out_desc, buf, len)` — Consumes `len` bytes of output from `buf`

This avoids buffer management complexity for applications that can provide
data on demand.