1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
|
# Gzip Support
## Overview
Neozip provides a complete gzip file I/O layer on top of the core
deflate/inflate engine. This layer is implemented in three files:
- `gzlib.c` — Shared state management, file open/close, seeking
- `gzread.c` — Gzip file reading (decompression)
- `gzwrite.c` — Gzip file writing (compression)
- `gzguts.h` — Internal structures and constants
The gzip API is enabled by the `WITH_GZFILEOP` CMake option (ON by default).
---
## The `gz_state` Structure
From `gzguts.h`:
```c
typedef struct {
// Identification
PREFIX3(stream) strm; // Inflate/deflate stream
int mode; // GZ_READ or GZ_WRITE
int fd; // File descriptor
char *path; // Path for error messages
unsigned size; // Buffer size (default GZBUFSIZE)
// Buffering
unsigned want; // Requested buffer size
unsigned char *in; // Input buffer (read mode)
unsigned char *out; // Output buffer
int direct; // 0=compressed, 1=passthrough (not gzip)
// Position tracking
z_off64_t start; // Start of compressed data (after header)
z_off64_t raw; // Raw (compressed) file position
z_off64_t pos; // Uncompressed data position
int eof; // End of input file reached
int past; // Read past end of input
// Error tracking
int err; // Error code
char *msg; // Error message (or NULL)
int how; // 0=output, 1=copy, 2=decompress
// Write mode
int level; // Compression level
int strategy; // Compression strategy
int reset; // true if deflateReset needed
// Seeking
z_off64_t skip; // Bytes to skip during next read
// Peek
int seek; // Seek request pending
} gz_state;
```
### Constants
```c
#define GZBUFSIZE 131072 // Default buffer size (128 KB)
#define GZ_READ 7247 // Sentinel for read mode
#define GZ_WRITE 31153 // Sentinel for write mode
#define GZ_APPEND 1 // Mode flag for append
```
The sentinel values `GZ_READ` and `GZ_WRITE` are non-obvious integers
chosen to catch state corruption.
---
## File Open (`gzlib.c`)
### `gzopen()` / `gzdopen()`
```c
gzFile PREFIX(gzopen)(const char *path, const char *mode);
gzFile PREFIX(gzdopen)(int fd, const char *mode);
gzFile PREFIX(gzopen64)(const char *path, const char *mode);
```
The mode string supports:
- `r` — Read (decompress)
- `w` — Write (compress)
- `a` — Append (compress, append to existing file)
- `0-9` — Compression level
- `f` — `Z_FILTERED` strategy
- `h` — `Z_HUFFMAN_ONLY` strategy
- `R` — `Z_RLE` strategy
- `F` — `Z_FIXED` strategy
- `T` — Direct/transparent (no compression)
### `gz_state_init()`
```c
static void gz_state_init(gz_state *state) {
state->size = 0;
state->want = GZBUFSIZE;
state->in = NULL;
state->out = NULL;
state->direct = 0;
state->err = Z_OK;
state->pos = 0;
state->strm.avail_in = 0;
}
```
### `gz_buffer_alloc()`
Allocates I/O buffers:
```c
static int gz_buffer_alloc(gz_state *state) {
unsigned size = state->want;
if (state->mode == GZ_READ) {
// Read: input buffer = size, output buffer = size * 2
state->in = malloc(size);
state->out = malloc(size << 1);
state->size = size;
} else {
// Write: output buffer = size
state->in = NULL;
state->out = malloc(size);
state->size = size;
}
return 0;
}
```
In read mode, the output buffer is doubled to handle cases where
decompression expands data significantly within a single call.
---
## Reading (`gzread.c`)
### Read Pipeline
```
gz_read() → gz_fetch() → gz_decomp() → inflate()
↘ gz_look() (header detection)
```
### `gz_look()` — Header Detection
Determines if the file is gzip-compressed or raw:
```c
static int gz_look(gz_state *state) {
// Read enough to check for gzip magic number
if (state->strm.avail_in < 2) {
// Read from file
int got = read(state->fd, state->in, state->size);
state->strm.avail_in = got;
state->strm.next_in = state->in;
}
// Check for gzip magic (1f 8b)
if (state->strm.avail_in >= 2 &&
state->in[0] == 0x1f && state->in[1] == 0x8b) {
// Initialize inflate for gzip
inflateInit2(&state->strm, 15 + 16); // windowBits + 16 = gzip
state->how = 2; // Decompress mode
} else {
// Not gzip — pass through directly
state->direct = 1;
state->how = 1; // Copy mode
}
}
```
### `gz_decomp()` — Decompression
```c
static int gz_decomp(gz_state *state) {
int ret;
unsigned had = state->strm.avail_out;
// Call inflate
ret = PREFIX(inflate)(&state->strm, Z_NO_FLUSH);
state->pos += had - state->strm.avail_out;
if (ret == Z_STREAM_END) {
// End of gzip member — may be concatenated gzip
inflateReset(&state->strm);
state->how = 0; // Need to look for next member
}
return 0;
}
```
### `gz_fetch()` — Fetch More Data
```c
static int gz_fetch(gz_state *state) {
do {
switch (state->how) {
case 0: // Look for gzip header
if (gz_look(state) == -1) return -1;
if (state->how == 0) return 0; // EOF
break;
case 1: // Copy raw data
if (gz_load(state, state->out, state->size << 1, &got) == -1)
return -1;
state->pos += got;
break;
case 2: // Decompress
if (state->strm.avail_in == 0) {
// Refill input buffer
int got = read(state->fd, state->in, state->size);
state->strm.avail_in = got;
state->strm.next_in = state->in;
}
if (gz_decomp(state) == -1) return -1;
break;
}
} while (state->strm.avail_out && !state->eof);
return 0;
}
```
### Public Read API
```c
int PREFIX(gzread)(gzFile file, void *buf, unsigned len);
int PREFIX(gzgetc)(gzFile file); // Read single character
char *PREFIX(gzgets)(gzFile file, char *buf, int len); // Read line
z_off_t PREFIX(gzungetc)(int c, gzFile file); // Push back character
int PREFIX(gzdirect)(gzFile file); // Check if raw
```
---
## Writing (`gzwrite.c`)
### Write Pipeline
```
gz_write() → gz_comp() → deflate()
```
### `gz_write_init()` — Lazy Initialisation
```c
static int gz_write_init(gz_state *state) {
// Allocate output buffer
gz_buffer_alloc(state);
// Initialize deflate
state->strm.next_out = state->out;
state->strm.avail_out = state->size;
int ret = PREFIX(deflateInit2)(&state->strm,
state->level, Z_DEFLATED,
15 + 16, // windowBits + 16 = gzip wrapping
DEF_MEM_LEVEL, state->strategy);
return ret == Z_OK ? 0 : -1;
}
```
### `gz_comp()` — Compress Buffered Data
```c
static int gz_comp(gz_state *state, int flush) {
int ret;
unsigned have;
// Deflate until done
do {
if (state->strm.avail_out == 0) {
// Flush output buffer to file
have = state->size;
if (write(state->fd, state->out, have) != have) {
state->err = Z_ERRNO;
return -1;
}
state->strm.next_out = state->out;
state->strm.avail_out = state->size;
}
ret = PREFIX(deflate)(&state->strm, flush);
} while (ret == Z_OK && state->strm.avail_out == 0);
if (flush == Z_FINISH && ret == Z_STREAM_END) {
// Write final output
have = state->size - state->strm.avail_out;
if (have && write(state->fd, state->out, have) != have) {
state->err = Z_ERRNO;
return -1;
}
}
return 0;
}
```
### Public Write API
```c
int PREFIX(gzwrite)(gzFile file, const void *buf, unsigned len);
int PREFIX(gzputc)(gzFile file, int c);
int PREFIX(gzputs)(gzFile file, const char *s);
int PREFIX(gzprintf)(gzFile file, const char *format, ...);
int PREFIX(gzflush)(gzFile file, int flush);
int PREFIX(gzsetparams)(gzFile file, int level, int strategy);
```
---
## Seeking and Position
```c
z_off64_t PREFIX(gzseek64)(gzFile file, z_off64_t offset, int whence);
z_off64_t PREFIX(gztell64)(gzFile file);
z_off64_t PREFIX(gzoffset64)(gzFile file);
int PREFIX(gzrewind)(gzFile file);
int PREFIX(gzeof)(gzFile file);
```
### Forward Seeking
For read mode, seeking forward decompresses and discards data:
```c
// In gzseek: forward seek in read mode
state->skip = offset; // Will be consumed during next gz_fetch
```
### Backward Seeking
Backward seeking requires a full rewind and re-decompression:
```c
// Must reset and decompress from the beginning
gzrewind(file);
state->skip = offset;
```
---
## Gzip Format
A gzip file (RFC 1952) consists of:
```
┌──────────────────────────────────┐
│ Header (10+ bytes) │
│ 1F 8B — magic number │
│ 08 — compression method │
│ FLG — flags │
│ MTIME — modification time │
│ XFL — extra flags │
│ OS — operating system │
│ [EXTRA] [NAME] [COMMENT] [HCRC]│
├──────────────────────────────────┤
│ Compressed data (deflate) │
├──────────────────────────────────┤
│ Trailer (8 bytes) │
│ CRC32 — CRC of original data │
│ ISIZE — size of original data │
└──────────────────────────────────┘
```
FLG bits:
- `FTEXT` (0x01) — Text mode hint
- `FHCRC` (0x02) — Header CRC present
- `FEXTRA` (0x04) — Extra field present
- `FNAME` (0x08) — Original filename present
- `FCOMMENT` (0x10) — Comment present
### Concatenated Gzip
Multiple gzip members can be concatenated. `gzread()` transparently
decompresses all members in sequence, resetting the inflate state at
each `Z_STREAM_END` boundary.
---
## Error Handling
```c
int PREFIX(gzerror)(gzFile file, int *errnum); // Get error message
void PREFIX(gzclearerr)(gzFile file); // Clear error state
```
The `gz_state.err` field tracks errors:
- `Z_OK` — No error
- `Z_ERRNO` — System I/O error (check `errno`)
- `Z_STREAM_ERROR` — Invalid state
- `Z_DATA_ERROR` — Corrupted gzip data
- `Z_MEM_ERROR` — Memory allocation failure
- `Z_BUF_ERROR` — Insufficient buffer space
---
## Close
```c
int PREFIX(gzclose)(gzFile file);
int PREFIX(gzclose_r)(gzFile file); // Close read-mode file
int PREFIX(gzclose_w)(gzFile file); // Close write-mode file
```
`gzclose_w()` flushes pending output with `Z_FINISH`, writes the
remaining compressed data, then calls `deflateEnd()`.
`gzclose_r()` calls `inflateEnd()` and frees buffers.
Both close the file descriptor (unless opened via `gzdopen()` with
the `F` flag to leave the fd open).
|