docs/handbook/cgit/caching-system.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287

# cgit — Caching System

## Overview

cgit implements a file-based output cache that stores the fully rendered
HTML/binary response for each unique request.  The cache avoids regenerating
pages for repeated identical requests.  When caching is disabled
(`cache-size=0`, the default), all output is written directly to `stdout`.

Source files: `cache.c`, `cache.h`.

## Cache Slot Structure

Each cached response is represented by a `cache_slot`:

```c
struct cache_slot {
    const char *key;       /* request identifier (URL-based) */
    int keylen;            /* strlen(key) */
    int ttl;               /* time-to-live in minutes */
    cache_fill_fn fn;      /* callback to regenerate content */
    int cache_fd;          /* fd for the cache file */
    int lock_fd;           /* fd for the .lock file */
    const char *cache_name;/* path: cache_root/hash(key) */
    const char *lock_name; /* path: cache_name + ".lock" */
    int match;             /* 1 if cache file matches key */
    struct stat cache_st;  /* stat of the cache file */
    int bufsize;           /* size of the header buffer */
    char buf[1024 + 4 * 20]; /* header: key + timestamps */
};
```

The `cache_fill_fn` typedef:

```c
typedef void (*cache_fill_fn)(void *cbdata);
```

This callback is invoked to produce the page content when the cache needs
filling.  The callback writes directly to `stdout`, which is redirected to the
lock file while cache filling is in progress.

## Hash Function

Cache file names are derived from the request key using the FNV-1 hash:

```c
unsigned long hash_str(const char *str)
{
    unsigned long h = 0x811c9dc5;
    unsigned char *s = (unsigned char *)str;
    while (*s) {
        h *= 0x01000193;
        h ^= (unsigned long)*s++;
    }
    return h;
}
```

The resulting hash is formatted as `%lx` and joined with the configured
`cache-root` directory to produce the cache file path.  The lock file is
the same path with `.lock` appended.

## Slot Lifecycle

A cache request goes through these phases, managed by `process_slot()`:

### 1. Open (`open_slot`)

Opens the cache file and reads the header.  The header contains the original
key followed by creation and expiry timestamps.  If the stored key matches the
current request key, `slot->match` is set to 1.

```c
static int open_slot(struct cache_slot *slot)
{
    slot->cache_fd = open(slot->cache_name, O_RDONLY);
    if (slot->cache_fd == -1)
        return errno;
    if (fstat(slot->cache_fd, &slot->cache_st))
        return errno;
    /* read header into slot->buf */
    return 0;
}
```

### 2. Check Match

If the file exists and the key matches, the code checks whether the entry
has expired based on the TTL:

```c
static int is_expired(struct cache_slot *slot)
{
    if (slot->ttl < 0)
        return 0;       /* negative TTL = never expires */
    return slot->cache_st.st_mtime + slot->ttl * 60 < time(NULL);
}
```

A TTL of `-1` means the entry never expires (used for `cache-static-ttl`).

### 3. Lock (`lock_slot`)

Creates the `.lock` file with `O_WRONLY | O_CREAT | O_EXCL` and writes the
header containing the key and timestamps.  If locking fails (another process
holds the lock), the stale cached content is served instead.

```c
static int lock_slot(struct cache_slot *slot)
{
    slot->lock_fd = open(slot->lock_name,
        O_WRONLY | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR);
    if (slot->lock_fd == -1)
        return errno;
    /* write header: key + creation timestamp */
    return 0;
}
```

### 4. Fill (`fill_slot`)

Redirects `stdout` to the lock file using `dup2()`, invokes the
`cache_fill_fn` callback to generate the page content, then restores `stdout`:

```c
static int fill_slot(struct cache_slot *slot)
{
    /* save original stdout */
    /* dup2(slot->lock_fd, STDOUT_FILENO) */
    slot->fn(slot->cbdata);
    /* restore original stdout */
    return 0;
}
```

### 5. Close and Rename

After filling, the lock file is atomically renamed to the cache file:

```c
if (rename(slot->lock_name, slot->cache_name))
    return errno;
```

This ensures readers never see a partially-written file.

### 6. Print (`print_slot`)

The cache file content (minus the header) is sent to `stdout`.  On Linux,
`sendfile()` is used for zero-copy output:

```c
static int print_slot(struct cache_slot *slot)
{
#ifdef HAVE_LINUX_SENDFILE
    off_t start = slot->keylen + 1;  /* skip header */
    sendfile(STDOUT_FILENO, slot->cache_fd, &start,
             slot->cache_st.st_size - start);
#else
    /* fallback: read()/write() loop */
#endif
}
```

## Process Slot State Machine

`process_slot()` implements a state machine combining all phases:

```
START → open_slot()
  ├── success + key match + not expired → print_slot() → DONE
  ├── success + key match + expired → lock_slot()
  │     ├── lock acquired → fill_slot() → close_slot() → open_slot() → print_slot()
  │     └── lock failed → print_slot() (serve stale)
  ├── success + key mismatch → lock_slot()
  │     ├── lock acquired → fill_slot() → close_slot() → open_slot() → print_slot()
  │     └── lock failed → fill_slot() (direct to stdout)
  └── open failed → lock_slot()
        ├── lock acquired → fill_slot() → close_slot() → open_slot() → print_slot()
        └── lock failed → fill_slot() (direct to stdout, no cache)
```

## Public API

```c
/* Process a request through the cache */
extern int cache_process(int size, const char *path, const char *key,
                         int ttl, cache_fill_fn fn, void *cbdata);

/* List all cache entries (for debugging/administration) */
extern int cache_ls(const char *path);

/* Hash a string using FNV-1 */
extern unsigned long hash_str(const char *str);
```

### `cache_process()`

Parameters:
- `size` — Maximum number of cache entries (from `cache-size`).  If `0`,
  caching is bypassed and `fn` is called directly.
- `path` — Cache root directory.
- `key` — Request identifier (derived from full URL + query string).
- `ttl` — Time-to-live in minutes.
- `fn` — Callback function that generates the page content.
- `cbdata` — Opaque data passed to the callback.

### `cache_ls()`

Scans the cache root directory and prints information about each cache entry
to `stdout`.  Used for administrative inspection.

## TTL Configuration Mapping

Different page types have different TTLs:

| Page Type | Config Directive | Default | Applied When |
|-----------|-----------------|---------|--------------|
| Repository list | `cache-root-ttl` | 5 min | `cmd->want_repo == 0` |
| Repo pages | `cache-repo-ttl` | 5 min | `cmd->want_repo == 1` and dynamic |
| Dynamic pages | `cache-dynamic-ttl` | 5 min | `cmd->want_vpath == 1` |
| Static content | `cache-static-ttl` | -1 (never) | SHA-referenced content |
| About pages | `cache-about-ttl` | 15 min | About/readme view |
| Snapshots | `cache-snapshot-ttl` | 5 min | Snapshot downloads |
| Scan results | `cache-scanrc-ttl` | 15 min | scan-path results |

Static content uses a TTL of `-1` because SHA-addressed content is
immutable — a given commit/tree/blob hash always refers to the same data.

## Cache Key Generation

The cache key is built from the complete query context in `cgit.c`:

```c
static const char *cache_key(void)
{
    return fmt("%s?%s?%s?%s?%s",
        ctx.qry.raw, ctx.env.http_host,
        ctx.env.https ? "1" : "0",
        ctx.env.authenticated ? "1" : "0",
        ctx.env.http_cookie ? ctx.env.http_cookie : "");
}
```

The key captures: raw query string, hostname, HTTPS state, authentication
state, and cookies.  This ensures that authenticated users get different
cache entries than unauthenticated users.

## Concurrency

The cache supports concurrent access from multiple CGI processes:

1. **Atomic writes**: Content is written to a `.lock` file first, then
   atomically renamed to the cache file.  Readers never see partial content.
2. **Non-blocking locks**: If a lock is already held, the process either
   serves stale cached content (if available) or generates content directly
   to stdout without caching.
3. **No deadlocks**: Lock files are `O_EXCL`, not `flock()`.  If a process
   crashes while holding a lock, the stale `.lock` file remains.  It is
   typically cleaned up by the next successful writer.

## Cache Directory Management

The cache root directory (`cache-root`, default `/var/cache/cgit`) must be
writable by the web server user.  Cache files are created with mode `0600`
(`S_IRUSR | S_IWUSR`).

There is no built-in cache eviction.  Old cache files persist until a new
request with the same hash replaces them.  Administrators should set up
periodic cleanup (e.g., a cron job) to purge expired files:

```bash
find /var/cache/cgit -type f -mmin +60 -delete
```

## Disabling the Cache

Set `cache-size=0` (the default).  When `size` is 0, `cache_process()` calls
the fill function directly, writing to stdout with no file I/O overhead:

```c
if (!size) {
    fn(cbdata);
    return 0;
}
```