# cgit — Architecture
## High-Level Component Map
```
┌──────────────────────────────────────────────────────────────┐
│ cgit.c │
│ constructor_environment() [__attribute__((constructor))] │
│ prepare_context() → config_cb() → querystring_cb() │
│ authenticate_cookie() → process_request() → main() │
├──────────────────────────────────────────────────────────────┤
│ Command Dispatcher │
│ cmd.c │
│ cgit_get_cmd() → static cmds[] table (23 entries) │
│ struct cgit_cmd { name, fn, want_repo, want_vpath, is_clone }│
├──────────┬───────────┬───────────┬───────────────────────────┤
│ UI Layer │ Caching │ Filters │ HTTP Clone │
│ ui-*.c │ cache.c │ filter.c │ ui-clone.c │
│ (17 mods)│ cache.h │ │ │
├──────────┴───────────┴───────────┴───────────────────────────┤
│ Core Utilities │
│ shared.c — global vars, repo mgmt, diff wrappers │
│ parsing.c — cgit_parse_commit(), cgit_parse_tag(), │
│ cgit_parse_url() │
│ html.c — entity escaping, URL encoding, form helpers │
│ configfile.c — line-oriented name=value parser │
│ scan-tree.c — filesystem repository discovery │
├──────────────────────────────────────────────────────────────┤
│ Vendored git library │
│ git/ — full Git 2.46.0 source; linked via cgit.mk │
│ Provides: object store, diff engine (xdiff), refs, revwalk, │
│ archive, notes, commit graph, blame, packfile │
└──────────────────────────────────────────────────────────────┘
```
## Global State
cgit uses a single global variable to carry all request state:
```c
/* shared.c */
struct cgit_repolist cgit_repolist; /* Array of all known repositories */
struct cgit_context ctx; /* Current request context */
```
### `struct cgit_context`
```c
struct cgit_context {
struct cgit_environment env; /* CGI env vars (HTTP_HOST, QUERY_STRING, etc.) */
struct cgit_query qry; /* Parsed URL/query parameters */
struct cgit_config cfg; /* All global config directives */
struct cgit_repo *repo; /* Pointer into cgit_repolist.repos[] or NULL */
struct cgit_page page; /* HTTP response metadata (mimetype, status, etag) */
};
```
### `struct cgit_environment`
Populated by `prepare_context()` via `getenv()`:
```c
struct cgit_environment {
const char *cgit_config; /* $CGIT_CONFIG (default: /etc/cgitrc) */
const char *http_host; /* $HTTP_HOST */
const char *https; /* $HTTPS ("on" if TLS) */
const char *no_http; /* $NO_HTTP (non-NULL → CLI mode) */
const char *path_info; /* $PATH_INFO */
const char *query_string; /* $QUERY_STRING */
const char *request_method; /* $REQUEST_METHOD */
const char *script_name; /* $SCRIPT_NAME */
const char *server_name; /* $SERVER_NAME */
const char *server_port; /* $SERVER_PORT */
const char *http_cookie; /* $HTTP_COOKIE */
const char *http_referer; /* $HTTP_REFERER */
unsigned int content_length; /* $CONTENT_LENGTH */
int authenticated; /* Set by auth filter (0 or 1) */
};
```
### `struct cgit_page`
Controls HTTP response headers:
```c
struct cgit_page {
time_t modified; /* Last-Modified header */
time_t expires; /* Expires header */
size_t size; /* Content-Length (0 = omit) */
const char *mimetype; /* Content-Type (default: "text/html") */
const char *charset; /* charset param (default: "UTF-8") */
const char *filename; /* Content-Disposition filename */
const char *etag; /* ETag header value */
const char *title; /* HTML
*/
int status; /* HTTP status code (0 = 200) */
const char *statusmsg; /* HTTP status message */
};
```
## Request Lifecycle — Detailed
### Phase 1: Pre-main Initialization
```c
__attribute__((constructor))
static void constructor_environment()
{
setenv("GIT_CONFIG_NOSYSTEM", "1", 1);
setenv("GIT_ATTR_NOSYSTEM", "1", 1);
unsetenv("HOME");
unsetenv("XDG_CONFIG_HOME");
}
```
This runs before `main()` on every invocation. It prevents Git from loading
`/etc/gitconfig`, `~/.gitconfig`, or any `$XDG_CONFIG_HOME/git/config`, ensuring
complete isolation from the host system's Git configuration.
### Phase 2: Context Preparation
`prepare_context()` zero-initializes `ctx` and sets every configuration field
to its default value. Key defaults:
| Field | Default |
|-------|---------|
| `cfg.cache_size` | `0` (disabled) |
| `cfg.cache_root` | `CGIT_CACHE_ROOT` (`/var/cache/cgit`) |
| `cfg.cache_repo_ttl` | `5` minutes |
| `cfg.cache_root_ttl` | `5` minutes |
| `cfg.cache_static_ttl` | `-1` (never expires) |
| `cfg.max_repo_count` | `50` |
| `cfg.max_commit_count` | `50` |
| `cfg.max_msg_len` | `80` |
| `cfg.max_repodesc_len` | `80` |
| `cfg.enable_http_clone` | `1` |
| `cfg.enable_index_owner` | `1` |
| `cfg.enable_tree_linenumbers` | `1` |
| `cfg.summary_branches` | `10` |
| `cfg.summary_log` | `10` |
| `cfg.summary_tags` | `10` |
| `cfg.difftype` | `DIFF_UNIFIED` |
| `cfg.robots` | `"index, nofollow"` |
| `cfg.root_title` | `"Git repository browser"` |
The function also reads all CGI environment variables and sets
`page.mimetype = "text/html"`, `page.charset = PAGE_ENCODING` (`"UTF-8"`).
### Phase 3: Configuration Parsing
```c
parse_configfile(ctx.env.cgit_config, config_cb);
```
`parse_configfile()` (in `configfile.c`) opens the file, reads lines of the
form `name=value`, skips comments (`#` and `;`), and calls the callback for each
directive. It supports recursive `include=` directives up to 8 levels deep.
`config_cb()` (in `cgit.c`) is a ~200-line chain of `if/else if` blocks that
maps directive names to `ctx.cfg.*` fields. When `repo.url=` is encountered,
`cgit_add_repo()` allocates a new repository entry; subsequent `repo.*`
directives configure that entry via `repo_config()`.
Special directive: `scan-path=` triggers immediate filesystem scanning via
`scan_tree()` or `scan_projects()`, or via a cached repolist file if
`cache-size > 0`.
### Phase 4: Query String Parsing
```c
http_parse_querystring(ctx.qry.raw, querystring_cb);
```
`querystring_cb()` maps short parameter names to `ctx.qry.*` fields:
| Parameter | Field | Purpose |
|-----------|-------|---------|
| `r` | `qry.repo` | Repository URL |
| `p` | `qry.page` | Page name |
| `url` | `qry.url` | Combined repo/page/path |
| `h` | `qry.head` | Branch/ref |
| `id` | `qry.oid` | Object ID |
| `id2` | `qry.oid2` | Second object ID (for diffs) |
| `ofs` | `qry.ofs` | Pagination offset |
| `q` | `qry.search` | Search query |
| `qt` | `qry.grep` | Search type |
| `path` | `qry.path` | File path |
| `name` | `qry.name` | Snapshot filename |
| `dt` | `qry.difftype` | Diff type (0/1/2) |
| `context` | `qry.context` | Diff context lines |
| `ignorews` | `qry.ignorews` | Ignore whitespace |
| `follow` | `qry.follow` | Follow renames |
| `showmsg` | `qry.showmsg` | Show full messages |
| `s` | `qry.sort` | Sort order |
| `period` | `qry.period` | Stats period |
The `url=` parameter receives special processing via `cgit_parse_url()` (in
`parsing.c`), which iteratively splits the URL at `/` characters, looking for
the longest prefix that matches a known repository URL.
### Phase 5: Authentication
`authenticate_cookie()` checks three cases:
1. **No auth filter** → set `ctx.env.authenticated = 1` and return.
2. **POST to login page** → call `authenticate_post()`, which reads up to
`MAX_AUTHENTICATION_POST_BYTES` (4096) from stdin, pipes it to the auth
filter with function `"authenticate-post"`, and exits.
3. **Normal request** → invoke auth filter with function
`"authenticate-cookie"`. The filter's exit code becomes
`ctx.env.authenticated`.
The auth filter receives 12 arguments:
```
function, cookie, method, query_string, http_referer,
path_info, http_host, https, repo, page, fullurl, loginurl
```
### Phase 6: Cache Envelope
If `ctx.cfg.cache_size > 0`, the request is wrapped in `cache_process()`:
```c
cache_process(ctx.cfg.cache_size, ctx.cfg.cache_root,
cache_key, ttl, fill_fn);
```
This constructs a filename from the FNV-1 hash of the cache key, attempts to
open an existing slot, verifies the key matches, checks expiry, and either
serves cached content or locks and fills a new slot. See the Caching System
document for full details.
### Phase 7: Command Dispatch
```c
cmd = cgit_get_cmd();
```
`cgit_get_cmd()` (in `cmd.c`) performs a linear scan of the static `cmds[]`
table:
```c
static struct cgit_cmd cmds[] = {
def_cmd(HEAD, 1, 0, 1),
def_cmd(atom, 1, 0, 0),
def_cmd(about, 0, 0, 0),
def_cmd(blame, 1, 1, 0),
def_cmd(blob, 1, 0, 0),
def_cmd(cla, 0, 0, 0),
def_cmd(commit, 1, 1, 0),
def_cmd(coc, 0, 0, 0),
def_cmd(diff, 1, 1, 0),
def_cmd(info, 1, 0, 1),
def_cmd(log, 1, 1, 0),
def_cmd(ls_cache, 0, 0, 0),
def_cmd(objects, 1, 0, 1),
def_cmd(patch, 1, 1, 0),
def_cmd(plain, 1, 0, 0),
def_cmd(rawdiff, 1, 1, 0),
def_cmd(refs, 1, 0, 0),
def_cmd(repolist, 0, 0, 0),
def_cmd(snapshot, 1, 0, 0),
def_cmd(stats, 1, 1, 0),
def_cmd(summary, 1, 0, 0),
def_cmd(tag, 1, 0, 0),
def_cmd(tree, 1, 1, 0),
};
```
The `def_cmd` macro expands to `{#name, name##_fn, want_repo, want_vpath, is_clone}`.
Default page if none specified:
- With a repository → `"summary"`
- Without a repository → `"repolist"`
### Phase 8: Repository Preparation
If `cmd->want_repo` is set:
1. `prepare_repo_env()` calls `setenv("GIT_DIR", ctx.repo->path, 1)`,
`setup_git_directory_gently()`, and `load_display_notes()`.
2. `prepare_repo_cmd()` resolves the default branch (via `guess_defbranch()`
which checks `HEAD` → `refs/heads/*`), resolves the requested head to an OID,
sorts submodules, chooses the README file, and sets the page title.
### Phase 9: Page Rendering
The handler function (`cmd->fn()`) is called. Most handlers follow this
pattern:
```c
cgit_print_layout_start(); /* HTTP headers + HTML doctype + header + tabs */
/* ... page-specific content ... */
cgit_print_layout_end(); /* footer + closing tags */
```
`cgit_print_layout_start()` calls:
- `cgit_print_http_headers()` — Content-Type, Last-Modified, Expires, ETag
- `cgit_print_docstart()` — ``, ``, CSS/JS includes
- `cgit_print_pageheader()` — header table, navigation tabs, breadcrumbs
## Module Dependency Graph
```
cgit.c ──→ cmd.c ──→ ui-*.c (all modules)
│ │
│ └──→ cache.c
│
├──→ configfile.c
├──→ scan-tree.c ──→ configfile.c
├──→ ui-shared.c ──→ html.c
├──→ ui-stats.c
├──→ ui-blob.c
├──→ ui-summary.c
└──→ filter.c
ui-commit.c ──→ ui-diff.c ──→ ui-ssdiff.c
ui-summary.c ──→ ui-log.c, ui-refs.c, ui-blob.c, ui-plain.c
ui-log.c ──→ ui-shared.c
All ui-*.c ──→ html.c, ui-shared.c
```
## The `struct cgit_cmd` Pattern
Each command in `cmd.c` is defined as a static function that wraps the
corresponding UI module:
```c
static void log_fn(void)
{
cgit_print_log(ctx.qry.oid, ctx.qry.ofs, ctx.cfg.max_commit_count,
ctx.qry.grep, ctx.qry.search, ctx.qry.path, 1,
ctx.repo->enable_commit_graph,
ctx.repo->commit_sort);
}
```
The thin wrapper pattern means all context is accessed via the global `ctx`
struct, and the wrapper simply extracts the relevant fields and passes them to
the module function.
## Repository List Management
The `cgit_repolist` global is a dynamically-growing array:
```c
struct cgit_repolist {
int length; /* Allocated capacity */
int count; /* Number of repos */
struct cgit_repo *repos; /* Array */
};
```
`cgit_add_repo()` doubles the array capacity when needed (starting from 8).
Each new repo inherits defaults from `ctx.cfg.*` (snapshots, feature flags,
filters, etc.).
`cgit_get_repoinfo()` performs a linear scan (O(n)) to find a repo by URL.
Ignored repos (`repo->ignore == 1`) are skipped.
## Build System
The build works in two stages:
1. **Git build** — `make` in the top-level `cgit/` directory delegates to
`make -C git -f ../cgit.mk` which includes Git's own `Makefile`.
2. **cgit link** — `cgit.mk` lists all cgit object files (`CGIT_OBJ_NAMES`),
compiles them with `CGIT_CFLAGS` (which embeds `CGIT_CONFIG`,
`CGIT_SCRIPT_NAME`, `CGIT_CACHE_ROOT` as string literals), and links them
against Git's `libgit.a`.
Lua support is auto-detected via `pkg-config` (checking `luajit`, `lua`,
`lua5.2`, `lua5.1` in order). Define `NO_LUA=1` to build without Lua.
Linux systems get `HAVE_LINUX_SENDFILE` which enables the `sendfile()` syscall
in the cache layer.
## Thread Safety
cgit runs as a **single-process CGI** — one process per HTTP request. There is
no multi-threading. All global state (`ctx`, `cgit_repolist`, the static
`diffbuf` in `shared.c`, the static format buffers in `html.c`) is safe because
each process is fully isolated.
The `fmt()` function in `html.c` uses a ring buffer of 8 static buffers
(`static char buf[8][1024]`) to allow up to 8 nested `fmt()` calls in a single
expression. The `bufidx` rotates via `bufidx = (bufidx + 1) & 7`.
## Error Handling
The codebase uses three assertion-style helpers from `shared.c`:
```c
int chk_zero(int result, char *msg); /* die if result != 0 */
int chk_positive(int result, char *msg); /* die if result <= 0 */
int chk_non_negative(int result, char *msg); /* die if result < 0 */
```
For user-facing errors, `cgit_print_error_page()` sets HTTP status, prints
headers, renders the page skeleton, and displays the error message.
## Type System
cgit uses three enums defined in `cgit.h`:
```c
typedef enum {
DIFF_UNIFIED, DIFF_SSDIFF, DIFF_STATONLY
} diff_type;
typedef enum {
ABOUT, COMMIT, SOURCE, EMAIL, AUTH, OWNER
} filter_type;
```
And three function pointer typedefs:
```c
typedef void (*configfn)(const char *name, const char *value);
typedef void (*filepair_fn)(struct diff_filepair *pair);
typedef void (*linediff_fn)(char *line, int len);
```