docs/handbook/corebinutils/cat.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211

# cat — Concatenate and Display Files

## Overview

`cat` reads files sequentially and writes their contents to standard output.
It supports line numbering, non-printable character visualization, blank
line squeezing, and efficient in-kernel copying.

**Source**: `cat/cat.c` (single file)
**Origin**: BSD 4.4, University of California, Berkeley
**License**: BSD-3-Clause

## Synopsis

```
cat [-belnstuv] [file ...]
```

## Options

| Flag | Long Form | Description |
|------|-----------|-------------|
| `-b` | — | Number non-blank output lines (starting at 1) |
| `-e` | — | Display `$` at end of each line (implies `-v`) |
| `-l` | — | Set exclusive advisory lock on stdout via `flock(2)` |
| `-n` | — | Number all output lines |
| `-s` | — | Squeeze multiple adjacent blank lines into one |
| `-t` | — | Display TAB as `^I` (implies `-v`) |
| `-u` | — | Disable output buffering (write immediately) |
| `-v` | — | Visualize non-printing characters using `^X` and `M-X` notation |

## Source Analysis

### Functions

| Function | Purpose |
|----------|---------|
| `main()` | Option parsing via `getopt(3)`, dispatch to `scanfiles()` |
| `usage()` | Print usage message and exit |
| `scanfiles()` | Iterate over file arguments, handle `-` as stdin |
| `cook_cat()` | Process input with formatting (numbering, visualization, squeezing) |
| `raw_cat()` | Fast buffer-based copy without formatting |
| `in_kernel_copy()` | Zero-copy via `copy_file_range(2)` syscall |
| `init_casper()` | BSD Capsicum sandbox init (disabled on Linux) |
| `init_casper_net()` | BSD Casper network service (disabled on Linux) |
| `udom_open()` | Unix domain socket support (disabled on Linux) |

### Option Processing

```c
while ((ch = getopt(argc, argv, "belnstuv")) != -1)
    switch (ch) {
    case 'b': bflag = nflag = 1; break;  /* implies -n */
    case 'e': eflag = vflag = 1; break;  /* implies -v */
    case 'l': lflag = 1; break;
    case 'n': nflag = 1; break;
    case 's': sflag = 1; break;
    case 't': tflag = vflag = 1; break;  /* implies -v */
    case 'u': setbuf(stdout, NULL); break;
    case 'v': vflag = 1; break;
    default:  usage();
    }
```

### I/O Strategy: Three Modes

`cat` selects among three output strategies based on which flags are active:

1. **`in_kernel_copy()`** — When no formatting flags are set and the output
   supports `copy_file_range(2)`, data moves directly between file
   descriptors inside the kernel, never entering user space.

2. **`raw_cat()`** — When no formatting is needed but `copy_file_range` is
   unavailable (e.g., stdin is a pipe). Uses an adaptive read buffer.

3. **`cook_cat()`** — When any formatting flag (`-b`, `-e`, `-n`, `-s`,
   `-t`, `-v`) is active. Processes each character individually.

### Adaptive Buffer Sizing

`raw_cat()` dynamically sizes its read buffer based on available physical
memory:

```c
#define PHYSPAGES_THRESHOLD (32*1024)
#define BUFSIZE_MAX         (2*1024*1024)   /* 2 MB */
#define BUFSIZE_SMALL       (128*1024)      /* 128 KB */

if (sysconf(_SC_PHYS_PAGES) > PHYSPAGES_THRESHOLD)
    bsize = MIN(BUFSIZE_MAX, MAXPHYS * 8);
else
    bsize = BUFSIZE_SMALL;
```

On systems with more than 128 MB of RAM (`32K × 4K pages`), cat uses up
to 2 MB buffers. On constrained systems, it falls back to 128 KB.

### In-Kernel Copy

When possible, `cat` uses the Linux `copy_file_range(2)` syscall for
zero-copy I/O:

```c
static int
in_kernel_copy(int from_fd, int to_fd)
{
    ssize_t ret;

    do {
        ret = copy_file_range(from_fd, NULL, to_fd, NULL, SSIZE_MAX, 0);
    } while (ret > 0);

    return (ret == 0) ? 0 : -1;
}
```

This avoids two context switches per block (kernel→user for read,
user→kernel for write) and can be significantly faster for large files.

### Character Visualization

When `-v` is active, `cook_cat()` renders non-printable characters:

| Character Range | Rendering | Example |
|----------------|-----------|---------|
| `0x00–0x1F`    | `^@` to `^_` | `^C` for ETX |
| `0x7F`         | `^?`      | DEL character |
| `0x80–0x9F`    | `M-^@` to `M-^_` | Meta-control |
| `0xA0–0xFE`    | `M- ` to `M-~` | Meta-printable |
| `0xFF`         | `M-^?`    | Meta-DEL |
| TAB (`0x09`)   | `^I` (with `-t`) or literal | |
| Newline        | `$\n` (with `-e`) or `\n` | |

### Locale Support

`cat` calls `setlocale(LC_CTYPE, "")` for wide character handling. In
multibyte locales, the `-v` flag considers locale-specific printability
via `iswprint(3)`. In the C locale, only ASCII printable characters
pass through unmodified.

### Lock Mode

The `-l` flag acquires an exclusive advisory lock on stdout before
writing:

```c
if (lflag)
    flock(STDOUT_FILENO, LOCK_EX);
```

This prevents interleaved output when multiple `cat` processes write to
the same file simultaneously. The lock is held for the entire duration
of the program.

## System Calls Used

| Syscall | Purpose |
|---------|---------|
| `open(2)` | Open input files |
| `read(2)` | Read file data into buffer |
| `write(2)` | Write processed data to stdout |
| `copy_file_range(2)` | Zero-copy kernel-to-kernel transfer |
| `flock(2)` | Advisory locking (with `-l`) |
| `fstat(2)` | Get file type for I/O strategy selection |
| `sysconf(3)` | Query physical page count for buffer sizing |

## BSD Features Disabled on Linux

Several BSD-specific features are compiled out on Linux:

- **Capsicum sandbox** (`cap_enter`, `cap_rights_limit`): The
  `init_casper()` function is a no-op stub on Linux.
- **Unix domain socket reading** (`udom_open()`): BSD `cat` can read
  from Unix sockets via `connect(2)`. Disabled on Linux.
- **`O_RESOLVE_BENEATH`**: BSD sandbox path resolution flag. Defined to 0
  on Linux.

## Examples

```sh
# Concatenate files
cat file1.txt file2.txt > combined.txt

# Number non-blank lines
cat -b source.c

# Show invisible characters
cat -vet binary_file

# Squeeze blank lines and number
cat -sn logfile.txt

# Lock stdout for atomic output
cat -l data.csv >> shared_output.csv
```

## Exit Codes

| Code | Meaning |
|------|---------|
| 0    | All files read successfully |
| 1    | Error opening or reading a file |

## Edge Cases

- Reading from stdin: When no files are specified or `-` is given, cat
  reads from standard input.
- Empty files: Produce no output but are not errors.
- Binary files: Processed byte-by-byte; `-v` makes them viewable.
- Named pipes and devices: `raw_cat()` handles them with buffered reads.
  `copy_file_range` is not attempted on non-regular files.