1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
|
# Corebinutils — Architecture
## Repository Layout
The corebinutils tree follows a straightforward directory-per-utility layout
with a top-level orchestrator build system:
```
corebinutils/
├── configure # POSIX sh configure script
├── README.md # Top-level build instructions
├── .gitattributes
├── .gitignore
│
├── config.mk # [generated] feature detection results
├── GNUmakefile # [generated] top-level build orchestrator
│
├── build/ # [generated] intermediate object files
│ ├── configure/ # Configure test artifacts and logs
│ ├── cat/ # Per-utility build intermediates
│ ├── chmod/
│ ├── ...
│ └── sh/
│
├── out/ # [generated] final binaries
│ └── bin/ # Staged executables (after `make stage`)
│
├── contrib/ # Shared library sources
│ ├── libc-vis/ # vis(3)/unvis(3) implementation
│ ├── libedit/ # editline(3) library
│ └── printf/ # Shared printf format helpers
│
├── cat/ # Utility: cat
│ ├── cat.c # Main source
│ ├── cat.1 # Manual page (groff)
│ ├── GNUmakefile # Per-utility build rules
│ └── README.md # Port notes and differences
│
├── chmod/ # Utility: chmod
│ ├── chmod.c # Main implementation
│ ├── mode.c # Mode parsing library (shared with mkdir)
│ ├── mode.h # Mode parsing header
│ ├── GNUmakefile
│ └── chmod.1
│
├── dd/ # Utility: dd (multi-file)
│ ├── dd.c # Main control flow
│ ├── dd.h # Shared types (IO, STAT, flags)
│ ├── extern.h # Function declarations
│ ├── args.c # JCL argument parser
│ ├── conv.c # Conversion functions (block/unblock/def)
│ ├── conv_tab.c # ASCII/EBCDIC conversion tables
│ ├── gen.c # Signal handling helpers
│ ├── misc.c # Summary, progress, timing
│ ├── position.c # Input/output seek positioning
│ └── GNUmakefile
│
├── ed/ # Utility: ed (multi-file)
│ ├── main.c # Command dispatch and main loop
│ ├── ed.h # Types (line_t, undo_t, constants)
│ ├── compat.c / compat.h # Portability shims
│ ├── buf.c # Buffer management (scratch file)
│ ├── glbl.c # Global command (g/re/cmd)
│ ├── io.c # File I/O (read_file, write_file)
│ ├── re.c # Regular expression handling
│ ├── sub.c # Substitution command
│ └── undo.c # Undo stack management
│
├── ls/ # Utility: ls (multi-file)
│ ├── ls.c # Main logic, option parsing, directory traversal
│ ├── ls.h # Types (entry, context, enums)
│ ├── extern.h # Cross-module declarations
│ ├── print.c # Output formatting (columns, long, stream)
│ ├── cmp.c # Sort comparison functions
│ └── util.c # Helper functions
│
├── ps/ # Utility: ps (multi-file)
│ ├── ps.c # Main logic, /proc scanning
│ ├── ps.h # Types (kinfo_proc, KINFO, VAR)
│ ├── extern.h # Cross-module declarations
│ ├── fmt.c # Format string parsing
│ ├── keyword.c # Output keyword definitions
│ ├── print.c # Field value formatting
│ └── nlist.c # Name list handling
│
└── sh/ # Utility: POSIX shell
├── main.c # Shell entry point
├── parser.c / parser.h # Command parser
├── eval.c # Command evaluator
├── exec.c # Command execution
├── jobs.c # Job control
├── var.c # Variable management
├── trap.c # Signal/trap handling
├── expand.c # Parameter expansion
├── redir.c # I/O redirection
└── ... # (60+ additional files)
```
## Build System Architecture
### Two-Level Build Organization
The build system has two distinct levels:
1. **Top-level orchestrator** — Generated `GNUmakefile` and `config.mk` that
coordinate all subdirectories.
2. **Per-utility `GNUmakefile`** — Each utility directory has its own build
rules. These are the source of truth and are never overwritten by
`configure`.
The top-level `GNUmakefile` invokes subdirectory builds via recursive make:
```makefile
build-%: prepare-%
+env CPPFLAGS="$(CPPFLAGS)" CFLAGS="$(CFLAGS)" LDFLAGS="$(LDFLAGS)" \
$(MAKE) -C "$*" -f GNUmakefile $(SUBMAKE_OVERRIDES) all
```
### Shared Output Directories
All utilities share centralized output directories to simplify packaging:
```
build/ # Object files, organized per-utility: build/cat/, build/chmod/, ...
out/ # Final linked binaries
out/bin/ # Staged binaries (after `make stage`)
```
Subdirectories get symbolic links (`build -> ../build/<util>`,
`out -> ../out`) created by the `prepare-%` target:
```makefile
prepare-%:
@mkdir -p "$(MONO_BUILDDIR)/$*" "$(MONO_OUTDIR)"
@ln -sfn "../build/$*" "$*/build"
@ln -sfn "../out" "$*/out"
```
### Variable Propagation
The top-level Makefile passes all detected toolchain variables to
subdirectory builds via `SUBMAKE_OVERRIDES`:
```makefile
SUBMAKE_OVERRIDES = \
CC="$(CC)" \
AR="$(AR)" \
AWK="$(AWK)" \
RANLIB="$(RANLIB)" \
NM="$(NM)" \
SH="$(SH)" \
CRYPTO_LIBS="$(CRYPTO_LIBS)" \
EDITLINE_CPPFLAGS="$(EDITLINE_CPPFLAGS)" \
EDITLINE_LIBS="$(EDITLINE_LIBS)" \
PREFIX="$(PREFIX)" \
BINDIR="$(BINDIR)" \
DESTDIR="$(DESTDIR)" \
CROSS_COMPILING="$(CROSS_COMPILING)" \
EXEEXT="$(EXEEXT)"
```
This ensures every utility builds with the same compiler, flags, and
library configuration.
### Generated vs. Maintained Files
| File | Generated? | Purpose |
|------------------|------------|--------------------------------------|
| `configure` | No | POSIX sh configure script |
| `config.mk` | Yes | Feature detection macros |
| `GNUmakefile` | Yes | Top-level orchestrator |
| `*/GNUmakefile` | No | Per-utility build rules |
| `build/` | Yes | Object file directory tree |
| `out/` | Yes | Binary output directory |
## Configure Script Architecture
### Script Structure
The `configure` script is a single POSIX shell file (no autoconf) organized
into these phases:
```
1. Initialization — Set defaults, parse CLI arguments
2. Compiler Detection — Find musl-first C compiler
3. Tool Detection — Find make, ar, ranlib, nm, awk, sh, pkg-config
4. Libc Identification — Determine musl vs glibc via binary inspection
5. Header Probing — Check for ~40 system headers
6. Function Probing — Check for ~20 C library functions
7. Library Probing — Check for optional libraries (crypt, dl, pthread, rt)
8. File Generation — Write config.mk and GNUmakefile
```
### Compiler Probing
The compiler detection uses three progressive tests:
```sh
# Can it compile a simple program?
can_compile_with() { ... }
# Can it compile AND run? (native builds only)
can_run_with() { ... }
# Does it support C11 stdatomic.h?
can_compile_stdatomic_with() { ... }
```
All three must pass. For cross-compilation (`--host != --build`), the
run test is skipped.
### Feature Detection Pattern
Headers and functions are probed with a consistent pattern that records
results as Make variables and C preprocessor defines:
```sh
check_header() {
hdr=$1
macro="HAVE_$(to_macro "$hdr")" # e.g., HAVE_SYS_ACL_H
if try_cc "#include <$hdr>
int main(void) { return 0; }"; then
record_cpp_define "$macro" 1
else
record_cpp_define "$macro" 0
fi
}
check_func() {
func=$1
includes=$2
macro="HAVE_$(to_macro "$func")" # e.g., HAVE_COPY_FILE_RANGE
if try_cc "$includes
int main(void) { void *p = (void *)(uintptr_t)&$func; return p == 0; }"; then
record_cpp_define "$macro" 1
else
record_cpp_define "$macro" 0
fi
}
```
### Headers Probed
The configure script checks for the following headers:
```
stdlib.h stdio.h stdint.h inttypes.h stdbool.h stddef.h
string.h strings.h unistd.h errno.h fcntl.h signal.h
sys/types.h sys/stat.h sys/time.h sys/resource.h sys/wait.h
sys/select.h sys/ioctl.h sys/param.h sys/socket.h netdb.h
poll.h sys/poll.h termios.h stropts.h pthread.h
sys/event.h sys/timerfd.h sys/acl.h attr/xattr.h linux/xattr.h
dlfcn.h langinfo.h locale.h wchar.h wctype.h
```
### Functions Probed
```
getcwd realpath fchdir fstatat openat copy_file_range
memmove strlcpy strlcat explicit_bzero getline getentropy
posix_spawn clock_gettime poll kqueue timerfd_create
pipe2 closefrom getrandom
```
### Libraries Probed
| Library | Symbol | Usage |
|----------|---------------------|------------------------------------|
| crypt | `crypt()` | Password hashing (`ed -x` legacy) |
| dl | `dlopen()` | Dynamic loading |
| pthread | `pthread_create()` | Threading support |
| rt | `clock_gettime()` | High-resolution timing |
| util | `openpty()` | Pseudo-terminal support |
| attr | `setxattr()` | Extended attributes (`mv`, `cp`) |
| selinux | `is_selinux_enabled()` | SELinux label support |
## Code Organization Patterns
### Single-File Utility Pattern
Most simple utilities follow this structure:
```c
/* SPDX license header */
#include <system-headers.h>
struct options { ... };
static const char *progname;
static void usage(void) __attribute__((__noreturn__));
static void error_errno(const char *, ...);
static void error_msg(const char *, ...);
int main(int argc, char *argv[])
{
struct options opt;
int ch;
progname = program_name(argv[0]);
while ((ch = getopt(argc, argv, "...")) != -1) {
switch (ch) {
case 'f': opt.force = true; break;
/* ... */
default: usage();
}
}
argc -= optind;
argv += optind;
/* Perform main operation */
for (int i = 0; i < argc; i++) {
if (process(argv[i], &opt) != 0)
exitval = 1;
}
return exitval;
}
```
### Multi-File Utility Pattern
Complex utilities split across files with a shared header:
```
utility/
├── utility.c # main(), option parsing, top-level dispatch
├── utility.h # Shared types, constants, macros
├── extern.h # Function declarations for cross-module calls
├── sub1.c # Functional subsystem (e.g., args.c, conv.c)
├── sub2.c # Another subsystem (e.g., print.c, fmt.c)
└── GNUmakefile # Build rules listing all .c files
```
### Header Guard Convention
Headers use the BSD `_FILENAME_H_` pattern:
```c
#ifndef _PS_H_
#define _PS_H_
/* ... */
#endif
```
### Portability Macros
Common compatibility macros appear across multiple utilities:
```c
#ifndef __unused
#define __unused __attribute__((__unused__))
#endif
#ifndef __dead2
#define __dead2 __attribute__((__noreturn__))
#endif
#ifndef nitems
#define nitems(array) (sizeof(array) / sizeof((array)[0]))
#endif
#ifndef MIN
#define MIN(a, b) ((a) < (b) ? (a) : (b))
#endif
#ifndef MAX
#define MAX(a, b) ((a) > (b) ? (a) : (b))
#endif
```
### POSIX Feature Test Macros
Many utilities define feature test macros at the top of their main source
file:
```c
#define _POSIX_C_SOURCE 200809L
```
Or rely on the configure-injected flags:
```
-D_DEFAULT_SOURCE -D_XOPEN_SOURCE=700
```
## Shared Code Reuse
### `mode.c` / `mode.h`
The mode parsing library is shared between `chmod` and `mkdir`. It provides:
- `mode_compile()` — Parse a mode string (numeric or symbolic) into a
compiled command array (`bitcmd_t`)
- `mode_apply()` — Apply a compiled mode to an existing `mode_t`
- `mode_free()` — Release compiled mode memory
- `strmode()` — Convert `mode_t` to display string like `"drwxr-xr-x "`
### `fts.c` / `fts.h`
An in-tree FTS (File Tree Walk) implementation used by `cp`, `chflags`, and
other utilities that do recursive directory traversal. This avoids depending
on glibc's FTS implementation or `nftw(3)`.
### `contrib/libc-vis/`
BSD `vis(3)` / `unvis(3)` character encoding used by `ls` for safe
display of filenames containing control characters or non-printable bytes.
### Signal Name Tables
`kill` and `timeout` both maintain identical `struct signal_entry` tables
mapping signal names to numbers:
```c
struct signal_entry {
const char *name;
int number;
};
#define SIGNAL_ENTRY(name) { #name, SIG##name }
static const struct signal_entry canonical_signals[] = {
SIGNAL_ENTRY(HUP),
SIGNAL_ENTRY(INT),
SIGNAL_ENTRY(QUIT),
/* ... ~30 standard signals ... */
};
```
Both also share the same `normalize_signal_name()` function pattern that
strips "SIG" prefixes and uppercases input.
## Data Structures
### Process Information (`ps`)
The `ps` utility defines a Linux-compatible replacement for FreeBSD's
`kinfo_proc`:
```c
struct kinfo_proc {
pid_t ki_pid, ki_ppid, ki_pgid, ki_sid;
dev_t ki_tdev;
uid_t ki_uid, ki_ruid, ki_svuid;
gid_t ki_groups[KI_NGROUPS];
char ki_comm[COMMLEN]; // 256 bytes
struct timeval ki_start;
uint64_t ki_runtime; // microseconds
uint64_t ki_size; // VSZ in bytes
uint64_t ki_rssize; // RSS in pages
int ki_nice;
char ki_stat; // BSD-like state (S,R,T,Z,D)
int ki_numthreads;
struct rusage ki_rusage;
/* ... */
};
```
This struct is populated by reading `/proc/[pid]/stat` and
`/proc/[pid]/status` files.
### I/O State (`dd`)
The `dd` utility uses two key structures for its I/O engine:
```c
typedef struct {
u_char *db; // Buffer address
u_char *dbp; // Current buffer I/O position
ssize_t dbcnt; // Current byte count in buffer
ssize_t dbrcnt; // Last read byte count
ssize_t dbsz; // Block size
u_int flags; // ISCHR | ISPIPE | ISTAPE | ISSEEK | NOREAD | ISTRUNC
const char *name; // Filename
int fd; // File descriptor
off_t offset; // Block count to skip
off_t seek_offset;// Sparse output seek offset
} IO;
typedef struct {
uintmax_t in_full, in_part; // Full/partial input blocks
uintmax_t out_full, out_part; // Full/partial output blocks
uintmax_t trunc; // Truncated records
uintmax_t swab; // Odd-length swab blocks
uintmax_t bytes; // Total bytes written
struct timespec start; // Start timestamp
} STAT;
```
### Line Buffer (`ed`)
The `ed` editor uses a doubly-linked list of line nodes with a scratch
file backing store:
```c
typedef struct line {
struct line *q_forw; // Next line
struct line *q_back; // Previous line
off_t seek; // Offset in scratch file
int len; // Line length
} line_t;
```
### File Entry (`ls`)
The `ls` utility represents each directory entry with:
```c
struct entry {
struct stat sb;
struct file_time btime; // Birth time (via statx)
char *name; // Display name
char *link_target; // Symlink target (if applicable)
/* color, type classification, etc. */
};
```
## Makefile Targets Reference
### Top-Level Targets
| Target | Description |
|--------------------|-------------------------------------------------------|
| `all` | Build all utilities |
| `clean` | Remove `build/` and `out/` directories |
| `distclean` | `clean` + remove generated `GNUmakefile`, `config.mk` |
| `rebuild` | `clean` then `all` |
| `reconfigure` | Re-run `./configure` |
| `check` / `test` | Run all utility test suites |
| `stage` | Copy binaries to `out/bin/` |
| `install` | Copy binaries to `$DESTDIR$BINDIR` |
| `status` | Show `out/` directory contents |
| `list` | Print all subdirectory names |
| `print-config` | Show active compiler and flags |
| `help` | List available targets |
### Per-Utility Targets
Individual utilities can be built, cleaned, or tested:
```sh
make -f GNUmakefile build-cat # Build only cat
make -f GNUmakefile clean-cat # Clean only cat
make -f GNUmakefile check-cat # Test only cat
make -f GNUmakefile cat # Alias for build-cat
```
### Target Dependencies
```
all
└── build-<util> (for each utility)
└── prepare-<util>
├── mkdir -p build/<util> out/
├── ln -sfn ../build/<util> <util>/build
└── ln -sfn ../out <util>/out
stage
└── all
└── copy executables to out/bin/
install
└── stage
└── copy out/bin/* to $DESTDIR$BINDIR/
distclean
└── clean
└── remove build/ out/
└── unprepare
└── remove build/out symlinks from subdirs
└── remove GNUmakefile config.mk
```
## Cross-Compilation Support
The configure script supports cross-compilation via `--host` and `--build`
triples:
```sh
./configure --host=aarch64-linux-musl --build=x86_64-linux-musl \
--cc=aarch64-linux-musl-gcc
```
When `--host` differs from `--build`:
- The executable run test (`can_run_with`) is skipped
- `CROSS_COMPILING=1` is recorded in `config.mk`
- The value propagates to all subdirectory builds
## Typical Per-Utility GNUmakefile
Each utility has a `GNUmakefile` following this general pattern:
```makefile
# cat/GNUmakefile
PROG = cat
SRCS = cat.c
BUILDDIR ?= build
OUTDIR ?= out
CC ?= cc
CPPFLAGS += -D_DEFAULT_SOURCE -D_XOPEN_SOURCE=700
CFLAGS ?= -O2 -g -pipe
LDFLAGS ?=
OBJS = $(SRCS:.c=.o)
OBJS := $(addprefix $(BUILDDIR)/,$(OBJS))
all: $(OUTDIR)/$(PROG)
$(OUTDIR)/$(PROG): $(OBJS)
$(CC) $(LDFLAGS) -o $@ $(OBJS) $(LDLIBS)
$(BUILDDIR)/%.o: %.c
@mkdir -p $(dir $@)
$(CC) $(CPPFLAGS) $(CFLAGS) -c -o $@ $<
clean:
rm -f $(OBJS) $(OUTDIR)/$(PROG)
test:
@echo "SKIP: no tests for $(PROG)"
.PHONY: all clean test
```
Multi-file utilities list all sources in `SRCS` and may link additional
libraries:
```makefile
# dd/GNUmakefile
SRCS = dd.c args.c conv.c conv_tab.c gen.c misc.c position.c
LDLIBS += -lm # For dd's speed calculations
```
## Security Considerations
### Input Validation Boundaries
- **File paths**: Validated against `PATH_MAX` limits. Utilities like `rm`
explicitly reject `/`, `.`, and `..` as arguments.
- **Numeric arguments**: Parsed with `strtoimax()` or `strtol()` with
explicit overflow checking.
- **Signal numbers**: Validated against the compiled signal table, not
unchecked `atoi()`.
- **Mode strings**: `mode_compile()` validates syntax before any filesystem
modification occurs.
### Privilege Handling
- `hostname` and `domainname` require root for set operations; they validate
the hostname length against the kernel's UTS namespace limit first.
- `rm` refuses to delete `/` unless explicitly overridden.
- `chmod -R` includes cycle detection to prevent infinite loops from symlink
chains.
### Temporary File Safety
- `ed` creates temporary scratch files in `$TMPDIR` (or `/tmp`) using
`mkstemp(3)`.
- `dd` does not create temporary files — it operates on explicit input/output
file descriptors.
|