diff options
| author | Mehmet Samet Duman <yongdohyun@projecttick.org> | 2026-04-05 17:37:54 +0300 |
|---|---|---|
| committer | Mehmet Samet Duman <yongdohyun@projecttick.org> | 2026-04-05 17:37:54 +0300 |
| commit | 32f5f761bc8e960293b4f4feaf973dd0da26d0f8 (patch) | |
| tree | 8d0436fdd093d5255c3b75e45f9741882b22e2e4 /docs/handbook/corebinutils | |
| parent | 64f4ddfa97c19f371fe1847b20bd26803f0a25d5 (diff) | |
| download | Project-Tick-32f5f761bc8e960293b4f4feaf973dd0da26d0f8.tar.gz Project-Tick-32f5f761bc8e960293b4f4feaf973dd0da26d0f8.zip | |
NOISSUE Project Tick Handbook is Released!
Assisted-by: Claude:Opus-4.6-High
Signed-off-by: Mehmet Samet Duman <yongdohyun@projecttick.org>
Diffstat (limited to 'docs/handbook/corebinutils')
27 files changed, 7579 insertions, 0 deletions
diff --git a/docs/handbook/corebinutils/architecture.md b/docs/handbook/corebinutils/architecture.md new file mode 100644 index 0000000000..7f6342c9f0 --- /dev/null +++ b/docs/handbook/corebinutils/architecture.md @@ -0,0 +1,665 @@ +# Corebinutils — Architecture + +## Repository Layout + +The corebinutils tree follows a straightforward directory-per-utility layout +with a top-level orchestrator build system: + +``` +corebinutils/ +├── configure # POSIX sh configure script +├── README.md # Top-level build instructions +├── .gitattributes +├── .gitignore +│ +├── config.mk # [generated] feature detection results +├── GNUmakefile # [generated] top-level build orchestrator +│ +├── build/ # [generated] intermediate object files +│ ├── configure/ # Configure test artifacts and logs +│ ├── cat/ # Per-utility build intermediates +│ ├── chmod/ +│ ├── ... +│ └── sh/ +│ +├── out/ # [generated] final binaries +│ └── bin/ # Staged executables (after `make stage`) +│ +├── contrib/ # Shared library sources +│ ├── libc-vis/ # vis(3)/unvis(3) implementation +│ ├── libedit/ # editline(3) library +│ └── printf/ # Shared printf format helpers +│ +├── cat/ # Utility: cat +│ ├── cat.c # Main source +│ ├── cat.1 # Manual page (groff) +│ ├── GNUmakefile # Per-utility build rules +│ └── README.md # Port notes and differences +│ +├── chmod/ # Utility: chmod +│ ├── chmod.c # Main implementation +│ ├── mode.c # Mode parsing library (shared with mkdir) +│ ├── mode.h # Mode parsing header +│ ├── GNUmakefile +│ └── chmod.1 +│ +├── dd/ # Utility: dd (multi-file) +│ ├── dd.c # Main control flow +│ ├── dd.h # Shared types (IO, STAT, flags) +│ ├── extern.h # Function declarations +│ ├── args.c # JCL argument parser +│ ├── conv.c # Conversion functions (block/unblock/def) +│ ├── conv_tab.c # ASCII/EBCDIC conversion tables +│ ├── gen.c # Signal handling helpers +│ ├── misc.c # Summary, progress, timing +│ ├── position.c # Input/output seek positioning +│ └── GNUmakefile +│ +├── ed/ # Utility: ed (multi-file) +│ ├── main.c # Command dispatch and main loop +│ ├── ed.h # Types (line_t, undo_t, constants) +│ ├── compat.c / compat.h # Portability shims +│ ├── buf.c # Buffer management (scratch file) +│ ├── glbl.c # Global command (g/re/cmd) +│ ├── io.c # File I/O (read_file, write_file) +│ ├── re.c # Regular expression handling +│ ├── sub.c # Substitution command +│ └── undo.c # Undo stack management +│ +├── ls/ # Utility: ls (multi-file) +│ ├── ls.c # Main logic, option parsing, directory traversal +│ ├── ls.h # Types (entry, context, enums) +│ ├── extern.h # Cross-module declarations +│ ├── print.c # Output formatting (columns, long, stream) +│ ├── cmp.c # Sort comparison functions +│ └── util.c # Helper functions +│ +├── ps/ # Utility: ps (multi-file) +│ ├── ps.c # Main logic, /proc scanning +│ ├── ps.h # Types (kinfo_proc, KINFO, VAR) +│ ├── extern.h # Cross-module declarations +│ ├── fmt.c # Format string parsing +│ ├── keyword.c # Output keyword definitions +│ ├── print.c # Field value formatting +│ └── nlist.c # Name list handling +│ +└── sh/ # Utility: POSIX shell + ├── main.c # Shell entry point + ├── parser.c / parser.h # Command parser + ├── eval.c # Command evaluator + ├── exec.c # Command execution + ├── jobs.c # Job control + ├── var.c # Variable management + ├── trap.c # Signal/trap handling + ├── expand.c # Parameter expansion + ├── redir.c # I/O redirection + └── ... # (60+ additional files) +``` + +## Build System Architecture + +### Two-Level Build Organization + +The build system has two distinct levels: + +1. **Top-level orchestrator** — Generated `GNUmakefile` and `config.mk` that + coordinate all subdirectories. +2. **Per-utility `GNUmakefile`** — Each utility directory has its own build + rules. These are the source of truth and are never overwritten by + `configure`. + +The top-level `GNUmakefile` invokes subdirectory builds via recursive make: + +```makefile +build-%: prepare-% + +env CPPFLAGS="$(CPPFLAGS)" CFLAGS="$(CFLAGS)" LDFLAGS="$(LDFLAGS)" \ + $(MAKE) -C "$*" -f GNUmakefile $(SUBMAKE_OVERRIDES) all +``` + +### Shared Output Directories + +All utilities share centralized output directories to simplify packaging: + +``` +build/ # Object files, organized per-utility: build/cat/, build/chmod/, ... +out/ # Final linked binaries +out/bin/ # Staged binaries (after `make stage`) +``` + +Subdirectories get symbolic links (`build -> ../build/<util>`, +`out -> ../out`) created by the `prepare-%` target: + +```makefile +prepare-%: + @mkdir -p "$(MONO_BUILDDIR)/$*" "$(MONO_OUTDIR)" + @ln -sfn "../build/$*" "$*/build" + @ln -sfn "../out" "$*/out" +``` + +### Variable Propagation + +The top-level Makefile passes all detected toolchain variables to +subdirectory builds via `SUBMAKE_OVERRIDES`: + +```makefile +SUBMAKE_OVERRIDES = \ + CC="$(CC)" \ + AR="$(AR)" \ + AWK="$(AWK)" \ + RANLIB="$(RANLIB)" \ + NM="$(NM)" \ + SH="$(SH)" \ + CRYPTO_LIBS="$(CRYPTO_LIBS)" \ + EDITLINE_CPPFLAGS="$(EDITLINE_CPPFLAGS)" \ + EDITLINE_LIBS="$(EDITLINE_LIBS)" \ + PREFIX="$(PREFIX)" \ + BINDIR="$(BINDIR)" \ + DESTDIR="$(DESTDIR)" \ + CROSS_COMPILING="$(CROSS_COMPILING)" \ + EXEEXT="$(EXEEXT)" +``` + +This ensures every utility builds with the same compiler, flags, and +library configuration. + +### Generated vs. Maintained Files + +| File | Generated? | Purpose | +|------------------|------------|--------------------------------------| +| `configure` | No | POSIX sh configure script | +| `config.mk` | Yes | Feature detection macros | +| `GNUmakefile` | Yes | Top-level orchestrator | +| `*/GNUmakefile` | No | Per-utility build rules | +| `build/` | Yes | Object file directory tree | +| `out/` | Yes | Binary output directory | + +## Configure Script Architecture + +### Script Structure + +The `configure` script is a single POSIX shell file (no autoconf) organized +into these phases: + +``` +1. Initialization — Set defaults, parse CLI arguments +2. Compiler Detection — Find musl-first C compiler +3. Tool Detection — Find make, ar, ranlib, nm, awk, sh, pkg-config +4. Libc Identification — Determine musl vs glibc via binary inspection +5. Header Probing — Check for ~40 system headers +6. Function Probing — Check for ~20 C library functions +7. Library Probing — Check for optional libraries (crypt, dl, pthread, rt) +8. File Generation — Write config.mk and GNUmakefile +``` + +### Compiler Probing + +The compiler detection uses three progressive tests: + +```sh +# Can it compile a simple program? +can_compile_with() { ... } + +# Can it compile AND run? (native builds only) +can_run_with() { ... } + +# Does it support C11 stdatomic.h? +can_compile_stdatomic_with() { ... } +``` + +All three must pass. For cross-compilation (`--host != --build`), the +run test is skipped. + +### Feature Detection Pattern + +Headers and functions are probed with a consistent pattern that records +results as Make variables and C preprocessor defines: + +```sh +check_header() { + hdr=$1 + macro="HAVE_$(to_macro "$hdr")" # e.g., HAVE_SYS_ACL_H + if try_cc "#include <$hdr> + int main(void) { return 0; }"; then + record_cpp_define "$macro" 1 + else + record_cpp_define "$macro" 0 + fi +} + +check_func() { + func=$1 + includes=$2 + macro="HAVE_$(to_macro "$func")" # e.g., HAVE_COPY_FILE_RANGE + if try_cc "$includes + int main(void) { void *p = (void *)(uintptr_t)&$func; return p == 0; }"; then + record_cpp_define "$macro" 1 + else + record_cpp_define "$macro" 0 + fi +} +``` + +### Headers Probed + +The configure script checks for the following headers: + +``` +stdlib.h stdio.h stdint.h inttypes.h stdbool.h stddef.h +string.h strings.h unistd.h errno.h fcntl.h signal.h +sys/types.h sys/stat.h sys/time.h sys/resource.h sys/wait.h +sys/select.h sys/ioctl.h sys/param.h sys/socket.h netdb.h +poll.h sys/poll.h termios.h stropts.h pthread.h +sys/event.h sys/timerfd.h sys/acl.h attr/xattr.h linux/xattr.h +dlfcn.h langinfo.h locale.h wchar.h wctype.h +``` + +### Functions Probed + +``` +getcwd realpath fchdir fstatat openat copy_file_range +memmove strlcpy strlcat explicit_bzero getline getentropy +posix_spawn clock_gettime poll kqueue timerfd_create +pipe2 closefrom getrandom +``` + +### Libraries Probed + +| Library | Symbol | Usage | +|----------|---------------------|------------------------------------| +| crypt | `crypt()` | Password hashing (`ed -x` legacy) | +| dl | `dlopen()` | Dynamic loading | +| pthread | `pthread_create()` | Threading support | +| rt | `clock_gettime()` | High-resolution timing | +| util | `openpty()` | Pseudo-terminal support | +| attr | `setxattr()` | Extended attributes (`mv`, `cp`) | +| selinux | `is_selinux_enabled()` | SELinux label support | + +## Code Organization Patterns + +### Single-File Utility Pattern + +Most simple utilities follow this structure: + +```c +/* SPDX license header */ + +#include <system-headers.h> + +struct options { ... }; + +static const char *progname; + +static void usage(void) __attribute__((__noreturn__)); +static void error_errno(const char *, ...); +static void error_msg(const char *, ...); + +int main(int argc, char *argv[]) +{ + struct options opt; + int ch; + + progname = program_name(argv[0]); + + while ((ch = getopt(argc, argv, "...")) != -1) { + switch (ch) { + case 'f': opt.force = true; break; + /* ... */ + default: usage(); + } + } + argc -= optind; + argv += optind; + + /* Perform main operation */ + for (int i = 0; i < argc; i++) { + if (process(argv[i], &opt) != 0) + exitval = 1; + } + return exitval; +} +``` + +### Multi-File Utility Pattern + +Complex utilities split across files with a shared header: + +``` +utility/ +├── utility.c # main(), option parsing, top-level dispatch +├── utility.h # Shared types, constants, macros +├── extern.h # Function declarations for cross-module calls +├── sub1.c # Functional subsystem (e.g., args.c, conv.c) +├── sub2.c # Another subsystem (e.g., print.c, fmt.c) +└── GNUmakefile # Build rules listing all .c files +``` + +### Header Guard Convention + +Headers use the BSD `_FILENAME_H_` pattern: + +```c +#ifndef _PS_H_ +#define _PS_H_ +/* ... */ +#endif +``` + +### Portability Macros + +Common compatibility macros appear across multiple utilities: + +```c +#ifndef __unused +#define __unused __attribute__((__unused__)) +#endif + +#ifndef __dead2 +#define __dead2 __attribute__((__noreturn__)) +#endif + +#ifndef nitems +#define nitems(array) (sizeof(array) / sizeof((array)[0])) +#endif + +#ifndef MIN +#define MIN(a, b) ((a) < (b) ? (a) : (b)) +#endif + +#ifndef MAX +#define MAX(a, b) ((a) > (b) ? (a) : (b)) +#endif +``` + +### POSIX Feature Test Macros + +Many utilities define feature test macros at the top of their main source +file: + +```c +#define _POSIX_C_SOURCE 200809L +``` + +Or rely on the configure-injected flags: + +``` +-D_DEFAULT_SOURCE -D_XOPEN_SOURCE=700 +``` + +## Shared Code Reuse + +### `mode.c` / `mode.h` + +The mode parsing library is shared between `chmod` and `mkdir`. It provides: + +- `mode_compile()` — Parse a mode string (numeric or symbolic) into a + compiled command array (`bitcmd_t`) +- `mode_apply()` — Apply a compiled mode to an existing `mode_t` +- `mode_free()` — Release compiled mode memory +- `strmode()` — Convert `mode_t` to display string like `"drwxr-xr-x "` + +### `fts.c` / `fts.h` + +An in-tree FTS (File Tree Walk) implementation used by `cp`, `chflags`, and +other utilities that do recursive directory traversal. This avoids depending +on glibc's FTS implementation or `nftw(3)`. + +### `contrib/libc-vis/` + +BSD `vis(3)` / `unvis(3)` character encoding used by `ls` for safe +display of filenames containing control characters or non-printable bytes. + +### Signal Name Tables + +`kill` and `timeout` both maintain identical `struct signal_entry` tables +mapping signal names to numbers: + +```c +struct signal_entry { + const char *name; + int number; +}; + +#define SIGNAL_ENTRY(name) { #name, SIG##name } + +static const struct signal_entry canonical_signals[] = { + SIGNAL_ENTRY(HUP), + SIGNAL_ENTRY(INT), + SIGNAL_ENTRY(QUIT), + /* ... ~30 standard signals ... */ +}; +``` + +Both also share the same `normalize_signal_name()` function pattern that +strips "SIG" prefixes and uppercases input. + +## Data Structures + +### Process Information (`ps`) + +The `ps` utility defines a Linux-compatible replacement for FreeBSD's +`kinfo_proc`: + +```c +struct kinfo_proc { + pid_t ki_pid, ki_ppid, ki_pgid, ki_sid; + dev_t ki_tdev; + uid_t ki_uid, ki_ruid, ki_svuid; + gid_t ki_groups[KI_NGROUPS]; + char ki_comm[COMMLEN]; // 256 bytes + struct timeval ki_start; + uint64_t ki_runtime; // microseconds + uint64_t ki_size; // VSZ in bytes + uint64_t ki_rssize; // RSS in pages + int ki_nice; + char ki_stat; // BSD-like state (S,R,T,Z,D) + int ki_numthreads; + struct rusage ki_rusage; + /* ... */ +}; +``` + +This struct is populated by reading `/proc/[pid]/stat` and +`/proc/[pid]/status` files. + +### I/O State (`dd`) + +The `dd` utility uses two key structures for its I/O engine: + +```c +typedef struct { + u_char *db; // Buffer address + u_char *dbp; // Current buffer I/O position + ssize_t dbcnt; // Current byte count in buffer + ssize_t dbrcnt; // Last read byte count + ssize_t dbsz; // Block size + u_int flags; // ISCHR | ISPIPE | ISTAPE | ISSEEK | NOREAD | ISTRUNC + const char *name; // Filename + int fd; // File descriptor + off_t offset; // Block count to skip + off_t seek_offset;// Sparse output seek offset +} IO; + +typedef struct { + uintmax_t in_full, in_part; // Full/partial input blocks + uintmax_t out_full, out_part; // Full/partial output blocks + uintmax_t trunc; // Truncated records + uintmax_t swab; // Odd-length swab blocks + uintmax_t bytes; // Total bytes written + struct timespec start; // Start timestamp +} STAT; +``` + +### Line Buffer (`ed`) + +The `ed` editor uses a doubly-linked list of line nodes with a scratch +file backing store: + +```c +typedef struct line { + struct line *q_forw; // Next line + struct line *q_back; // Previous line + off_t seek; // Offset in scratch file + int len; // Line length +} line_t; +``` + +### File Entry (`ls`) + +The `ls` utility represents each directory entry with: + +```c +struct entry { + struct stat sb; + struct file_time btime; // Birth time (via statx) + char *name; // Display name + char *link_target; // Symlink target (if applicable) + /* color, type classification, etc. */ +}; +``` + +## Makefile Targets Reference + +### Top-Level Targets + +| Target | Description | +|--------------------|-------------------------------------------------------| +| `all` | Build all utilities | +| `clean` | Remove `build/` and `out/` directories | +| `distclean` | `clean` + remove generated `GNUmakefile`, `config.mk` | +| `rebuild` | `clean` then `all` | +| `reconfigure` | Re-run `./configure` | +| `check` / `test` | Run all utility test suites | +| `stage` | Copy binaries to `out/bin/` | +| `install` | Copy binaries to `$DESTDIR$BINDIR` | +| `status` | Show `out/` directory contents | +| `list` | Print all subdirectory names | +| `print-config` | Show active compiler and flags | +| `help` | List available targets | + +### Per-Utility Targets + +Individual utilities can be built, cleaned, or tested: + +```sh +make -f GNUmakefile build-cat # Build only cat +make -f GNUmakefile clean-cat # Clean only cat +make -f GNUmakefile check-cat # Test only cat +make -f GNUmakefile cat # Alias for build-cat +``` + +### Target Dependencies + +``` +all + └── build-<util> (for each utility) + └── prepare-<util> + ├── mkdir -p build/<util> out/ + ├── ln -sfn ../build/<util> <util>/build + └── ln -sfn ../out <util>/out + +stage + └── all + └── copy executables to out/bin/ + +install + └── stage + └── copy out/bin/* to $DESTDIR$BINDIR/ + +distclean + └── clean + └── remove build/ out/ + └── unprepare + └── remove build/out symlinks from subdirs + └── remove GNUmakefile config.mk +``` + +## Cross-Compilation Support + +The configure script supports cross-compilation via `--host` and `--build` +triples: + +```sh +./configure --host=aarch64-linux-musl --build=x86_64-linux-musl \ + --cc=aarch64-linux-musl-gcc +``` + +When `--host` differs from `--build`: +- The executable run test (`can_run_with`) is skipped +- `CROSS_COMPILING=1` is recorded in `config.mk` +- The value propagates to all subdirectory builds + +## Typical Per-Utility GNUmakefile + +Each utility has a `GNUmakefile` following this general pattern: + +```makefile +# cat/GNUmakefile + +PROG = cat +SRCS = cat.c + +BUILDDIR ?= build +OUTDIR ?= out + +CC ?= cc +CPPFLAGS += -D_DEFAULT_SOURCE -D_XOPEN_SOURCE=700 +CFLAGS ?= -O2 -g -pipe +LDFLAGS ?= + +OBJS = $(SRCS:.c=.o) +OBJS := $(addprefix $(BUILDDIR)/,$(OBJS)) + +all: $(OUTDIR)/$(PROG) + +$(OUTDIR)/$(PROG): $(OBJS) + $(CC) $(LDFLAGS) -o $@ $(OBJS) $(LDLIBS) + +$(BUILDDIR)/%.o: %.c + @mkdir -p $(dir $@) + $(CC) $(CPPFLAGS) $(CFLAGS) -c -o $@ $< + +clean: + rm -f $(OBJS) $(OUTDIR)/$(PROG) + +test: + @echo "SKIP: no tests for $(PROG)" + +.PHONY: all clean test +``` + +Multi-file utilities list all sources in `SRCS` and may link additional +libraries: + +```makefile +# dd/GNUmakefile +SRCS = dd.c args.c conv.c conv_tab.c gen.c misc.c position.c +LDLIBS += -lm # For dd's speed calculations +``` + +## Security Considerations + +### Input Validation Boundaries + +- **File paths**: Validated against `PATH_MAX` limits. Utilities like `rm` + explicitly reject `/`, `.`, and `..` as arguments. +- **Numeric arguments**: Parsed with `strtoimax()` or `strtol()` with + explicit overflow checking. +- **Signal numbers**: Validated against the compiled signal table, not + unchecked `atoi()`. +- **Mode strings**: `mode_compile()` validates syntax before any filesystem + modification occurs. + +### Privilege Handling + +- `hostname` and `domainname` require root for set operations; they validate + the hostname length against the kernel's UTS namespace limit first. +- `rm` refuses to delete `/` unless explicitly overridden. +- `chmod -R` includes cycle detection to prevent infinite loops from symlink + chains. + +### Temporary File Safety + +- `ed` creates temporary scratch files in `$TMPDIR` (or `/tmp`) using + `mkstemp(3)`. +- `dd` does not create temporary files — it operates on explicit input/output + file descriptors. diff --git a/docs/handbook/corebinutils/building.md b/docs/handbook/corebinutils/building.md new file mode 100644 index 0000000000..48ad098712 --- /dev/null +++ b/docs/handbook/corebinutils/building.md @@ -0,0 +1,429 @@ +# Corebinutils — Building + +## Prerequisites + +### Required Tools + +| Tool | Minimum Version | Purpose | +|------------|----------------|-----------------------------------------| +| C compiler | C11 support | Must support `<stdatomic.h>` | +| `make` | GNU Make 4.x | Build orchestration | +| `ar` | Any | Archive tool for static libraries | +| `ranlib` | Any | Library index generation | +| `awk` | POSIX | Build-time text processing | +| `sh` | POSIX | Shell for scripts and tests | + +### Preferred Compiler: musl-based + +The configure script searches for musl-based compilers in this priority +order: + +1. `musl-clang` — musl's Clang wrapper +2. `clang --target=<arch>-linux-musl` — Clang targeting musl +3. `clang --target=<arch>-unknown-linux-musl` — Clang with full triple +4. `musl-gcc` — musl's GCC wrapper +5. `clang` — Generic Clang (libc detected from output binary) +6. `cc` — System default +7. `gcc` — GNU CC + +If a glibc toolchain is detected, configure fails with: + +``` +configure: error: glibc toolchain detected; refusing by default + (use --allow-glibc to override) +``` + +### Libc Detection + +The configure script identifies the libc implementation through three +methods (tried in order): + +1. **Binary inspection**: Compiles a test program, runs `file(1)` on it, + looks for `ld-musl` or `ld-linux` in the interpreter path +2. **Preprocessor macros**: Checks for `__GLIBC__` or `__MUSL__` in the + compiler's predefined macros +3. **Target triple**: Inspects the compiler's `-dumpmachine` output for + `musl` or `gnu`/`glibc` substrings + +### Optional Dependencies + +| Library | Symbol | Required By | Fallback | +|------------|------------------|---------------------|----------------------| +| `libcrypt` | `crypt()` | `ed` (legacy `-x`) | Feature disabled | +| `libdl` | `dlopen()` | `sh` (loadable) | Feature disabled | +| `libpthread`| `pthread_create()` | Various | Single-threaded | +| `librt` | `clock_gettime()`| `dd`, `timeout` | Linked if needed | +| `libutil` | `openpty()` | `sh`, `csh` | Pty feature disabled | +| `libattr` | `setxattr()` | `mv`, `cp` | xattr not preserved | +| `libselinux`| `is_selinux_enabled()` | SELinux labels | Labels not set | +| `libedit` | editline(3) | `sh`, `csh` | No line editing | + +## Quick Build + +```sh +cd corebinutils/ + +# Step 1: Configure +./configure + +# Step 2: Build all utilities +make -f GNUmakefile -j$(nproc) all + +# Step 3: (Optional) Run tests +make -f GNUmakefile test + +# Step 4: (Optional) Stage binaries +make -f GNUmakefile stage +``` + +After a successful build, binaries appear in `out/` and staged copies in +`out/bin/`. + +## Configure Script Reference + +### Usage + +``` +./configure [options] +``` + +### General Options + +| Option | Default | Description | +|---------------------------|------------------|--------------------------------| +| `--help` | | Show help and exit | +| `--prefix=PATH` | `/usr/local` | Install prefix | +| `--bindir=PATH` | `<prefix>/bin` | Install binary directory | +| `--host=TRIPLE` | Auto-detected | Target host triple | +| `--build=TRIPLE` | Auto-detected | Build system triple | + +### Toolchain Options + +| Option | Default | Description | +|---------------------------|------------------|--------------------------------| +| `--cc=COMMAND` | Auto-detected | Force specific compiler | +| `--allow-glibc` | Off | Allow glibc toolchain | + +### Flag Options + +| Option | Default | Description | +|---------------------------|------------------|--------------------------------| +| `--extra-cppflags=FLAGS` | Empty | Extra preprocessor flags | +| `--extra-cflags=FLAGS` | Empty | Extra compilation flags | +| `--extra-ldflags=FLAGS` | Empty | Extra linker flags | + +### Local Path Options + +| Option | Default | Description | +|---------------------------|------------------|--------------------------------| +| `--with-local-dir=PATH` | `/usr/local` | Add `PATH/include` and `PATH/lib` | +| `--without-local-dir` | | Disable local path probing | + +### Policy Options + +| Option | Default | Description | +|------------------------------|---------|-----------------------------------| +| `--enable-fail-if-missing` | Off | Fail on missing optional probes | + +### Environment Variables + +The configure script respects standard environment variables: + +| Variable | Purpose | +|-------------|------------------------------------------------| +| `CC` | C compiler (overridden by probing if unusable) | +| `CPPFLAGS` | Preprocessor flags from environment | +| `CFLAGS` | Compilation flags from environment | +| `LDFLAGS` | Linker flags from environment | + +### Default Flags + +The configure script applies these base flags: + +``` +CPPFLAGS: -D_DEFAULT_SOURCE -D_XOPEN_SOURCE=700 $env $extra +CFLAGS: -O2 -g -pipe $env $extra +LDFLAGS: $env $extra +``` + +## Configure Output + +### `config.mk` + +A Make include file with feature detection results: + +```makefile +# Auto-generated by ./configure on 2026-04-05T12:00:00Z +CONF_CPPFLAGS := +CONF_LDFLAGS := +CONF_LIBS := +CONFIGURE_TIMESTAMP := 2026-04-05T12:00:00Z +CONFIGURE_HOST := x86_64-unknown-Linux +CONFIGURE_BUILD := x86_64-unknown-Linux +CONFIGURE_CC_MACHINE := x86_64-linux-musl +CONFIGURE_LIBC := musl +CROSS_COMPILING := 0 +EXEEXT := +HAVE_STDLIB_H := 1 +HAVE_COPY_FILE_RANGE := 1 +HAVE_STRLCPY := 1 +# ... (one entry per probed header/function) +CONF_CPPFLAGS += -DHAVE_STDLIB_H=1 -DHAVE_COPY_FILE_RANGE=1 ... +CONF_LIBS += -lcrypt -ldl -lpthread -lrt +``` + +### `GNUmakefile` + +The generated top-level Makefile contains: + +- Toolchain variables (`CC`, `AR`, `RANLIB`, etc.) +- Flag variables (`CPPFLAGS`, `CFLAGS`, `LDFLAGS`) +- Library variables (`CRYPTO_LIBS`, `EDITLINE_LIBS`) +- Path variables (`PREFIX`, `BINDIR`, `MONO_BUILDDIR`, `MONO_OUTDIR`) +- Subdirectory list (`SUBDIRS := cat chflags chmod cp ...`) +- Build, clean, test, install, and utility targets + +### `build/configure/config.log` + +Detailed log of every compiler test and probe result. Useful for debugging +configure failures: + +``` +$ musl-clang -x c build/configure/conftest.c -o build/configure/conftest +$ build/configure/conftest +checking for sys/acl.h... no +checking for copy_file_range... yes +``` + +## Makefile Targets + +### Build Targets + +```sh +make -f GNUmakefile all # Build all utilities +make -f GNUmakefile cat # Build only cat (alias for build-cat) +make -f GNUmakefile build-cat # Build only cat +make -f GNUmakefile build-ls # Build only ls +``` + +### Clean Targets + +```sh +make -f GNUmakefile clean # Remove build/ and out/ +make -f GNUmakefile clean-cat # Clean only cat's objects +make -f GNUmakefile distclean # clean + remove GNUmakefile, config.mk +make -f GNUmakefile maintainer-clean # Same as distclean +``` + +### Test Targets + +```sh +make -f GNUmakefile test # Run all test suites +make -f GNUmakefile check # Same as test +make -f GNUmakefile check-cat # Test only cat +make -f GNUmakefile check-ls # Test only ls +``` + +### Install Targets + +```sh +make -f GNUmakefile stage # Copy binaries to out/bin/ +make -f GNUmakefile install # Install to $DESTDIR$PREFIX/bin +make -f GNUmakefile install DESTDIR=/tmp/pkg # Staged install +``` + +### Information Targets + +```sh +make -f GNUmakefile status # Show output directory contents +make -f GNUmakefile list # List all utility subdirectories +make -f GNUmakefile print-config # Show compiler and flags +make -f GNUmakefile print-subdirs # List subdirectories +make -f GNUmakefile help # Show available targets +``` + +### Rebuild and Reconfigure + +```sh +make -f GNUmakefile rebuild # clean + all +make -f GNUmakefile reconfigure # Re-run ./configure +``` + +## Cross-Compilation + +### Basic Cross-Compilation + +```sh +./configure \ + --host=aarch64-linux-musl \ + --build=x86_64-linux-musl \ + --cc=aarch64-linux-musl-gcc + +make -f GNUmakefile -j$(nproc) all +``` + +### Cross-Compilation with Clang + +```sh +./configure \ + --host=aarch64-linux-musl \ + --cc="clang --target=aarch64-linux-musl --sysroot=/path/to/musl-sysroot" + +make -f GNUmakefile -j$(nproc) all +``` + +### Cross vs. Native Detection + +When `--host` matches `--build` (or both are auto-detected to the same +value), `REQUIRE_RUNNABLE_CC=1` and the configure script verifies the +compiler produces executables that can actually run. For cross-compilation, +only compilation (not execution) is tested. + +## Build Customization + +### Custom Compiler Flags + +```sh +# Debug build +./configure --extra-cflags="-O0 -g3 -fsanitize=address,undefined" + +# Release build +./configure --extra-cflags="-O3 -DNDEBUG -flto" --extra-ldflags="-flto" + +# With warnings +./configure --extra-cflags="-Wall -Wextra -Werror" +``` + +### Custom Install Prefix + +```sh +./configure --prefix=/opt/project-tick --bindir=/opt/project-tick/sbin +make -f GNUmakefile -j$(nproc) all +make -f GNUmakefile install +``` + +### Building Individual Utilities + +```sh +# Configure once +./configure + +# Build only what you need +make -f GNUmakefile cat ls cp mv rm mkdir +``` + +### Forcing glibc + +```sh +./configure --allow-glibc --cc=gcc +``` + +Note: The primary test target for corebinutils is musl. Building with glibc +may expose minor differences in header availability or function signatures. + +## Troubleshooting + +### "no usable compiler found" + +The configure script could not find any C compiler that: +1. Produces working executables +2. Supports `<stdatomic.h>` (C11) +3. Can run the output (native builds only) + +**Fix**: Install `musl-gcc` or `musl-clang`, or specify a compiler +explicitly with `--cc=...`. + +### "glibc toolchain detected; refusing by default" + +The detected compiler links against glibc instead of musl. + +**Fix**: Install musl development tools or pass `--allow-glibc`. + +### Missing header warnings + +The configure log (`build/configure/config.log`) shows which headers were +not found. Missing optional headers (e.g., `sys/acl.h`) disable related +features but don't prevent building. + +### Linker errors for `-lcrypt` or `-lrt` + +Some utilities use optional libraries. If they're not found at configure +time, the corresponding features are disabled. If you see linker errors: + +```sh +# Check what was detected +cat config.mk | grep CONF_LIBS +``` + +### Parallel build failures + +If `make -j$(nproc)` fails but `make -j1` succeeds, a subdirectory +`GNUmakefile` may have missing dependencies. File a bug report. + +### Cleaning stale state + +```sh +make -f GNUmakefile distclean +./configure +make -f GNUmakefile -j$(nproc) all +``` + +## Build Output Structure + +After a successful `make all && make stage`: + +``` +out/ +├── cat +├── chmod +├── cp +├── csh +├── date +├── dd +├── df +├── echo +├── ed +├── expr +├── hostname +├── kill +├── ln +├── ls +├── mkdir +├── mv +├── nproc +├── pax +├── ps +├── pwd +├── realpath +├── rm +├── rmdir +├── sh +├── sleep +├── sync +├── test +├── timeout +├── [ # Symlink or hardlink to test +└── bin/ # Staged binaries (copies) +``` + +## CI Integration + +For CI pipelines, use the non-interactive build sequence: + +```sh +#!/bin/sh +set -eu + +cd corebinutils/ +./configure --prefix=/usr + +# Build with parallelism hinted by configure +JOBS=$(grep -o 'JOBS_HINT := [0-9]*' GNUmakefile | cut -d' ' -f3) +make -f GNUmakefile -j"${JOBS:-1}" all + +# Run tests (SKIP is OK, failures are not) +make -f GNUmakefile test + +# Package +make -f GNUmakefile install DESTDIR="$PWD/pkg" +``` diff --git a/docs/handbook/corebinutils/cat.md b/docs/handbook/corebinutils/cat.md new file mode 100644 index 0000000000..ddc2b842c5 --- /dev/null +++ b/docs/handbook/corebinutils/cat.md @@ -0,0 +1,211 @@ +# cat — Concatenate and Display Files + +## Overview + +`cat` reads files sequentially and writes their contents to standard output. +It supports line numbering, non-printable character visualization, blank +line squeezing, and efficient in-kernel copying. + +**Source**: `cat/cat.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +cat [-belnstuv] [file ...] +``` + +## Options + +| Flag | Long Form | Description | +|------|-----------|-------------| +| `-b` | — | Number non-blank output lines (starting at 1) | +| `-e` | — | Display `$` at end of each line (implies `-v`) | +| `-l` | — | Set exclusive advisory lock on stdout via `flock(2)` | +| `-n` | — | Number all output lines | +| `-s` | — | Squeeze multiple adjacent blank lines into one | +| `-t` | — | Display TAB as `^I` (implies `-v`) | +| `-u` | — | Disable output buffering (write immediately) | +| `-v` | — | Visualize non-printing characters using `^X` and `M-X` notation | + +## Source Analysis + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Option parsing via `getopt(3)`, dispatch to `scanfiles()` | +| `usage()` | Print usage message and exit | +| `scanfiles()` | Iterate over file arguments, handle `-` as stdin | +| `cook_cat()` | Process input with formatting (numbering, visualization, squeezing) | +| `raw_cat()` | Fast buffer-based copy without formatting | +| `in_kernel_copy()` | Zero-copy via `copy_file_range(2)` syscall | +| `init_casper()` | BSD Capsicum sandbox init (disabled on Linux) | +| `init_casper_net()` | BSD Casper network service (disabled on Linux) | +| `udom_open()` | Unix domain socket support (disabled on Linux) | + +### Option Processing + +```c +while ((ch = getopt(argc, argv, "belnstuv")) != -1) + switch (ch) { + case 'b': bflag = nflag = 1; break; /* implies -n */ + case 'e': eflag = vflag = 1; break; /* implies -v */ + case 'l': lflag = 1; break; + case 'n': nflag = 1; break; + case 's': sflag = 1; break; + case 't': tflag = vflag = 1; break; /* implies -v */ + case 'u': setbuf(stdout, NULL); break; + case 'v': vflag = 1; break; + default: usage(); + } +``` + +### I/O Strategy: Three Modes + +`cat` selects among three output strategies based on which flags are active: + +1. **`in_kernel_copy()`** — When no formatting flags are set and the output + supports `copy_file_range(2)`, data moves directly between file + descriptors inside the kernel, never entering user space. + +2. **`raw_cat()`** — When no formatting is needed but `copy_file_range` is + unavailable (e.g., stdin is a pipe). Uses an adaptive read buffer. + +3. **`cook_cat()`** — When any formatting flag (`-b`, `-e`, `-n`, `-s`, + `-t`, `-v`) is active. Processes each character individually. + +### Adaptive Buffer Sizing + +`raw_cat()` dynamically sizes its read buffer based on available physical +memory: + +```c +#define PHYSPAGES_THRESHOLD (32*1024) +#define BUFSIZE_MAX (2*1024*1024) /* 2 MB */ +#define BUFSIZE_SMALL (128*1024) /* 128 KB */ + +if (sysconf(_SC_PHYS_PAGES) > PHYSPAGES_THRESHOLD) + bsize = MIN(BUFSIZE_MAX, MAXPHYS * 8); +else + bsize = BUFSIZE_SMALL; +``` + +On systems with more than 128 MB of RAM (`32K × 4K pages`), cat uses up +to 2 MB buffers. On constrained systems, it falls back to 128 KB. + +### In-Kernel Copy + +When possible, `cat` uses the Linux `copy_file_range(2)` syscall for +zero-copy I/O: + +```c +static int +in_kernel_copy(int from_fd, int to_fd) +{ + ssize_t ret; + + do { + ret = copy_file_range(from_fd, NULL, to_fd, NULL, SSIZE_MAX, 0); + } while (ret > 0); + + return (ret == 0) ? 0 : -1; +} +``` + +This avoids two context switches per block (kernel→user for read, +user→kernel for write) and can be significantly faster for large files. + +### Character Visualization + +When `-v` is active, `cook_cat()` renders non-printable characters: + +| Character Range | Rendering | Example | +|----------------|-----------|---------| +| `0x00–0x1F` | `^@` to `^_` | `^C` for ETX | +| `0x7F` | `^?` | DEL character | +| `0x80–0x9F` | `M-^@` to `M-^_` | Meta-control | +| `0xA0–0xFE` | `M- ` to `M-~` | Meta-printable | +| `0xFF` | `M-^?` | Meta-DEL | +| TAB (`0x09`) | `^I` (with `-t`) or literal | | +| Newline | `$\n` (with `-e`) or `\n` | | + +### Locale Support + +`cat` calls `setlocale(LC_CTYPE, "")` for wide character handling. In +multibyte locales, the `-v` flag considers locale-specific printability +via `iswprint(3)`. In the C locale, only ASCII printable characters +pass through unmodified. + +### Lock Mode + +The `-l` flag acquires an exclusive advisory lock on stdout before +writing: + +```c +if (lflag) + flock(STDOUT_FILENO, LOCK_EX); +``` + +This prevents interleaved output when multiple `cat` processes write to +the same file simultaneously. The lock is held for the entire duration +of the program. + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `open(2)` | Open input files | +| `read(2)` | Read file data into buffer | +| `write(2)` | Write processed data to stdout | +| `copy_file_range(2)` | Zero-copy kernel-to-kernel transfer | +| `flock(2)` | Advisory locking (with `-l`) | +| `fstat(2)` | Get file type for I/O strategy selection | +| `sysconf(3)` | Query physical page count for buffer sizing | + +## BSD Features Disabled on Linux + +Several BSD-specific features are compiled out on Linux: + +- **Capsicum sandbox** (`cap_enter`, `cap_rights_limit`): The + `init_casper()` function is a no-op stub on Linux. +- **Unix domain socket reading** (`udom_open()`): BSD `cat` can read + from Unix sockets via `connect(2)`. Disabled on Linux. +- **`O_RESOLVE_BENEATH`**: BSD sandbox path resolution flag. Defined to 0 + on Linux. + +## Examples + +```sh +# Concatenate files +cat file1.txt file2.txt > combined.txt + +# Number non-blank lines +cat -b source.c + +# Show invisible characters +cat -vet binary_file + +# Squeeze blank lines and number +cat -sn logfile.txt + +# Lock stdout for atomic output +cat -l data.csv >> shared_output.csv +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All files read successfully | +| 1 | Error opening or reading a file | + +## Edge Cases + +- Reading from stdin: When no files are specified or `-` is given, cat + reads from standard input. +- Empty files: Produce no output but are not errors. +- Binary files: Processed byte-by-byte; `-v` makes them viewable. +- Named pipes and devices: `raw_cat()` handles them with buffered reads. + `copy_file_range` is not attempted on non-regular files. diff --git a/docs/handbook/corebinutils/chmod.md b/docs/handbook/corebinutils/chmod.md new file mode 100644 index 0000000000..b5f8a22886 --- /dev/null +++ b/docs/handbook/corebinutils/chmod.md @@ -0,0 +1,296 @@ +# chmod — Change File Permissions + +## Overview + +`chmod` changes the file mode (permission) bits of specified files. It supports +both symbolic and numeric (octal) mode specifications, recursive directory +traversal, symlink handling policies, ACL awareness, and verbose operation. + +**Source**: `chmod/chmod.c`, `chmod/mode.c`, `chmod/mode.h` +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +chmod [-fhvR [-H | -L | -P]] mode file ... +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-R` | Recursive: change files and directories recursively | +| `-H` | Follow symlinks on the command line (with `-R` only) | +| `-L` | Follow all symbolic links (with `-R` only) | +| `-P` | Do not follow symbolic links (default with `-R`) | +| `-f` | Force: suppress most error messages | +| `-h` | Affect symlinks themselves, not their targets | +| `-v` | Verbose: print changed files | +| `-vv` | Very verbose: print all files, whether changed or not | + +## Source Analysis + +### chmod.c — Main Implementation + +#### Key Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options via `getopt(3)`, compile mode, dispatch traversal | +| `walk_path()` | Stat a path and decide how to process it | +| `walk_dir()` | Enumerate directory contents for recursive processing | +| `apply_mode()` | Compile mode, apply via `fchmodat(2)`, report changes | +| `stat_path()` | Wrapper choosing between `stat(2)` and `lstat(2)` | +| `should_skip_acl_check()` | Cache per-filesystem ACL support detection | +| `visited_push()` / `visited_check()` | Cycle detection via device/inode tracking | +| `siginfo_handler()` | Handle SIGINFO/SIGUSR1 for progress reporting | +| `join_path()` | Safe path concatenation with separator handling | + +#### Option Processing + +```c +while ((ch = getopt(argc, argv, "HLPRfhv")) != -1) + switch (ch) { + case 'H': Hflag = 1; Lflag = 0; break; + case 'L': Lflag = 1; Hflag = 0; break; + case 'P': Hflag = Lflag = 0; break; + case 'R': Rflag = 1; break; + case 'f': fflag = 1; break; + case 'h': hflag = 1; break; + case 'v': vflag++; break; /* -v increments, -vv = 2 */ + default: usage(); + } +``` + +#### Recursive Traversal + +The `-R` flag triggers recursive directory traversal. `chmod` implements its +own traversal with cycle detection rather than using `fts(3)`: + +```c +static int +walk_dir(const char *dir_path, const struct chmod_options *opts) +{ + DIR *dp; + struct dirent *de; + char *child_path; + int ret = 0; + + dp = opendir(dir_path); + while ((de = readdir(dp)) != NULL) { + if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0) + continue; + child_path = join_path(dir_path, de->d_name); + ret |= walk_path(child_path, opts, false); + free(child_path); + } + closedir(dp); + return ret; +} +``` + +#### Cycle Detection + +To prevent infinite traversal through symlink loops or bind mounts, `chmod` +maintains a visited-path stack keyed on `(dev, ino)` pairs: + +```c +static int visited_check(dev_t dev, ino_t ino); /* returns 1 if seen */ +static void visited_push(dev_t dev, ino_t ino); /* record as visited */ +``` + +### mode.c — Mode Parsing Library + +#### Data Types + +```c +typedef struct { + int cmd; /* '+', '-', '=', 'X', 'u', 'g', 'o' */ + mode_t bits; /* Permission bits to modify */ + mode_t who; /* Scope mask (user/group/other/all) */ +} bitcmd_t; +``` + +#### Key Functions + +| Function | Purpose | +|----------|---------| +| `mode_compile()` | Parse mode string into array of `bitcmd_t` operations | +| `mode_apply()` | Apply compiled mode to an existing `mode_t` value | +| `mode_free()` | Free compiled mode array | +| `strmode()` | Convert `mode_t` to display string like `"drwxr-xr-x "` | +| `get_current_umask()` | Atomically read process umask | + +#### Numeric Mode Parsing + +Numeric modes are parsed as octal: + +```c +if (isdigit(*mode_string)) { + /* Parse octal: 755 → rwxr-xr-x, 0644 → rw-r--r-- */ + val = strtol(mode_string, &ep, 8); + /* Set bits directly, clearing old permission bits */ +} +``` + +#### Symbolic Mode Parsing + +Symbolic modes follow the grammar: + +``` +mode ::= clause [, clause ...] +clause ::= [who ...] [action ...] action +who ::= 'u' | 'g' | 'o' | 'a' +action ::= op [perm ...] +op ::= '+' | '-' | '=' +perm ::= 'r' | 'w' | 'x' | 'X' | 's' | 't' | 'u' | 'g' | 'o' +``` + +Examples: +- `u+rwx` — Add read/write/execute for user +- `go-w` — Remove write for group and other +- `a=rx` — Set all to read+execute only +- `u=g` — Copy group permissions to user +- `+X` — Add execute only if already executable or is a directory +- `u+s` — Set SUID bit +- `g+s` — Set SGID bit +- `+t` — Set sticky bit + +#### The 'X' Permission + +The conditional execute permission `X` is a special case: + +```c +/* 'X' only adds execute if: + * - The file is a directory, OR + * - Any execute bit is already set */ +if (cmd == 'X') { + if (S_ISDIR(old_mode) || (old_mode & (S_IXUSR|S_IXGRP|S_IXOTH))) + /* apply execute bits */ +} +``` + +This is commonly used with `-R` to make directories traversable without +making regular files executable: `chmod -R u+rwX,go+rX dir/` + +#### Mode Compilation + +The `mode_compile()` function translates a mode string into an array of +`bitcmd_t` instructions that can be applied to any `mode_t`: + +```c +bitcmd_t *mode_compile(const char *mode_string); + +/* Usage: */ +bitcmd_t *set = mode_compile("u+rw,go+r"); +mode_t new_mode = mode_apply(set, old_mode); +mode_free(set); +``` + +This two-phase approach lets the mode be parsed once and applied to many +files during recursive traversal. + +#### strmode() Function + +Converts a numeric `mode_t` into a human-readable string: + +```c +char buf[12]; +strmode(0100755, buf); /* "drwxr-xr-x " → for directories */ +strmode(0100644, buf); /* "-rw-r--r-- " → for regular files */ +``` + +The output is always 11 characters: type + 9 permission chars + space. + +### Umask Interaction + +When no scope (`u`, `g`, `o`, `a`) is specified in a symbolic mode, the +umask determines which bits are affected. The umask is read atomically: + +```c +static mode_t +get_current_umask(void) +{ + mode_t mask; + sigset_t set, oset; + + sigfillset(&set); + sigprocmask(SIG_BLOCK, &set, &oset); + mask = umask(0); + umask(mask); + sigprocmask(SIG_SETMASK, &oset, NULL); + return mask; +} +``` + +Signals are blocked during the read-restore cycle to prevent another +thread or signal handler from seeing a zero umask. + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `fchmodat(2)` | Apply permission changes | +| `fstatat(2)` | Get current file mode | +| `lstat(2)` | Stat without following symlinks | +| `opendir(3)` / `readdir(3)` | Directory traversal | +| `sigaction(2)` | Install SIGINFO handler | +| `umask(2)` | Read current umask | + +## ACL Integration + +`chmod` is aware of POSIX ACLs. When changing permissions on a file with +ACLs, the ACL mask entry may need updating. The `should_skip_acl_check()` +function caches whether a filesystem supports ACLs to avoid repeated +`pathconf()` calls: + +```c +static bool +should_skip_acl_check(const char *path) +{ + /* Cache per-device ACL support to avoid pathconf() on every file */ +} +``` + +## Examples + +```sh +# Set exact permissions +chmod 755 script.sh +chmod 0644 config.txt + +# Add execute for user +chmod u+x program + +# Recursive: directories traversable, files not executable +chmod -R u+rwX,go+rX project/ + +# Remove write for everyone except owner +chmod go-w important.txt + +# Copy group permissions to other +chmod o=g shared_file + +# Set SUID +chmod u+s /usr/local/bin/helper + +# Verbose mode +chmod -Rv 755 bin/ +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All files changed successfully | +| 1 | Error changing one or more files (partial failure) | + +## Differences from GNU chmod + +- No `--reference=FILE` option +- No `--changes` (use `-v`) +- `-h` flag affects symlinks (GNU uses `--no-dereference`) +- `-vv` for very verbose (GNU only has one `-v` level) +- ACL awareness is filesystem-dependent +- Mode compiler supports `u=g` (copy from group to user) diff --git a/docs/handbook/corebinutils/code-style.md b/docs/handbook/corebinutils/code-style.md new file mode 100644 index 0000000000..2461903725 --- /dev/null +++ b/docs/handbook/corebinutils/code-style.md @@ -0,0 +1,351 @@ +# Code Style — Corebinutils Conventions + +## Overview + +The corebinutils codebase follows FreeBSD kernel style (KNF) with +Linux-specific adaptations. This document catalogs the coding +conventions observed across all utilities. + +## File Organization + +### Standard File Layout + +```c +/*- + * SPDX-License-Identifier: BSD-3-Clause + * + * Copyright (c) YYYY Project Tick + * Copyright (c) YYYY The Regents of the University of California. + * All rights reserved. + * + * Redistribution notice ... + */ + +/* System headers (alphabetical) */ +#include <sys/types.h> +#include <sys/stat.h> +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +/* Local headers */ +#include "utility.h" + +/* Macros */ +#define BUFSIZE (128 * 1024) + +/* Types and structs */ +struct options { ... }; + +/* Static function prototypes */ +static void usage(void); + +/* Static globals */ +static const char *progname; + +/* Functions (main last or first, utility-dependent) */ +``` + +### Header Guard Style + +```c +/* No #pragma once — uses traditional guards */ +#ifndef _DD_H_ +#define _DD_H_ +/* ... */ +#endif /* !_DD_H_ */ +``` + +## Naming Conventions + +### Functions + +- **Lowercase with underscores**: `parse_args()`, `copy_file_data()` +- **Static for file-scope**: All non-`main` functions are `static` unless + needed by other translation units +- **Verb-first**: `read_file()`, `write_output()`, `parse_duration()` +- **Predicate prefix**: `is_directory()`, `should_recurse()`, `has_flag()` + +### Variables + +- **Short names in small scopes**: `n`, `p`, `ch`, `sb`, `dp` +- **Descriptive names in structs**: `suppress_newline`, `follow_mode` +- **Constants as macros**: `BUFSIZE`, `EXIT_TIMEOUT`, `COMMLEN` +- **Global flags**: Single-word or abbreviated: `verbose`, `force`, `rflag` + +### Struct Naming + +```c +/* Tagged structs (no typedef for most) */ +struct options { ... }; +struct mount_entry { ... }; + +/* Typedefs only for opaque or complex types */ +typedef struct line line_t; +typedef struct { ... } bitcmd_t; +typedef regex_t pattern_t; +``` + +## Option Processing + +### getopt(3) Pattern + +```c +while ((ch = getopt(argc, argv, "fhilnRsvwx")) != -1) { + switch (ch) { + case 'f': + opts.force = true; + break; + case 'v': + opts.verbose = true; + break; + /* ... */ + default: + usage(); + } +} +argc -= optind; +argv += optind; +``` + +### getopt_long(3) Pattern + +```c +static const struct option long_options[] = { + {"color", optional_argument, NULL, 'G'}, + {"group-directories-first", no_argument, NULL, OPT_GROUPDIRS}, + {NULL, 0, NULL, 0}, +}; + +while ((ch = getopt_long(argc, argv, optstring, + long_options, NULL)) != -1) { ... } +``` + +### Manual Parsing (echo) + +```c +/* When getopt is too heavy */ +while (*argv && strcmp(*argv, "-n") == 0) { + suppress_newline = true; + argv++; +} +``` + +## Error Handling + +### BSD err(3) Family + +```c +#include <err.h> + +err(1, "open '%s'", path); /* perror-style with exit */ +errx(2, "invalid mode: %s", s); /* No errno, with exit */ +warn("stat '%s'", path); /* perror-style, no exit */ +warnx("skipping '%s'", path); /* No errno, no exit */ +``` + +### Custom Error Functions + +Many utilities define their own for consistency: + +```c +static void +error_errno(const char *fmt, ...) +{ + int saved = errno; + va_list ap; + fprintf(stderr, "%s: ", progname); + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + va_end(ap); + fprintf(stderr, ": %s\n", strerror(saved)); +} + +static void +error_msg(const char *fmt, ...) +{ + va_list ap; + fprintf(stderr, "%s: ", progname); + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + va_end(ap); + fputc('\n', stderr); +} +``` + +### Usage Functions + +```c +static void __dead2 /* noreturn attribute */ +usage(void) +{ + fprintf(stderr, "usage: %s [-fiv] source target\n", progname); + exit(2); +} +``` + +## Memory Management + +### Dynamic Allocation Patterns + +```c +/* Always check allocation */ +char *buf = malloc(size); +if (buf == NULL) + err(1, "malloc"); + +/* strdup with check */ +char *copy = strdup(str); +if (copy == NULL) + err(1, "strdup"); + +/* Adaptive buffer sizing */ +size_t bufsize = BUFSIZE_MAX; +while (bufsize >= BUFSIZE_MIN) { + buf = malloc(bufsize); + if (buf) break; + bufsize /= 2; +} +``` + +### No Global malloc/free Tracking + +Utilities that process-exit after completion do not free final +allocations — the OS reclaims all memory. Early-exit utilities +(cat, echo, pwd) rely on this. + +## Portability Patterns + +### Conditional Compilation + +```c +/* Feature detection from configure */ +#ifdef HAVE_SYS_XATTR_H +#include <sys/xattr.h> +#endif + +/* BSD vs Linux */ +#ifdef __linux__ + /* Linux-specific path */ +#else + /* BSD fallback (rarely used) */ +#endif + +/* musl compatibility */ +#ifndef STAILQ_HEAD +#define STAILQ_HEAD(name, type) ... +#endif +``` + +### Inline Syscall Wrappers + +```c +/* For syscalls not in musl headers */ +static int +linux_statx(int dirfd, const char *path, int flags, + unsigned int mask, struct statx *stx) +{ + return syscall(__NR_statx, dirfd, path, flags, mask, stx); +} +``` + +## Formatting + +### Indentation + +- **Tabs** for indentation (KNF style) +- **8-space tab stops** (standard) +- Continuation lines indented 4 spaces from operator + +### Braces + +```c +/* K&R for functions */ +static void +function_name(int arg) +{ + /* body */ +} + +/* Same-line for control flow */ +if (condition) { + /* body */ +} else { + /* body */ +} + +/* No braces for single statements */ +if (error) + return -1; +``` + +### Line Length + +- Target 80 columns +- Long function signatures wrap at parameter boundaries +- Long strings use concatenation + +### Switch Statements + +```c +switch (ch) { +case 'f': + force = true; + break; +case 'v': + verbose++; + break; +default: + usage(); + /* NOTREACHED */ +} +``` + +## Common Macros + +```c +/* Array size */ +#define nitems(x) (sizeof(x) / sizeof((x)[0])) + +/* Min/Max */ +#ifndef MIN +#define MIN(a, b) ((a) < (b) ? (a) : (b)) +#endif + +/* Noreturn */ +#define __dead2 __attribute__((__noreturn__)) + +/* Unused parameter */ +#define __unused __attribute__((__unused__)) +``` + +## Signal Handling Conventions + +```c +/* Volatile sig_atomic_t for signal flags */ +static volatile sig_atomic_t info_requested; + +/* Minimal signal handlers (set flag only) */ +static void +handler(int sig) +{ + (void)sig; + info_requested = 1; +} + +/* Check flag in main loop */ +if (info_requested) { + report_progress(); + info_requested = 0; +} +``` + +## Build System Conventions + +- Per-utility `GNUmakefile` is the source of truth +- Top-level `GNUmakefile` generated by `configure` +- All object files go to `build/<utility>/` +- Final binaries go to `out/bin/` +- `CFLAGS` include `-Wall -Wextra -Werror` by default diff --git a/docs/handbook/corebinutils/cp.md b/docs/handbook/corebinutils/cp.md new file mode 100644 index 0000000000..b15bb01a2d --- /dev/null +++ b/docs/handbook/corebinutils/cp.md @@ -0,0 +1,270 @@ +# cp — Copy Files and Directories + +## Overview + +`cp` copies files and directory trees. It supports recursive copying, archive +mode (preserving metadata), symlink handling policies, sparse file detection, +and interactive/forced overwrite modes. + +**Source**: `cp/cp.c`, `cp/utils.c`, `cp/extern.h`, `cp/fts.c`, `cp/fts.h` +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +cp [-HLPRafilnpsvx] [--sort] [-N mode] source_file target_file +cp [-HLPRafilnpsvx] [--sort] [-N mode] source_file ... target_directory +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-R` / `-r` | Recursive: copy directories and their contents | +| `-a` | Archive mode: equivalent to `-R -P -p` | +| `-f` | Force: remove existing target before copying | +| `-i` | Interactive: prompt before overwriting | +| `-l` | Create hard links instead of copying | +| `-n` | No-clobber: do not overwrite existing files | +| `-p` | Preserve: maintain mode, ownership, timestamps | +| `-s` | Create symbolic links instead of copying | +| `-v` | Verbose: print each file as it is copied | +| `-x` | One-filesystem: do not cross mount points | +| `-H` | Follow symlinks on command line (with `-R`) | +| `-L` | Follow all symbolic links (with `-R`) | +| `-P` | Do not follow symbolic links (default with `-R`) | +| `--sort` | Sort entries numerically during recursive copy | +| `-N mode` | Apply negated permissions to regular files | + +## Source Analysis + +### cp.c — Main Logic + +#### Key Data Structures + +```c +typedef struct { + char *p_end; /* Pointer to NULL at end of path */ + char *target_end; /* Pointer to end of target base */ + char p_path[PATH_MAX]; /* Current target path buffer */ + int p_fd; /* Directory file descriptor */ +} PATH_T; + +struct options { + bool recursive; + bool force; + bool interactive; + bool no_clobber; + bool preserve; + bool hard_link; + bool symbolic_link; + bool verbose; + bool one_filesystem; + /* ... */ +}; +``` + +#### Key Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options, stat destination, determine copy mode | +| `copy()` | Main recursive copy driver using FTS traversal | +| `ftscmp()` | qsort comparator for `--sort` numeric ordering | +| `local_strlcpy()` | Portability wrapper for `strlcpy` | +| `local_asprintf()` | Portability wrapper for `asprintf` | + +#### Copy Mode Detection + +`cp` determines the operation mode from the arguments: + +```c +/* Three cases: + * 1. cp file1 file2 → file-to-file copy + * 2. cp file1 file2 dir/ → files into directory + * 3. cp -R dir1 dir2 → directory to new directory + */ +if (stat(target, &sb) == 0 && S_ISDIR(sb.st_mode)) + type = DIR_TO_DIR; +else if (argc == 2) + type = FILE_TO_FILE; +else + usage(); /* Multiple sources require directory target */ +``` + +### utils.c — Copy Engine + +#### Adaptive Buffer Sizing + +Like `cat`, `cp` adapts its I/O buffer to available memory: + +```c +#define PHYSPAGES_THRESHOLD (32*1024) +#define BUFSIZE_MAX (2*1024*1024) +#define BUFSIZE_SMALL (MAXPHYS) /* 128 KB */ + +static ssize_t +copy_fallback(int from_fd, int to_fd) +{ + if (buf == NULL) { + if (sysconf(_SC_PHYS_PAGES) > PHYSPAGES_THRESHOLD) + bufsize = MIN(BUFSIZE_MAX, MAXPHYS * 8); + else + bufsize = BUFSIZE_SMALL; + buf = malloc(bufsize); + } + /* read/write loop */ +} +``` + +#### Key Functions in utils.c + +| Function | Purpose | +|----------|---------| +| `copy_fallback()` | Buffer-based file copy with adaptive sizing | +| `copy_file()` | Copy regular file, potentially using `copy_file_range(2)` | +| `copy_link()` | Copy symbolic link (read target, create new symlink) | +| `copy_fifo()` | Copy FIFO via `mkfifo(2)` | +| `copy_special()` | Copy device nodes via `mknod(2)` | +| `setfile()` | Set timestamps, ownership, permissions on target | +| `preserve_fd_acls()` | Copy POSIX ACLs between file descriptors | + +### FTS Traversal + +`cp -R` uses an in-tree FTS (File Traversal Stream) implementation: + +```c +FTS *ftsp; +FTSENT *curr; +int fts_options = FTS_NOCHDIR | FTS_PHYSICAL; + +if (Lflag) + fts_options &= ~FTS_PHYSICAL; + fts_options |= FTS_LOGICAL; + +ftsp = fts_open(argv, fts_options, NULL); +while ((curr = fts_read(ftsp)) != NULL) { + switch (curr->fts_info) { + case FTS_D: /* Directory pre-visit */ + mkdir(target_path, curr->fts_statp->st_mode); + break; + case FTS_F: /* Regular file */ + copy_file(curr->fts_path, target_path); + break; + case FTS_SL: /* Symbolic link */ + copy_link(curr->fts_path, target_path); + break; + case FTS_DP: /* Directory post-visit */ + setfile(curr->fts_statp, target_path); + break; + } +} +``` + +### Symlink Handling Modes + +| Mode | Flag | Behavior | +|------|------|----------| +| Physical | `-P` (default) | Copy symlinks as symlinks | +| Command-line follow | `-H` | Follow symlinks named on command line | +| Logical | `-L` | Follow all symlinks, copy targets | + +### Archive Mode + +The `-a` flag combines three flags for complete archival: + +```sh +cp -a source/ dest/ +# Equivalent to: +cp -R -P -p source/ dest/ +``` + +- `-R` — Recursive copy +- `-P` — Don't follow symlinks (preserve them as-is) +- `-p` — Preserve timestamps, ownership, and permissions + +### Metadata Preservation (`-p`) + +When `-p` is specified, `cp` preserves: + +| Metadata | System Call | +|----------|-------------| +| Access time | `utimensat(2)` | +| Modification time | `utimensat(2)` | +| File mode | `fchmod(2)` / `chmod(2)` | +| Owner/group | `fchown(2)` / `lchown(2)` | +| ACLs | `acl_get_fd()` / `acl_set_fd()` (if available) | + +### Cycle Detection + +During recursive copy, `cp` tracks visited directories by `(dev, ino)` +pairs to detect filesystem cycles created by symlinks or bind mounts: + +```c +/* If we've already visited this inode on this device, skip it */ +if (cycle_check(curr->fts_statp->st_dev, curr->fts_statp->st_ino)) { + warnx("%s: directory causes a cycle", curr->fts_path); + fts_set(ftsp, curr, FTS_SKIP); + continue; +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `open(2)` | Open source and target files | +| `read(2)` / `write(2)` | Buffer-based data copy | +| `copy_file_range(2)` | Zero-copy in-kernel transfer | +| `mkdir(2)` | Create target directories | +| `mkfifo(2)` | Create FIFO copies | +| `mknod(2)` | Create device node copies | +| `symlink(2)` | Create symbolic links | +| `link(2)` | Create hard links (with `-l`) | +| `readlink(2)` | Read symlink target | +| `fchmod(2)` | Set permissions on target | +| `fchown(2)` | Set ownership on target | +| `utimensat(2)` | Set timestamps on target | +| `fstat(2)` | Check file type and metadata | + +## Examples + +```sh +# Simple file copy +cp source.txt dest.txt + +# Recursive directory copy +cp -R src/ dest/ + +# Archive mode (preserve everything) +cp -a project/ backup/project/ + +# Interactive overwrite +cp -i newfile.txt existing.txt + +# Create hard links instead of copies +cp -l large_file.dat link_to_large_file.dat + +# Don't cross filesystem boundaries +cp -Rx /home/user/ /backup/home/user/ + +# Verbose recursive copy +cp -Rv config/ /etc/myapp/ +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All files copied successfully | +| 1 | Error copying one or more files | + +## Differences from GNU cp + +- No `--reflink` option for CoW copies +- No `--sparse=auto/always/never` (sparse handling is automatic) +- `--sort` flag for sorted recursive output (not in GNU) +- `-N` flag for negated permissions (not in GNU) +- Uses in-tree FTS instead of gnulib +- No SELinux context preservation (use `--preserve=context` in GNU) diff --git a/docs/handbook/corebinutils/date.md b/docs/handbook/corebinutils/date.md new file mode 100644 index 0000000000..d498f406a5 --- /dev/null +++ b/docs/handbook/corebinutils/date.md @@ -0,0 +1,352 @@ +# date — Display and Set System Date + +## Overview + +`date` displays the current date and time, or sets the system clock. It +supports strftime-based format strings, ISO 8601 output, RFC 2822 output, +timezone overrides, date arithmetic via "vary" adjustments, and input +date parsing. + +**Source**: `date/date.c`, `date/vary.c`, `date/vary.h` +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause / BSD-2-Clause (vary.c) + +## Synopsis + +``` +date [-jnRu] [-r seconds | filename] [-I[date|hours|minutes|seconds|ns]] + [-f input_fmt] [-v [+|-]val[ymwdHMS]] [-z output_zone] + [+output_format] [[[[[cc]yy]mm]dd]HH]MM[.ss]] +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-j` | Do not try to set the system clock | +| `-n` | Same as `-j` (compatibility) | +| `-R` | RFC 2822 format output | +| `-u` | Use UTC instead of local time | +| `-r seconds` | Display time from epoch seconds | +| `-r filename` | Display modification time of file | +| `-I[precision]` | ISO 8601 format output | +| `-f input_fmt` | Parse input date using strptime format | +| `-v adjustment` | Adjust date components (can be repeated) | +| `-z timezone` | Use specified timezone for output | + +## Source Analysis + +### date.c — Main Implementation + +#### Key Data Structures + +```c +struct iso8601_fmt { + const char *refname; /* "date", "hours", "minutes", etc. */ + const char *format_string; /* strftime format */ + bool include_zone; /* Whether to append timezone */ +}; + +struct strbuf { + char *data; + size_t len; + size_t cap; +}; + +struct options { + const char *input_format; /* -f format string */ + const char *output_zone; /* -z timezone */ + const char *reference_arg; /* -r argument */ + const char *time_operand; /* MMDDhhmm... or parsed date */ + const char *format_operand; /* +format string */ + struct vary *vary_chain; /* -v adjustments */ + const struct iso8601_fmt *iso8601_selected; + bool no_set; /* -j flag */ + bool rfc2822; /* -R flag */ + bool use_utc; /* -u flag */ +}; +``` + +#### ISO 8601 Formats + +```c +static const struct iso8601_fmt iso8601_fmts[] = { + { "date", "%Y-%m-%d", false }, + { "hours", "%Y-%m-%dT%H", true }, + { "minutes", "%Y-%m-%dT%H:%M", true }, + { "seconds", "%Y-%m-%dT%H:%M:%S", true }, + { "ns", "%Y-%m-%dT%H:%M:%S,%N", true }, +}; +``` + +#### Key Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options, resolve time, format output | +| `parse_args()` | Option-by-option argument processing | +| `validate_options()` | Check for conflicting options | +| `set_timezone_or_die()` | Apply timezone via `setenv("TZ", ...)` | +| `read_reference_time()` | Get time from `-r` argument (epoch or file mtime) | +| `read_current_time()` | Get current time via `clock_gettime(2)` | +| `parse_legacy_time()` | Parse `[[[[cc]yy]mm]dd]HH]MM[.ss]` format | +| `parse_formatted_time()` | Parse via `strptime(3)` with `-f` format | +| `parse_time_operand()` | Dispatch to legacy or formatted parser | +| `set_system_time()` | Set clock via `clock_settime(2)` | +| `apply_variations()` | Apply `-v` adjustments to broken-down time | +| `expand_format_string()` | Expand `%N` (nanoseconds) in format strings | +| `render_format()` | Format via `strftime(3)` with extensions | +| `render_iso8601()` | Generate ISO 8601 output with timezone | +| `render_numeric_timezone()` | Format `+HHMM` timezone offset | +| `print_line_and_exit()` | Write output and exit | + +#### Main Flow + +```c +int main(int argc, char **argv) +{ + parse_args(argc, argv, &options); + validate_options(&options); + setlocale(LC_TIME, ""); + + if (options.use_utc) + set_timezone_or_die("UTC0", "TZ=UTC0"); + + if (options.reference_arg != NULL) + read_reference_time(options.reference_arg, &ts); + else + read_current_time(&ts, &resolution); + + if (options.time_operand != NULL) { + parse_time_operand(&options, &ts, &ts); + if (!options.no_set) + set_system_time(&ts); + } + + localtime_or_die(ts.tv_sec, &tm); + apply_variations(&options, &tm); + + /* Render output based on -I, -R, or +format */ + output = render_format(format, &tm, ts.tv_nsec, resolution.tv_nsec); + print_line_and_exit(output); +} +``` + +#### String Buffer Implementation + +`date.c` includes a custom growable string buffer for format expansion: + +```c +static void strbuf_init(struct strbuf *buf); +static void strbuf_reserve(struct strbuf *buf, size_t extra); +static void strbuf_append_mem(struct strbuf *buf, const char *data, size_t len); +static void strbuf_append_char(struct strbuf *buf, char ch); +static void strbuf_append_str(struct strbuf *buf, const char *text); +static char *strbuf_finish(struct strbuf *buf); +``` + +#### Nanosecond Format Extension + +The `%N` format specifier (not in standard `strftime`) is expanded +manually before passing to `strftime(3)`: + +```c +static void +append_nsec_digits(struct strbuf *buf, const char *pending, size_t len, + long nsec, long resolution) +{ + /* Format nanoseconds with appropriate precision based on resolution */ +} +``` + +### vary.c — Date Arithmetic + +The `-v` flag enables relative date adjustments. Multiple `-v` flags can +be chained to build complex date expressions. + +#### Adjustment Types + +| Code | Unit | Example | +|------|------|---------| +| `y` | Years | `-v +1y` (next year) | +| `m` | Months | `-v -3m` (3 months ago) | +| `w` | Weeks | `-v +2w` (2 weeks forward) | +| `d` | Days | `-v +1d` (tomorrow) | +| `H` | Hours | `-v -6H` (6 hours ago) | +| `M` | Minutes | `-v +30M` (30 minutes forward) | +| `S` | Seconds | `-v -10S` (10 seconds ago) | + +#### Named Values + +Month names and weekday names can be used with `=`: + +```sh +date -v =monday # Next Monday +date -v =january # Set month to January +``` + +#### Implementation + +```c +struct trans { + int64_t value; + const char *name; +}; + +static const struct trans trans_mon[] = { + { 1, "january" }, { 2, "february" }, { 3, "march" }, + { 4, "april" }, { 5, "may" }, { 6, "june" }, + { 7, "july" }, { 8, "august" }, { 9, "september" }, + { 10, "october" },{ 11, "november" }, { 12, "december" }, + { -1, NULL } +}; + +static const struct trans trans_wday[] = { + { 0, "sunday" }, { 1, "monday" }, { 2, "tuesday" }, + { 3, "wednesday" },{ 4, "thursday" },{ 5, "friday" }, + { 6, "saturday" }, { -1, NULL } +}; +``` + +The `vary_apply()` function processes each adjustment in the chain, +calling specific adjuster functions: + +```c +static int adjyear(struct tm *tm, char type, int64_t value, bool normalize); +static int adjmon(struct tm *tm, char type, int64_t value, bool is_text, bool normalize); +static int adjday(struct tm *tm, char type, int64_t value, bool normalize); +static int adjwday(struct tm *tm, char type, int64_t value, bool is_text, bool normalize); +static int adjhour(struct tm *tm, char type, int64_t value, bool normalize); +static int adjmin(struct tm *tm, char type, int64_t value, bool normalize); +static int adjsec(struct tm *tm, char type, int64_t value, bool normalize); +``` + +Each adjuster modifies the broken-down `struct tm` and calls +`normalize_tm()` to fix rolled-over fields via `mktime(3)`. + +### Timezone Handling + +```c +static void +set_timezone_or_die(const char *tz_value, const char *what) +{ + if (setenv("TZ", tz_value, 1) != 0) + die_errno("setenv %s", what); + tzset(); +} +``` + +The `-u` flag sets `TZ=UTC0`. The `-z` flag sets `TZ` to the specified +value only for output formatting (input parsing uses the original timezone). + +### Legacy Time Format + +The BSD legacy format `[[[[cc]yy]mm]dd]HH]MM[.ss]` is parsed +right-to-left: + +```c +static void +parse_legacy_time(const char *text, const struct timespec *base, struct timespec *ts) +{ + /* Parse from rightmost position: + * 1. [.ss] - optional seconds + * 2. MM - minutes (required) + * 3. HH - hours + * 4. dd - day + * 5. mm - month + * 6. [cc]yy - year + */ +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `clock_gettime(2)` | Read current time with nanosecond precision | +| `clock_settime(2)` | Set system clock (requires root) | +| `stat(2)` | Get file modification time for `-r filename` | +| `setenv(3)` | Set `TZ` environment variable | +| `strftime(3)` | Format broken-down time | +| `strptime(3)` | Parse time from formatted string | +| `mktime(3)` | Normalize broken-down time | +| `localtime_r(3)` | Thread-safe time conversion | + +## Format Strings + +`date` supports all `strftime(3)` format specifiers plus: + +| Specifier | Meaning | +|-----------|---------| +| `%N` | Nanoseconds (extension, expanded before strftime) | +| `%+` | Default format (equivalent to `%a %b %e %T %Z %Y`) | + +Common `strftime` specifiers: + +| Specifier | Output | +|-----------|--------| +| `%Y` | 4-digit year (2026) | +| `%m` | Month (01-12) | +| `%d` | Day (01-31) | +| `%H` | Hour (00-23) | +| `%M` | Minute (00-59) | +| `%S` | Second (00-60) | +| `%T` | Time as `%H:%M:%S` | +| `%Z` | Timezone abbreviation | +| `%z` | Numeric timezone (`+0000`) | +| `%s` | Epoch seconds | +| `%a` | Abbreviated weekday | +| `%b` | Abbreviated month | + +## Examples + +```sh +# Default output +date +# → Sat Apr 5 14:30:00 UTC 2026 + +# Custom format +date "+%Y-%m-%d %H:%M:%S" +# → 2026-04-05 14:30:00 + +# ISO 8601 +date -Iseconds +# → 2026-04-05T14:30:00+00:00 + +# RFC 2822 +date -R +# → Sat, 05 Apr 2026 14:30:00 +0000 + +# UTC +date -u + +# Epoch seconds +date +%s +# → 1775578200 + +# Date arithmetic: tomorrow +date -v +1d + +# Date arithmetic: last Monday +date -v -monday + +# Date arithmetic: 3 months from now, at midnight +date -v +3m -v 0H -v 0M -v 0S + +# Parse input format +date -f "%Y%m%d" "20260405" "+%A, %B %d" +# → Sunday, April 05 + +# Display file modification time +date -r /etc/passwd + +# Display epoch time +date -r 1775578200 +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Error (invalid format, failed to set time, etc.) | diff --git a/docs/handbook/corebinutils/dd.md b/docs/handbook/corebinutils/dd.md new file mode 100644 index 0000000000..df0108c231 --- /dev/null +++ b/docs/handbook/corebinutils/dd.md @@ -0,0 +1,407 @@ +# dd — Data Duplicator + +## Overview + +`dd` copies and optionally converts data between files or devices. It operates +at the block level with configurable input/output block sizes, supports +ASCII/EBCDIC conversion, case conversion, byte swapping, sparse output, +speed throttling, and real-time progress reporting. + +**Source**: `dd/dd.c`, `dd/dd.h`, `dd/extern.h`, `dd/args.c`, `dd/conv.c`, +`dd/conv_tab.c`, `dd/gen.c`, `dd/misc.c`, `dd/position.c` +**Origin**: BSD 4.4, Keith Muller / Lance Visser +**License**: BSD-3-Clause + +## Synopsis + +``` +dd [operand=value ...] +``` + +## Operands + +`dd` uses a unique JCL-style syntax (not `getopt`): + +| Operand | Description | Default | +|---------|-------------|---------| +| `if=file` | Input file | stdin | +| `of=file` | Output file | stdout | +| `bs=n` | Block size (sets both ibs and obs) | 512 | +| `ibs=n` | Input block size | 512 | +| `obs=n` | Output block size | 512 | +| `cbs=n` | Conversion block size | — | +| `count=n` | Number of input blocks to copy | All | +| `skip=n` / `iseek=n` | Skip n input blocks | 0 | +| `seek=n` / `oseek=n` | Seek n output blocks | 0 | +| `files=n` | Copy n input files (tape only) | 1 | +| `fillchar=c` | Fill character for sync padding | NUL/space | +| `speed=n` | Maximum bytes per second | Unlimited | +| `status=value` | Progress reporting mode | — | + +### Size Suffixes + +Numeric values accept multiplier suffixes: + +| Suffix | Multiplier | +|--------|-----------| +| `b` | 512 | +| `k` | 1024 | +| `m` | 1024² (1,048,576) | +| `g` | 1024³ (1,073,741,824) | +| `t` | 1024⁴ | +| `w` | `sizeof(int)` | +| `x` | Multiplication (e.g., `2x512` = 1024) | + +### conv= Options + +| Conversion | Description | +|------------|-------------| +| `ascii` | EBCDIC to ASCII | +| `ebcdic` | ASCII to EBCDIC | +| `ibm` | ASCII to IBM EBCDIC | +| `block` | Newline-terminated to fixed-length records | +| `unblock` | Fixed-length records to newline-terminated | +| `lcase` | Convert to lowercase | +| `ucase` | Convert to uppercase | +| `swab` | Swap every pair of bytes | +| `noerror` | Continue after read errors | +| `notrunc` | Don't truncate output file | +| `sync` | Pad input blocks to ibs with NULs/spaces | +| `sparse` | Seek over zero-filled output blocks | +| `fsync` | Physically write output data and metadata | +| `fdatasync` | Physically write output data | +| `pareven` | Set even parity on output | +| `parodd` | Set odd parity on output | +| `parnone` | Strip parity from input | +| `parset` | Set parity bit on output | + +### iflag= and oflag= Options + +| Flag | Description | +|------|-------------| +| `direct` | Use `O_DIRECT` for direct I/O | +| `fullblock` | Accumulate full input blocks | + +### status= Values + +| Value | Description | +|-------|-------------| +| `noxfer` | Suppress transfer statistics | +| `none` | Suppress everything | +| `progress` | Print periodic progress via SIGALRM | + +## Source Architecture + +### File Responsibilities + +| File | Purpose | +|------|---------| +| `dd.c` | Main control flow: `main()`, `setup()`, `dd_in()`, `dd_close()` | +| `dd.h` | Shared types: `IO`, `STAT`, conversion flags (40+ bit flags) | +| `extern.h` | External function declarations and global variable exports | +| `args.c` | JCL argument parser: `jcl()`, operand table, size parsing | +| `conv.c` | Conversion functions: `def()`, `block()`, `unblock()` | +| `conv_tab.c` | ASCII/EBCDIC translation tables (256-byte arrays) | +| `gen.c` | Signal handling: `prepare_io()`, `before_io()`, `after_io()` | +| `misc.c` | Summary output, progress reporting, timing | +| `position.c` | Input/output positioning: `pos_in()`, `pos_out()` | + +### Key Data Structures + +#### IO Structure (I/O Stream State) + +```c +typedef struct { + u_char *db; /* Buffer address */ + u_char *dbp; /* Current buffer I/O pointer */ + ssize_t dbcnt; /* Current buffer byte count */ + ssize_t dbrcnt; /* Last read byte count */ + ssize_t dbsz; /* Block size */ + u_int flags; /* ISCHR | ISPIPE | ISTAPE | ISSEEK | NOREAD | ISTRUNC */ + const char *name; /* Filename */ + int fd; /* File descriptor */ + off_t offset; /* Blocks to skip */ + off_t seek_offset; /* Sparse output offset */ +} IO; +``` + +Device type flags: + +| Flag | Value | Meaning | +|------|-------|---------| +| `ISCHR` | 0x01 | Character device | +| `ISPIPE` | 0x02 | Pipe or socket | +| `ISTAPE` | 0x04 | Tape device | +| `ISSEEK` | 0x08 | Seekable | +| `NOREAD` | 0x10 | Write-only (output opened without read) | +| `ISTRUNC` | 0x20 | Truncatable | + +#### STAT Structure (Statistics) + +```c +typedef struct { + uintmax_t in_full; /* Full input blocks transferred */ + uintmax_t in_part; /* Partial input blocks */ + uintmax_t out_full; /* Full output blocks */ + uintmax_t out_part; /* Partial output blocks */ + uintmax_t trunc; /* Truncated records */ + uintmax_t swab; /* Odd-length swab blocks */ + uintmax_t bytes; /* Total bytes written */ + struct timespec start; /* Start timestamp */ +} STAT; +``` + +#### Conversion Flags + +The `ddflags` global is a 64-bit bitmask with 37 defined flags: + +```c +#define C_ASCII 0x0000000000000001ULL +#define C_BLOCK 0x0000000000000002ULL +#define C_BS 0x0000000000000004ULL +/* ... 34 more flags ... */ +#define C_IDIRECT 0x0000000800000000ULL +#define C_ODIRECT 0x0000001000000000ULL +``` + +### Argument Parsing (args.c) + +`dd` uses its own JCL-style parser instead of `getopt`: + +```c +static const struct arg { + const char *name; + void (*f)(char *); + uint64_t set, noset; +} args[] = { + { "bs", f_bs, C_BS, C_BS|C_IBS|C_OBS|C_OSYNC }, + { "cbs", f_cbs, C_CBS, C_CBS }, + { "conv", f_conv, 0, 0 }, + { "count", f_count, C_COUNT, C_COUNT }, + { "files", f_files, C_FILES, C_FILES }, + { "fillchar", f_fillchar, C_FILL, C_FILL }, + { "ibs", f_ibs, C_IBS, C_BS|C_IBS }, + { "if", f_if, C_IF, C_IF }, + { "iflag", f_iflag, 0, 0 }, + { "obs", f_obs, C_OBS, C_BS|C_OBS }, + { "of", f_of, C_OF, C_OF }, + { "oflag", f_oflag, 0, 0 }, + { "seek", f_seek, C_SEEK, C_SEEK }, + { "skip", f_skip, C_SKIP, C_SKIP }, + { "speed", f_speed, 0, 0 }, + { "status", f_status, C_STATUS,C_STATUS }, +}; +``` + +Arguments are looked up via `bsearch()` in the sorted table. + +The `noset` field prevents conflicting options: e.g., `bs=` sets +`C_BS|C_IBS|C_OBS|C_OSYNC` and forbids re-specifying any of those. + +### Conversion Functions (conv.c) + +Three conversion modes: + +#### `def()` — Default (No Conversion) + +```c +void def(void) +{ + if ((t = ctab) != NULL) + for (inp = in.dbp - (cnt = in.dbrcnt); cnt--; ++inp) + *inp = t[*inp]; + + out.dbp = in.dbp; + out.dbcnt = in.dbcnt; + + if (in.dbcnt >= out.dbsz) + dd_out(0); +} +``` + +Simple buffer pass-through with optional character table translation. + +#### `block()` — Variable → Fixed Length + +Converts newline-terminated records to fixed-length records padded with +spaces to `cbs` bytes. Used for ASCII-to-EBCDIC record conversion. + +#### `unblock()` — Fixed → Variable Length + +Converts fixed-length records back to newline-terminated format by +stripping trailing spaces and appending newlines. + +### Signal Handling (gen.c) + +`dd` handles several signals: + +| Signal | Handler | Purpose | +|--------|---------|---------| +| SIGINFO/SIGUSR1 | `siginfo_handler()` | Print transfer summary | +| SIGALRM | `sigalarm_handler()` | Periodic progress (with `status=progress`) | +| SIGINT/SIGTERM | Default + atexit | Print summary before exit | + +The `prepare_io()`, `before_io()`, `after_io()` functions manage signal +masking during I/O operations to prevent interruption during critical +sections. + +```c +volatile sig_atomic_t need_summary; /* Set by SIGINFO */ +volatile sig_atomic_t need_progress; /* Set by SIGALRM */ +volatile sig_atomic_t kill_signal; /* Set by termination signals */ +``` + +### Progress Reporting (misc.c) + +```c +void summary(void) +{ + /* Print: "X+Y records in\nA+B records out\nN bytes transferred in T secs" */ + double elapsed = secs_elapsed(); + /* Print human-readable transfer rate */ +} +``` + +The `format_scaled()` helper renders byte counts in human-readable form +(kB, MB, GB) using configurable base (1000 or 1024). + +### Buffer Allocation + +Direct I/O requires page-aligned buffers: + +```c +static void * +alloc_io_buffer(size_t size) +{ + if ((ddflags & (C_IDIRECT | C_ODIRECT)) == 0) + return malloc(size); + + size_t alignment = sysconf(_SC_PAGESIZE); + if (alignment == 0 || alignment == (size_t)-1) + alignment = 4096; + void *buf; + posix_memalign(&buf, alignment, size); + return buf; +} +``` + +### Sparse Output + +With `conv=sparse`, `dd` uses `lseek(2)` to skip over blocks of zeros +instead of writing them: + +```c +#define BISZERO(p, s) ((s) > 0 && *((const char *)p) == 0 && \ + !memcmp((const void *)(p), (const void *)((const char *)p + 1), (s) - 1)) +``` + +### Setup and I/O + +The `setup()` function in `dd.c` handles: + +1. Opening input/output files with appropriate flags +2. Detecting device types (`getfdtype()`) +3. Allocating I/O buffers +4. Setting up character conversion tables (parity, case) +5. Positioning input/output streams +6. Truncating output if needed + +```c +static void setup(void) +{ + /* Open input */ + if (in.name == NULL) { + in.name = "stdin"; + in.fd = STDIN_FILENO; + } else { + iflags = (ddflags & C_IDIRECT) ? O_DIRECT : 0; + in.fd = open(in.name, O_RDONLY | iflags, 0); + } + + /* Open output */ + oflags = O_CREAT; + if (!(ddflags & (C_SEEK | C_NOTRUNC))) + oflags |= O_TRUNC; + if (ddflags & C_OFSYNC) + oflags |= O_SYNC; + if (ddflags & C_ODIRECT) + oflags |= O_DIRECT; + out.fd = open(out.name, O_RDWR | oflags, DEFFILEMODE); + + /* Allocate buffers */ + if (!(ddflags & (C_BLOCK | C_UNBLOCK))) { + in.db = alloc_io_buffer(out.dbsz + in.dbsz - 1); + out.db = in.db; /* Single shared buffer */ + } else { + in.db = alloc_io_buffer(MAX(in.dbsz, cbsz) + cbsz); + out.db = alloc_io_buffer(out.dbsz + cbsz); + } +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `open(2)` | Open input/output files | +| `read(2)` | Read input blocks | +| `write(2)` | Write output blocks | +| `lseek(2)` | Position streams, sparse output | +| `ftruncate(2)` | Truncate output file | +| `close(2)` | Close file descriptors | +| `ioctl(2)` | Tape device queries (`MTIOCGET`) | +| `sigaction(2)` | Install signal handlers | +| `setitimer(2)` | Periodic SIGALRM for progress | +| `clock_gettime(2)` | Elapsed time calculation | +| `posix_memalign(3)` | Page-aligned buffers for direct I/O | +| `sysconf(3)` | Get page size | + +## Examples + +```sh +# Copy a disk image +dd if=/dev/sda of=disk.img bs=4M status=progress + +# Write an image to a device +dd if=image.iso of=/dev/sdb bs=4M conv=fsync + +# Create a 1GB sparse file +dd if=/dev/zero of=sparse.img bs=1 count=0 seek=1G + +# Convert ASCII to uppercase +dd if=input.txt of=output.txt conv=ucase + +# Copy with direct I/O +dd if=data.bin of=data2.bin bs=4k iflag=direct oflag=direct + +# Network transfer (with throttling) +dd if=large_file.tar bs=1M speed=10M | ssh remote 'dd of=large_file.tar' + +# Skip first 100 blocks of input +dd if=tape.raw of=data.bin skip=100 bs=512 + +# Show progress with SIGUSR1 +dd if=/dev/sda of=backup.img bs=1M & +kill -USR1 $! +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Error (open, read, write, or conversion failure) | + +## Summary Output Format + +``` +X+Y records in +A+B records out +N bytes (H) transferred in T.TTT secs (R/s) +``` + +Where: +- `X` = full input blocks, `Y` = partial input blocks +- `A` = full output blocks, `B` = partial output blocks +- `N` = total bytes, `H` = human-readable size +- `T` = elapsed seconds, `R` = transfer rate diff --git a/docs/handbook/corebinutils/df.md b/docs/handbook/corebinutils/df.md new file mode 100644 index 0000000000..c7b364f3e0 --- /dev/null +++ b/docs/handbook/corebinutils/df.md @@ -0,0 +1,264 @@ +# df — Display Filesystem Space Usage + +## Overview + +`df` reports total, used, and available disk space for mounted filesystems. +It supports multiple output formats (POSIX, human-readable, SI), filesystem +type filtering, inode display, and Linux-native mount information parsing. + +**Source**: `df/df.c` (single file, 100+ functions) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +df [-abcgHhiklmPTtx] [-t type] [file ...] +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-a` | Show all filesystems including zero-size ones | +| `-b` | Display sizes in 512-byte blocks | +| `-c` | Print total line at end | +| `-g` | Display sizes in gigabytes | +| `-H` | Human-readable with SI units (1000-based) | +| `-h` | Human-readable with binary units (1024-based) | +| `-i` | Show inode information instead of block usage | +| `-k` | Display sizes in kilobytes | +| `-l` | Show only local (non-remote) filesystems | +| `-m` | Display sizes in megabytes | +| `-P` | POSIX output format (one line per filesystem) | +| `-T` | Show filesystem type column | +| `-t type` | Filter to specified filesystem type | +| `-x` | Exclude specified filesystem type | +| `,` | Use thousands separator in output | + +## Source Analysis + +### Key Data Structures + +```c +struct options { + bool show_all; + bool show_inodes; + bool show_type; + bool posix_format; + bool human_readable; + bool si_units; + bool local_only; + bool show_total; + bool thousands_separator; + /* block size settings from -b/-k/-m/-g or BLOCKSIZE env */ +}; + +struct mount_entry { + char *source; /* Device path (/dev/sda1) */ + char *target; /* Mount point (/home) */ + char *fstype; /* Filesystem type (ext4, tmpfs) */ + char *options; /* Mount options (rw,noatime) */ + dev_t device; /* Device number */ +}; + +struct mount_table { + struct mount_entry *entries; + size_t count; + size_t capacity; +}; + +struct row { + char *filesystem; /* Formatted filesystem column */ + char *type; /* Filesystem type */ + char *size; /* Total size */ + char *used; /* Used space */ + char *avail; /* Available space */ + char *capacity; /* Percentage used */ + char *mount_point; /* Mount point path */ + char *iused; /* Inodes used */ + char *ifree; /* Inodes free */ +}; + +struct column_widths { + int filesystem; + int type; + int size; + int used; + int avail; + int capacity; + int mount_point; +}; +``` + +### Linux-Native Mount Parsing + +Unlike BSD which uses `getmntinfo(3)` / `statfs(2)`, this port reads +`/proc/self/mountinfo` directly: + +```c +static int +parse_mountinfo(struct mount_table *table) +{ + FILE *fp = fopen("/proc/self/mountinfo", "r"); + /* Parse each line: + * ID PARENT_ID MAJOR:MINOR ROOT MOUNT_POINT OPTIONS ... - FSTYPE SOURCE SUPER_OPTIONS + */ + while (getline(&line, &linesz, fp) != -1) { + /* Extract fields, unescape special characters */ + entry.source = unescape_mountinfo(source_str); + entry.target = unescape_mountinfo(target_str); + entry.fstype = strdup(fstype_str); + } +} +``` + +#### Escape Handling + +Mount paths in `/proc/self/mountinfo` use octal escapes for special +characters (spaces, newlines, backslashes): + +```c +static char * +unescape_mountinfo(const char *text) +{ + /* Convert \040 → space, \011 → tab, \012 → newline, \134 → backslash */ +} +``` + +### Filesystem Stats + +`df` uses `statvfs(2)` instead of BSD's `statfs(2)`: + +```c +struct statvfs sv; +if (statvfs(mount_point, &sv) != 0) + return -1; + +total_blocks = sv.f_blocks; +free_blocks = sv.f_bfree; +avail_blocks = sv.f_bavail; /* Available to unprivileged users */ +block_size = sv.f_frsize; + +total_inodes = sv.f_files; +free_inodes = sv.f_ffree; +``` + +### Remote Filesystem Detection + +The `-l` (local only) flag requires distinguishing local from remote +filesystems: + +```c +static bool +is_remote_filesystem(const struct mount_entry *entry) +{ + /* Check filesystem type */ + if (strcmp(entry->fstype, "nfs") == 0 || + strcmp(entry->fstype, "nfs4") == 0 || + strcmp(entry->fstype, "cifs") == 0 || + strcmp(entry->fstype, "smbfs") == 0 || + strcmp(entry->fstype, "fuse.sshfs") == 0) + return true; + + /* Check source for remote indicators (host:path or //host/share) */ + if (strchr(entry->source, ':') != NULL) + return true; + if (entry->source[0] == '/' && entry->source[1] == '/') + return true; + + return false; +} +``` + +### Human-Readable Formatting + +```c +static char * +format_human_readable(uint64_t bytes, bool si) +{ + unsigned int base = si ? 1000 : 1024; + const char *const *units = si ? si_units : binary_units; + /* Scale and format: "1.5G", "234M", "45K" */ +} +``` + +### BLOCKSIZE Environment + +The `BLOCKSIZE` environment variable can override the default block size: + +```c +char *bs = getenv("BLOCKSIZE"); +if (bs != NULL) { + /* Parse: "512", "K", "M", "G", or "1k", "4k", etc. */ +} +``` + +### Safe Integer Arithmetic + +`df` performs arithmetic with overflow protection: + +```c +/* Safe multiplication with clamping */ +static uint64_t +safe_mul(uint64_t a, uint64_t b) +{ + if (a != 0 && b > UINT64_MAX / a) + return UINT64_MAX; /* Clamp instead of overflow */ + return a * b; +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `statvfs(2)` | Query filesystem statistics | +| `stat(2)` | Identify filesystem for file arguments | +| `open(2)` / `read(2)` | Parse `/proc/self/mountinfo` | + +## Examples + +```sh +# Default output +df + +# Human-readable sizes +df -h + +# Show filesystem type +df -hT + +# Only local filesystems +df -hl + +# POSIX format +df -P + +# Inode usage +df -i + +# Specific filesystem +df /home + +# Total line +df -hc + +# Specific filesystem type +df -t ext4 +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Error accessing filesystem | + +## Differences from GNU df + +- Uses `/proc/self/mountinfo` directly (no libmount) +- No `--output` for custom column selection +- `-c` for total line (GNU uses `--total`) +- BLOCKSIZE env var compatibility (BSD convention) +- No `--sync` / `--no-sync` options diff --git a/docs/handbook/corebinutils/echo.md b/docs/handbook/corebinutils/echo.md new file mode 100644 index 0000000000..da7df16cec --- /dev/null +++ b/docs/handbook/corebinutils/echo.md @@ -0,0 +1,158 @@ +# echo — Write Arguments to Standard Output + +## Overview + +`echo` writes its arguments to standard output, separated by spaces, followed +by a newline. It is intentionally minimal — the FreeBSD/BSD implementation +does not support GNU-style `-e` escape processing. + +**Source**: `echo/echo.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +echo [-n] [string ...] +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-n` | Suppress trailing newline | + +Only a leading `-n` is recognized as an option. Any other arguments +(including `--`) are treated as literal strings and printed. + +## Source Analysis + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse arguments and write output | +| `write_all()` | Retry-safe `write(2)` loop handling `EINTR` | +| `warn_errno()` | Error reporting to stderr | +| `trim_trailing_backslash_c()` | Check if final argument ends with `\c` | + +### Option Processing + +`echo` does NOT use `getopt(3)`. It manually checks for `-n`: + +```c +int main(int argc, char *argv[]) +{ + bool suppress_newline = false; + + argv++; /* Skip program name */ + + /* Only leading -n flags are consumed */ + while (*argv && strcmp(*argv, "-n") == 0) { + suppress_newline = true; + argv++; + } + + /* Everything else is literal output */ +} +``` + +### The `\c` Convention + +If the **last** argument ends with `\c`, the trailing newline is suppressed +and the `\c` itself is not printed: + +```c +static bool +trim_trailing_backslash_c(const char *arg, size_t *len) +{ + if (*len >= 2 && arg[*len - 2] == '\\' && arg[*len - 1] == 'c') { + *len -= 2; + return true; /* Suppress newline */ + } + return false; +} +``` + +### I/O Strategy + +Instead of `printf` or `writev`, echo uses a `write(2)` loop: + +```c +static int +write_all(int fd, const void *buf, size_t count) +{ + const char *p = buf; + ssize_t n; + + while (count > 0) { + n = write(fd, p, count); + if (n < 0) { + if (errno == EINTR) + continue; + return -1; + } + p += n; + count -= n; + } + return 0; +} +``` + +This avoids `IOV_MAX` limitations that would apply with `writev(2)` when +there are many arguments. + +### Key Behaviors + +| Input | Output | Notes | +|-------|--------|-------| +| `echo hello` | `hello\n` | Basic usage | +| `echo -n hello` | `hello` | No trailing newline | +| `echo -n -n hello` | `hello` | Multiple `-n` consumed | +| `echo -- hello` | `-- hello\n` | `--` is NOT end-of-options | +| `echo -e hello` | `-e hello\n` | `-e` is NOT recognized | +| `echo "hello\c"` | `hello` | `\c` suppresses newline | +| `echo ""` | `\n` | Empty string → just newline | + +## Portability Notes + +- **BSD echo** (this implementation): Only `-n` and trailing `\c` +- **GNU echo**: Supports `-e` for escape sequences (`\n`, `\t`, etc.) + and `-E` to disable them +- **POSIX echo**: Behavior of `-n` and backslash sequences is + implementation-defined +- **Shell built-in**: Most shells have a built-in `echo` that may differ + from the external command + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `write(2)` | All output to stdout | + +## Examples + +```sh +# Simple output +echo Hello, World! + +# No trailing newline +echo -n "prompt> " + +# Literal dash-n (only leading -n is recognized) +echo "The flag is -n" + +# Multiple arguments +echo one two three +# → "one two three" + +# Suppress newline with \c +echo "no newline\c" +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Write error | diff --git a/docs/handbook/corebinutils/ed.md b/docs/handbook/corebinutils/ed.md new file mode 100644 index 0000000000..3e51e576b6 --- /dev/null +++ b/docs/handbook/corebinutils/ed.md @@ -0,0 +1,306 @@ +# ed — Line Editor + +## Overview + +`ed` is the standard POSIX line editor. It operates on a text buffer that +resides in a temporary scratch file, supports regular expression search and +substitution, global commands, undo, and file I/O. This implementation derives +from Andrew Moore's BSD `ed` and the algorithm described in Kernighan and +Plauger's *Software Tools in Pascal*. + +**Source**: `ed/main.c`, `ed/ed.h`, `ed/compat.c`, `ed/compat.h`, `ed/buf.c`, +`ed/glbl.c`, `ed/io.c`, `ed/re.c`, `ed/sub.c`, `ed/undo.c` +**Origin**: BSD 4.4, Andrew Moore (Talke Studio) +**License**: BSD-2-Clause + +## Synopsis + +``` +ed [-] [-sx] [-p string] [file] +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-` | Suppress diagnostics (same as `-s`) | +| `-s` | Script mode: suppress byte counts and `!` prompts | +| `-x` | Encryption mode (**not supported on Linux**) | +| `-p string` | Set the command prompt (default: no prompt) | + +## Source Architecture + +### File Responsibilities + +| File | Purpose | Key Functions | +|------|---------|---------------| +| `main.c` | Main loop, command dispatch, signals | `main()`, `exec_command()`, signal handlers | +| `ed.h` | Types, constants, function prototypes | `line_t`, `undo_t`, error codes | +| `compat.c/h` | Linux portability shims | `strlcpy`, `strlcat` replacements | +| `buf.c` | Scratch file buffer management | `get_sbuf_line()`, `put_sbuf_line()` | +| `glbl.c` | Global command (g/re/cmd) | `exec_global()`, mark management | +| `io.c` | File I/O (read/write) | `read_file()`, `write_file()`, `read_stream()` | +| `re.c` | Regular expression handling | `get_compiled_pattern()`, `search_*()` | +| `sub.c` | Substitution command | `substitute()`, replacement parsing | +| `undo.c` | Undo stack management | `push_undo_stack()`, `undo_last()` | + +### Key Data Structures + +#### Line Node (linked list element) + +```c +typedef struct line { + struct line *q_forw; /* Next line in buffer */ + struct line *q_back; /* Previous line in buffer */ + off_t seek; /* Byte offset in scratch file */ + int len; /* Line length */ +} line_t; +``` + +The edit buffer is a doubly-linked circular list of `line_t` nodes. Line +content is stored in an external scratch file, not in memory. + +#### Undo Record + +```c +typedef struct undo { + int type; /* UADD, UDEL, UMOV, VMOV */ + line_t *h; /* Head of affected line range */ + line_t *t; /* Tail of affected line range */ + /* ... */ +} undo_t; +``` + +#### Constants + +```c +#define ERR (-2) /* General error */ +#define EMOD (-3) /* Buffer modified warning */ +#define FATAL (-4) /* Fatal error (abort) */ + +#define MINBUFSZ 512 +#define SE_MAX 30 /* Max regex subexpressions */ +#define LINECHARS INT_MAX +``` + +#### Global Flags + +```c +#define GLB 001 /* Global command active */ +#define GPR 002 /* Print after command */ +#define GLS 004 /* List after command */ +#define GNP 010 /* Enumerate after command */ +#define GSG 020 /* Global substitute */ +``` + +### Main Loop + +```c +int main(volatile int argc, char **volatile argv) +{ + setlocale(LC_ALL, ""); + + /* Detect if invoked as "red" (restricted ed) */ + red = (n = strlen(argv[0])) > 2 && argv[0][n - 3] == 'r'; + + /* Parse options */ + while ((c = getopt(argc, argv, "p:sx")) != -1) { ... } + + /* Signal setup */ + signal(SIGHUP, signal_hup); /* Emergency save */ + signal(SIGQUIT, SIG_IGN); /* Ignore quit */ + signal(SIGINT, signal_int); /* Interrupt handling */ + signal(SIGWINCH, handle_winch); /* Terminal resize */ + + /* Initialize buffers, load file if specified */ + init_buffers(); + if (argc && is_legal_filename(*argv)) + read_file(*argv, 0); + + /* Command loop */ + for (;;) { + if (prompt) fputs(prompt, stdout); + status = get_tty_line(); + if (status == EOF) break; + status = exec_command(); + } +} +``` + +### Buffer Management (buf.c) + +The scratch file strategy avoids unlimited memory consumption: + +- Lines are stored in a temporary file created with `mkstemp(3)` +- `put_sbuf_line()` appends a line to the scratch file and returns its offset +- `get_sbuf_line()` reads a line back from the scratch file by offset +- The `line_t` linked list tracks offsets and lengths, not actual text + +```c +/* Append line to scratch file, return its node */ +line_t *put_sbuf_line(const char *text); + +/* Read line from scratch file via offset */ +char *get_sbuf_line(const line_t *lp); +``` + +### File I/O (io.c) + +```c +long read_file(char *fn, long n) +{ + /* Open file or pipe (if fn starts with '!') */ + fp = (*fn == '!') ? popen(fn + 1, "r") : fopen(strip_escapes(fn), "r"); + + /* Read lines into buffer after line n */ + size = read_stream(fp, n); + + /* Print byte count unless in script mode */ + if (!scripted) + fprintf(stdout, "%lu\n", size); +} +``` + +The `read_stream()` function reads from `fp`, appending each line to the +edit buffer via `put_sbuf_line()`, and maintaining the undo stack for +rollback. + +### Regular Expressions (re.c) + +Uses POSIX `regex.h` (via `regcomp(3)` / `regexec(3)`): + +```c +typedef regex_t pattern_t; + +pattern_t *get_compiled_pattern(void); +/* Compiles the current regex pattern, caching the last used pattern */ +``` + +### Substitution (sub.c) + +The `s/pattern/replacement/flags` command: +- Supports `\1` through `\9` backreferences +- `g` flag for global replacement +- Count for nth occurrence replacement +- `&` in replacement refers to the matched text + +### Undo (undo.c) + +Every buffer modification pushes an undo record: + +```c +undo_t *push_undo_stack(int type, long from, long to); +int undo_last(void); /* Reverse last modification */ +``` + +Undo types: `UADD` (lines added), `UDEL` (lines deleted), `UMOV` (lines moved). + +### Signal Handling + +| Signal | Handler | Action | +|--------|---------|--------| +| `SIGHUP` | `signal_hup()` | Save buffer to `ed.hup` and exit | +| `SIGINT` | `signal_int()` | Set interrupt flag, longjmp to command prompt | +| `SIGWINCH` | `handle_winch()` | Update terminal width for `l` command | +| `SIGQUIT` | `SIG_IGN` | Ignored | + +### Restricted Mode (red) + +When invoked as `red`, the editor restricts: +- Shell commands (`!command`) are forbidden +- Filenames with `/` or starting with `!` are rejected +- Directory changes are prevented + +## Commands Reference + +| Command | Description | +|---------|-------------| +| `(.)a` | Append text after line | +| `(.)i` | Insert text before line | +| `(.,.)c` | Change (replace) lines | +| `(.,.)d` | Delete lines | +| `(.,.)p` | Print lines | +| `(.,.)l` | List lines (show non-printable characters) | +| `(.,.)n` | Number and print lines | +| `(.,.)m(.)` | Move lines | +| `(.,.)t(.)` | Copy (transfer) lines | +| `(.,.)s/re/replacement/flags` | Substitute | +| `(.,.)g/re/command` | Global: apply command to matching lines | +| `(.,.)v/re/command` | Inverse global: apply to non-matching lines | +| `(.,.)w file` | Write lines to file | +| `(.,.)W file` | Append lines to file | +| `e file` | Edit file (replaces buffer) | +| `E file` | Edit unconditionally | +| `f file` | Set default filename | +| `r file` | Read file into buffer | +| `(.)r !command` | Read command output into buffer | +| `u` | Undo last command | +| `(.)=` | Print line number | +| `(.,.)j` | Join lines | +| `(.)k(c)` | Mark line with character c | +| `q` | Quit (warns if modified) | +| `Q` | Quit unconditionally | +| `H` | Toggle verbose error messages | +| `h` | Print last error message | +| `!command` | Execute shell command | + +### Addressing + +| Address | Meaning | +|---------|---------| +| `.` | Current line | +| `$` | Last line | +| `n` | Line number n | +| `-n` / `+n` | Relative to current line | +| `/re/` | Next line matching regex | +| `?re?` | Previous line matching regex | +| `'c` | Line marked with character c | +| `,` | Equivalent to `1,$` | +| `;` | Equivalent to `.,$` | + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `mkstemp(3)` | Create scratch file | +| `read(2)` / `write(2)` | Scratch file I/O | +| `lseek(2)` | Position in scratch file | +| `fopen(3)` / `fclose(3)` | Read/write user files | +| `popen(3)` / `pclose(3)` | Shell command execution | +| `regcomp(3)` / `regexec(3)` | Regular expression matching | +| `sigsetjmp(3)` / `siglongjmp(3)` | Interrupt recovery | + +## Examples + +```sh +# Edit a file +ed myfile.txt + +# Script mode (for automation) +printf '1,3p\nq\n' | ed -s myfile.txt + +# Global substitution +printf 'g/old/s//new/g\nw\nq\n' | ed -s myfile.txt + +# With prompt +ed -p '> ' myfile.txt + +# Read from pipe +echo '!ls -la' | ed +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Error | +| 2 | Usage error | + +## Linux-Specific Notes + +- The `-x` (encryption) option prints an error and exits, as Linux + does not provide the BSD `des_setkey(3)` functions. +- `strlcpy(3)` / `strlcat(3)` are provided by `compat.c` when not + available in the system libc. +- The `SIGWINCH` handler uses `ioctl(TIOCGWINSZ)` for terminal size. diff --git a/docs/handbook/corebinutils/error-handling.md b/docs/handbook/corebinutils/error-handling.md new file mode 100644 index 0000000000..f82eec0b1c --- /dev/null +++ b/docs/handbook/corebinutils/error-handling.md @@ -0,0 +1,315 @@ +# Error Handling — Corebinutils Patterns + +## Overview + +Corebinutils uses a layered error handling strategy: BSD `err(3)` functions +as the primary interface, custom `error_errno()`/`error_msg()` wrappers in +utilities that need more control, and consistent exit codes following +POSIX conventions. + +## err(3) Family + +The BSD `<err.h>` functions are used throughout: + +```c +#include <err.h> + +/* Fatal errors (print message + errno + exit) */ +err(1, "open '%s'", filename); +/* → "utility: open 'file.txt': No such file or directory\n" */ + +/* Fatal errors (print message + exit, no errno) */ +errx(2, "invalid option: -%c", ch); +/* → "utility: invalid option: -z\n" */ + +/* Non-fatal warnings (print message + errno, continue) */ +warn("stat '%s'", filename); +/* → "utility: stat 'file.txt': Permission denied\n" */ + +/* Non-fatal warnings (print message, no errno, continue) */ +warnx("skipping '%s': not a regular file", filename); +/* → "utility: skipping 'foo': not a regular file\n" */ +``` + +### When to Use Each + +| Function | Fatal? | Shows errno? | Use Case | +|----------|--------|-------------|----------| +| `err()` | Yes | Yes | Syscall failure, must exit | +| `errx()` | Yes | No | Bad input, usage error | +| `warn()` | No | Yes | Syscall failure, can continue | +| `warnx()` | No | No | Validation issue, can continue | + +## Custom Error Functions + +Several utilities define their own error reporting for program name +control or additional formatting: + +### Pattern: error_errno / error_msg + +```c +static const char *progname; + +static void +error_errno(const char *fmt, ...) +{ + int saved_errno = errno; + va_list ap; + + fprintf(stderr, "%s: ", progname); + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + va_end(ap); + fprintf(stderr, ": %s\n", strerror(saved_errno)); +} + +static void +error_msg(const char *fmt, ...) +{ + va_list ap; + + fprintf(stderr, "%s: ", progname); + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + va_end(ap); + fputc('\n', stderr); +} +``` + +Used by: `mkdir`, `chmod`, `hostname`, `domainname`, `nproc` + +### Pattern: die / die_errno + +```c +static void __dead2 +die(const char *fmt, ...) +{ + va_list ap; + fprintf(stderr, "%s: ", progname); + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + va_end(ap); + fputc('\n', stderr); + exit(1); +} + +static void __dead2 +die_errno(const char *fmt, ...) +{ + int saved = errno; + va_list ap; + fprintf(stderr, "%s: ", progname); + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + va_end(ap); + fprintf(stderr, ": %s\n", strerror(saved)); + exit(1); +} +``` + +Used by: `sleep`, `echo` + +### Pattern: verror_message (centralized) + +```c +static void +verror_message(const char *fmt, va_list ap, bool with_errno) +{ + int saved = errno; + fprintf(stderr, "%s: ", progname); + vfprintf(stderr, fmt, ap); + if (with_errno) + fprintf(stderr, ": %s", strerror(saved)); + fputc('\n', stderr); +} +``` + +## Exit Code Conventions + +### Standard Codes + +| Code | Meaning | Used By | +|------|---------|---------| +| 0 | Success | All utilities | +| 1 | General error | Most utilities | +| 2 | Usage/syntax error | test, expr, timeout, mv | + +### Utility-Specific Codes + +| Utility | Code | Meaning | +|---------|------|---------| +| `test` | 0 | Expression is true | +| `test` | 1 | Expression is false | +| `test` | 2 | Invalid expression | +| `expr` | 0 | Non-null, non-zero result | +| `expr` | 1 | Null or zero result | +| `expr` | 2 | Invalid expression | +| `expr` | 3 | Internal error | +| `timeout` | 124 | Command timed out | +| `timeout` | 125 | `timeout` itself failed | +| `timeout` | 126 | Command not executable | +| `timeout` | 127 | Command not found | + +### Exit on First Error vs. Accumulate + +Two patterns are observed: + +```c +/* Pattern 1: Exit immediately on error */ +if (stat(path, &sb) < 0) + err(1, "stat"); + +/* Pattern 2: Accumulate errors, exit with status */ +int errors = 0; +for (i = 0; i < argc; i++) { + if (process(argv[i]) < 0) { + warn("failed: %s", argv[i]); + errors = 1; + } +} +return errors; +``` + +Pattern 2 is used by multi-argument utilities (rm, chmod, cp, ln) +to process as many arguments as possible even when some fail. + +## errno Preservation + +All error functions save `errno` before calling any function that +might modify it (like `fprintf`): + +```c +static void +error_errno(const char *fmt, ...) +{ + int saved = errno; /* Save before fprintf */ + /* ... */ + fprintf(stderr, ": %s\n", strerror(saved)); +} +``` + +## Signal Error Recovery + +### sigsetjmp/siglongjmp (ed) + +```c +static sigjmp_buf jmpbuf; + +static void +signal_handler(int sig) +{ + (void)sig; + siglongjmp(jmpbuf, 1); +} + +/* In main loop */ +if (sigsetjmp(jmpbuf, 1) != 0) { + /* Returned from signal — reset state */ + fputs("?\n", stderr); +} +``` + +### Flag-Based (sleep, dd) + +```c +static volatile sig_atomic_t got_signal; + +static void +handler(int sig) +{ + got_signal = sig; +} + +/* In main loop */ +if (got_signal) { + cleanup(); + exit(128 + got_signal); +} +``` + +## Validation Patterns + +### At System Boundaries + +```c +/* Validate user input */ +if (argc < 2) { + usage(); + /* NOTREACHED */ +} + +/* Validate parsed values */ +if (val < 0 || val > MAX_VALUE) + errx(2, "value out of range: %ld", val); + +/* Validate system call results */ +if (open(path, O_RDONLY) < 0) + err(1, "open"); +``` + +### String-to-Number Conversion + +```c +static long +parse_number(const char *str) +{ + char *end; + errno = 0; + long val = strtol(str, &end, 10); + + if (end == str || *end != '\0') + errx(2, "not a number: %s", str); + if (errno == ERANGE) + errx(2, "number out of range: %s", str); + + return val; +} +``` + +## Write Error Detection + +### Pattern: Check stdout at exit + +```c +/* Catch write errors (e.g., broken pipe) */ +if (fclose(stdout) == EOF) + err(1, "stdout"); + +/* Or equivalently */ +if (fflush(stdout) == EOF) + err(1, "write error"); +``` + +### Pattern: write_all loop + +```c +static int +write_all(int fd, const void *buf, size_t count) +{ + const char *p = buf; + while (count > 0) { + ssize_t n = write(fd, p, count); + if (n < 0) { + if (errno == EINTR) + continue; + return -1; + } + p += n; + count -= n; + } + return 0; +} +``` + +Used by: `echo`, `cat`, `dd` + +## Summary of Conventions + +1. Use `err(3)` family when available and sufficient +2. Define custom wrappers only when program name control is needed +3. Save `errno` immediately — before any library calls +4. Exit 0 for success, 1 for errors, 2 for usage +5. Multi-argument commands accumulate errors +6. Validate at system boundaries (input parsing, syscall returns) +7. Signal handlers set flags only — no complex logic +8. Always check `write(2)` / `fclose(3)` return values diff --git a/docs/handbook/corebinutils/expr.md b/docs/handbook/corebinutils/expr.md new file mode 100644 index 0000000000..cd7e8a214c --- /dev/null +++ b/docs/handbook/corebinutils/expr.md @@ -0,0 +1,194 @@ +# expr — Evaluate Expressions + +## Overview + +`expr` evaluates arithmetic, string, and logical expressions from the command +line and writes the result to standard output. It implements a recursive +descent parser with automatic type coercion between strings, numeric strings, +and integers. + +**Source**: `expr/expr.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +expr expression +``` + +## Source Analysis + +### Value Types + +```c +enum value_type { + INTEGER, /* Pure integer (from arithmetic) */ + NUMERIC_STRING, /* String that looks like a number */ + STRING, /* General string */ +}; + +struct value { + enum value_type type; + union { + intmax_t ival; + char *sval; + }; +}; +``` + +`expr` automatically coerces between types during operations. A value +like `"42"` starts as `NUMERIC_STRING` and is promoted to `INTEGER` for +arithmetic. + +### Parser Architecture + +`expr` uses a recursive descent parser with operator precedence: + +``` +parse_expr() + └── parse_or() /* | operator (lowest precedence) */ + └── parse_and() /* & operator */ + └── parse_compare() /* =, !=, <, >, <=, >= */ + └── parse_add() /* +, - */ + └── parse_mul() /* *, /, % */ + └── parse_primary() /* atoms, ( expr ), : regex */ +``` + +### Operators + +#### Arithmetic Operators + +| Operator | Description | Example | +|----------|-------------|---------| +| `+` | Addition | `expr 2 + 3` → `5` | +| `-` | Subtraction | `expr 5 - 2` → `3` | +| `*` | Multiplication | `expr 4 \* 3` → `12` | +| `/` | Integer division | `expr 10 / 3` → `3` | +| `%` | Modulo | `expr 10 % 3` → `1` | + +#### Comparison Operators + +| Operator | Description | Example | +|----------|-------------|---------| +| `=` | Equal | `expr abc = abc` → `1` | +| `!=` | Not equal | `expr abc != def` → `1` | +| `<` | Less than | `expr 1 \< 2` → `1` | +| `>` | Greater than | `expr 2 \> 1` → `1` | +| `<=` | Less or equal | `expr 1 \<= 1` → `1` | +| `>=` | Greater or equal | `expr 2 \>= 1` → `1` | + +Comparisons between numeric strings use numeric ordering; otherwise +locale-aware string comparison (`strcoll`) is used. + +#### Logical Operators + +| Operator | Description | Example | +|----------|-------------|---------| +| `\|` | OR (short-circuit) | `expr 0 \| 5` → `5` | +| `&` | AND (short-circuit) | `expr 1 \& 2` → `1` | + +#### String/Regex Operators + +| Operator | Description | Example | +|----------|-------------|---------| +| `:` | Regex match | `expr hello : 'hel\(.*\)'` → `lo` | +| `match` | Same as `:` | `expr match hello 'h.*'` | +| `substr` | Substring | `expr substr hello 2 3` → `ell` | +| `index` | Character position | `expr index hello l` → `3` | +| `length` | String length | `expr length hello` → `5` | + +### Regex Matching + +The `:` operator uses POSIX basic regular expressions (`regcomp` with +`REG_NOSUB` or group capture): + +```c +/* expr STRING : REGEX */ +/* Returns captured \(...\) group or match length */ +``` + +If the regex contains `\(...\)`, the captured substring is returned. +Otherwise, the length of the match is returned. + +### Overflow Checking + +All arithmetic operations check for integer overflow: + +```c +static intmax_t +safe_add(intmax_t a, intmax_t b) +{ + if ((b > 0 && a > INTMAX_MAX - b) || + (b < 0 && a < INTMAX_MIN - b)) + errx(2, "integer overflow"); + return a + b; +} +``` + +### Locale Awareness + +String comparisons use `strcoll(3)` for locale-correct ordering: + +```c +/* Compare as strings using locale collation */ +result = strcoll(left->sval, right->sval); +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `regcomp(3)` / `regexec(3)` | Regular expression matching | +| `strcoll(3)` | Locale-aware string comparison | + +## Examples + +```sh +# Arithmetic +expr 2 + 3 # → 5 +expr 10 / 3 # → 3 +expr 7 % 4 # → 3 + +# String length +expr length "hello" # → 5 + +# Regex match (capture group) +expr "hello-world" : 'hello-\(.*\)' # → world + +# Regex match (length) +expr "hello" : '.*' # → 5 + +# Substring +expr substr "hello" 2 3 # → ell + +# Index (first occurrence) +expr index "hello" "lo" # → 3 + +# Comparisons +expr 42 = 42 # → 1 +expr abc \< def # → 1 + +# Logical OR (returns first non-zero/non-empty) +expr 0 \| 5 # → 5 +expr "" \| alt # → alt + +# In shell scripts +count=$(expr $count + 1) +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Expression is neither null nor zero | +| 1 | Expression is null or zero | +| 2 | Expression is invalid | +| 3 | Internal error | + +## Differences from GNU expr + +- No `--help` or `--version` +- Identical POSIX semantics for `:` operator +- Locale-aware string comparison by default +- Overflow results in error, not wraparound diff --git a/docs/handbook/corebinutils/hostname.md b/docs/handbook/corebinutils/hostname.md new file mode 100644 index 0000000000..d6ea83bff2 --- /dev/null +++ b/docs/handbook/corebinutils/hostname.md @@ -0,0 +1,154 @@ +# hostname — Get or Set the System Hostname + +## Overview + +`hostname` reads or sets the system hostname. On Linux it uses `uname(2)` to +read and `sethostname(2)` to write. The `-f` (FQDN) option is explicitly +unsupported because resolving a fully qualified domain name requires +NSS/DNS, which is outside the scope of a core utility. + +**Source**: `hostname/hostname.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +hostname [-s | -d] [name-of-host] +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-s` | Print short hostname (truncate at first `.`) | +| `-d` | Print domain part only (after first `.`) | +| `-f` | **Not supported on Linux** — exits with error | + +## Source Analysis + +### Data Structures + +```c +struct options { + bool short_name; /* -s: truncate at first dot */ + bool domain_only; /* -d: print after first dot */ + bool set_mode; /* hostname was provided as argument */ + const char *new_hostname; +}; +``` + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Dispatch between get/set modes | +| `parse_args()` | `getopt(3)` option parsing | +| `dup_hostname()` | Fetch hostname from `uname(2)` and duplicate it | +| `print_hostname()` | Print full, short, or domain part | +| `set_hostname()` | Set hostname via `sethostname(2)` | +| `linux_hostname_max()` | Query max hostname length from UTS namespace | + +### Reading the Hostname + +```c +static char * +dup_hostname(void) +{ + struct utsname uts; + + if (uname(&uts) < 0) + err(1, "uname"); + return strdup(uts.nodename); +} +``` + +### Setting the Hostname + +```c +static void +set_hostname(const char *name) +{ + size_t max_len = linux_hostname_max(); + size_t len = strlen(name); + + if (len > max_len) + errx(1, "hostname too long: %zu > %zu", len, max_len); + + if (sethostname(name, len) < 0) + err(1, "sethostname"); +} +``` + +### Short/Domain Modes + +```c +static void +print_hostname(const char *hostname, const struct options *opts) +{ + if (opts->short_name) { + /* Truncate at first '.' */ + const char *dot = strchr(hostname, '.'); + if (dot) + printf("%.*s\n", (int)(dot - hostname), hostname); + else + puts(hostname); + } else if (opts->domain_only) { + /* Print after first '.' or empty */ + const char *dot = strchr(hostname, '.'); + puts(dot ? dot + 1 : ""); + } else { + puts(hostname); + } +} +``` + +### Max Hostname Length + +```c +static size_t +linux_hostname_max(void) +{ + long val = sysconf(_SC_HOST_NAME_MAX); + return (val > 0) ? (size_t)val : 64; +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `uname(2)` | Read current hostname | +| `sethostname(2)` | Set new hostname (requires `CAP_SYS_ADMIN`) | +| `sysconf(3)` | Query `_SC_HOST_NAME_MAX` | + +## Examples + +```sh +# Print hostname +hostname + +# Print short hostname +hostname -s + +# Print domain part +hostname -d + +# Set hostname (requires root) +hostname myserver.example.com +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Error (sethostname failed, invalid option) | + +## Differences from GNU hostname + +- No `-f` / `--fqdn` — Linux requires NSS for FQDN resolution +- No `--ip-address` / `-i` +- No `--alias` / `-a` +- No `--all-fqdns` / `--all-ip-addresses` +- Simpler: read or set only, no DNS lookups diff --git a/docs/handbook/corebinutils/kill.md b/docs/handbook/corebinutils/kill.md new file mode 100644 index 0000000000..eb4d8d55bf --- /dev/null +++ b/docs/handbook/corebinutils/kill.md @@ -0,0 +1,237 @@ +# kill — Send Signals to Processes + +## Overview + +`kill` sends signals to processes or lists available signals. This +implementation supports both numeric and named signal specifications, +real-time signals (`SIGRT`), and can be compiled as a shell built-in. + +**Source**: `kill/kill.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +kill [-s signal_name] pid ... +kill -l [exit_status ...] +kill -signal_name pid ... +kill -signal_number pid ... +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-s signal` | Send the named signal | +| `-l` | List available signal names | +| `-signal_name` | Send named signal (e.g., `-TERM`) | +| `-signal_number` | Send signal by number (e.g., `-15`) | + +## Source Analysis + +### Signal Table + +The signal table maps names to numbers using a macro-generated array: + +```c +struct signal_entry { + const char *name; + int number; +}; + +#define SIGNAL_ENTRY(sig) { #sig, SIG##sig } + +static const struct signal_entry signal_table[] = { + SIGNAL_ENTRY(HUP), + SIGNAL_ENTRY(INT), + SIGNAL_ENTRY(QUIT), + SIGNAL_ENTRY(ILL), + SIGNAL_ENTRY(TRAP), + SIGNAL_ENTRY(ABRT), + SIGNAL_ENTRY(EMT), /* If available */ + SIGNAL_ENTRY(FPE), + SIGNAL_ENTRY(KILL), + SIGNAL_ENTRY(BUS), + SIGNAL_ENTRY(SEGV), + SIGNAL_ENTRY(SYS), + SIGNAL_ENTRY(PIPE), + SIGNAL_ENTRY(ALRM), + SIGNAL_ENTRY(TERM), + SIGNAL_ENTRY(URG), + SIGNAL_ENTRY(STOP), + SIGNAL_ENTRY(TSTP), + SIGNAL_ENTRY(CONT), + SIGNAL_ENTRY(CHLD), + SIGNAL_ENTRY(TTIN), + SIGNAL_ENTRY(TTOU), + SIGNAL_ENTRY(IO), + SIGNAL_ENTRY(XCPU), + SIGNAL_ENTRY(XFSZ), + SIGNAL_ENTRY(VTALRM), + SIGNAL_ENTRY(PROF), + SIGNAL_ENTRY(WINCH), + SIGNAL_ENTRY(INFO), /* If available */ + SIGNAL_ENTRY(USR1), + SIGNAL_ENTRY(USR2), + /* ... */ +}; +``` + +### Key Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options and dispatch signal or list | +| `normalize_signal_name()` | Canonicalize signal name (strip `SIG` prefix, uppercase) | +| `parse_signal_option_token()` | Parse `-SIGNAL` shorthand | +| `parse_signal_for_dash_s()` | Parse signal name/number for `-s` | +| `signal_name_for_number()` | Reverse lookup: number → name | +| `printsignals()` | List all signals (for `-l`) | +| `max_signal_number()` | Find highest valid signal | +| `parse_pid_argument()` | Parse and validate PID string | + +### Signal Name Normalization + +```c +static const char * +normalize_signal_name(const char *name) +{ + /* Strip optional "SIG" prefix */ + if (strncasecmp(name, "SIG", 3) == 0) + name += 3; + + /* Case-insensitive lookup in signal_table */ + for (size_t i = 0; i < SIGNAL_TABLE_SIZE; i++) { + if (strcasecmp(name, signal_table[i].name) == 0) + return signal_table[i].name; + } + return NULL; +} +``` + +### Parsing Signal Options + +The option parsing handles three forms: + +```c +/* Form 1: kill -s SIGNAL pid */ +/* Form 2: kill -SIGNAL pid (dash prefix) */ +/* Form 3: kill -NUMBER pid */ + +static int +parse_signal_option_token(const char *token) +{ + /* Try as number first */ + char *end; + long val = strtol(token, &end, 10); + if (*end == '\0' && val >= 0 && val <= max_signal_number()) + return (int)val; + + /* Try as name */ + const char *name = normalize_signal_name(token); + if (name) { + /* Look up number from normalized name */ + return number_for_name(name); + } + + errx(2, "unknown signal: %s", token); +} +``` + +### Real-Time Signal Support + +```c +/* SIGRTMIN+n and SIGRTMAX-n notation */ +#ifdef SIGRTMIN + if (strncasecmp(name, "RTMIN", 5) == 0) { + int offset = (name[5] == '+') ? atoi(name + 6) : 0; + return SIGRTMIN + offset; + } + if (strncasecmp(name, "RTMAX", 5) == 0) { + int offset = (name[5] == '-') ? atoi(name + 6) : 0; + return SIGRTMAX - offset; + } +#endif +``` + +### Listing Signals + +```c +static void +printsignals(FILE *fp) +{ + int columns = 0; + for (int sig = 1; sig <= max_signal_number(); sig++) { + const char *name = signal_name_for_number(sig); + if (name) { + fprintf(fp, "%s", name); + if (++columns >= 8) { + fputc('\n', fp); + columns = 0; + } else { + fputc('\t', fp); + } + } + } +} +``` + +### Signal from Exit Status + +When given an exit status with `-l`, the signal number is extracted: + +```c +/* exit_status > 128 means killed by signal (exit_status - 128) */ +if (exit_status > 128) + sig = exit_status - 128; +``` + +### Shell Built-in Integration + +```c +#ifdef SHELL +/* When compiled into the shell (sh/), kill is a built-in */ +/* Uses different error reporting and argument parsing */ +int killcmd(int argc, char *argv[]); +#endif +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `kill(2)` | Send signal to process or process group | + +## Examples + +```sh +# Send SIGTERM (default) +kill 1234 + +# Send SIGKILL +kill -9 1234 +kill -KILL 1234 +kill -s KILL 1234 + +# Send to process group +kill -TERM -1234 + +# List all signals +kill -l + +# Signal name from exit status +kill -l 137 +# → KILL (137 - 128 = 9 = SIGKILL) + +# Real-time signal +kill -s RTMIN+3 1234 +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All signals sent successfully | +| 1 | Error sending signal to at least one process | +| 2 | Usage error | diff --git a/docs/handbook/corebinutils/ln.md b/docs/handbook/corebinutils/ln.md new file mode 100644 index 0000000000..848ebb72c0 --- /dev/null +++ b/docs/handbook/corebinutils/ln.md @@ -0,0 +1,190 @@ +# ln — Make Links + +## Overview + +`ln` creates hard links or symbolic links between files. It supports +interactive prompting, forced overwriting, verbose output, and optional +warnings for missing symbolic link targets. + +**Source**: `ln/ln.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +ln [-sfhivFLPnw] source_file [target_file] +ln [-sfhivFLPnw] source_file ... target_dir +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-s` | Create symbolic links instead of hard links | +| `-f` | Force: remove existing target files | +| `-i` | Interactive: prompt before overwriting | +| `-n` | Don't follow symlinks on the target | +| `-v` | Verbose: print each link created | +| `-w` | Warn if symbolic link target does not exist | +| `-h` | Don't follow symlink if target is a symlink to a directory | +| `-F` | Remove existing target directory before linking | +| `-L` | Follow symlinks on the source | +| `-P` | Don't follow symlinks on the source (default for hard links) | + +## Source Analysis + +### Data Structures + +```c +struct ln_options { + bool force; /* -f: remove existing targets */ + bool remove_dir; /* -F: remove existing directories */ + bool no_target_follow; /* -n/-h: don't follow target symlinks */ + bool interactive; /* -i: prompt before replace */ + bool follow_source_symlink; /* -L: follow source symlinks */ + bool symbolic; /* -s: create symlinks */ + bool verbose; /* -v: print actions */ + bool warn_missing; /* -w: warn on missing symlink target */ + int linkch; /* Function: link or symlink */ +}; +``` + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options, determine single vs. multi-target mode | +| `linkit()` | Create one link (core logic) | +| `remove_existing_target()` | Unlink or rmdir existing target | +| `samedirent()` | Check if source and target are the same file | +| `should_append_basename()` | Determine if target is a directory | +| `stat_parent_dir()` | Stat the parent directory of a path | +| `warn_missing_symlink_source()` | Check if symlink target exists | +| `prompt_replace()` | Interactive yes/no prompt | + +### Core Linking Logic + +```c +static int +linkit(const char *source, const char *target, + const struct ln_options *opts) +{ + /* Check if target already exists */ + if (lstat(target, &sb) == 0) { + /* Same file check */ + if (samedirent(source, target)) { + warnx("%s and %s are the same", source, target); + return 1; + } + + /* Interactive prompt */ + if (opts->interactive && !prompt_replace(target)) + return 0; + + /* Remove existing target */ + if (opts->force || opts->interactive) + remove_existing_target(target, opts); + } + + /* Create the link */ + if (opts->symbolic) { + if (symlink(source, target) < 0) { + warn("symlink %s -> %s", target, source); + return 1; + } + } else { + if (link(source, target) < 0) { + warn("link %s -> %s", target, source); + return 1; + } + } + + /* Warn about dangling symlinks */ + if (opts->symbolic && opts->warn_missing) + warn_missing_symlink_source(source, target); + + /* Verbose output */ + if (opts->verbose) + printf("%s -> %s\n", target, source); + + return 0; +} +``` + +### Sameness Detection + +```c +static int +samedirent(const char *source, const char *target) +{ + struct stat sb_src, sb_tgt; + + if (stat(source, &sb_src) < 0) + return 0; + if (stat(target, &sb_tgt) < 0) + return 0; + + return (sb_src.st_dev == sb_tgt.st_dev && + sb_src.st_ino == sb_tgt.st_ino); +} +``` + +### Target Resolution + +When the target is a directory, the source basename is appended: + +```c +static int +should_append_basename(const char *target, + const struct ln_options *opts) +{ + struct stat sb; + int (*statfn)(const char *, struct stat *); + + /* -n/-h: use lstat to not follow target symlinks */ + statfn = opts->no_target_follow ? lstat : stat; + + if (statfn(target, &sb) == 0 && S_ISDIR(sb.st_mode)) + return 1; + return 0; +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `link(2)` | Create hard link | +| `symlink(2)` | Create symbolic link | +| `lstat(2)` | Stat without following symlinks | +| `stat(2)` | Stat following symlinks | +| `unlink(2)` | Remove existing target | +| `rmdir(2)` | Remove existing target directory (`-F`) | +| `readlink(2)` | Resolve symlink for display | + +## Examples + +```sh +# Hard link +ln file1.txt file2.txt + +# Symbolic link +ln -s /usr/local/bin/python3 /usr/bin/python + +# Force overwrite +ln -sf new_target link_name + +# Verbose with warning +ln -svw /opt/myapp/bin/app /usr/local/bin/app + +# Multiple files into directory +ln -s file1 file2 file3 /tmp/links/ +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All links created successfully | +| 1 | Error creating one or more links | diff --git a/docs/handbook/corebinutils/ls.md b/docs/handbook/corebinutils/ls.md new file mode 100644 index 0000000000..e6c6314170 --- /dev/null +++ b/docs/handbook/corebinutils/ls.md @@ -0,0 +1,314 @@ +# ls — List Directory Contents + +## Overview + +`ls` lists files and directory contents with extensive formatting, +sorting, filtering, and colorization options. This implementation uses +the Linux `statx(2)` syscall for file metadata (including birth time), +and provides column, long, stream, and single-column layout modes. + +**Source**: `ls/ls.c`, `ls/ls.h`, `ls/print.c`, `ls/cmp.c`, `ls/util.c`, +`ls/extern.h` +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +ls [-ABCFGHILPRSTUWabcdfghiklmnopqrstuvwxy1,] [--color[=when]] + [--group-directories-first] [file ...] +``` + +## Source Architecture + +### File Responsibilities + +| File | Purpose | Key Functions | +|------|---------|---------------| +| `ls.c` | Main, option parsing, directory traversal | `main()`, `parse_options()`, `collect_directory_entries()` | +| `ls.h` | Type definitions, enums, structs | `layout_mode`, `sort_mode`, `time_field` | +| `print.c` | Output formatting for all layout modes | `printlong()`, `printcol()`, `printstream()` | +| `cmp.c` | Sorting comparators | `namecmp()`, `mtimecmp()`, `sizecmp()` | +| `util.c` | Helper functions | `emalloc()`, `printescaped()` | +| `extern.h` | Function prototypes across files | All cross-file declarations | + +### Enums (ls.h) + +```c +enum layout_mode { + SINGLE, /* One file per line (-1) */ + COLUMNS, /* Multi-column (-C) */ + LONG, /* Long listing (-l) */ + STREAM, /* Comma-separated (-m) */ +}; + +enum sort_mode { + BY_NAME, /* Default alphabetical */ + BY_TIME, /* -t: modification/access/birth/change time */ + BY_SIZE, /* -S: file size */ + BY_VERSION, /* --sort=version: version number sort */ + UNSORTED, /* -f: no sorting */ +}; + +enum time_field { + MTIME, /* Modification time (default) */ + ATIME, /* Access time (-u) */ + BTIME, /* Birth/creation time (-U) */ + CTIME, /* Inode change time (-c) */ +}; + +enum follow_mode { + FOLLOW_NEVER, /* -P: never follow symlinks */ + FOLLOW_ALWAYS, /* -L: always follow */ + FOLLOW_CMDLINE, /* -H: follow on command line only */ +}; + +enum color_mode { + COLOR_NEVER, + COLOR_ALWAYS, + COLOR_AUTO, /* Only when stdout is a TTY */ +}; +``` + +### File Time Struct + +```c +struct file_time { + struct timespec ts; + bool available; /* False if filesystem doesn't support it */ +}; +``` + +### statx(2) Integration + +Since musl libc may not provide `statx` wrappers, `ls` defines the +syscall interface inline: + +```c +static int +linux_statx(int dirfd, const char *path, int flags, + unsigned int mask, struct statx *stx) +{ + return syscall(__NR_statx, dirfd, path, flags, mask, stx); +} +``` + +This enables birth time (`btime`) on filesystems that support it +(ext4, btrfs, XFS) where traditional `stat(2)` does not expose it. + +### Option Parsing + +```c +static const char *optstring = + "ABCFGHILPRSTUWabcdfghiklmnopqrstuvwxy1,"; + +static void +parse_options(int argc, char *argv[]) +{ + /* Short options via getopt(3) */ + while ((ch = getopt_long(argc, argv, optstring, + long_options, NULL)) != -1) { + switch (ch) { + case 'l': layout = LONG; break; + case 'C': layout = COLUMNS; break; + case '1': layout = SINGLE; break; + case 'm': layout = STREAM; break; + case 't': sort = BY_TIME; break; + case 'S': sort = BY_SIZE; break; + case 'r': reverse = true; break; + case 'a': show_hidden = ALL; break; + case 'A': show_hidden = ALMOST_ALL; break; + case 'R': recurse = true; break; + /* ... more options ... */ + } + } +} +``` + +### Long Options + +| Long Option | Description | +|-------------|-------------| +| `--color[=when]` | Colorize output (always/auto/never) | +| `--group-directories-first` | Sort directories before files | +| `--sort=version` | Version-number sort | + +### Directory Traversal + +```c +static void +collect_directory_entries(const char *dir, struct entry_list *list) +{ + DIR *dp = opendir(dir); + struct dirent *ent; + + while ((ent = readdir(dp)) != NULL) { + /* Skip . and .. (unless -a) */ + if (!show_hidden && ent->d_name[0] == '.') + continue; + + struct entry *e = alloc_entry(ent->d_name); + stat_with_policy(dir, e); + list_append(list, e); + } + closedir(dp); +} +``` + +### Recursive Listing + +```c +static void +list_directory(const char *path, int depth) +{ + collect_directory_entries(path, &entries); + sort_entries(&entries); + display_entries(&entries); + + if (recurse) { + for (each entry that is a directory) { + if (should_recurse(entry)) { + /* Cycle detection: check device/inode */ + if (visit_stack_contains(entry->ino, entry->dev)) + warnx("cycle detected: %s", path); + else + list_directory(full_path, depth + 1); + } + } + } +} +``` + +### Birth Time via statx + +```c +static void +fill_birthtime(struct entry *e, const struct statx *stx) +{ + if (stx->stx_mask & STATX_BTIME) { + e->btime.ts.tv_sec = stx->stx_btime.tv_sec; + e->btime.ts.tv_nsec = stx->stx_btime.tv_nsec; + e->btime.available = true; + } else { + e->btime.available = false; + } +} +``` + +### Sorting (cmp.c) + +Comparators are selected based on the sort mode and direction: + +```c +int namecmp(const struct entry *a, const struct entry *b); +int mtimecmp(const struct entry *a, const struct entry *b); +int atimecmp(const struct entry *a, const struct entry *b); +int btimecmp(const struct entry *a, const struct entry *b); +int ctimecmp(const struct entry *a, const struct entry *b); +int sizecmp(const struct entry *a, const struct entry *b); +``` + +All comparators fall back to `namecmp()` for stable ordering when +primary keys are equal. + +### Output Formatting (print.c) + +| Function | Layout Mode | +|----------|-------------| +| `printlong()` | `-l` long listing with permissions, owner, size, date | +| `printcol()` | `-C` multi-column (default for TTY) | +| `printstream()` | `-m` comma-separated stream | +| `printsingle()` | `-1` one per line (default for pipe) | + +Human-readable sizes (`-h`) format with K, M, G, T suffixes. + +## Full Options Reference + +| Flag | Description | +|------|-------------| +| `-a` | Show all entries (including `.` and `..`) | +| `-A` | Show almost all (exclude `.` and `..`) | +| `-b` | Print C-style escapes for non-printable chars | +| `-C` | Multi-column output (default if TTY) | +| `-c` | Use ctime (inode change time) for sorting/display | +| `-d` | List directories themselves, not contents | +| `-F` | Append type indicator (`/`, `*`, `@`, `=`, `%`, `\|`) | +| `-f` | Unsorted, show all | +| `-G` | Colorize output (same as `--color=auto`) | +| `-g` | Long format without owner | +| `-H` | Follow symlinks on command line | +| `-h` | Human-readable sizes | +| `-I` | Suppress auto-column mode | +| `-i` | Print inode number | +| `-k` | Use 1024-byte blocks | +| `-L` | Follow all symlinks | +| `-l` | Long listing format | +| `-m` | Stream (comma-separated) output | +| `-n` | Numeric UID/GID | +| `-o` | Long format without group | +| `-P` | Never follow symlinks | +| `-p` | Append `/` to directories | +| `-q` | Replace non-printable with `?` | +| `-R` | Recursive listing | +| `-r` | Reverse sort order | +| `-S` | Sort by size (largest first) | +| `-s` | Print block count | +| `-T` | Show complete time information | +| `-t` | Sort by time | +| `-U` | Use birth time | +| `-u` | Use access time | +| `-v` | Sort version numbers naturally | +| `-w` | Force raw (non-printable) output | +| `-x` | Multi-column sorted across | +| `-y` | Sort by extension | +| `-1` | One entry per line | +| `,` | Thousands separator in sizes | + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `statx(2)` | File metadata including birth time | +| `stat(2)` / `lstat(2)` | Fallback file metadata | +| `opendir(3)` / `readdir(3)` | Directory enumeration | +| `readlink(2)` | Resolve symlink targets | +| `ioctl(TIOCGWINSZ)` | Terminal width detection | +| `isatty(3)` | Detect if stdout is a terminal | +| `getpwuid(3)` / `getgrgid(3)` | User/group name lookup | + +## Linux-Specific Notes + +- Uses `statx(2)` directly via `syscall()` for birth time support +- Defines `struct statx` inline for musl compatibility +- No BSD file flags (`-o`, `-W` not supported) +- No MAC label support (`-Z` not supported) + +## Examples + +```sh +# Long listing +ls -la + +# Human-readable, sorted by size +ls -lhS + +# Recursive with color +ls -R --color=auto + +# Sort by modification time +ls -lt + +# Show birth time (on supporting filesystems) +ls -lU + +# Directories first +ls --group-directories-first +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Minor problem (cannot access one file) | +| 2 | Serious trouble (cannot access command line argument) | diff --git a/docs/handbook/corebinutils/mkdir.md b/docs/handbook/corebinutils/mkdir.md new file mode 100644 index 0000000000..dfdbf75d27 --- /dev/null +++ b/docs/handbook/corebinutils/mkdir.md @@ -0,0 +1,194 @@ +# mkdir — Make Directories + +## Overview + +`mkdir` creates directories with specified permissions. It shares the +`mode_compile()` / `mode_apply()` engine with `chmod` for parsing +symbolic and numeric mode specifications. With `-p`, it creates all +missing intermediate directories. + +**Source**: `mkdir/mkdir.c`, `mkdir/mode.c`, `mkdir/mode.h` (shared with chmod) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +mkdir [-pv] [-m mode] directory ... +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-m mode` | Set permissions (numeric or symbolic) | +| `-p` | Create parent directories as needed | +| `-v` | Print each directory as it is created | + +## Source Analysis + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options and iterate over arguments | +| `create_single_path()` | Create one directory (no `-p`) | +| `create_parents_path()` | Create directory with parents (`-p`) | +| `create_component()` | Create a single path component | +| `mkdir_with_umask()` | Atomically apply umask during `mkdir(2)` | +| `existing_directory()` | Check if a path already exists as a directory | +| `current_umask()` | Atomically read the current umask | +| `mode_compile()` | Parse mode string to command array (shared) | +| `mode_apply()` | Apply compiled mode to existing permissions | + +### Simple Creation + +```c +static int +create_single_path(const char *path, mode_t mode) +{ + if (mkdir(path, mode) < 0) { + error_errno("cannot create directory '%s'", path); + return 1; + } + + /* If explicit mode was specified, chmod to override umask */ + if (explicit_mode) { + if (chmod(path, mode) < 0) { + error_errno("cannot set permissions on '%s'", path); + return 1; + } + } + + if (verbose) + printf("mkdir: created directory '%s'\n", path); + + return 0; +} +``` + +### Parent Directory Creation + +```c +static int +create_parents_path(const char *path, mode_t mode, + mode_t intermediate_mode) +{ + char *buf = strdup(path); + char *p = buf; + + /* Skip leading slashes */ + while (*p == '/') p++; + + /* Create each component */ + while (*p) { + char *slash = strchr(p, '/'); + if (slash) *slash = '\0'; + + if (!existing_directory(buf)) { + if (mkdir_with_umask(buf, intermediate_mode) < 0) { + if (errno != EEXIST) { + error_errno("cannot create '%s'", buf); + return 1; + } + } + if (verbose) + printf("mkdir: created directory '%s'\n", buf); + } + + if (slash) { + *slash = '/'; + p = slash + 1; + } else { + break; + } + } + + /* Apply final mode to the leaf directory */ + if (chmod(buf, mode) < 0) { ... } + return 0; +} +``` + +### Atomic Umask Handling + +To prevent race conditions when setting permissions: + +```c +static mode_t +current_umask(void) +{ + /* Atomically read umask by setting and restoring */ + mode_t mask = umask(0); + umask(mask); + return mask; +} + +static int +mkdir_with_umask(const char *path, mode_t mode) +{ + /* Use more restrictive intermediate perms: + * u+wx so the creator can write subdirs */ + mode_t old = umask(0); + int ret = mkdir(path, mode); + umask(old); + return ret; +} +``` + +Intermediate directories are created with `0300 | mode` to ensure the +creating user always has write and execute access to create children, +even if the specified mode is more restrictive. + +### Mode Compilation (Shared with chmod) + +```c +/* Numeric modes */ +mkdir -m 755 mydir +/* → mode_compile("755") returns compiled bitcmd array */ + +/* Symbolic modes */ +mkdir -m u=rwx,g=rx,o=rx mydir +/* → mode_compile("u=rwx,g=rx,o=rx") */ + +/* Default mode */ +/* 0777 & ~umask (typically 0755) */ +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `mkdir(2)` | Create directory | +| `chmod(2)` | Set final permissions | +| `umask(2)` | Read/set file creation mask | +| `stat(2)` | Check if path exists | + +## Examples + +```sh +# Simple directory +mkdir mydir + +# With specific permissions +mkdir -m 700 private_dir + +# Create parent directories +mkdir -p /opt/myapp/lib/plugins + +# Verbose +mkdir -pv a/b/c +# mkdir: created directory 'a' +# mkdir: created directory 'a/b' +# mkdir: created directory 'a/b/c' + +# Symbolic mode +mkdir -m u=rwx,g=rx,o= restricted_dir +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All directories created successfully | +| 1 | Error creating one or more directories | diff --git a/docs/handbook/corebinutils/mv.md b/docs/handbook/corebinutils/mv.md new file mode 100644 index 0000000000..c6b7369301 --- /dev/null +++ b/docs/handbook/corebinutils/mv.md @@ -0,0 +1,285 @@ +# mv — Move (Rename) Files + +## Overview + +`mv` moves or renames files and directories. When the source and target +are on the same filesystem, it uses `rename(2)`. When they are on +different filesystems, it performs a copy-and-remove fallback with +extended attribute preservation. + +**Source**: `mv/mv.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +mv [-finv] source target +mv [-finv] source ... directory +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-f` | Force: do not prompt before overwriting | +| `-i` | Interactive: prompt before overwriting | +| `-n` | No clobber: do not overwrite existing files | +| `-v` | Verbose: print each file as it is moved | + +## Source Analysis + +### Data Structures + +```c +struct mv_options { + bool force; /* -f: overwrite without asking */ + bool interactive; /* -i: prompt before overwrite */ + bool no_clobber; /* -n: never overwrite */ + bool no_target_dir_follow; /* Don't follow target symlinks */ + bool verbose; /* -v: display moves */ +}; + +struct move_target { + char *path; + struct stat sb; + bool is_directory; +}; +``` + +### Constants + +```c +#define MV_EXIT_ERROR 1 +#define MV_EXIT_USAGE 2 + +#define COPY_BUFFER_MIN (128 * 1024) /* 128 KB */ +#define COPY_BUFFER_MAX (2 * 1024 * 1024) /* 2 MB */ +``` + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options, determine single vs. multi-target | +| `handle_single_move()` | Move one source to one target | +| `apply_existing_target_policy()` | Handle `-f`, `-i`, `-n` logic | +| `copy_move_fallback()` | Cross-device copy+remove | +| `copy_file_data()` | Buffer-based data copy | +| `copy_file_xattrs()` | Preserve extended attributes | +| `copy_directory_tree()` | Recursive directory copy | +| `apply_path_metadata()` | Set ownership, permissions, timestamps | +| `remove_source_tree()` | Remove original after copy | + +### Core Move Logic + +```c +static int +handle_single_move(const char *source, const char *target, + const struct mv_options *opts) +{ + /* Check for self-move */ + struct stat src_sb, tgt_sb; + if (stat(source, &src_sb) < 0) + return MV_EXIT_ERROR; + + /* Handle existing target */ + if (lstat(target, &tgt_sb) == 0) { + /* Same file? (device + inode) */ + if (src_sb.st_dev == tgt_sb.st_dev && + src_sb.st_ino == tgt_sb.st_ino) { + warnx("'%s' and '%s' are the same file", source, target); + return MV_EXIT_ERROR; + } + + /* Apply -f/-i/-n policy */ + int policy = apply_existing_target_policy(target, &tgt_sb, opts); + if (policy != 0) + return policy; + } + + /* Try rename(2) first — fast path */ + if (rename(source, target) == 0) { + if (opts->verbose) + printf("'%s' -> '%s'\n", source, target); + return 0; + } + + /* Cross-device: copy then remove */ + if (errno == EXDEV) + return copy_move_fallback(source, target, &src_sb, opts); + + warn("rename '%s' to '%s'", source, target); + return MV_EXIT_ERROR; +} +``` + +### Cross-Device Copy Fallback + +When `rename(2)` fails with `EXDEV` (different filesystems): + +```c +static int +copy_move_fallback(const char *source, const char *target, + const struct stat *src_sb, + const struct mv_options *opts) +{ + if (S_ISDIR(src_sb->st_mode)) { + /* Recursive directory copy */ + if (copy_directory_tree(source, target) != 0) + return MV_EXIT_ERROR; + } else { + /* Regular file copy */ + if (copy_file_data(source, target) != 0) + return MV_EXIT_ERROR; + } + + /* Preserve metadata */ + apply_path_metadata(target, src_sb); + + /* Preserve extended attributes */ + copy_file_xattrs(source, target); + + /* Remove original */ + remove_source_tree(source, src_sb); + + if (opts->verbose) + printf("'%s' -> '%s'\n", source, target); + + return 0; +} +``` + +### Adaptive Buffer Sizing + +```c +static int +copy_file_data(const char *source, const char *target) +{ + /* Allocate buffer based on available memory */ + size_t bufsize = COPY_BUFFER_MAX; + char *buf = NULL; + + while (bufsize >= COPY_BUFFER_MIN) { + buf = malloc(bufsize); + if (buf) break; + bufsize /= 2; + } + + int src_fd = open(source, O_RDONLY); + int tgt_fd = open(target, O_WRONLY | O_CREAT | O_TRUNC, 0666); + + ssize_t n; + while ((n = read(src_fd, buf, bufsize)) > 0) { + if (write_all(tgt_fd, buf, n) < 0) { + warn("write '%s'", target); + return -1; + } + } + + free(buf); + close(src_fd); + close(tgt_fd); + return 0; +} +``` + +### Extended Attribute Preservation + +```c +#include <sys/xattr.h> + +static void +copy_file_xattrs(const char *source, const char *target) +{ + ssize_t list_len = listxattr(source, NULL, 0); + if (list_len <= 0) + return; + + char *list = malloc(list_len); + listxattr(source, list, list_len); + + for (char *name = list; name < list + list_len; + name += strlen(name) + 1) { + ssize_t val_len = getxattr(source, name, NULL, 0); + if (val_len < 0) continue; + + char *val = malloc(val_len); + getxattr(source, name, val, val_len); + setxattr(target, name, val, val_len, 0); + free(val); + } + + free(list); +} +``` + +### Metadata Preservation + +```c +static void +apply_path_metadata(const char *target, const struct stat *sb) +{ + /* Ownership */ + chown(target, sb->st_uid, sb->st_gid); + + /* Permissions */ + chmod(target, sb->st_mode); + + /* Timestamps */ + struct timespec times[2] = { + sb->st_atim, /* Access time */ + sb->st_mtim, /* Modification time */ + }; + utimensat(AT_FDCWD, target, times, 0); +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `rename(2)` | Same-device move (fast path) | +| `read(2)` / `write(2)` | Cross-device data copy | +| `stat(2)` / `lstat(2)` | File metadata | +| `chown(2)` | Preserve ownership | +| `chmod(2)` | Preserve permissions | +| `utimensat(2)` | Preserve timestamps | +| `listxattr(2)` | List extended attributes | +| `getxattr(2)` / `setxattr(2)` | Copy extended attributes | +| `unlink(2)` / `rmdir(2)` | Remove source after copy | + +## Examples + +```sh +# Rename a file +mv old.txt new.txt + +# Move into directory +mv file.txt /tmp/ + +# Interactive mode +mv -i important.txt /backup/ + +# No clobber +mv -n *.txt /archive/ + +# Verbose +mv -v file1 file2 file3 /dest/ +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All moves successful | +| 1 | Error during move | +| 2 | Usage error | + +## Differences from GNU mv + +- No `--backup` / `-b` option +- No `--suffix` / `-S` +- No `--target-directory` / `-t` +- No `--update` / `-u` +- Simpler cross-device fallback without sparse file optimization diff --git a/docs/handbook/corebinutils/overview.md b/docs/handbook/corebinutils/overview.md new file mode 100644 index 0000000000..0ac41deee5 --- /dev/null +++ b/docs/handbook/corebinutils/overview.md @@ -0,0 +1,362 @@ +# Corebinutils — Overview + +## What Is Corebinutils? + +Corebinutils is Project Tick's collection of core command-line utilities, ported +from FreeBSD and adapted for Linux with musl libc. It provides the foundational +user-space programs that every Unix system needs — file manipulation, process +control, text processing, and system information tools — built from +battle-tested FreeBSD sources rather than GNU coreutils. + +The project targets a clean, auditable, BSD-licensed alternative to the GNU +toolchain. Every utility compiles against musl libc by default, producing +statically-linkable binaries with minimal dependencies. + +## Heritage and Licensing + +All utilities derive from FreeBSD's `/usr/src/bin/` tree, carrying BSD +3-Clause or BSD 2-Clause licenses from the original Berkeley and FreeBSD +contributors. Project Tick's modifications (Copyright 2026) maintain the same +licensing terms. No GPL-licensed code is present in the tree. + +The copyright headers trace a direct lineage: + +``` +Copyright (c) 1989, 1993, 1994 + The Regents of the University of California. All rights reserved. +Copyright (c) 2026 + Project Tick. All rights reserved. +``` + +Key contributors acknowledged across the codebase include Keith Muller (dd), +Andrew Moore (ed), Michael Fischbein (ls), Ken Smith (mv), and Lance Visser +(dd). + +## Design Philosophy + +### Linux-Native, Not Compatibility Layers + +Unlike many BSD-to-Linux ports that ship a compatibility shim library, +corebinutils rewrites platform-specific code using native Linux APIs: + +- **`/proc/self/mountinfo`** replaces BSD `getmntinfo(3)` in `df` +- **`statx(2)`** replaces BSD `stat(2)` for birth time in `ls` +- **`sched_getaffinity(2)`** replaces BSD `cpuset_getaffinity(2)` in `nproc` +- **`sethostname(2)` from `<unistd.h>`** replaces BSD kernel calls in `hostname` +- **`prctl(PR_SET_CHILD_SUBREAPER)`** replaces BSD `procctl` in `timeout` +- **`fdopendir(3)` + `readdir(3)`** replaces BSD FTS functions in `rm` + +### musl-First Toolchain + +The build system preferentially selects musl-based compilers. The configure +script tries, in order: + +1. `musl-clang` +2. `clang --target=<arch>-linux-musl` +3. `clang --target=<arch>-unknown-linux-musl` +4. `musl-gcc` +5. `clang` (generic) +6. `cc` +7. `gcc` + +If a glibc toolchain is detected, configure refuses to proceed unless +`--allow-glibc` is explicitly passed. + +### No External Dependencies + +Core utilities have zero runtime dependencies beyond libc. Optional features +(readline in `csh`, crypto in `ed`) probe for system libraries at configure +time but degrade gracefully when absent. + +## Complete Utility List + +### File Operations + +| Utility | Description | Complexity | Source Files | +|-----------|--------------------------------------|------------|-------------| +| `cat` | Concatenate and display files | Simple | 1 `.c` | +| `cp` | Copy files and directory trees | Medium | 3+ `.c` | +| `dd` | Block-level data copying/conversion | Complex | 8+ `.c` | +| `ln` | Create hard and symbolic links | Medium | 1 `.c` | +| `mv` | Move/rename files and directories | Medium | 1 `.c` | +| `rm` | Remove files and directories | Medium | 1 `.c` | +| `rmdir` | Remove empty directories | Simple | 1 `.c` | + +### Directory Operations + +| Utility | Description | Complexity | Source Files | +|-------------|------------------------------------|------------|-------------| +| `ls` | List directory contents | Complex | 5+ `.c` | +| `mkdir` | Create directories | Medium | 2 `.c` | +| `pwd` | Print working directory | Simple | 1 `.c` | +| `realpath` | Canonicalize file paths | Simple | 1 `.c` | + +### Permission and Attribute Management + +| Utility | Description | Complexity | Source Files | +|-------------|------------------------------------|------------|-------------| +| `chmod` | Change file permissions | Medium | 2 `.c` | +| `chflags` | Change file flags (BSD compat) | Medium | 4 `.c` | +| `getfacl` | Display file ACLs | Medium | 1 `.c` | +| `setfacl` | Set file ACLs | Medium | 1 `.c` | + +### Process Management + +| Utility | Description | Complexity | Source Files | +|-------------|------------------------------------|------------|-------------| +| `kill` | Send signals to processes | Medium | 1 `.c` | +| `ps` | List running processes | Complex | 6+ `.c` | +| `pkill` | Signal processes by name/attribute | Medium | 1+ `.c` | +| `pwait` | Wait for process termination | Simple | 1 `.c` | +| `timeout` | Run command with time limit | Medium | 1 `.c` | + +### Text Processing + +| Utility | Description | Complexity | Source Files | +|-----------|--------------------------------------|------------|-------------| +| `echo` | Write arguments to stdout | Simple | 1 `.c` | +| `ed` | Line-oriented text editor | Complex | 10+ `.c` | +| `expr` | Evaluate expressions | Medium | 1 `.c` | +| `test` | Conditional expression evaluation | Medium | 1 `.c` | + +### Date and Time + +| Utility | Description | Complexity | Source Files | +|-----------|--------------------------------------|------------|-------------| +| `date` | Display/set system date and time | Medium | 2 `.c` | +| `sleep` | Pause for specified duration | Simple | 1 `.c` | + +### System Information + +| Utility | Description | Complexity | Source Files | +|------------------|---------------------------------|------------|-------------| +| `df` | Report filesystem space usage | Complex | 1 `.c` | +| `hostname` | Get/set system hostname | Simple | 1 `.c` | +| `domainname` | Get/set NIS domain name | Simple | 1 `.c` | +| `nproc` | Count available processors | Simple | 1 `.c` | +| `freebsd-version`| Show FreeBSD version (compat) | Simple | Shell script| +| `uuidgen` | Generate UUIDs | Simple | 1 `.c` | + +### Shells + +| Utility | Description | Complexity | Source Files | +|---------|--------------------------------------|------------|-------------| +| `sh` | POSIX-compatible shell | Very High | 60+ `.c` | +| `csh` | C-shell (tcsh port) | Very High | 30+ `.c` | + +### Archive and Mail + +| Utility | Description | Complexity | Source Files | +|---------|--------------------------------------|------------|-------------| +| `pax` | POSIX archive utility (tar/cpio) | Complex | 30+ `.c` | +| `rmail` | Remote mail handler | Simple | 1 `.c` | + +### Miscellaneous + +| Utility | Description | Complexity | Source Files | +|-------------|------------------------------------|------------|-------------| +| `sync` | Flush filesystem buffers | Simple | 1 `.c` | +| `stty` | Set terminal characteristics | Medium | 2+ `.c` | +| `cpuset` | CPU affinity management | Medium | 1 `.c` | + +## Shared Components + +The `contrib/` directory provides libraries shared across utilities: + +### `contrib/libc-vis/` +BSD `vis(3)` and `unvis(3)` functions for encoding and decoding special +characters. Used by `ls` for safe filename display and by `pax` for +header encoding. + +### `contrib/libedit/` +BSD `editline(3)` library providing command-line editing with history and +completion support. Used by `csh` and `sh` for interactive input. + +### `contrib/printf/` +Shared `printf` format string processing used by multiple utilities that +need custom format string expansion beyond standard `printf(3)`. + +## Project Structure + +``` +corebinutils/ +├── configure # Top-level configure script (POSIX sh) +├── README.md # Build instructions +├── .gitattributes # Git configuration +├── .gitignore # Build artifact exclusions +├── contrib/ # Shared libraries +│ ├── libc-vis/ # vis(3)/unvis(3) +│ ├── libedit/ # editline(3) +│ └── printf/ # Shared printf helpers +├── cat/ # Each utility in its own directory +│ ├── cat.c # Main source +│ ├── GNUmakefile # Per-utility build rules +│ ├── cat.1 # Manual page +│ └── README.md # Port-specific notes +├── chmod/ +│ ├── chmod.c +│ ├── mode.c # Shared mode parsing library +│ ├── mode.h +│ └── GNUmakefile +├── ... # (33 utility directories total) +└── sh/ # Full POSIX shell (60+ source files) +``` + +## Utility Complexity Classification + +### Tier 1 — Simple (1 source file, <500 lines) + +`cat`, `echo`, `hostname`, `domainname`, `nproc`, `pwd`, `realpath`, `rmdir`, +`sleep`, `sync`, `uuidgen`, `pwait` + +These utilities typically have a `main()` function that parses options with +`getopt(3)`, performs a single system call, and exits. Error handling follows +the `err(3)`/`warn(3)` pattern. + +### Tier 2 — Medium (1-3 source files, 500-2000 lines) + +`chmod` (with `mode.c`), `cp` (with `utils.c`, `fts.c`), `date` (with +`vary.c`), `kill`, `ln`, `mkdir` (with `mode.c`), `mv`, `rm`, `test`, +`timeout`, `expr`, `df` + +These utilities involve more complex option parsing, recursive directory +traversal, or multi-step algorithms. They share code through header files +and sometimes reuse `mode.c`/`mode.h`. + +### Tier 3 — Complex (5+ source files, 2000+ lines) + +`dd` (8 source files), `ed` (10 source files), `ls` (5 source files), +`ps` (6 source files), `pax` (30+ source files) + +These are substantial programs with their own internal architecture: +- `dd`: argument parser, conversion engine, signal handling, I/O position logic +- `ed`: command parser, buffer manager, regex engine, undo system +- `ls`: stat engine, sort/compare, print/format, ANSI color +- `ps`: /proc parser, format string engine, process filter, output formatter + +### Tier 4 — Shells (30-60+ source files) + +`sh` and `csh` are full POSIX-compatible shells with lexers, parsers, job +control, signal handling, built-in commands, and editline integration. + +## Key Differences from GNU Coreutils + +| Feature | Corebinutils (BSD) | GNU Coreutils | +|------------------------|-----------------------------|----------------------------| +| License | BSD-3-Clause / BSD-2-Clause | GPL-3.0 | +| Default libc | musl | glibc | +| `echo` behavior | No `-e` flag (BSD compat) | `-e` for escape sequences | +| `test` parser | Recursive descent | Varies by implementation | +| `ls` birth time | `statx(2)` syscall | `statx(2)` or fallback | +| `dd` progress | SIGINFO + `status=progress` | `status=progress` | +| `sleep` units | `s`, `m`, `h`, `d` suffixes | `s`, `m`, `h`, `d` (GNU ext)| +| Build system | `./configure` + `GNUmakefile`| Autotools (autoconf/automake)| +| Error functions | `err(3)`/`warn(3)` from libc| `error()` from gnulib | +| FTS implementation | In-tree custom `fts.c` | gnulib FTS or `nftw(3)` | + +## Signal Handling Conventions + +Most utilities follow a consistent signal handling pattern: + +- **SIGINFO / SIGUSR1**: Progress reporting. `dd`, `chmod`, `sleep`, and + others install a handler that sets a `volatile sig_atomic_t` flag, which + the main loop checks to print status information. + +- **SIGINT**: Graceful termination. Utilities performing recursive operations + check for pending signals between iterations. + +- **SIGHUP**: In `ed`, triggers an emergency save of the edit buffer to a + temporary file. + +Signal handlers are installed via `sigaction(2)` rather than the legacy +`signal(2)` function, ensuring reliable semantics across platforms. + +## Error Handling Patterns + +All utilities exit with standardized codes: + +| Exit Code | Meaning | +|-----------|------------------------------------------| +| 0 | Success | +| 1 | General failure | +| 2 | Usage error (invalid arguments) | +| 124 | Command timed out (`timeout` only) | +| 125 | `timeout` internal error | +| 126 | Command found but not executable | +| 127 | Command not found | + +Error messages follow the BSD pattern: +```c +error_errno("open %s", path); // "mv: open /foo: Permission denied" +error_msg("invalid mode: %s", arg); // "chmod: invalid mode: xyz" +``` + +Many utilities provide custom `error_errno()` / `error_msg()` wrappers that +prepend the program name, format the message, and optionally append +`strerror(errno)`. + +## Memory Management + +Corebinutils utilities follow BSD memory conventions: + +- **Dynamic allocation**: `malloc(3)` with explicit `NULL` checks, typically + wrapped in `xmalloc()` that calls `err(1, "malloc")` on failure. +- **No fixed-size buffers** for user-controlled data (paths, format strings). +- **Adaptive buffer sizing**: `cat` and `cp` scale I/O buffers based on + available physical memory via `sysconf(_SC_PHYS_PAGES)`. +- **Explicit cleanup**: `free()` is called in long-running loops to avoid + accumulation, though single-pass utilities may rely on process exit. + +### Buffer Strategy Example (from `cat.c` and `cp/utils.c`): + +```c +#define PHYSPAGES_THRESHOLD (32*1024) +#define BUFSIZE_MAX (2*1024*1024) +#define BUFSIZE_SMALL (128*1024) + +if (sysconf(_SC_PHYS_PAGES) > PHYSPAGES_THRESHOLD) + bufsize = MIN(BUFSIZE_MAX, MAXPHYS * 8); +else + bufsize = BUFSIZE_SMALL; +``` + +## Testing + +Each utility directory may contain its own test suite, invoked through: + +```sh +make -f GNUmakefile test +``` + +Or for a specific utility: + +```sh +make -f GNUmakefile check-cat +make -f GNUmakefile check-ls +``` + +Tests that require root privileges or specific kernel features print `SKIP` +and continue without failing the overall test run. + +## Building Quick Reference + +```sh +cd corebinutils/ +./configure # Detect toolchain, generate build files +make -f GNUmakefile -j$(nproc) all # Build all utilities +make -f GNUmakefile test # Run test suites +make -f GNUmakefile stage # Copy binaries to out/bin/ +make -f GNUmakefile install # Install to $PREFIX/bin +``` + +See [building.md](building.md) for detailed configure options and build +customization. + +## Further Reading + +- [architecture.md](architecture.md) — Build system internals, code organization +- [building.md](building.md) — Configure options, dependencies, cross-compilation +- Individual utility documentation: [cat.md](cat.md), [ls.md](ls.md), + [dd.md](dd.md), [ps.md](ps.md), etc. +- [code-style.md](code-style.md) — C coding conventions +- [error-handling.md](error-handling.md) — Error patterns and exit codes diff --git a/docs/handbook/corebinutils/ps.md b/docs/handbook/corebinutils/ps.md new file mode 100644 index 0000000000..cbbd749a44 --- /dev/null +++ b/docs/handbook/corebinutils/ps.md @@ -0,0 +1,298 @@ +# ps — Process Status + +## Overview + +`ps` displays information about active processes. This implementation +reads process data from the Linux `/proc` filesystem and presents it +through BSD-style format strings. It provides a custom `struct kinfo_proc` +that mirrors FreeBSD's interface while reading from Linux procfs. + +**Source**: `ps/ps.c`, `ps/ps.h`, `ps/fmt.c`, `ps/keyword.c`, +`ps/print.c`, `ps/nlist.c`, `ps/extern.h` +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +ps [-AaCcdefHhjLlMmrSTuvwXxZ] [-D fmt] [-G gid[,gid...]] + [-J jail] [-N system] [-O fmt] [-o fmt] [-p pid[,pid...]] + [-t tty[,tty...]] [-U user[,user...]] [-g group[,group...]] +``` + +## Source Architecture + +### File Responsibilities + +| File | Purpose | +|------|---------| +| `ps.c` | Main program, option parsing, process collection | +| `ps.h` | Data structures, constants, STAILQ macros | +| `fmt.c` | Format string parsing and column management | +| `keyword.c` | Format keyword definitions and lookup table | +| `print.c` | Column value formatters (PID, user, CPU, etc.) | +| `nlist.c` | Name list support (noop on Linux) | +| `extern.h` | Cross-file function declarations | + +### Key Data Structures + +#### Process Information (Linux replacement for BSD kinfo_proc) + +```c +struct kinfo_proc { + pid_t ki_pid; /* Process ID */ + pid_t ki_ppid; /* Parent PID */ + pid_t ki_pgid; /* Process group ID */ + pid_t ki_sid; /* Session ID */ + uid_t ki_uid; /* Real UID */ + uid_t ki_ruid; /* Real UID (copy) */ + uid_t ki_svuid; /* Saved UID */ + gid_t ki_rgid; /* Real GID */ + gid_t ki_svgid; /* Saved GID */ + gid_t ki_groups[KI_NGROUPS]; /* Supplementary groups */ + int ki_ngroups; /* Number of groups */ + dev_t ki_tdev; /* TTY device */ + int ki_flag; /* Process flags */ + int ki_stat; /* Process state */ + char ki_comm[COMMLEN + 1]; /* Command name */ + char ki_wmesg[WMESGLEN + 1]; /* Wait channel */ + int ki_nice; /* Nice value */ + int ki_pri; /* Priority */ + long ki_size; /* Virtual size */ + long ki_rssize; /* Resident size */ + struct timeval ki_start; /* Start time */ + struct timeval ki_rusage; /* Resource usage */ + /* ... additional fields ... */ +}; +``` + +#### KINFO Wrapper + +```c +typedef struct { + struct kinfo_proc *ki_p; + char *ki_args; /* Full command line */ + char *ki_env; /* Environment (if -E) */ + double ki_pcpu; /* Computed %CPU */ + long ki_memsize; /* Computed memory size */ +} KINFO; +``` + +#### Format Variable + +```c +typedef struct { + const char *name; /* Keyword name (e.g., "pid", "user") */ + const char *header; /* Column header (e.g., "PID", "USER") */ + int width; /* Column width */ + int (*sprnt)(KINFO *); /* Print function */ + int flag; /* Format flags */ +} VAR; +``` + +### Constants + +```c +#define COMMLEN 256 /* Max command name length */ +#define WMESGLEN 64 /* Max wait message length */ +#define KI_NGROUPS 16 /* Max supplementary groups tracked */ +``` + +### musl Compatibility + +FreeBSD uses `STAILQ_*` macros extensively, but musl's `<sys/queue.h>` +may not provide them. `ps.h` defines custom implementations: + +```c +#ifndef STAILQ_HEAD +#define STAILQ_HEAD(name, type) \ +struct name { \ + struct type *stqh_first; \ + struct type **stqh_last; \ +} +#define STAILQ_ENTRY(type) \ +struct { \ + struct type *stqe_next; \ +} +#define STAILQ_INIT(head) do { ... } while (0) +#define STAILQ_INSERT_TAIL(head, elm, field) do { ... } while (0) +#define STAILQ_FOREACH(var, head, field) ... +#endif +``` + +### Predefined Format Strings + +```c +/* Default format (-f not specified) */ +const char *dfmt = "pid,tt,stat,time,command"; + +/* Jobs format (-j) */ +const char *jfmt = "user,pid,ppid,pgid,sid,jobc,stat,tt,time,command"; + +/* Long format (-l) */ +const char *lfmt = "uid,pid,ppid,cpu,pri,nice,vsz,rss,wchan,stat,tt,time,command"; + +/* User format (-u) */ +const char *ufmt = "user,pid,%cpu,%mem,vsz,rss,tt,stat,start,time,command"; + +/* Virtual memory format (-v) */ +const char *vfmt = "pid,stat,time,sl,re,pagein,vsz,rss,lim,tsiz,%cpu,%mem,command"; +``` + +### /proc Parsing + +Process data is read from multiple `/proc/[pid]/` files: + +| File | Data Extracted | +|------|----------------| +| `/proc/[pid]/stat` | PID, PPID, PGID, state, priority, nice, threads, start time | +| `/proc/[pid]/status` | UID, GID, groups, memory (VmSize, VmRSS) | +| `/proc/[pid]/cmdline` | Full command line arguments | +| `/proc/[pid]/environ` | Environment variables (if requested) | +| `/proc/[pid]/wchan` | Wait channel name | +| `/proc/[pid]/fd/0` | Controlling TTY detection | + +### Process Filtering + +```c +/* Option string */ +#define PS_ARGS "AaCcD:defG:gHhjJ:LlM:mN:O:o:p:rSTt:U:uvwXxZ" + +struct listinfo { + int count; + int maxcount; + int *list; /* Array of values to match */ + int (*addelem)(struct listinfo *, const char *); +}; +``` + +Filtering by PID, UID, GID, TTY, session, and process group uses +`struct listinfo` with dynamic arrays and element-specific parsers. + +### Column Formatting (keyword.c) + +The keyword table maps format names to print functions: + +```c +static VAR var[] = { + {"pid", "PID", 5, s_pid, 0}, + {"ppid", "PPID", 5, s_ppid, 0}, + {"user", "USER", 8, s_user, 0}, + {"uid", "UID", 5, s_uid, 0}, + {"gid", "GID", 5, s_gid, 0}, + {"%cpu", "%CPU", 4, s_pcpu, 0}, + {"%mem", "%MEM", 4, s_pmem, 0}, + {"vsz", "VSZ", 6, s_vsz, 0}, + {"rss", "RSS", 5, s_rss, 0}, + {"tt", "TT", 3, s_tty, 0}, + {"stat", "STAT", 4, s_stat, 0}, + {"time", "TIME", 8, s_time, 0}, + {"command", "COMMAND", 16, s_command, COMM}, + {"args", "COMMAND", 16, s_args, COMM}, + {"comm", "COMMAND", 16, s_comm, COMM}, + {"nice", "NI", 3, s_nice, 0}, + {"pri", "PRI", 3, s_pri, 0}, + {"wchan", "WCHAN", 8, s_wchan, 0}, + {"start", "STARTED", 8, s_start, 0}, + /* ... more keywords ... */ + {NULL, NULL, 0, NULL, 0}, /* Sentinel */ +}; +``` + +### Global State + +```c +int cflag; /* Raw CPU usage */ +int eval; /* Exit value */ +time_t now; /* Current time */ +int rawcpu; /* Don't compute decay */ +int sumrusage; /* Sum child usage */ +int termwidth; /* Terminal width */ +int showthreads; /* Show threads (-H) */ +int hlines; /* Header repeat interval */ +``` + +## Options Reference + +| Flag | Description | +|------|-------------| +| `-A` / `-e` | All processes | +| `-a` | Processes with terminals (except session leaders) | +| `-C` | Raw CPU percentage | +| `-c` | Show command name only (not full path) | +| `-d` | All except session leaders | +| `-f` | Full format | +| `-G gid` | Filter by real group ID | +| `-g group` | Filter by group name | +| `-H` | Show threads | +| `-h` | Repeat header every screenful | +| `-j` | Jobs format | +| `-L` | Show all threads (LWP) | +| `-l` | Long format | +| `-M` | Display MAC label | +| `-m` | Sort by memory usage | +| `-O fmt` | Add columns to default format | +| `-o fmt` | Custom output format | +| `-p pid` | Filter by PID | +| `-r` | Running processes only | +| `-S` | Include child time | +| `-T` | Show threads for current terminal | +| `-t tty` | Filter by TTY | +| `-U user` | Filter by effective user | +| `-u` | User format | +| `-v` | Virtual memory format | +| `-w` | Wide output | +| `-X` | Skip processes without controlling TTY | +| `-x` | Include processes without controlling TTY | +| `-Z` | Show security context | + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `opendir(3)` / `readdir(3)` | Enumerate `/proc/` PIDs | +| `open(2)` / `read(2)` | Read `/proc/[pid]/*` files | +| `stat(2)` | Get file owner for UID detection | +| `getpwuid(3)` / `getgrgid(3)` | UID/GID to name resolution | +| `ioctl(TIOCGWINSZ)` | Terminal width | +| `sysconf(3)` | Clock ticks, page size | + +## Examples + +```sh +# Default process list +ps + +# All processes, user format +ps aux + +# Full format +ps -ef + +# Custom columns +ps -o pid,user,%cpu,%mem,command + +# Filter by user +ps -U root + +# Jobs format +ps -j + +# Long format with threads +ps -lH +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Error | + +## Linux-Specific Notes + +- Reads from `/proc` filesystem instead of BSD `kvm_getprocs(3)` +- Custom `struct kinfo_proc` replaces BSD's `<sys/user.h>` variant +- STAILQ macros defined inline for musl compatibility +- No jail (`-J`) support on Linux +- No Capsicum sandboxing diff --git a/docs/handbook/corebinutils/pwd.md b/docs/handbook/corebinutils/pwd.md new file mode 100644 index 0000000000..8f584e3357 --- /dev/null +++ b/docs/handbook/corebinutils/pwd.md @@ -0,0 +1,152 @@ +# pwd — Print Working Directory + +## Overview + +`pwd` prints the absolute pathname of the current working directory. +It supports logical mode (using `$PWD`) and physical mode (resolving +symlinks). Logical mode is the default, with fallback to physical +if validation fails. + +**Source**: `pwd/pwd.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +pwd [-L | -P] +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-L` | Logical: use `$PWD` (default) | +| `-P` | Physical: resolve all symlinks | + +When both are specified, the last one wins. + +## Source Analysis + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options and dispatch | +| `getcwd_logical()` | Validate and use `$PWD` | +| `getcwd_physical()` | Resolve via `getcwd(3)` | +| `usage()` | Print usage message | + +### Logical Mode (Default) + +```c +static char * +getcwd_logical(void) +{ + const char *pwd = getenv("PWD"); + + /* Must be set and absolute */ + if (!pwd || pwd[0] != '/') + return NULL; + + /* Must not contain "." or ".." components */ + if (contains_dot_components(pwd)) + return NULL; + + /* Must refer to the same directory as "." */ + struct stat pwd_sb, dot_sb; + if (stat(pwd, &pwd_sb) < 0 || stat(".", &dot_sb) < 0) + return NULL; + if (pwd_sb.st_dev != dot_sb.st_dev || + pwd_sb.st_ino != dot_sb.st_ino) + return NULL; + + return strdup(pwd); +} +``` + +The `$PWD` validation ensures: +1. The value is an absolute path +2. It contains no `.` or `..` components +3. The path resolves to the same inode as `.` + +### Physical Mode + +```c +static char * +getcwd_physical(void) +{ + /* POSIX: getcwd(NULL, 0) dynamically allocates */ + return getcwd(NULL, 0); +} +``` + +Uses the POSIX extension `getcwd(NULL, 0)` which allocates the +returned buffer dynamically, avoiding fixed-size buffer limitations. + +### Main Logic + +```c +int main(int argc, char *argv[]) +{ + int mode = MODE_LOGICAL; /* Default: logical */ + + while ((ch = getopt(argc, argv, "LP")) != -1) { + switch (ch) { + case 'L': mode = MODE_LOGICAL; break; + case 'P': mode = MODE_PHYSICAL; break; + default: usage(); + } + } + + char *cwd; + if (mode == MODE_LOGICAL) { + cwd = getcwd_logical(); + if (!cwd) + cwd = getcwd_physical(); /* Fallback */ + } else { + cwd = getcwd_physical(); + } + + if (!cwd) + err(1, "getcwd"); + + puts(cwd); + free(cwd); + return 0; +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `getcwd(3)` | Physical working directory | +| `stat(2)` | Validate `$PWD` against `.` | +| `getenv(3)` | Read `$PWD` environment variable | + +## Examples + +```sh +# Default (logical) +pwd +# /home/user/projects/mylink (preserves symlink name) + +# Physical +pwd -P +# /home/user/actual/path (resolved symlinks) + +# Demonstrate difference +cd /tmp +ln -s /usr/local/share mylink +cd mylink +pwd -L # → /tmp/mylink +pwd -P # → /usr/local/share +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Error (cannot determine directory) | diff --git a/docs/handbook/corebinutils/realpath.md b/docs/handbook/corebinutils/realpath.md new file mode 100644 index 0000000000..bf7c1d421f --- /dev/null +++ b/docs/handbook/corebinutils/realpath.md @@ -0,0 +1,119 @@ +# realpath — Resolve to Canonical Path + +## Overview + +`realpath` resolves each given pathname to its canonical absolute form +by expanding all symbolic links, resolving `.` and `..` references, +and removing extra `/` characters. + +**Source**: `realpath/realpath.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +realpath [-q] [path ...] +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-q` | Quiet: suppress error messages for non-existent paths | + +## Source Analysis + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options and resolve loop | +| `resolve_path()` | Wrapper around `realpath(3)` | +| `set_progname()` | Extract program name from `argv[0]` | +| `print_line()` | Safe stdout writing | +| `usage()` | Print usage message | +| `warnx_msg()` | Warning without errno | +| `warn_path_errno()` | Warning with errno for path | + +### Core Logic + +```c +static int +resolve_path(const char *path, bool quiet) +{ + char *resolved = realpath(path, NULL); + if (!resolved) { + if (!quiet) + warn("%s", path); + return 1; + } + + puts(resolved); + free(resolved); + return 0; +} +``` + +### Main Loop + +```c +int main(int argc, char *argv[]) +{ + bool quiet = false; + int ch, errors = 0; + + while ((ch = getopt(argc, argv, "q")) != -1) { + switch (ch) { + case 'q': quiet = true; break; + default: usage(); + } + } + argc -= optind; + argv += optind; + + if (argc == 0) + usage(); + + for (int i = 0; i < argc; i++) + errors |= resolve_path(argv[i], quiet); + + return errors ? 1 : 0; +} +``` + +Uses `realpath(path, NULL)` (POSIX.1-2008) for dynamic buffer +allocation, avoiding `PATH_MAX` limitations. + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `realpath(3)` | Canonicalize pathname | + +## Examples + +```sh +# Simple resolution +realpath ../foo/bar +# → /home/user/foo/bar + +# Resolve symlink +ln -s /usr/local/bin target +realpath target +# → /usr/local/bin + +# Quiet mode (no error for missing) +realpath -q /nonexistent/path +# (no output, exit 1) + +# Multiple paths +realpath /tmp/../etc ./relative/path +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All paths resolved successfully | +| 1 | One or more paths could not be resolved | diff --git a/docs/handbook/corebinutils/rm.md b/docs/handbook/corebinutils/rm.md new file mode 100644 index 0000000000..36c92e34ec --- /dev/null +++ b/docs/handbook/corebinutils/rm.md @@ -0,0 +1,293 @@ +# rm — Remove Files and Directories + +## Overview + +`rm` removes files and directories. It supports recursive removal, +interactive prompting, forced deletion, and protects against removal +of `/`, `.`, and `..`. Directory traversal uses `openat(2)` and +`fdopendir(3)` for safe recursive descent. + +**Source**: `rm/rm.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +rm [-dfiIPRrvWx] file ... +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-d` | Remove empty directories (like `rmdir`) | +| `-f` | Force: no prompts, ignore nonexistent files | +| `-i` | Interactive: prompt for each file | +| `-I` | Prompt once before recursive removal or >3 files | +| `-P` | Overwrite before delete (BSD; not on Linux) | +| `-R` / `-r` | Recursive: remove directories and contents | +| `-v` | Verbose: print each file as removed | +| `-W` | Whiteout (BSD union fs; not on Linux) | +| `-x` | Stay on one filesystem | + +## Source Analysis + +### Data Structures + +```c +struct options_t { + bool force; /* -f */ + bool interactive; /* -i */ + bool prompt_once; /* -I */ + bool recursive; /* -R/-r */ + bool remove_empty; /* -d */ + bool verbose; /* -v */ + bool one_fs; /* -x */ + bool stdin_tty; /* Whether stdin is a TTY */ +}; +``` + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options and dispatch | +| `remove_path()` | Remove a single top-level argument | +| `remove_simple_path()` | Remove non-directory file | +| `remove_path_at()` | Recursive removal at directory fd | +| `prompt_for_removal()` | Interactive prompt for single file | +| `prompt_for_directory_descent()` | Prompt before entering directory | +| `prompt_once()` | One-time batch prompt (`-I`) | +| `prompt_yesno()` | Read yes/no from terminal | +| `join_path()` | Path concatenation | +| `path_is_writable()` | Check write access | + +### Safety Checks + +```c +static int +remove_path(const char *path, const struct options_t *opts) +{ + /* Reject "/" */ + if (strcmp(path, "/") == 0) { + warnx("\"/\" may not be removed"); + return 1; + } + + /* Reject "." and ".." */ + const char *base = basename(path); + if (strcmp(base, ".") == 0 || strcmp(base, "..") == 0) { + warnx("\".\" and \"..\" may not be removed"); + return 1; + } + + struct stat sb; + if (lstat(path, &sb) < 0) { + if (opts->force) + return 0; /* Silently ignore */ + warn("%s", path); + return 1; + } + + if (S_ISDIR(sb.st_mode) && opts->recursive) + return remove_path_at(AT_FDCWD, path, &sb, opts); + else + return remove_simple_path(path, &sb, opts); +} +``` + +### Recursive Removal + +```c +static int +remove_path_at(int dirfd, const char *path, + const struct stat *sb, + const struct options_t *opts) +{ + /* Prompt before descending */ + if (opts->interactive && + !prompt_for_directory_descent(path)) + return 0; + + /* Open directory safely */ + int fd = openat(dirfd, path, + O_RDONLY | O_DIRECTORY | O_NOFOLLOW); + if (fd < 0) { + warn("cannot open '%s'", path); + return 1; + } + + /* One-filesystem check */ + if (opts->one_fs) { + struct stat dir_sb; + fstat(fd, &dir_sb); + if (dir_sb.st_dev != sb->st_dev) { + warnx("skipping '%s' (different filesystem)", path); + close(fd); + return 1; + } + } + + DIR *dp = fdopendir(fd); + struct dirent *ent; + int errors = 0; + + while ((ent = readdir(dp)) != NULL) { + /* Skip . and .. */ + if (ent->d_name[0] == '.' && + (ent->d_name[1] == '\0' || + (ent->d_name[1] == '.' && ent->d_name[2] == '\0'))) + continue; + + struct stat child_sb; + if (fstatat(fd, ent->d_name, &child_sb, + AT_SYMLINK_NOFOLLOW) < 0) { + warn("%s/%s", path, ent->d_name); + errors = 1; + continue; + } + + if (S_ISDIR(child_sb.st_mode)) { + /* Cycle detection: compare device/inode */ + errors |= remove_path_at(fd, + ent->d_name, &child_sb, opts); + } else { + /* Prompt and remove */ + if (!opts->force && + !prompt_for_removal(path, ent->d_name, + &child_sb, opts)) + continue; + if (unlinkat(fd, ent->d_name, 0) < 0) { + warn("cannot remove '%s/%s'", path, ent->d_name); + errors = 1; + } else if (opts->verbose) { + printf("removed '%s/%s'\n", path, ent->d_name); + } + } + } + closedir(dp); + + /* Remove the directory itself */ + if (unlinkat(dirfd, path, AT_REMOVEDIR) < 0) { + warn("cannot remove '%s'", path); + errors = 1; + } else if (opts->verbose) { + printf("removed directory '%s'\n", path); + } + + return errors; +} +``` + +### Interactive Prompting + +```c +static bool +prompt_for_removal(const char *dir, const char *name, + const struct stat *sb, + const struct options_t *opts) +{ + if (opts->force) + return true; + + /* Always prompt in -i mode */ + if (opts->interactive) { + fprintf(stderr, "remove %s '%s/%s'? ", + filetype_name(sb->st_mode), dir, name); + return prompt_yesno(); + } + + /* Prompt for non-writable files (unless -f) */ + if (!path_is_writable(sb) && opts->stdin_tty) { + fprintf(stderr, "remove write-protected %s '%s/%s'? ", + filetype_name(sb->st_mode), dir, name); + return prompt_yesno(); + } + + return true; +} + +static bool +prompt_yesno(void) +{ + char buf[128]; + if (fgets(buf, sizeof(buf), stdin) == NULL) + return false; + return (buf[0] == 'y' || buf[0] == 'Y'); +} +``` + +### Batch Prompt (-I) + +```c +static bool +prompt_once(int count, const char *first_path, + const struct options_t *opts) +{ + if (!opts->prompt_once) + return true; + + if (count > 3 || opts->recursive) { + fprintf(stderr, + "remove %d arguments%s? ", + count, + opts->recursive ? " recursively" : ""); + return prompt_yesno(); + } + return true; +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `unlink(2)` | Remove file | +| `unlinkat(2)` | Remove file/directory relative to dirfd | +| `openat(2)` | Open directory for traversal | +| `fdopendir(3)` | DIR stream from file descriptor | +| `fstatat(2)` | Stat relative to dirfd | +| `lstat(2)` | Stat without following symlinks | +| `readdir(3)` | Read directory entries | +| `rmdir(2)` | Remove empty directory | + +## Examples + +```sh +# Remove a file +rm file.txt + +# Force remove (no prompts) +rm -f *.o + +# Recursive remove +rm -rf build/ + +# Interactive +rm -ri important_dir/ + +# Verbose +rm -rv old_directory/ + +# Prompt once +rm -I *.log + +# Stay on one filesystem +rm -rx /mounted/dir/ +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All files removed successfully | +| 1 | Error removing one or more files | + +## Differences from GNU rm + +- No `--preserve-root` / `--no-preserve-root` (always refuses `/`) +- No `--one-file-system` long option (uses `-x` instead) +- No `--interactive=WHEN` (only `-i` and `-I`) +- `-P` (overwrite) is BSD-only and not functional on Linux +- `-W` (whiteout) is BSD-only diff --git a/docs/handbook/corebinutils/sleep.md b/docs/handbook/corebinutils/sleep.md new file mode 100644 index 0000000000..b562daff3a --- /dev/null +++ b/docs/handbook/corebinutils/sleep.md @@ -0,0 +1,218 @@ +# sleep — Suspend Execution for an Interval + +## Overview + +`sleep` pauses for the specified duration. It supports fractional seconds, +multiple arguments (accumulated), unit suffixes (`s`, `m`, `h`, `d`), +and `SIGINFO`/`SIGUSR1` for progress reporting. + +**Source**: `sleep/sleep.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +sleep number[suffix] ... +``` + +## Options + +No flags. Arguments are durations with optional unit suffixes. + +## Unit Suffixes + +| Suffix | Meaning | Multiplier | +|--------|---------|------------| +| `s` (default) | Seconds | 1 | +| `m` | Minutes | 60 | +| `h` | Hours | 3600 | +| `d` | Days | 86400 | + +## Source Analysis + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse arguments and sleep loop | +| `parse_interval()` | Parse numeric value with unit suffix | +| `scale_interval()` | Apply unit multiplier with overflow check | +| `seconds_to_timespec()` | Convert float seconds to `struct timespec` | +| `seconds_from_timespec()` | Extract seconds from `struct timespec` | +| `install_info_handler()` | Set up `SIGINFO`/`SIGUSR1` handler | +| `report_remaining()` | Print remaining time on signal | +| `die()` / `die_errno()` | Error handling | +| `usage()` | Print usage and exit | + +### Argument Accumulation + +Multiple arguments are summed: + +```c +int main(int argc, char *argv[]) +{ + double total = 0.0; + + for (int i = 1; i < argc; i++) { + double interval = parse_interval(argv[i]); + total += interval; + } + + if (total > (double)TIME_T_MAX) + die("total sleep duration too large"); + + struct timespec ts = seconds_to_timespec(total); + install_info_handler(); + + /* Sleep loop with EINTR restart */ + while (nanosleep(&ts, &ts) < 0) { + if (errno != EINTR) + die_errno("nanosleep"); + + /* SIGINFO handler may have reported progress */ + } + + return 0; +} +``` + +### Interval Parsing + +```c +static double +parse_interval(const char *arg) +{ + char *end; + double val = strtod(arg, &end); + + if (end == arg || val < 0) + die("invalid time interval: %s", arg); + + /* Apply unit suffix */ + if (*end != '\0') { + val = scale_interval(val, *end); + end++; + } + + if (*end != '\0') + die("invalid time interval: %s", arg); + + return val; +} + +static double +scale_interval(double val, char unit) +{ + switch (unit) { + case 's': return val; + case 'm': return val * 60.0; + case 'h': return val * 3600.0; + case 'd': return val * 86400.0; + default: + die("invalid unit: %c", unit); + } +} +``` + +### Progress Reporting + +```c +static volatile sig_atomic_t info_requested; + +static void +signal_handler(int sig) +{ + (void)sig; + info_requested = 1; +} + +static void +install_info_handler(void) +{ + struct sigaction sa = { + .sa_handler = signal_handler, + .sa_flags = 0, + }; + sigemptyset(&sa.sa_mask); + +#ifdef SIGINFO + sigaction(SIGINFO, &sa, NULL); +#endif + sigaction(SIGUSR1, &sa, NULL); +} + +static void +report_remaining(const struct timespec *remaining) +{ + double secs = seconds_from_timespec(remaining); + fprintf(stderr, "sleep: about %.1f second(s) remaining\n", secs); + info_requested = 0; +} +``` + +When `nanosleep` returns with `EINTR` and the remaining time is in `ts`, +the handler flag is checked and progress is reported before restarting. + +### Overflow Protection + +```c +static struct timespec +seconds_to_timespec(double sec) +{ + struct timespec ts; + + if (sec >= (double)TIME_T_MAX) { + ts.tv_sec = TIME_T_MAX; + ts.tv_nsec = 0; + } else { + ts.tv_sec = (time_t)sec; + ts.tv_nsec = (long)((sec - ts.tv_sec) * 1e9); + } + + return ts; +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `nanosleep(2)` | Sleep with nanosecond precision | +| `sigaction(2)` | Install signal handlers | + +## Examples + +```sh +# Sleep 5 seconds +sleep 5 + +# Fractional seconds +sleep 0.5 + +# With units +sleep 2m # 2 minutes +sleep 1.5h # 90 minutes +sleep 1d # 24 hours + +# Multiple arguments (accumulated) +sleep 1m 30s # 90 seconds total + +# Check remaining time (send SIGUSR1 from another terminal) +kill -USR1 $(pgrep sleep) +# → "sleep: about 42.3 second(s) remaining" +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success (slept for full duration) | +| 1 | Error (invalid argument) | + +## Differences from GNU sleep + +- POSIX-compliant with BSD extensions +- Supports `SIGINFO` (on systems that have it, otherwise `SIGUSR1`) +- Same unit suffix support (`s`, `m`, `h`, `d`) +- Multiple arguments are accumulated (same as GNU) diff --git a/docs/handbook/corebinutils/test.md b/docs/handbook/corebinutils/test.md new file mode 100644 index 0000000000..11b429ab2a --- /dev/null +++ b/docs/handbook/corebinutils/test.md @@ -0,0 +1,248 @@ +# test — Evaluate Conditional Expressions + +## Overview + +`test` (also invoked as `[`) evaluates file attributes, string comparisons, +and integer arithmetic, returning an exit status of 0 (true) or 1 (false). +It uses a recursive descent parser with short-circuit evaluation and +supports both POSIX and BSD extensions. + +**Source**: `test/test.c` (single file) +**Origin**: BSD 4.4, University of California, Berkeley +**License**: BSD-3-Clause + +## Synopsis + +``` +test expression +[ expression ] +``` + +When invoked as `[`, the last argument must be `]`. + +## Source Analysis + +### Parser Architecture + +```c +struct parser { + int argc; + char **argv; + int pos; /* Current argument index */ +}; + +enum token { + TOK_OPERAND, /* String/number operand */ + TOK_UNARY, /* Unary operator (-f, -d, etc.) */ + TOK_BINARY, /* Binary operator (-eq, =, etc.) */ + TOK_NOT, /* ! */ + TOK_AND, /* -a */ + TOK_OR, /* -o */ + TOK_LPAREN, /* ( */ + TOK_RPAREN, /* ) */ + TOK_END, /* End of arguments */ +}; +``` + +### Operator Table + +```c +struct operator { + const char *name; + enum token type; + int (*eval)(/* ... */); +}; +``` + +### Recursive Descent Grammar + +``` +parse_expr() + └── parse_oexpr() /* -o (OR, lowest precedence) */ + └── parse_aexpr() /* -a (AND) */ + └── parse_nexpr() /* ! (NOT) */ + └── parse_primary() /* atoms, ( expr ) */ +``` + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Entry: handle `[`/`test` invocation, drive parser | +| `current_arg()` | Return current argument | +| `peek_arg()` | Look at next argument | +| `advance_arg()` | Consume current argument | +| `lex_token()` | Classify current argument as token type | +| `find_operator()` | Look up operator in table | +| `parse_primary()` | Parse `( expr )`, unary ops, binary ops | +| `parse_nexpr()` | Parse `! expression` | +| `parse_aexpr()` | Parse `expr -a expr` | +| `parse_oexpr()` | Parse `expr -o expr` | +| `parse_binop()` | Evaluate binary operators | +| `evaluate_file_test()` | Evaluate file test primaries | +| `compare_integers()` | Integer comparison | +| `compare_mtime()` | File modification time comparison | +| `newer_file()` | `-nt` test | +| `older_file()` | `-ot` test | +| `same_file()` | `-ef` test | +| `parse_int()` | Parse integer with error checking | +| `effective_access()` | `eaccess(2)` or `faccessat(AT_EACCESS)` | + +### File Test Primaries + +| Operator | Test | System Call | +|----------|------|------------| +| `-b file` | Block special | `stat(2)` + `S_ISBLK` | +| `-c file` | Character special | `stat(2)` + `S_ISCHR` | +| `-d file` | Directory | `stat(2)` + `S_ISDIR` | +| `-e file` | Exists | `stat(2)` | +| `-f file` | Regular file | `stat(2)` + `S_ISREG` | +| `-g file` | Set-GID bit | `stat(2)` + `S_ISGID` | +| `-h file` | Symbolic link | `lstat(2)` + `S_ISLNK` | +| `-k file` | Sticky bit | `stat(2)` + `S_ISVTX` | +| `-L file` | Symbolic link | `lstat(2)` + `S_ISLNK` | +| `-p file` | Named pipe (FIFO) | `stat(2)` + `S_ISFIFO` | +| `-r file` | Readable | `eaccess(2)` or `faccessat(2)` | +| `-s file` | Non-zero size | `stat(2)` + `st_size > 0` | +| `-S file` | Socket | `stat(2)` + `S_ISSOCK` | +| `-t fd` | Is a terminal | `isatty(3)` | +| `-u file` | Set-UID bit | `stat(2)` + `S_ISUID` | +| `-w file` | Writable | `eaccess(2)` or `faccessat(2)` | +| `-x file` | Executable | `eaccess(2)` or `faccessat(2)` | +| `-O file` | Owned by EUID | `stat(2)` + `st_uid == geteuid()` | +| `-G file` | Group matches EGID | `stat(2)` + `st_gid == getegid()` | + +### String Operators + +| Operator | Description | +|----------|-------------| +| `-z string` | String is zero length | +| `-n string` | String is non-zero length | +| `s1 = s2` | Strings are identical | +| `s1 == s2` | Strings are identical (alias) | +| `s1 != s2` | Strings differ | +| `s1 < s2` | String less than (lexicographic) | +| `s1 > s2` | String greater than (lexicographic) | + +### Integer Operators + +| Operator | Description | +|----------|-------------| +| `n1 -eq n2` | Equal | +| `n1 -ne n2` | Not equal | +| `n1 -lt n2` | Less than | +| `n1 -le n2` | Less or equal | +| `n1 -gt n2` | Greater than | +| `n1 -ge n2` | Greater or equal | + +### File Comparison Operators + +| Operator | Description | +|----------|-------------| +| `f1 -nt f2` | f1 is newer than f2 | +| `f1 -ot f2` | f1 is older than f2 | +| `f1 -ef f2` | f1 and f2 are the same file (device + inode) | + +### Short-Circuit Evaluation + +```c +static int +parse_oexpr(struct parser *p) +{ + int result = parse_aexpr(p); + + while (current_is(p, "-o")) { + advance_arg(p); + int right = parse_aexpr(p); + result = result || right; /* Short-circuit */ + } + + return result; +} + +static int +parse_aexpr(struct parser *p) +{ + int result = parse_nexpr(p); + + while (current_is(p, "-a")) { + advance_arg(p); + int right = parse_nexpr(p); + result = result && right; /* Short-circuit */ + } + + return result; +} +``` + +### Bracket Mode + +```c +int main(int argc, char *argv[]) +{ + /* If invoked as "[", last arg must be "]" */ + const char *progname = basename(argv[0]); + if (strcmp(progname, "[") == 0) { + if (argc < 2 || strcmp(argv[argc - 1], "]") != 0) + errx(2, "missing ]"); + argc--; /* Remove trailing ] */ + } + + if (argc <= 1) + return 1; /* No expression → false */ + + struct parser p = { argc - 1, argv + 1, 0 }; + int result = parse_oexpr(&p); + + if (p.pos < p.argc) + errx(2, "unexpected argument: %s", current_arg(&p)); + + return !result; /* 0 = true, 1 = false */ +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `stat(2)` | File attribute tests | +| `lstat(2)` | Symlink tests (`-h`, `-L`) | +| `eaccess(2)` / `faccessat(2)` | Permission tests (`-r`, `-w`, `-x`) | +| `isatty(3)` | Terminal test (`-t`) | +| `geteuid(3)` / `getegid(3)` | Ownership tests (`-O`, `-G`) | + +## Examples + +```sh +# File exists +test -f /etc/passwd && echo "exists" + +# Using [ syntax +[ -d /tmp ] && echo "is a directory" + +# String comparison +[ "$var" = "hello" ] && echo "match" + +# Integer comparison +[ "$count" -gt 10 ] && echo "more than 10" + +# Combined with AND +[ -f file.txt -a -r file.txt ] && echo "readable file" + +# File newer than another +[ config.new -nt config.old ] && echo "config updated" + +# Negation +[ ! -e /tmp/lockfile ] && echo "no lock" + +# Parenthesized expression +[ \( -f a -o -f b \) -a -r c ] +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Expression is true | +| 1 | Expression is false | +| 2 | Invalid expression (syntax error) | diff --git a/docs/handbook/corebinutils/timeout.md b/docs/handbook/corebinutils/timeout.md new file mode 100644 index 0000000000..3186b42886 --- /dev/null +++ b/docs/handbook/corebinutils/timeout.md @@ -0,0 +1,297 @@ +# timeout — Run a Command with a Time Limit + +## Overview + +`timeout` runs a command and kills it if it exceeds a time limit. It +supports a two-stage kill strategy: first send a configurable signal +(default `SIGTERM`), then optionally send a second kill signal after +a grace period. Uses `prctl(PR_SET_CHILD_SUBREAPER)` to reliably +reap grandchild processes. + +**Source**: `timeout/timeout.c` (single file) +**Origin**: BSD/Project Tick +**License**: BSD-3-Clause + +## Synopsis + +``` +timeout [--preserve-status] [--foreground] [-k duration] + [-s signal] [--verbose] duration command [arg ...] +``` + +## Options + +| Flag | Description | +|------|-------------| +| `-s signal` | Signal to send on timeout (default: `SIGTERM`) | +| `-k duration` | Kill signal to send after grace period | +| `--preserve-status` | Exit with the command's status, not 124 | +| `--foreground` | Don't create a new process group | +| `--verbose` | Print diagnostics when sending signals | + +## Source Analysis + +### Constants + +```c +#define EXIT_TIMEOUT 124 /* Command timed out */ +#define EXIT_INVALID 125 /* timeout itself failed */ +#define EXIT_CMD_ERROR 126 /* Command found but not executable */ +#define EXIT_CMD_NOENT 127 /* Command not found */ +``` + +### Data Structures + +```c +struct options { + bool foreground; /* --foreground */ + bool preserve; /* --preserve-status */ + bool verbose; /* --verbose */ + bool kill_after_set; /* -k was specified */ + int timeout_signal; /* -s signal (default SIGTERM) */ + double duration; /* Primary timeout */ + double kill_after; /* Grace period before SIGKILL */ + const char *command_name; + char **command_argv; +}; + +struct child_state { + pid_t pid; + int status; + bool exited; + bool signaled; +}; + +struct runtime_state { + struct child_state child; + bool first_timeout_sent; + bool kill_sent; +}; + +enum deadline_kind { + DEADLINE_TIMEOUT, /* Primary timeout */ + DEADLINE_KILL, /* Kill-after grace period */ +}; +``` + +### Functions + +| Function | Purpose | +|----------|---------| +| `main()` | Parse options, fork, wait with timers | +| `parse_duration_or_die()` | Parse duration string (fractional seconds + units) | +| `monotonic_seconds()` | Read `CLOCK_MONOTONIC` | +| `enable_subreaper_or_die()` | Call `prctl(PR_SET_CHILD_SUBREAPER)` | +| `send_signal_to_command()` | Send signal to child/process group | +| `arm_second_timer()` | Set up kill-after timer | +| `reap_children()` | Wait for all descendants | +| `child_exec()` | Child process: exec the command | + +### Signal Table + +`timeout` shares the same signal table as `kill`: + +```c +/* Same SIGNAL_ENTRY() macro and signal_entry table */ +/* Supports named signals: TERM, KILL, HUP, INT, etc. */ +/* Supports SIGRTMIN+n notation */ +``` + +### Duration Parsing + +```c +static double +parse_duration_or_die(const char *str) +{ + char *end; + double val = strtod(str, &end); + + if (end == str || val < 0) + errx(EXIT_INVALID, "invalid duration: %s", str); + + /* Apply unit suffix */ + switch (*end) { + case '\0': + case 's': break; /* seconds (default) */ + case 'm': val *= 60; break; + case 'h': val *= 3600; break; + case 'd': val *= 86400; break; + default: + errx(EXIT_INVALID, "invalid unit: %c", *end); + } + + return val; +} +``` + +### Subreaper + +The Linux-specific `prctl(PR_SET_CHILD_SUBREAPER)` ensures that orphaned +grandchild processes are reparented to `timeout` instead of PID 1: + +```c +static void +enable_subreaper_or_die(void) +{ + if (prctl(PR_SET_CHILD_SUBREAPER, 1) < 0) + err(EXIT_INVALID, "prctl(PR_SET_CHILD_SUBREAPER)"); +} +``` + +### Two-Stage Kill Strategy + +``` +┌──────────────────────────────────────────────────┐ +│ timeout 30 -k 5 -s TERM ./long_running_task │ +│ │ +│ 1. Fork and exec ./long_running_task │ +│ 2. Wait up to 30 seconds │ +│ 3. If still running: send SIGTERM │ +│ 4. Wait up to 5 more seconds (-k 5) │ +│ 5. If still running: send SIGKILL │ +│ 6. Reap all children │ +└──────────────────────────────────────────────────┘ +``` + +```c +/* Primary timeout handler */ +static void +handle_timeout(struct runtime_state *state, + const struct options *opts) +{ + if (opts->verbose) + warnx("sending signal %s to command '%s'", + signal_name_for_number(opts->timeout_signal), + opts->command_name); + + send_signal_to_command(state, opts->timeout_signal, opts); + state->first_timeout_sent = true; + + /* Arm kill-after timer if specified */ + if (opts->kill_after_set) + arm_second_timer(opts->kill_after); +} + +/* Kill-after timer handler */ +static void +handle_kill_after(struct runtime_state *state, + const struct options *opts) +{ + if (opts->verbose) + warnx("sending SIGKILL to command '%s'", + opts->command_name); + + send_signal_to_command(state, SIGKILL, opts); + state->kill_sent = true; +} +``` + +### Process Group Management + +```c +static void +send_signal_to_command(struct runtime_state *state, + int sig, const struct options *opts) +{ + if (opts->foreground) { + /* Send to child only */ + kill(state->child.pid, sig); + } else { + /* Send to entire process group */ + kill(-state->child.pid, sig); + } +} + +static void +child_exec(const struct options *opts) +{ + if (!opts->foreground) { + /* Create new process group */ + setpgid(0, 0); + } + + execvp(opts->command_name, opts->command_argv); + + /* exec failed */ + int code = (errno == ENOENT) ? EXIT_CMD_NOENT : EXIT_CMD_ERROR; + err(code, "exec '%s'", opts->command_name); +} +``` + +### Timer Implementation + +Uses `timer_create(2)` with `CLOCK_MONOTONIC`: + +```c +static void +arm_timer(double seconds) +{ + struct itimerspec its = { + .it_value = { + .tv_sec = (time_t)seconds, + .tv_nsec = (long)((seconds - (time_t)seconds) * 1e9), + }, + }; + + timer_t timerid; + struct sigevent sev = { + .sigev_notify = SIGEV_SIGNAL, + .sigev_signo = SIGALRM, + }; + + timer_create(CLOCK_MONOTONIC, &sev, &timerid); + timer_settime(timerid, 0, &its, NULL); +} +``` + +## System Calls Used + +| Syscall | Purpose | +|---------|---------| +| `fork(2)` | Create child process | +| `execvp(3)` | Execute the command | +| `kill(2)` | Send signal to child/group | +| `waitpid(2)` | Wait for child/grandchild exit | +| `setpgid(2)` | Create new process group | +| `prctl(2)` | `PR_SET_CHILD_SUBREAPER` | +| `timer_create(2)` | POSIX timer for deadline | +| `timer_settime(2)` | Arm the timer | +| `clock_gettime(2)` | `CLOCK_MONOTONIC` for elapsed time | +| `sigaction(2)` | Signal handler setup | + +## Examples + +```sh +# Basic timeout (30 seconds) +timeout 30 make -j4 + +# With kill-after grace period +timeout -k 10 60 ./server + +# Custom signal +timeout -s HUP 300 ./daemon + +# Verbose +timeout --verbose 5 sleep 100 +# timeout: sending signal TERM to command 'sleep' + +# Preserve exit status +timeout --preserve-status 10 ./test_runner +echo $? # Exit code from test_runner, not 124 + +# Fractional seconds +timeout 2.5 curl https://example.com + +# Foreground (no process group) +timeout --foreground 30 ./interactive_app +``` + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 124 | Command timed out | +| 125 | `timeout` itself failed | +| 126 | Command found but not executable | +| 127 | Command not found | +| other | Command's exit status (or 128+signal if killed) | |
