The Instant APFS Clone That Took 18 Seconds

Cloning a monorepo with exp took 18.6s, and I assumed copy-on-write was supposed to be free. It is free for data — but clonefile(2) is O(N inodes), and a 451k-inode tree is real time and real disk. Excluding node_modules and symlinking it back took the same clone to 767ms.

2026-03-08 // RAW LEARNING CAPTURE

PROJECTEXP

exp new is supposed to be instant. It clones a project directory with APFS copy-on-write — no data is copied, just block references, so the marketing in my head said "near-zero." Then I ran it against onbook, a pnpm/bun monorepo, and watched the spinner freeze for 18.6 seconds:

Cloned via clonefile(2) in 18.6s (cleaned .next, .turbo in 14ms)

The cleanup step was 14ms — noise. 99.9% of the wall time was the clone itself, the operation I'd assumed was free. I was wrong about what "free" meant.

clonefile(2) is O(N inodes), not instant

The myth is that an APFS clone doesn't copy data. That part is true. What it skips telling you is that clonefile(2) still creates B-tree entries and block references for every inode in the tree. onbook has a lot of inodes:

451,801 inodes × ~0.04ms/inode = 18s
Actual disk cost (df delta): 159 MB

So the clone is linear in inode count, and it costs real disk. Note the measurement detail that bit me first: du -sh reports apparent size, which for a CoW clone looks identical to a full copy. The only honest measurement is a df delta before and after — that's what surfaces the true block-reference cost.

The frozen spinner was the same story from a different angle. The clonefile(2) FFI call (bun:ffi into libSystem) is synchronous, so it blocks Bun's event loop for the full 18s and the setInterval driving the spinner animation never fires. The UI wasn't lying about being stuck; it genuinely was.

Building the scaffold to measure

Guessing was over. I wrote two scripts: one to scaffold a realistic turbo monorepo fixture (~20k inodes, fast to reset between runs), and one to benchmark clone strategies reporting both wall-clock time and df-delta disk cost. An early bug in the fixture script is instructive — a find -not -path "*/\.*" filter silently excluded pnpm's .pnpm/ virtual store, reporting 39 inodes where there were 19,949. When the thing you're measuring is inode count, an inode-hiding glob is a great way to fool yourself.

The dominant cost is node_modules. So every strategy is some variation of "don't clone it."

Walking the strategies

Strategy one: scan the root, skip node_modules, atomically clonefile each remaining dir, then bun install to rebuild. Fifteen syscalls instead of a 451k-inode traversal. The clone dropped to 490ms — but install dominated at 6.3s, so end-to-end was only 2.4x faster.

Strategy two: walk the whole tree recursively, excluding node_modules at any depth. More correct, but slower — the recursive exclusion check is itself a full tree walk, so you pay traversal twice, and stripping every nested node_modules means install has more to rebuild.

The unlock came from a constraint I already knew but hadn't encoded: you will never find node_modules inside .git, .next, or .turbo, and you've never seen a real repo nest deeper than apps/name/node_modules. Adding a noDescend set (.git alone was 357 needlessly-walked dirs) plus a depth limit of 3 cut the traverse time from 1.37s to 661ms.

The winner: smart-clone + symlink

The realization that closed it: if you're excluding node_modules anyway, why rebuild it at all? Clone everything except node_modules, then symlink the directories back to their source locations.

const nmPaths = findNodeModules(source, maxDepth=5);
smartClone(source, destination, exclude=["node_modules"],
  noDescend=[".git", ".next", ".turbo"], maxDepth=3);
for (const nm of nmPaths) {
  const rel = nm.slice(source.length);
  symlinkSync(nm, destination + rel);
}

No install step. Symlinks are free.

Strategy	Time	Disk	Notes
`clonefile(2)` whole tree	9.9s	72 MB	Current default
smart-clone + symlink	767ms	~0	13x faster
smart-clone + install	11.1s	66 MB	Double-walk penalty
root-scan + install	10.2s	69 MB	Includes nested nm

Thirteen times faster, zero disk overhead. It ships opt-in rather than as the default — the known footgun is that Turbopack rejects symlinks pointing outside the project root — but for dogfooding it's the obvious choice. The install-based strategies were dropped outright; 6-10s to rebuild what a symlink hands you for free is not a trade.

The lesson

This started as "why is this slow?" and ended as a design decision — recursive correctness versus speed, symlink versus install. The session was a shaping exercise wearing a performance-profiling costume, and the pivotal fact (clonefile(2) is O(N inodes)) never appeared in any doc I read. It only fell out of measurement. Write the bench script, measure everything, and let the df delta tell you where the cost actually lives — because the story in your head about "free" copy-on-write is exactly the kind of assumption a 30-second benchmark exists to kill.

LOG.ENTRY_END

ref:exp

RAW