Two Systems Hoping to Agree Is Just Heuristics

I swapped a bad text-height heuristic for an accurate text-measurement library and still got 46px of error. The accuracy wasn't the problem — having two systems predict each other was. The fix was to make measurement and rendering the same system, which took the masonry grid from a 7.7px average error to 0.0px across 40 cards.

2026-03-31 // RAW LEARNING CAPTURE

PROJECTBOOKMARK-VAULT

Bookmark Vault's masonry grid estimated card height by dividing character count by 8.3px per character. That average is wrong for emoji, CJK, URLs, mixed content — basically everything real — so cards overlapped or left gaps. The fix seemed obvious: replace the guess with a real measurement. I reached for Pretext (@chenglou/pretext), which measures multiline text layout via canvas measureText() without DOM reflows. I expected this to end the story. It didn't, and the reason why is the actual lesson.

Can it even measure accurately?

I built a comparison page that rendered 30 bookmark texts in real DOM and diffed the heights against Pretext's predictions. The API is two calls: prepare(text, font) is expensive (measures word widths on canvas), layout(prepared, maxWidth, lineHeight) is cheap pure arithmetic.

First run, against Tailwind's default ui-sans-serif stack:

Perfect (< 1px): 15/30
Close  (< 5px): 21/30
Avg |delta|:    7.7px

Every failure was exactly 21px — one line height. Pretext kept predicting one fewer line than the DOM rendered. Two culprits, found in sequence:

The card text used tracking-wide (letter-spacing 0.025em). Canvas measureText() does not account for letter-spacing, so accumulated width pushed words to the next line in the DOM but not in the prediction. Removing it took the average to 2.1px.
system-ui is unsafe for canvas measurement on macOS — SF Pro has variable optical sizing, and canvas resolves a different variant than the DOM does. Switching to a named font ("Helvetica Neue") brought it to 1.4px, with one stubborn sub-pixel outlier.

The reframe

I was about to call 1.4px "good enough" when the framing flipped. I had been asking can Pretext predict what CSS will do? The honest answer is "mostly, with edge cases that never fully die." The better question is can Pretext control what CSS does?

If Pretext determines the line breaks and I render exactly those lines, there is nothing to predict. Measurement is rendering. Zero disagreement is structurally possible, not just achievable on a good day.

The heuristic trap

So I integrated it — kept the existing Tailwind BookmarkCard, used Pretext only for the text measurement. The calibrator:

Perfect (< 2px): 0/40
Avg |delta|:     46.6px

Worse than before I started. I had replaced one heuristic (text height) with an accurate measurement, then quietly introduced another: the chrome constants. HEADER_H: 52, FOOTER_H: 40 and friends were guesses about what Tailwind would render. Two systems hoping to agree are heuristics even when one of them is accurate — the inaccuracy just relocates to the seam between them.

One source of truth

The real fix was to build MasonryCard from scratch so every height-bearing dimension reads inline styles from the same constants the measurement uses. Tailwind only handles color and hover.

// pretext-measure.ts — the single source
export const HEADER_H = 52;
export const FOOTER_H = 40;
export const CONTENT_PADDING = 14;

// masonry-card.tsx — renders from the same numbers
<div style={{ height: HEADER_H }}>
<div style={{ padding: CONTENT_PADDING }}>
<div style={{ height: FOOTER_H }}>

Calibrator:

40/40 perfect
Avg |delta|: 0.0px

	Avg delta	Perfect
Char-count heuristic	7.7px	15/30
Pretext as predictor of Tailwind	46.6px	0/40
Pretext as source of truth	0.0px	40/40

The same numbers driving measurement now drive layout for both the virtualized feed and a pannable, infinite-wrapping atlas of full tweet cards — the atlas only became possible because deterministic heights let me render real cards instead of bare image tiles. Along the way: stagger DOM insertion (promote ~4 tiles per animation frame rather than dumping 35 elements in one synchronous render, which froze the browser), and fill short masonry columns with overflow copies of their own cards so the wrap boundary has no visible seam.

The takeaway

The instinct when a prediction is slightly wrong is to make the prediction better. But if accuracy is your safety mechanism, you are always one edge case from a bug, because the two systems are independent and only hope to agree. The durable move is to collapse them — make the thing that measures and the thing that renders read from one source — so agreement is structural rather than earned. Then 0.0px isn't luck. It's the only outcome the system can produce.

LOG.ENTRY_END

ref:bookmark-vault

RAW