COORD: 44.21.90
OFFSET: +12.5°
SYS.READY
BUFFER: 99%
FOCAL_PT
BACK TO DEVLOG
SWARM-INFRA

Swarm Migration to Semantic Labels + Image-Based Default

Migrating Docker Swarm from Expanse-themed node labels to Tailscale-matching semantic labels, plus switching default deployment to image-based.

2026-01-24 // RAW LEARNING CAPTURE
PROJECTSWARM-INFRA

Devlog: Swarm Migration to Semantic Labels + Image-Based Default

Date: 2026-01-24

Starting Point

The Docker Swarm infrastructure had accumulated two naming/deployment debt items:

  1. Old Expanse-themed node labels (rocinante-bru-m3, nauvoo-bru-home-pc-wsl) didn't match Tailscale device names (joel-m3, joel-desktop-wsl). The CHANGELOG v0.9 explicitly said "Swarm node labels kept at old names to avoid breaking 40+ services."

  2. Volume-mount was still the documented default even though the digitalpine-deploy skill already did image-first. New projects should build → push to ghcr.io → deploy to PHM MacBook (always-on Intel Mac), not mount code from the dev MacBook.

The PHM MacBook (joel-phm-colima) had been added as an always-on worker in v0.9 but had zero services deployed to it. Everything was either volume-mounted on the M3 or image-based on nauvoo (the PC).

Updated the Documentation First

Before touching compose files, updated three docs to flip the default:

swarm-infra/CLAUDE.md — Service template changed from volume-mount to image-based:

# NEW DEFAULT
services:
  myapp:
    image: ghcr.io/digitalpine/myapp:latest
    deploy:
      placement:
        constraints:
          - node.labels.availability == always-on
          - node.labels.tier != edge

Volume-mount pattern demoted to "Local Dev Only" subsection.

~/.claude/CLAUDE.md — Swarm quick ref:

# OLD
- Dev (portable): M3 MacBook (`joel-m3-colima`), volume mounts
- Always-on: PC (`joel-desktop-wsl`) + PHM MacBook (`joel-phm-colima`), ghcr.io images

# NEW
- Default: Build image → push ghcr.io/digitalpine → deploy to PHM MacBook
- Dev-only: Volume mounts on M3 MacBook for rapid iteration

swarm-infra/PHILOSOPHY.md — Flipped from "No image registry required (usually)" to "Image builds are the default."

Audited Current Deployment State

Found 53 docker-compose files across ~/Code. Breakdown:

PatternCountTarget
Volume mount (code on host)~28M3 MacBook
Image-based (ghcr.io)~17nauvoo (PC)
Infrastructure/dev-only~8Various

Key finding: 11 projects were Kindling Pages pitch sites that all shared the same pattern — mount kindling-pages .next/standalone build + their own .project.md:

volumes:
  - /Users/joel/Code/kindling-pages/.next/standalone:/app:ro
  - ./.project.md:/content/.project.md:ro

This pattern was perfect for conversion — the base image already existed (kindling-pages/Dockerfile) and per-project images would be 2 lines each.

Phase 1: Bulk Node Label Rename

Simple find/replace across all compose files:

# 19 files
grep -rl "rocinante-bru-m3" ~/Code --include="docker-compose.yaml" --include="docker-compose.yml" \
  | grep -v node_modules | grep -v .git \
  | xargs sed -i '' 's/rocinante-bru-m3/joel-m3/g'

# 20 files
grep -rl "nauvoo-bru-home-pc-wsl" ~/Code --include="docker-compose.yaml" --include="docker-compose.yml" \
  | grep -v node_modules | grep -v .git \
  | xargs sed -i '' 's/nauvoo-bru-home-pc-wsl/joel-desktop-wsl/g'

Also updated swarm-infra/CLAUDE.md node labels table and stacks/joel-m3.yml.

Left CHANGELOG and devlog references alone (historical).

Phase 2: Semantic Constraints for Image-Based Services

The key shift: instead of node.labels.host == specific-machine, use node.labels.availability == always-on + node.labels.tier != edge. This lets Swarm schedule anywhere that's always-on (currently PHM + nauvoo).

For the ~20 nauvoo services already using ghcr.io images, used perl for the 1→2 line replacement:

grep -rl "node.labels.host == joel-desktop-wsl" ~/Code --include="docker-compose.*" \
  | grep -v node_modules | grep -v .git \
  | xargs perl -i -pe 's/- node\.labels\.host == joel-desktop-wsl/- node.labels.availability == always-on\n          - node.labels.tier != edge/'

Also switched hammerspoon and jobwatch (image-based but previously pinned to M3).

Decision: User chose "Both always-on" — meaning nauvoo AND PHM both have availability == always-on, so Swarm can schedule on either. No preference for one over the other.

Phase 3: Converting Kindling Pages to Image-Based

The Pattern

The existing kindling-pages/Dockerfile creates a base image:

FROM node:22-alpine
WORKDIR /app
COPY .next/standalone ./
COPY .next/static ./.next/static
COPY content/.project.md /content/.project.md  # Added: own pitch page
EXPOSE 3000
ENV PORT=3000 HOSTNAME="0.0.0.0"
CMD ["node", "server.js"]

Per-project Dockerfiles are 2 lines:

FROM ghcr.io/digitalpine/kindling-pages:latest
COPY .project.md /content/.project.md

Projects Converted (9 total)

Created Dockerfiles in: cc-trace-command, pocket, daemon-ranch, bookmark-vault, tugboat, design-studio, pocket-config-sync, tapfate

Updated kindling-pages' own Dockerfile to include its content (so the base image also serves kindling.digitalpine.io).

Compose File Rewrites

Each went from ~25 lines (volumes, working_dir, command, environment, host constraint) to ~20 lines (image reference + semantic constraints + traefik labels):

services:
  web:
    image: ghcr.io/digitalpine/{project}-page:latest
    networks:
      - digitalpine
    deploy:
      placement:
        constraints:
          - node.labels.availability == always-on
          - node.labels.tier != edge
      labels:
        - "kindling.pages=true"
        - "traefik.enable=true"
        - "traefik.http.routers.{name}.rule=Host(`{name}.digitalpine.io`)"
        - "traefik.http.routers.{name}.entrypoints=web"
        - "traefik.http.services.{name}.loadbalancer.server.port=3000"

What Was Left Out (Deliberate)

  • skill-pages (5 services) — Mounts from ~/.claude/skills/{name}/.project.md, needs different image strategy
  • webshot — Has browserless (chromium) dependency, too heavy for PHM Intel Mac
  • brubkr, personal-site-test, mac-digitalpine — Dev-only, not worth converting now
  • bfds — Known arch mismatch issue, skipped

Where We Landed

File Changes Made

  • ~39 compose files: label renames
  • ~20 compose files: semantic constraint migration
  • 9 compose files: full rewrites (volume-mount → image-based)
  • 8 new Dockerfiles (2-line each)
  • 1 Dockerfile updated (kindling-pages base)
  • 3 doc files updated (CLAUDE.md, PHILOSOPHY.md, global CLAUDE.md)

Still Needed (Runtime)

The build + deploy step was interrupted. Remaining:

  1. Build kindling-pages: cd ~/Code/kindling-pages && pnpm build && docker buildx build --platform linux/amd64,linux/arm64 -t ghcr.io/digitalpine/kindling-pages:latest --push .

  2. Build per-project images (after base is pushed):

for dir in cc-trace-command pocket daemon-ranch bookmark-vault tugboat design-studio pocket-config-sync tapfate; do
  cd ~/Code/$dir
  docker buildx build --platform linux/amd64,linux/arm64 \
    -t ghcr.io/digitalpine/${dir}-page:latest --push .
done
  1. Relabel swarm nodes:
docker node update --label-rm host <m3-node-id> && docker node update --label-add host=joel-m3 <m3-node-id>
docker node update --label-rm host <nauvoo-node-id> && docker node update --label-add host=joel-desktop-wsl <nauvoo-node-id>
  1. Redeploy all stacksdocker stack deploy for each affected service.

Takeaways

  • The "avoid breaking 40+ services" hesitation was overblown. Bulk sed/perl across compose files took seconds. The real barrier was psychological, not technical.

  • Semantic labels (availability == always-on) over host-pinning means future node renames are painless — just relabel the node, services keep running.

  • Kindling Pages' shared-build pattern was accidentally perfect for image-based conversion. One base image, N tiny per-project images. The Dockerfile per project is literally 2 lines.

  • PHM MacBook goes from 0 to ~30 services once deployed. It was added in v0.9 (2026-01-23) but never got any workload because the default was still volume-mounts-on-M3.

  • Multi-platform builds matter here because PHM is Intel (amd64) and M3 is ARM (arm64). docker buildx --platform linux/amd64,linux/arm64 handles both.

LOG.ENTRY_END
ref:swarm-infra
RAW