COORD: 44.21.90
OFFSET: +12.5°
SYS.READY
BUFFER: 99%
FOCAL_PT
BACK TO DEVLOG
MOTIF

Content/Style Separation Pipeline

Wiring the core style extraction pipeline — separating what is depicted from how it looks for image generation.

2025-01-23 // RAW LEARNING CAPTURE
PROJECTMOTIF

Devlog: Content/Style Separation Pipeline

Date: 2025-01-23 Commit: dd507cc — Wire core style extraction pipeline with content/style separation

Starting Point

Motif had a working wizard UI (from design exploration B) but the backend was using the deprecated imagen-3.0-generate-002 model and a naive schema that mixed content with style. The key realization from examining the style-profile-architect reference project: the technique that actually works is separating WHAT is depicted (neutral content) from HOW it looks (structured JSON style profile), then recombining them at generation time.

Previous session had already:

  • Switched from R2 to Cloudflare Images
  • Fixed OAuth auth
  • Started switching from deprecated Imagen to Gemini native image generation

The Reference Project Technique

The working technique from style-profile-architect (explored at ~/Code/style-profile-architect):

Analysis phase — Given N reference images, extract TWO things:

  1. A structured JSON StyleProfile (color, lighting, camera, texture, etc.)
  2. N neutral content prompts — one per image, describing ONLY what's physically depicted with ZERO style words

Generation phase — For each neutral prompt, generate an image using:

CONTENT: ${neutralPrompt}

STYLE PROFILE INSTRUCTIONS: Apply the following style constraints strictly: ${JSON.stringify(profile)}. Ensure the visual medium is ${profile.rendering.medium}.

Refinement phase — Show Gemini both reference images AND generated images side-by-side with labels like [Target Reference Image 1] and [Current Generated Attempt 1], plus user feedback. It returns an updated profile JSON with an assessment of what changed.

The key insight: the JSON profile IS the prompt engineering. The full structured object gets serialized into the generation prompt as text. The model reads it as constraints.

Schema Rewrite

Old schema was vague and mixed concerns:

// OLD — too abstract, missing concrete visual attributes
colors: { palette: [{hex, role, name}], mood: string }
typography: { mood, weight, characteristics[] }
composition: { density, rhythm, whitespace }
texture: { qualities[], material }
lighting: { direction, warmth, contrast }
mood: { primary, keywords[] }

New schema matches the reference project — concrete, numeric where appropriate:

// NEW — specific, measurable, machine-readable
color: { palette: string[], hue, saturation, contrast, brightness }
lighting: { direction, intensity, temperature: number, volumetric: boolean, shadowStyle, highlightStyle }
camera: { perspective, angle, zoom, depthOfField }
composition: { framing, focus, depth: number, negativeSpaceRatio: number, motionFlow }
texture: { surfaceStyle, detailLevel, brushStyle, grain, lineQuality }
atmosphere: { fog, vignette, ambientMotion }
mood: { emotionalTone, realismLevel }
rendering: { medium }

Notable changes:

  • palette went from {hex, role, name}[] to plain string[] (just hex values)
  • temperature is Kelvin (2700=warm candlelight, 5500=daylight, 7500=cool blue)
  • depth is 1-10 scale
  • negativeSpaceRatio is 0-1 decimal
  • medium is specific: "matte-photograph", "oil-painting", "3d-render"

Gemini Integration (src/lib/gemini.ts)

Two models:

const ANALYSIS_MODEL = 'gemini-2.0-flash';      // Vision + structured JSON output
const IMAGE_GEN_MODEL = 'gemini-2.5-flash-image'; // Native image generation

Analysis uses structured output with a JSON schema matching Zod:

const response = await ai.models.generateContent({
  model: ANALYSIS_MODEL,
  contents: [{ role: 'user', parts: [...imageParts, { text: ANALYSIS_PROMPT }] }],
  config: {
    responseMimeType: 'application/json',
    responseJsonSchema: STYLE_PROFILE_JSON_SCHEMA,
  },
});

Generation uses responseModalities: ['IMAGE', 'TEXT']:

const response = await ai.models.generateContent({
  model: IMAGE_GEN_MODEL,
  contents: [{ role: 'user', parts: [{ text: fullPrompt }] }],
  config: { responseModalities: ['IMAGE', 'TEXT'] },
});

const parts = response.candidates?.[0]?.content?.parts ?? [];
const imagePart = parts.find((p) => p.inlineData?.data);
// imagePart.inlineData.data is base64, .mimeType is 'image/png'

This replaced the deprecated generateImages API (imagen-3.0-generate-002).

Refinement builds interleaved image parts with text labels:

for (let i = 0; i < referenceImages.length; i++) {
  imageParts.push({ text: `[Target Reference Image ${i + 1}]` });
  imageParts.push(createPartFromBase64(ref.data, ref.mimeType));
  if (gen) {
    imageParts.push({ text: `[Current Generated Attempt ${i + 1}]` });
    imageParts.push(createPartFromBase64(gen.data, gen.mimeType));
  }
}

Prisma Field Addition — The Gotchas

Added neutralPrompts to the StyleProfile model:

model StyleProfile {
  neutralPrompts String @default("[]") // JSON array of content prompts
}

Gotcha 1: prisma db push alone isn't enough. Even though db push reports "in sync", the generated client doesn't know about the new field until prisma generate is run. Error was:

Unknown argument `neutralPrompts`. Available options are marked with ?.

Fix: pnpm prisma generate to regenerate the client.

Gotcha 2: Stale session after db push --force-reset. The earlier session had run --force-reset which wiped the User table. But the browser still had a valid session cookie referencing userId: "cmkrtmny20000frjpm0xeeryk". Result:

PrismaClientKnownRequestError: Foreign key constraint violated

The error message was confusing because it didn't say WHICH FK — had to check sqlite3 ... "SELECT id FROM User;" to confirm the table was empty.

Fix: Added sign-out button so user can clear the stale session and re-auth.

Retry Safety in Server Actions

extractStyle can be called multiple times (user retries). Two patterns to handle this:

// Upsert instead of create for idempotent version records
await prisma.profileVersion.upsert({
  where: { profileId_version: { profileId, version: 1 } },
  update: { schema: schemaJson, note: 'Initial extraction' },
  create: { profileId, version: 1, schema: schemaJson, note: 'Initial extraction' },
});

// Delete before regenerating specimens
await prisma.generatedExample.deleteMany({ where: { profileId } });
const generatedImages = await generateAndSaveSpecimens(profileId, schema, prompts);

The Field Rename Cascade

Changing colorscolor and palette: {hex,role,name}[]palette: string[] touched every file that renders the schema. The cascade:

  • page.tsx (homepage) — palette strip, featured specimen, profile cards, mood index
  • gallery/[id]/page.tsx — palette swatches with hover tooltips, schema properties section
  • studio/page.tsx — CssSpecimen component, profile card palette strip
  • studio/new/page.tsx — wizard steps showing schema attributes

Common patterns that broke:

// OLD
schema.colors.palette.map(color => color.hex)
schema.mood.primary
schema.mood.keywords
schema.composition.density
schema.lighting.warmth
schema.texture.material

// NEW
schema.color.palette.map(hex => hex)  // strings directly
schema.mood.emotionalTone
// mood.keywords removed entirely
schema.composition.framing
schema.lighting.intensity
schema.texture.surfaceStyle

Sign-Out with Auth.js v5

Inline server action in a form — cleanest pattern for Auth.js v5:

import { signOut } from '@/lib/auth';

// In the JSX:
<form action={async () => { 'use server'; await signOut({ redirectTo: '/login' }); }}>
  <button type="submit">Sign out</button>
</form>

TypeScript shows a false "unused variable" diagnostic for signOut because it's referenced inside the inline 'use server' function, which TS doesn't trace through. Build compiles fine.

Where We Landed

Full pipeline wired:

  1. User uploads reference images → Cloudflare Images
  2. extractStyle → Gemini analyzes → returns StyleProfile JSON + neutral prompts
  3. generateSpecimenImages → applies profile to each prompt → generates specimens
  4. refineStyle → compares reference vs generated → adjusts profile → regenerates
  5. publishProfile → marks published, appears in gallery

Build passes clean. All three gallery/studio views render with the new schema shape. Sign-out available for session management.

Takeaways

  • Content/style separation is the key technique. A flat "generate an image in this style" prompt doesn't work well. Separating the neutral content description from the style constraints, then recombining as CONTENT: ... STYLE: ${JSON.stringify(profile)} gives the model clear structure to follow.

  • Gemini's responseModalities: ['IMAGE', 'TEXT'] is the replacement for Imagen. The model is gemini-2.5-flash-image. Response comes back in candidates[0].content.parts[] — find the part with inlineData.data (base64).

  • prisma db push syncs the DB but NOT the client. Always follow with prisma generate after schema changes, or the TypeScript types won't include new fields.

  • Session cookies survive database resets. After --force-reset, users need to sign out and back in. Foreign key errors on user creation are the symptom.

LOG.ENTRY_END
ref:motif
RAW