Redesigning Well: A Design System Story from First Principles

Maxime ChampouxMarch 4, 202612 min read

In January 2026, Well's engineering dashboard served over 400 workspace configurations across three continents. Our team of four engineers was shipping 27 features per month. The UI was holding, but barely. Every new component meant copying styles from whichever screen looked closest to what we needed, then tweaking hex values until things matched. We had 14 distinct grays in production.

That month, we decided to stop shipping features for two weeks and rebuild our design system from scratch.

This is the story of why that decision made sense, what we built, and how shadow tokens became the single best infrastructure investment a small team can make.

The breaking point

Most startups accumulate design debt the same way they accumulate technical debt: gradually, then suddenly. For Well, the "suddenly" arrived when a client asked for a white-label version of the dashboard.

We said yes. Then we audited the codebase.

Our colors were hardcoded in 340 places across 89 files. Spacing values ranged from 4px to 5px to 6px in components that sat side by side. We had three button components, each built by a different engineer at a different time, each with slightly different padding, border radius, and hover states. The typography stack referenced six font weights, but only four were actually loaded.

White-labeling this would mean touching every file in the frontend. Manually. Repeatedly.

The calculus was simple. We could spend two weeks now building a system, or spend two weeks per client customization request forever. With our growth trajectory pointing toward more enterprise deals, "forever" was looking expensive.

Why most startups get this wrong

The conventional wisdom in startup engineering is: don't invest in infrastructure until it hurts. Ship first, refactor later. Move fast and fix things.

This advice is correct for teams of 20+ engineers where coordination costs dominate. At that scale, premature abstraction creates more problems than it solves. You end up with a design system committee, a tokens working group, and quarterly "alignment sprints" that produce documentation nobody reads.

At four engineers, the dynamics are different. Every person on the team touches every part of the codebase every week. There's no coordination cost because there's nothing to coordinate. When one person changes the spacing scale, everyone knows by lunch.

Small teams get outsized returns from infrastructure investment because the ratio of building cost to usage is inverted. A design system that takes 80 hours to build and saves 2 hours per engineer per week pays for itself in 10 weeks. With four engineers, that's 10 weeks. With 40 engineers, the building cost balloons to 800 hours (meetings, documentation, migration guides, backward compatibility) and the payback period barely changes.

The lesson: design systems at small scale are cheap to build and expensive to skip.

Shadow tokens: the core idea

Design tokens are variables that store visual design decisions. Instead of writing color: #1a73e8, you write color: var(--color-primary). This is well-understood. Most teams that adopt tokens stop here.

We went one layer deeper with what we call shadow tokens. A shadow token is a token that references another token. The concept is borrowed from how operating systems handle symbolic links, and it solves the theming problem at its root.

Here's the architecture. We define three layers:

Primitive tokens are raw values. They don't carry semantic meaning. --blue-500: #1a73e8. --gray-100: #f5f5f5. These never appear in component code.

Semantic tokens reference primitives and carry meaning. --color-primary: var(--blue-500). --color-surface: var(--gray-100). Components use these.

Shadow tokens are semantic tokens that change their reference based on context. --color-surface points to --gray-100 in light mode and --gray-900 in dark mode. But it also points to --brand-surface in white-label mode, where --brand-surface is itself a semantic token defined by the client's configuration.

The "shadow" metaphor works because these tokens cast forward. Changing a single primitive value at the base layer cascades through every shadow that references it, across every component, without touching a single line of component code.

In practice, this meant our white-label implementation went from "touch 340 hardcoded values" to "swap one JSON file."

// primitives.json (per-theme)
{
  "blue-500": "#1a73e8",  // Well default
  "blue-500": "#6200ee",  // Client theme override
}

// semantics.json (shared)
{
  "color-primary": "{blue-500}",
  "color-primary-hover": "{blue-600}",
  "color-surface": "{gray-100}"
}

The token compiler resolves references at build time, generates CSS custom properties, and produces a flat map for runtime use. Total implementation: 340 lines of TypeScript.

Building the component library

With tokens in place, we rebuilt our component library over six days. The strategy was deliberately minimal: we identified the 22 components that appeared on more than two screens and rebuilt only those. Everything else stayed as-is, scheduled for migration during normal feature work.

The components followed three rules:

Rule 1: No magic numbers. Every spacing value comes from the scale (4, 8, 12, 16, 24, 32, 48, 64). Every color comes from a semantic token. Every font size comes from the type scale. If a designer spec uses 13px, we round to 12px and move on. Precision that humans can't perceive is precision that costs engineering time.

Rule 2: Composition over configuration. Instead of a Button component with 16 props controlling every visual variant, we built a Button with 3 props (size, variant, state) and let composition handle the rest. Need a button with an icon? Compose Icon inside Button. Need a button group? Compose Button inside ButtonGroup. This reduced our component API surface by roughly 60% compared to the old system.

Rule 3: One component, one file, one owner. Each component lives in a single file. Each file has a comment at the top with the author's name. Not for blame, but for questions. When you want to know why Dialog animates from the bottom on mobile but fades in on desktop, you ask the person who built it. Documentation is good. A person who remembers context is better.

Side Panel V2: the design system's first test

The first real test of the new system came immediately. We'd been planning a redesign of our data exploration interface for months. The old version used five separate screens: a list view, a detail view, a chart view, a filter panel, and an export dialog. Users navigated between them constantly, losing context with every screen transition.

Side Panel V2 replaced all five screens with a single sliding panel that overlays the main content. Selecting a data point opens the panel. Inside it, tabs handle the different views (detail, chart, export), and filters live in a collapsible section at the top.

This wasn't a novel UX pattern. Airtable, Linear, and Notion all use side panels for detail views. What made our implementation interesting was how the design system handled it.

The panel itself used 4 components from the library: Panel, Tabs, Collapse, and DataGrid. Styling required zero custom CSS. The spacing between elements came from the layout tokens. The animation timing came from the motion tokens. The responsive breakpoints came from the layout tokens.

Building the V1 data exploration interface had taken three engineers two weeks. Side Panel V2 took one engineer four days.

That speed difference wasn't just about having reusable components. It was about having reusable decisions. The design system encoded answers to questions like: How much padding should a panel have? (16px on mobile, 24px on desktop.) What easing curve should a slide animation use? (cubic-bezier 0.4, 0, 0.2, 1.) How should focus states look? (2px ring, offset by 2px, using --color-focus.)

Every question the old system forced engineers to answer through inspection or guessing, the new system answered through tokens.

The spacing scale debate

Not every decision went smoothly. The spacing scale caused a three-day argument between two engineers who together represent half our entire company.

The debate: should we use a 4px base with a linear scale (4, 8, 12, 16, 20, 24...) or a 4px base with a geometric scale (4, 8, 16, 32, 64)?

Linear scales are predictable. The jump between steps is always the same. But they produce too many options in the middle range, and you end up with engineers choosing between 16px and 20px based on vibes rather than hierarchy.

Geometric scales enforce visual hierarchy through increasing jumps. But they leave gaps. The jump from 16 to 32 is often too large for the space between a label and its input field.

We settled on a hybrid: 4, 8, 12, 16, 24, 32, 48, 64. Eight values. The small end is linear (for fine-grained control in dense UI), the large end is closer to geometric (for structural spacing). We chose values by auditing our existing screens and finding the spacing values that actually occurred most frequently.

This matters because a spacing scale is a forcing function. Too many options and engineers make arbitrary choices. Too few and they override the system. Eight values turned out to be the right constraint for our product's density.

Typography: fewer decisions, faster shipping

Our type system follows the same philosophy: minimize decisions. We use one typeface (Inter), five sizes (12, 14, 16, 20, 24), two weights (regular and semibold), and one line-height ratio (1.5).

That's it. No tracking adjustments. No condensed variants. No display sizes. Every text element in the application uses one of ten combinations (5 sizes × 2 weights).

When a new engineer joins and asks "what font size should this label be?" the answer takes two seconds. If it's a secondary label, it's 12. If it's body text, it's 14. If it's a section header, it's 16. There are only five options.

Constraints like these feel limiting in the abstract. In practice, they eliminate an entire category of decision fatigue. Engineers spend time on product logic, not debating whether 13px or 14px looks better for helper text.

Measuring the investment

Eight weeks after the redesign, we tracked the impact across three metrics.

Feature velocity. Before the design system, our average time from design spec to deployed feature was 4.2 days. After: 2.8 days. A 33% reduction. The gains came almost entirely from eliminating the "style matching" phase where engineers would inspect existing screens to figure out correct values.

Visual consistency bugs. We tagged all bugs related to visual inconsistencies (wrong colors, misaligned spacing, inconsistent hover states). Before: 8-12 per sprint. After: 1-2 per sprint. An 80% reduction.

White-label turnaround. Our first white-label implementation post-design system took one engineer half a day. Under the old system, our estimate was two engineer-weeks.

The two-week investment returned its cost within the first month and continues to compound.

What we'd do differently

Three things, in hindsight.

First, we should have built the token compiler before the components. We started building components in parallel with the token system, which meant some early components used hardcoded values that we had to go back and replace. Sequential would have been faster despite feeling slower.

Second, we should have migrated old screens more aggressively. Our "migrate during normal feature work" strategy left some screens on the old system for weeks. Users noticed the inconsistency. A dedicated migration sprint of two or three days would have cleaned that up.

Third, we over-invested in motion tokens. We defined 12 animation curves and 8 duration values. In practice, we use three curves and two durations. The rest are dead tokens that add cognitive overhead when someone browses the system. We're planning to prune these.

Design quality as signal

There's a business argument for design systems that goes beyond engineering efficiency.

Enterprise buyers evaluate software partly on visual polish. They can't audit your codebase, so they audit your interface. Inconsistent spacing, mismatched colors, janky animations: these register as unreliability signals, even when the underlying product works perfectly.

Two of our Q1 deals included a "product walkthrough" stage where the buyer's team clicked through the app on a shared screen. In both cases, the feedback forms mentioned "clean, consistent interface" as a positive factor. Neither buyer could articulate what made it clean. They just felt it.

A design system doesn't just reduce engineering costs. It raises the visual baseline of every screen in the product. When your worst screen still looks intentional, buyers trust the product more. That trust converts.

For a startup competing against larger incumbents with dedicated design teams, a token-based system lets four engineers produce interface quality that looks like the output of forty. The tokens don't care how many people use them. They enforce consistency regardless of team size.

The meta-lesson

Design systems are rarely discussed as startup strategy. They're filed under "engineering best practices" or "design ops," categories that signal optional overhead.

For a four-person team shipping 27 features a month, the design system was not optional overhead. It was the infrastructure that made that velocity possible without the codebase degrading into entropy.

The analogy I keep coming back to: a design system at a small startup is like version control. You could theoretically ship without it. Some teams do, early on. But the moment you need to collaborate, iterate, or maintain anything, the absence of the system costs more than the system ever would.

We spent two weeks building ours. It will save us years.

Well is a business context layer. We write about the engineering and product decisions behind what we build.

Maxime Champoux

CEO & co-founder, Well

Maxime is the CEO and co-founder of Well. He built Well to rebuild finance around AI-native data, not spreadsheets.

Ready to automate your financial workflows?

Try Well free