Building Chat V2: Web Search, Agent Mode, and Conversational AI Architecture

Maxime ChampouxMarch 7, 202610 min read

72% of small business owners say they would use AI if it could actually do things in their software, not just answer questions about it. We found that number in a Salesforce SMB survey from late 2025, and it matched what we were hearing from our own users. They liked chatting with Well. They wanted it to do more.

Chat V1 shipped in early 2025. It could answer questions about your finances, explain line items, and summarize trends. Users liked it. But the feedback was consistent: "Why can't I just tell it to send the invoice?"

That question shaped everything about Chat V2. The answer required rethinking what a chat interface should be inside a business tool. Not a chatbot bolted onto a dashboard. Not a search bar with personality. A primary interface that reads your data and acts on it.

What V1 Got Wrong

V1 treated chat as a read layer. You could ask about revenue, compare months, get explanations. The AI had access to your financial data and could reason about it. That was already more than most accounting tools offered.

But it created a strange dynamic. Users would have a conversation, understand a problem, decide on an action, and then leave the chat to go click through menus to execute it. The insight lived in the conversation. The action lived somewhere else.

We also lost context between sessions. Every conversation started fresh. A user might spend ten minutes explaining their business to the AI on Monday, then have to repeat it all on Wednesday. The AI felt helpful in the moment but forgetful across time.

Two problems, then. Chat that could think but not act. And an AI with no memory.

The Dual-Mode Decision

We considered several architectures for V2. The simplest option was to give the existing chat more capabilities and let the model figure out when to read versus when to act. Most AI products take this approach. One input box, one mode, the model decides.

We rejected it. The reason is trust.

When a freelancer asks "what did I invoice last quarter," they expect a lookup. When they say "send a payment reminder to Acme Corp," they expect an action with consequences. These are different mental models. Collapsing them into one mode means the user is never quite sure what the AI might do. Will asking about an invoice accidentally trigger something? Will requesting an action just return information?

So we split the interface explicitly. Ask mode is read-only. It queries your data, compares numbers, explains transactions, and searches the web for context. It cannot modify anything. Agent mode can take actions. It creates invoices, sends reminders, updates records, and triggers workflows. The switch between modes is visible and intentional.

This is a UX bet. We are trading the simplicity of a single input for the clarity of explicit intent. Early usage data supports it. Users in Ask mode ask longer, more exploratory questions. Users in Agent mode give shorter, more directive commands. The modes shape the conversation differently, which is exactly what we wanted.

Ask Mode: Read, Compare, Research

Ask mode connects to your Well data and to the web. That combination matters.

A typical Ask mode interaction might start with: "How does my gross margin compare to other design agencies?" The AI pulls your margin from Well, searches the web for industry benchmarks, and returns a comparison. It cites the sources. You can drill down.

Or: "Show me all invoices from Q4 that took longer than 30 days to get paid." That is a pure data query. No web search needed. The AI constructs the filter, pulls the results, and presents them.

The web search augmentation was technically straightforward but product-significant. Before it, the AI could only tell you about your numbers. With it, the AI can tell you what your numbers mean. A 42% margin is just a number until you know the industry average is 35%. Then it becomes a story.

We built web search as a tool the AI can call when the question requires external context. It does not search on every query. If you ask "what was my revenue in January," there is no reason to hit the web. The model decides when external data would improve the answer. In practice, about 15-20% of Ask mode queries trigger a web search.

Agent Mode: Intent, Confirmation, Execution

Agent mode is where V2 diverges from every AI chat product we have studied. The AI can take real actions in your account.

The interaction pattern follows three steps. First, the user states an intent: "Create an invoice for Acme Corp, 40 hours of design work at €95/hour." Second, the AI constructs the action and presents a preview. It shows the invoice with line items, totals, and the recipient. Third, the user confirms or edits before execution.

That confirmation step is non-negotiable. The AI will not execute financial actions without explicit approval. This is where trust gets built or broken. We tested a version without confirmation during internal dogfooding. Even team members who built the system felt uneasy when the AI just did things without asking.

Agent mode currently supports invoice creation, payment reminders, expense categorization, client communication, and a growing set of workflow triggers. Each new action goes through a security review before shipping. The AI cannot access actions that have not been explicitly enabled for its toolset.

A real interaction from our beta: a user in Agent mode said, "Send payment reminders to everyone who's more than 14 days overdue." The AI queried overdue invoices, found three, generated personalized reminder emails for each, and presented them for review. The user edited one, approved all three, and they sent. What would have been 15 minutes of clicking through an invoicing UI became a 30-second conversation.

Memory: The AI That Remembers

Conversation persistence was the second major problem from V1. Solving it required building what we call Chat Memory, a structured recall system with four types of persistent facts.

Preferences capture how you like things done. If you tell the AI "I always want invoices in EUR," it remembers. You do not repeat it.

Terminology maps your language to your data. If you call a specific client "the Berlin project," the AI learns that mapping. Next time you mention the Berlin project, it knows which client you mean.

Context stores background facts about your business. "We offer a 10% early payment discount" or "our fiscal year starts in April." These facts inform how the AI interprets questions and constructs actions.

Instructions are explicit rules you set. "Always flag invoices over €5,000 for my review" or "never auto-categorize meals as business expenses." Instructions act as guardrails that the AI follows across every session.

Together, these four types let the AI build an understanding of your business that deepens over time. The first conversation is generic. By the twentieth, the AI knows your clients, your preferences, your fiscal calendar, and your rules. It starts to feel less like a tool and more like a colleague who has been paying attention.

We store memory per workspace, not per user. This means a team shares context. When one partner tells the AI about a new client, the other partner's AI knows too. This was a deliberate decision. Business context is shared context.

Tasks and Rules: Closing the Loop

Memory alone is passive. It makes the AI smarter but not proactive. Tasks and Rules add the active layer.

Tasks come from two sources. Users can create them directly: "Remind me to follow up with Acme Corp next Tuesday." But the AI also suggests tasks based on conversation patterns. If you discuss an overdue payment three times without resolving it, the AI might suggest creating a follow-up task. You can accept or dismiss the suggestion.

Rules are user-defined business logic that the AI enforces. "If any single expense exceeds €1,000, flag it" or "categorize all Stripe payouts as platform revenue." Rules run continuously. They are not triggered by conversation but by data changes. When a new transaction matches a rule, the AI applies it.

The combination of Memory, Tasks, and Rules means Chat V2 is not just a conversation interface. It is a persistent business assistant that learns your preferences, tracks your obligations, and enforces your policies. The chat is the control surface for all of it.

Why Chat as the Primary Interface

A fair question: is this actually different from what other products are doing? Every SaaS tool is adding a chat sidebar. Copilots are everywhere.

The difference is commitment. Most products add chat as a feature alongside their existing UI. The buttons, menus, and dashboards remain primary. Chat is supplementary. An assistant that helps you use the real interface.

We are going the other direction. Chat is the primary interface. The traditional UI still exists and will continue to improve, but we are building for a future where most users interact with Well primarily through conversation. Ask mode replaces report-building. Agent mode replaces form-filling. Memory replaces preference panels. Rules replace manual filters.

This is a conviction, not a hedge. If conversational AI is good enough to run financial operations, then the right architecture is to build around it. If it is not good enough, adding a chat sidebar will not save you anyway.

The early signals are encouraging. V2 beta users complete tasks 40% faster through Agent mode than through the traditional UI. Ask mode users explore their data more broadly, asking questions they never would have thought to click through dashboards to answer. And retention among Chat V2 users is higher than our baseline.

The Technical Bet

Building Chat V2 required decisions about model architecture, tool design, and safety that deserve their own post. But two choices are worth noting here.

First, we built a tool-use framework that is action-specific rather than general-purpose. The AI does not have a generic "do anything" capability. Each action (create invoice, send reminder, categorize expense) is a discrete tool with defined inputs, outputs, and permissions. This limits flexibility but dramatically improves reliability. The AI cannot hallucinate an action that does not exist in its toolset.

Second, we made conversation history searchable and referenceable. Past conversations are not just stored. They are indexed. The AI can search its own history to find relevant context. If you discussed pricing strategy three months ago, the AI can surface those insights when you ask about pricing today. This turns conversation history from a log into a knowledge base.

What Comes Next

Chat V2 shipped to beta in February 2026. The roadmap from here focuses on three areas. More actions in Agent mode, starting with quote generation and recurring invoice automation. Deeper memory, including the ability for the AI to proactively surface relevant past context without being asked. And multi-step workflows, where a single conversation can chain multiple actions together.

The underlying thesis has not changed since we started building Well: financial operations for small businesses should not require expertise in financial software. Chat V2 is the most direct expression of that thesis. You tell the AI what you need. It either answers or does it. The interface gets out of the way.

We are betting that the future of business software looks more like a conversation than a dashboard. Chat V2 is how we are building toward that future, one interaction at a time.

Maxime Champoux

CEO & co-founder, Well

Maxime is the CEO and co-founder of Well. He built Well to rebuild finance around AI-native data, not spreadsheets.

Ready to automate your financial workflows?

Try Well free