How to Automate Invoice Collection Across 50+ Supplier Portals

Maxime Champoux9 min read

Basware's 2024 AP benchmark found that companies with 50+ suppliers spend an average of 14 hours per month just downloading invoices. Not processing them. Not reconciling. Downloading.

I know this because we lived it. Before building Well, our finance team maintained a spreadsheet with login credentials for 63 supplier portals. Every month, someone would spend two full days clicking through each one. OVH buries invoices three clicks deep in a billing panel that requires re-authentication every session. AWS spreads them across accounts and regions, with a console that changes layout quarterly. Google Cloud generates invoices per billing account, so if you run five projects, that is five separate downloads. And those are the good ones. They at least have consistent URLs.

The long tail is worse. Your office cleaning company emails a PDF. Your freelance designer sends a photo of a handwritten receipt. Your Italian logistics partner uses FatturaPA XML. Your coworking space has a portal built in 2009 that only works in Internet Explorer.

The math does not work. APIs exist for maybe 10% of supplier portals. The remaining 90% have no programmatic access. You cannot write a script for a portal that requires solving a CAPTCHA, navigating a JavaScript-heavy SPA, and downloading a PDF that is actually a print-rendered HTML page.

This is not a minor inefficiency. It is a structural bottleneck. AP teams spend their most expensive resource (human attention) on their lowest-value task (downloading files from websites). Every month, without fail, the same ritual.

Traditional automation fails here because each portal is different. Different auth flows. Different session management. Different file formats. Different download mechanisms. RPA tools can handle maybe 20 portals before the maintenance burden of keeping scripts updated outweighs the time saved.

There is a better approach now, and it combines three technologies that did not coexist until recently: LLM-controlled browser automation, OCR-to-structured-data pipelines, and Model Context Protocol connectors for the APIs that do exist.

The before state

A typical month for an AP team at a 50-supplier company:

  • Day 1-2: Log into each supplier portal individually. Navigate to the billing or invoice section. Download the latest invoice. Save it with a consistent naming convention. Repeat 50+ times.
  • Day 3: Chase missing invoices via email. Some suppliers only send invoices on request. Others send them to a shared inbox that nobody monitors.
  • Day 4-5: Manually enter invoice data into the accounting system. Cross-reference with purchase orders. Flag discrepancies. Total: 10-20 hours of pure download work, plus another 10-15 hours of data entry.

The error rate compounds. A missed invoice means a missed early payment discount (typically 2% net 10). Across 50 suppliers, that adds up to thousands per month in lost discounts. Late payments damage supplier relationships and, in some jurisdictions, trigger penalties.

The automation stack

Layer 1: API connectors for the 10%

For suppliers that offer proper APIs (Stripe, AWS, major SaaS platforms), use MCP (Model Context Protocol) connectors. These are structured integrations that authenticate via OAuth or API key, call the billing endpoint, and return invoice data in a predictable format.

At Well, we build and maintain MCP connectors for the most common B2B tools. When a connector exists, it is the fastest and most reliable path. You get structured data directly: invoice number, date, line items, tax amounts, totals. No parsing needed.

But connectors only cover the head of the distribution. The SaaS tools that millions of companies use. They do not cover your regional telecom provider, your specialty materials supplier, or the freelancer who invoices you through a WordPress plugin.

Layer 2: LLM-controlled browser for the 90%

This is where traditional automation breaks down and where the new approach diverges. Instead of writing brittle scripts for each portal (click this button, wait 3 seconds, click that link), you use a browser controlled by a large language model.

The concept: an LLM can see a webpage the same way a human does. It can read labels, understand form fields, navigate menus, and adapt when a UI changes. When OVH redesigns their billing page, a scripted bot breaks. An LLM-controlled browser reads the new layout and finds the invoice download button anyway.

We built this into a Chrome Extension called Vturfu (patent FR2500423). It works like this:

  • The extension opens the supplier portal in a real Chrome session
  • The LLM receives a screenshot or DOM snapshot of the page
  • It identifies the login form, enters credentials from a secure vault
  • It navigates to the billing section by reading the UI, not by following hardcoded selectors
  • It identifies and downloads all new invoices since the last collection
  • It handles edge cases: CAPTCHAs get escalated, MFA codes get requested, session timeouts get managed The key difference from RPA: maintenance is near zero. When a portal changes its UI, the LLM adapts. You do not need to update a script. The model reads the page and figures out the new path to the invoices.

Layer 3: OCR pipeline for format normalization

Invoices come in many formats:

  • PDF (sometimes searchable, sometimes scanned images)
  • XML (UBL, FatturaPA, ZUGFeRD)
  • HTML (some portals render invoices as web pages with no download option)
  • Email attachments (PDF, Word, or plain text)
  • Photos of paper invoices An OCR pipeline converts all of these into structured data. Modern OCR, powered by vision models, handles scanned invoices with varying quality, multi-language documents, tables with complex layouts, and even handwritten amounts (though accuracy drops there).

The output is always the same: a structured JSON object with invoice number, date, supplier details, line items, tax breakdown, and total. This feeds directly into your accounting system.

The after state

With the three-layer stack running:

  • Automated collection runs nightly. Every supplier portal gets visited, new invoices get downloaded, and data gets extracted. No human involvement.
  • AP team reviews exceptions only. Maybe 5% of invoices need human attention: a CAPTCHA that could not be solved, a portal that is down, or an OCR confidence score below threshold.
  • Data flows into accounting automatically. Structured invoice data gets pushed to your ERP or accounting tool via API. The 14 hours per month drops to about 1 hour of exception handling.

Early payment discounts get captured consistently because invoices arrive the day they are issued, not the day someone remembers to log in. Supplier relationships improve because payments happen on time. The AP team works on reconciliation and analysis instead of downloading files.

Implementation details that matter

Credential management. You need a secure vault for portal credentials. Do not store passwords in plaintext config files. Use a proper secrets manager. The browser automation layer should pull credentials at runtime and never persist them in browser storage.

Scheduling. Daily collection is better than monthly. Invoices arrive throughout the month, and earlier collection means earlier processing. Run collections overnight so any issues can be addressed during business hours.

Error handling. Build alerting for failed collections. A portal might change its login flow, require a new type of MFA, or go down for maintenance. You want to know within hours, not at month-end when someone notices a missing invoice.

Deduplication. When you run daily collections, you will encounter invoices you have already downloaded. Use invoice number + supplier ID as a composite key to avoid duplicates.

Audit trail. For compliance, log every collection: which portal, when, what was downloaded, what data was extracted. Screenshot the portal state at each step. This matters for audits and for debugging when something goes wrong.

What this does not solve

This stack handles collection and data extraction. It does not handle:

  • Three-way matching (invoice vs. PO vs. goods receipt). That is a separate system.
  • Approval workflows. Routing invoices for approval is a process layer, not a collection layer.
  • Payment execution. Actually paying suppliers is downstream. These are real problems, but they are different problems. Trying to solve everything at once is how AP automation projects fail. Start with collection. It is the highest-pain, lowest-risk starting point.

The economics

A mid-size company with 60 suppliers, assuming a blended cost of €50/hour for AP staff time:

  • Before: 14 hours/month × €50 = €700/month in labor, plus ~€2,000/month in missed early payment discounts = €2,700/month total cost
  • After: 1 hour/month × €50 = €50/month in labor, discounts captured = €50/month total cost
  • Net savings: ~€2,650/month, or €31,800/year For larger companies with 200+ suppliers, the numbers scale linearly. The automation cost does not.

These are the numbers we see across Well customers running this stack today. The variance comes from industry (construction companies have more paper invoices than SaaS companies) and geography (Italian e-invoicing mandates simplify the XML portion; French and German portals remain largely manual).

Getting started

If you want to build this yourself:

  • Audit your supplier list. Categorize each by: has API, has web portal, email-only, paper-only.
  • Start with API connectors for the top 10 suppliers by invoice volume.
  • Set up browser automation for the next 30. This is where the biggest time savings come from.
  • Add OCR for any format that is not machine-readable.
  • Build the data pipeline into your accounting system. If you do not want to build it, this is what Well does. Our MCP connectors handle the API layer, Vturfu handles the browser layer, and our OCR pipeline handles format normalization. The combination covers close to 100% of invoice sources.

The point is not which tool you use. The point is that manually downloading invoices from 50+ portals is a solved problem. The technology exists. The ROI is clear. The only question is how long you want your AP team to keep doing it by hand.

Maxime Champoux, CEO & co-founder, Well

Maxime Champoux

CEO & co-founder, Well

Maxime is the CEO and co-founder of Well. He built Well to rebuild finance around AI-native data, not spreadsheets.

LinkedIn

Ready to automate your financial workflows?