B2B SaaS10-week build + 6-week handoff16 weeks

Cut support triage from 120 hrs/week to 8 hrs at a Series B SaaS platform

Client: Series B SaaS (Horizontal Workflow, ~$18M ARR)

A Series B workflow platform was drowning in tier-1 support volume. We built a triage and response-draft system that deflected 71% of tickets and cut human review time from three full-time equivalents to one analyst working half-time.

ticket deflection rate: 71%
median time-to-first-response: 32s
annualized payroll savings: $420k
CSAT delta (prior 6 months vs. post): +18pts

Challenge

The client's support team had tripled to 14 agents in 18 months but ticket volume was still outpacing headcount. Their CEO had banned further support hiring until someone could explain why every dollar of ARR growth required 14 cents of support cost. An internal attempt to bolt a general-purpose chatbot onto their Zendesk had produced public brand damage — customers were getting confidently wrong answers about their own billing. The team had decided AI was 'not ready' and was planning to hire another six agents when they called us.

The real problem was narrower than 'build a support bot.' 62% of tickets fell into four categories (billing questions, SSO resets, seat management, and integration 401s), all of which had deterministic answers in the product's own documentation. But the existing triage layer couldn't distinguish those four from the remaining 38% that genuinely needed a human. Everything was getting human attention because nothing was being classified.

Approach

We scoped the engagement around classification, not generation. The first four weeks built a labeled dataset from 18 months of closed tickets and an evaluation harness that measured both classification accuracy and — critically — the false-confident-answer rate, because that was the pattern that had burned them before.

Once the eval suite was passing at 94% classification accuracy and under 0.3% false-confident responses, we built the response layer. Tier-1 tickets got a drafted reply with citations to the knowledge base; tier-2+ tickets got a structured summary and priority score handed to a human agent. Anything the model was not confident about — confidence threshold tuned against the eval set — went straight to a human with no draft, because a missing draft is cheaper than a wrong one.

Weeks 10-16 were handoff: we moved the eval harness, the prompt repo, and the deployment pipeline into their engineering team's own infrastructure and trained two of their engineers to own it. When we left, the client could modify and redeploy the system without calling us.

Outcome

By week 14, the system was handling 71% of inbound tickets end-to-end, with a 32-second median time-to-first-response on auto-resolved tickets and a 4-minute median on human-assisted ones (down from 2 hours 40 minutes). Manual triage dropped from 120 hours per week to 8. CSAT for auto-resolved tickets landed 18 points above the prior 6-month baseline — primarily because customers were getting correct answers in under a minute instead of correct answers in three hours. The client redeployed four support agents into a customer-success motion that now drives $1.1M in annualized expansion revenue. The engineering team has shipped four material updates to the system since we left without our involvement.

Stack

Claude Sonnet + Haiku routing
pgvector on Supabase
Temporal for orchestration
Langfuse for eval & observability

Working on something similar?

A partner will respond personally within one business day. If there isn't a fit, we'll tell you that, and point you somewhere better.

Start a conversation More case studies