TOOL

AI diligence cheat sheet.

For operating partners assessing portfolio AI capability — fast, honest, repeatable.

60% of portfolio companies are experimenting with AI. Only 5% have scaled to production. This checklist separates real capability from pitch-deck narrative — in under 30 minutes.

The cheat sheet.

12 questions organized in four categories. Every question has a passing answer and a red-flag answer. Work through these in the first 30 minutes of any portco AI walk-through.

Engineering reality check

  • Is there a named engineer who owns the AI system in production — not the demo, the live system?

  • Can the team show an audit trail of what the model did and when, for at least the last 30 days?

  • Has the system been updated after a model provider release? How long did it take?

Eval and cost discipline

  • Does the team have an eval harness — a systematic quality measurement — built against real production inputs, not imagined ones?

  • Can they show the eval results? Do they have a regression baseline?

  • Is there a cost model for inference at 2x and 5x current volume? Do the numbers still work?

  • What is the latency at the 95th percentile (the worst 5% of requests)? Is that acceptable to users?

Team capability

  • Has anyone on the team shipped an AI system to production before — not a notebook, a live system with real users?

  • Does the team distinguish between the model and the harness? Or do they talk about them as the same thing?

  • When you ask about failure modes, do they have a catalog — or do they say the system works well?

Roadmap credibility

  • Is the AI roadmap ordered by validated use cases with a measurable line of sight to revenue or cost reduction — or by what sounds impressive?

  • If the core model provider raised prices by 3x tomorrow, what would change? Do they have an answer?

“What a target company claims about their AI capabilities and what they actually possess can be dramatically different.”

What real capability looks like.

Four telling indicators from portcos that have actually shipped. Not what they say — what they can show.

They have a production eval harness

Measured against labeled data from real production logs — not a vibes-based merge process. They can show you the dashboard, the regression baseline, and the last time it caught something before users noticed.

They've been through a model provider update

And handled it in under a week. This tells you the harness is decoupled from the model. If it took them two sprints to recover from a model update, the architecture has a dependency problem that will compound.

The cost model is already built

They know their inference cost per query, per user, and what happens to the P&L at 10x volume. If they have to think about it, it hasn't been modeled. If they can't answer it, they haven't shipped a production system — they've shipped a prototype.

There's a named production owner

Not 'the team.' A specific person who gets paged when the system misbehaves. Operating partners at Vista and Apollo have found this is the single best proxy for whether AI is actually managed vs. deployed and forgotten.

What pitch-deck capability looks like.

Four antipatterns that appear frequently in portco AI presentations and reliably indicate low production maturity.

Every AI feature uses RAG, regardless of fit

Retrieval-augmented generation is a pattern, not a solution. Teams that apply it to every problem haven't modeled when long context or fine-tuning would be cheaper and more reliable. It signals framework-first thinking, not problem-first thinking.

The demo is the proof

"It worked in the demo" is not evidence of production capability. If the team can't show you eval results, production logs, or a cost model, the demo is the product. That is not a product.

Accuracy described in vague percentages

'90% accurate' without a definition of the evaluation set, the task, or the baseline tells you nothing. Ask what the accuracy was on the inputs that actually fail — the ambiguous, the out-of-distribution, the angry-user edge cases.

No failure-mode catalog

When you ask 'what happens when it gets it wrong,' and the answer is 'it almost never does,' you're looking at a team that hasn't operated the system under real load. Every production AI system has known failure modes. The question is whether they've been found and handled.

“Any AI bet as a portfolio value driver must be grounded in validated use cases with a measurable line of sight to EBITDA in six months or less.”

The 48-hour diligence engagement.

Operating partners don't have eight weeks to assess portco AI maturity. They need a clear picture in days — which portcos are ready, which are running pilots with no production path, and which are at risk of a quiet system degradation nobody is watching.

I run this assessment across portcos in 48 hours per company. The output is not a 100-day plan document. It is a ranked list: which portcos have validated use cases, which have production-ready infrastructure, which have the team capability to execute, and where the VCP should allocate AI investment to move the IRR needle.

The engagement is designed to be repeatable across a portfolio — not customized from scratch for each company. The same 12 questions, the same scoring rubric, the same output format. That consistency is what gives operating partners the portfolio-leverage view that individual engagements can't produce.

What you get at the end.

  • A ranked assessment of AI maturity across the portcos you want assessed

  • Specific gaps identified per company — not a generic framework

  • A prioritized investment recommendation: where to put operating resources for MOIC impact

  • A repeatable template you can run on future portcos without re-engaging

Book a portfolio diligence call.

Tell me which portcos you want assessed and the timeline. I'll confirm availability and scope within 48 hours.

Book a diligence call →