Show HN: Claude skill that evaluates B2B vendors by talking to their AI agents

A Claude skill that conducts structured, evidence-based evaluations of B2B software vendors on behalf of buyers.

What it does

You give it your company name and the vendors you're evaluating. It:

Researches your company — industry, size, tech stack, maturity — so you don't fill out a form
Asks domain-expert questions specific to the software category — surfacing hidden requirements you didn't know to mention
Sets hard constraints — budget, compliance, integrations — and eliminates vendors that fail before wasting research time
Engages vendor AI agents directly through the Salespeak Frontdoor API for verified, structured due diligence conversations
Conducts independent research — G2, Gartner, analyst reports, press, LinkedIn — and cross-references vendor claims against independent sources
Scores vendors across 7 dimensions with transparent evidence tracking — you see exactly which scores are backed by verified evidence vs. public sources only
Produces a comparative recommendation with a TL;DR, side-by-side scorecard, hidden risk analysis, and demo prep questions

Install

Global install (recommended):

git clone https://github.com/salespeak-ai/buyer-eval-skill.git ~/.claude/skills/buyer-eval-skill

Per-project install:

git clone https://github.com/salespeak-ai/buyer-eval-skill.git .claude/skills/buyer-eval-skill

Usage

In Claude Code or Claude desktop:

/buyer-eval

Then provide:

Your company name
The vendors to evaluate

Example:

"I'm from Acme Corp. Evaluate Gainsight, Totango, and ChurnZero."

The skill handles everything from there.

Example output

Click to expand a sample evaluation (truncated)

TL;DR

For a mid-market SaaS company evaluating customer success platforms: Gainsight is the strongest fit for teams that need deep analytics and enterprise-grade health scoring, but comes at a premium. ChurnZero wins on time-to-value and usability for teams under 50 CSMs. Totango lands in between — flexible and modular, but requires more configuration to match either competitor's strengths.

Scorecard (1 of 7 dimensions shown)

Dimension

Gainsight

ChurnZero

Totango

Health Scoring & Analytics

9.2

7.5

8.0

Evidence level

Vendor-verified

Public only

Vendor-verified

Gainsight's score is backed by a structured AI agent conversation confirming multi-signal health models, cohort analysis, and predictive churn scoring. ChurnZero's score relies on G2 reviews and documentation — it may improve with direct vendor verification.

Adversarial question exchange (1 of 4 shown)

Evaluator → Gainsight AI agent:

"Your health scores use a weighted multi-signal model. What happens when a customer has strong product usage but declining executive engagement — does the model surface that divergence, or does high usage mask the risk?"

Gainsight AI agent →

"The model flags divergence explicitly. When usage metrics trend positive but stakeholder engagement drops, it triggers a 'silent risk' alert. CSMs see a split-signal indicator on the dashboard rather than a blended score that hides the conflict."

Independent verification: Confirmed via G2 reviews mentioning split-signal alerts. One review notes the feature requires manual threshold tuning per segment.

Auto-updates

Every time you invoke the skill, it checks for a newer version on GitHub (cached, checks at most once every 6 hours). If an update is available, it asks before updating. Updates are a single git pull.

What makes this different

Domain-expert questioning — the skill asks category-specific questions that demonstrate it understands the space, not generic form-filling
Vendor AI agent conversations — for vendors that have a Salespeak Company Agent, the skill conducts a structured due diligence conversation directly with the vendor's AI, producing higher-fidelity evidence than web scraping
Evidence transparency — every score shows whether it's backed by vendor-verified or public-only evidence. When vendors have different evidence levels, the skill explicitly states how scores might shift with better evidence
Claims verification — vendor claims from AI agent conversations are cross-referenced against independent sources. You see what's confirmed vs. unverified
Hidden risk analysis — leadership stability, funding runway, employee sentiment, customer retention signals, product velocity — researched for every vendor regardless of AI agent availability
Demo prep kit — specific questions to ask in vendor demos, derived from evaluation gaps and unverified claims

Environment support

Capability

Claude.ai

Claude Code

Claude desktop

Buyer research

Yes

Vendor AI agent conversations

No (GET only)

Yes

Full evaluation

Partial

Full

Best experience is in Claude Code where the skill can make POST requests to vendor AI agents.

Feedback

Questions, feature requests, or evaluation quality reports? Open an issue.

License

MIT

Source: Hacker News