Methodology · Rubric v2

How we score.

Most AI tool directories rank tools by superficial signals: vendor marketing copy, paid placement, or raw G2 ratings. None of them are honest about whether a tool is actually a good fit for a solo operator, an early-stage startup, or a small business with a real budget and a real job to do.

Magpie AI scores every tool against six dimensions. The result is a 1.0–5.0 overall score and a recommendation badge that tells you, plainly, whether this tool is worth your time and money.

The six dimensions

SME Fit

25%

Pricing accessibility, total cost of ownership at scale, buyer-journey friction.

We dock tools that hide pricing, force enterprise sales, or stack a 10× cliff between tiers. We reward genuine free tiers and predictable per-seat pricing under $30/seat.

Job-to-be-Done Clarity

15%

How specifically does this tool solve a named, measurable job?

Tools that claim to “do everything with AI” rarely do anything well. We reward focused tools that name a job and a measurable outcome.

Integration & Agent-Readiness

25%

Public API, MCP server, native integrations with the everyday SaaS stack.

Tied with SME Fit for the highest weight. A tool that can’t be used by an agent today is a tool that won’t be used tomorrow — by anyone, solo operator or scaling startup.

Trust, Stability & Lock-in

15%

Company age, data portability, contract terms.

Founded in 2024 with no data export and a 3-year contract minimum? We mark it down. Three-year-old, profitable, with one-click CSV export? We mark it up.

Momentum & Quality Signals

15%

Rating velocity, community engagement, docs quality, changelog cadence.

A 4.5 climbing toward 4.7 is a healthier signal than a 4.7 falling to 4.5. We try to surface direction, not just absolute numbers.

Compliance & Data Practices

SOC 2, GDPR, data residency, DPA availability.

Default weight is low because most solo operators and small teams don’t need SOC 2 Type II. For finance / legal / HR / health categories, this dimension scales up to 20–25%.

Default weights apply to most categories. Finance / legal / HR / health categories scale up Compliance to 20–25% at the expense of JTBD.

Auto-disqualifiers

Any one of these strips the tool's recommendation badge entirely — the low overall score becomes the only user-facing signal. Auto-DQs are absolutes, not weighted signals.

Buyer journey is demo-only or contact-sales-only
No data export option
No public roadmap or changelog in 12+ months
Minimum seats >10 or minimum spend >$500/mo
Dead/abandoned (no updates in 12+ months)
Sketchy data practices (no privacy policy, or vendor claims customer data ownership)
Not strictly AI-native (legacy tool with an AI label slapped on)

Recommendation badges

Best in classOverall ≥ 4.5 AND no dimension below 4

RecommendedOverall ≥ 4.0 AND no dimension below 3

Consider with caveatsOverall 3.5–4.0, or one dimension below 3

The minimum-dimension floor matters: a tool scoring 5/5/5/5/5/1 averages 4.3, but the 1 in any dimension is a red flag worth surfacing. The badge logic catches that and flags it as Consider with caveats rather than letting the high average hide the gap.

Read the full spec

The complete rubric — formulas, worked examples, dimension inputs by sheet column — lives in docs/rubric.md on GitHub. The validation agent reads the same document; it's the single source of truth.

Found a tool that should be in the directory? Submit it.