Skip to main content

In the pursuit of data-driven decision making, organizations often find themselves caught in a paradox: the very metrics designed to guide progress can sometimes lead them astray. This becomes particularly evident when Key Performance Indicators (KPIs) begin to diverge from what customers are actually saying and doing.

KPIs vs. Market Feedback as the Basis for Executive Decisions

Jeff Bezos once astutely observed: “When the data and the anecdotes disagree, the anecdotes are usually right. It’s usually not that the data is being mis-collected. It’s usually that you’re not measuring the right thing.” This insight cuts to the heart of a fundamental challenge in business: metrics can tell you what’s happening, but they often fail to tell you why.

Consider the cautionary tale of Wells Fargo, where an overemphasis on cross-selling metrics led to a catastrophic breach of customer trust, with employees opening millions of fraudulent accounts. The metric (cross-selling) was intended to represent deeper customer relationships, but when it became the goal rather than the indicator, it produced precisely the opposite effect.

This pattern recurs across industries. A former FAANG developer described their experience on a team that measured everything yet accomplished little of consequence: “Leadership was obsessed with metrics, we measured everything.” Contrast this with their subsequent experience at a smaller team where success was defined simply: “When the line [revenue] goes up, I’m happy.”

The distinction is telling: focusing on outcomes rather than the mechanisms intended to track them.

Case in Point: The Popularity of Claude for Coding

The AI landscape offers a compelling example of this principle in action. Despite not always scoring highest on all technical benchmarks, Anthropic’s Claude has become the preferred coding assistant for many developers. A recent survey of 1,370 developers who use AI daily found that a striking 78% preferred Claude 3.5 over competing alternatives, with only 13% choosing OpenAI’s models and 9% opting for others like Gemini.

As Anthropic noted in their Claude 3.7 release: “…in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs.”

This phenomenon, dubbed “The Claude Effect,” demonstrates what happens when a product’s real-world utility diverges from its benchmark performance. While benchmarks serve an important purpose, they cannot fully capture the intangible qualities that make a tool valuable in practice—the “vibe” that resonates with users.

As one Hacker News commenter noted: “Claude is the best example of benchmarks not being reflective of reality. All the AI labs are so focused on improving benchmark scores but when it comes to providing actual utility Claude has been the winner for quite some time.”

KPIs are Excellent as Sneak Peeks for What’s Coming

This isn’t to suggest that metrics lack value. Well-designed KPIs can provide early signals and valuable insights. They can help organizations identify trends before they become obvious and pinpoint areas that need attention before they become problems.

However, their true power emerges when they’re treated as indicators rather than objectives—as means rather than ends. When metrics become the goal themselves, they create perverse incentives that can damage the very outcomes they were designed to improve.

Consider the practice in some large tech companies of launching half-baked features to claim user acquisition metrics for promotion packets, regardless of whether users actually find value in these features. The metric (user acquisition) becomes divorced from the actual goal (providing user value).

Trust Your Customers’ Feedback (and Wallets) Most

The solution to this dilemma lies in what might be called “Feels over Figures”—recognizing that qualitative feedback often provides insights that quantitative metrics miss. This means:

  1. Conducting regular, in-depth conversations with customers rather than relying solely on survey responses
  2. Prioritizing direct revenue and retention metrics over proxy measurements
  3. Creating space for intuition and judgment in decision-making
  4. Investing in human spot checks and qualitative feedback rather than reducing everything to numbers

OpenAI’s release of GPT-4.5 illustrates this approach. Despite potentially lower performance on some technical benchmarks, the company emphasized the model’s “EQ” or emotional intelligence—recognizing that user experience involves factors that aren’t easily quantified.

As another Hacker News commenter described it, this represents the “gameday mindset”: “Yes data is good. But I think the industry would very often be better off trusting guts and not needing a big, huge expensive UX study or benchmark to prove what you can plainly see.”

In the end, while metrics provide valuable guideposts, they cannot replace the fundamental question: Are we creating something people actually want and will pay for? When metrics and market feedback diverge, wise leaders follow the money and the voices of their customers, recognizing that sometimes, the most valuable insights come from what can’t be measured.

***

JLytics’ mission is to empower CEOs, founders and business executives to leverage the power of data in their everyday lives so that they can focus on what they do best: lead.

Start the Conversation

Interested in exploring a relationship with a data partner dedicated to supporting executive decision-making? Start the conversation today with JLytics.