Don’t Lose the Plot: How to Prevent AI Usage from Becoming a Performance Metric

David Rice

on May 27, 2026

Explore why tracking AI usage as a performance metric can distort behavior, damage trust, and weaken real AI capability development across teams.

Key Takeaways

Tokenmaxxing Issue: Employees are manipulating AI token usage metrics rather than focusing on actual productivity.

Measurement Flaws: Current evaluation methods prioritize AI activity over meaningful outcomes, leading to misleading performance indicators.

Corruption Risks: Reliance on usage data compromises its reliability, making it difficult to assess true employee performance.

Prevention Strategies: Organizations should shift to outcome-based AI metrics, emphasizing quality over quantity in performance.

Leadership Communication: Effective leadership needs to prioritize learning outcomes in AI discussions to foster genuine capability development.

In early April 2026, a leaked internal dashboard from Meta revealed that the company had been ranking its roughly 85,000 employees by AI token consumption.

Tokens are the units of data AI models process, and someone at Meta decided that counting them was a useful way to track who was actually using AI. The top user had burned through 281 billion tokens in a single month. Total usage tracked on the dashboard exceeded 60 trillion tokens before it was taken down.

Shortly after, the Financial Times reported a parallel pattern at Amazon. Employees were automating unnecessary tasks specifically to inflate their AI usage numbers, manufacturing the appearance of engagement with the company's AI push.

The behavior has since acquired the name “tokenmaxxing”, borrowed from Gen Z slang for maximizing something, which in this case is the visible performance of AI use rather than the thing AI use is supposed to produce.

Two of the world's largest companies, the same dynamic, and a name that suggests it's spreading.

Tokenmaxxing is not as much a technology problem as it is a performance management and culture problem, and it has HR and operations fingerprints on it whether or not HR was in the room when the original decisions were made that are creating it.

Someone set the incentive. Someone built or tolerated the leaderboard. Someone decided that consumption was a reasonable proxy for capability. This article is about understanding how that happens and what to do before it does.

Why Measurement Goes Wrong

Organizations don't generally set out to build leaderboard cultures. Tokenmaxxing emerges in the space between what companies actually want AI to deliver and what they can currently measure.

A February 2026 Gallup survey of more than 23,000 U.S. employees found that half of employed adults use AI at least a few times a year, and that employees inside AI-adopting organizations report more disruption and staffing anxiety than those in organizations that haven't adopted yet.

Meanwhile, Gartner data shows that only one in 50 AI investments delivers transformational value, and only one in five delivers any measurable ROI. McKinsey's State of AI 2025 found that most organizations are still in the early stages of capturing enterprise-level value despite rapid adoption.

What's separating higher-value outcomes from lower ones, per McKinsey, is workflow redesign and governance, not raw usage volume.

But boards and CEOs don't feel that nuance. They feel urgency and that urgency cascades into demands for visible proof of adoption, which cascade into middle managers looking for something to show. When you cannot measure whether AI is actually improving outcomes, you measure the next closest thing, which is activity. Enter the dashboard.

Yakov Filippenko, CEO of the professional networking platform Intch, draws the parallel to return-to-office mandates. Companies pushed employees back into offices partly because shareholders owned expensive empty buildings, he argues.

Now organizations burn tokens to demonstrate to investors that they aren't missing the AI revolution. In both cases, a legitimate underlying pressure — shareholder visibility, competitive positioning — produced a measurement that served optics rather than operations.

Meta CTO Andrew Bosworth made this logic explicit, publicly describing his best engineer as spending the equivalent of his salary in AI tokens and being "5x to 10x more productive" as a result. "It's like, this is easy money. Keep doing it. No limit."

That framing of token spend as productivity signal coming from a senior technology leader at one of the world's most influential companies is not an isolated opinion. It reflects a broader failure to distinguish between correlation and causation. The high performer happens to use a lot of tokens, so tokens start to look like the cause of high performance rather than a byproduct of it.

The error compounds quickly. Managers who see that framing promoted from the top don't need a formal leaderboard to internalize the message. The behavior follows the signal, even when no one intended the signal to be sent.

What It Costs

Usage dashboards were already an unreliable measure of AI productivity before anyone started gaming them.

Yasser Drif, founder of Roman AI, shared data drawn from his customer base suggesting the gap between activity and output is wider than most organizations assume. Only about 24% of human-initiated AI runs produce a useful artifact. The rest is iteration and work-in-progress that volume metrics inflate into the appearance of productivity. That's the baseline. Tokenmaxxing makes it worse.

The first cost is signal corruption. Once usage becomes the metric, usage data becomes untrustworthy. You can no longer look at who's consuming the most AI and draw any reliable conclusions about who's performing, learning, or adding value.

The data has been gamed, and you won't always know by whom. Any workforce analytics built on top of it inherits that contamination.

The second cost runs deeper. When employees optimize for the appearance of AI fluency rather than developing it, you have reversed the learning conditions you actually need. Genuine AI capability development requires experimentation, including failed experiments, and honest reporting about what worked.

A tokenmaxxing culture creates pressure to perform competence publicly while making honest reporting feel risky. Those are incompatible conditions.

The third cost is what happens to the employees who aren't gaming anything. In an environment where token consumption has become visible and valorized, the person doing careful, well-scoped AI work that produces strong outcomes but doesn't generate headline numbers starts to feel behind.

That feeling is not corrected by reassurance. The fastest way to make genuine AI capability development feel unsafe is to let people who aren't inflating their numbers believe they're losing a race they don't understand.

Amazon, to its credit, responded to the problem it created by restricting team-wide visibility of usage statistics so only individuals and their direct managers could see them. That's a corrective measure. Prevention requires a different kind of action earlier in the process.

How to Prevent It

The practical question for a CHRO or COO is where to start. "Measure outcomes, not inputs" is true but doesn't tell anyone what to change on Monday morning.

Start with an audit of what's currently being tracked and who sees it. Filippenko flags any AI adoption KPI tied directly to compensation as an immediate red flag.

That’s the equivalent of measuring a lawyer’s performance by the number of pages they print, or a developer’s efficiency by lines of code.

Yakov Filippenko

Founder and CEO of Intch

Yasser Drif, whose company Roman AI builds AI tools deployed across organizations, offers a practical starting point.

Pull 30 days of usage data, sort by spend, and ask what actually shipped. Can the person show you an artifact?

Yasser Drif

Founder of Roman AI

The red flags his team watches for include high-spend users with no deliverable attached, run inflation where a task accumulates ten or more AI interactions with no status change, and a single AI champion consuming a disproportionate share of usage credits while producing little finished work. If your dashboards can't answer the "what shipped" question, that's the first problem to fix.

If your organization has AI usage dashboards with cross-employee visibility, or if usage volume appears anywhere in a performance review, goal-setting cycle, or manager reporting process, those are the places where tokenmaxxing risk is live. The question isn't whether the data is collected — it's whether it's visible in ways that create competitive pressure around volume rather than quality.

Outcome-based AI measurement looks different depending on the function. In a sales context, it might track whether AI-assisted outreach improved conversion rates, not how many prompts a rep sent. In operations, it might track cycle time reduction or error rate on AI-assisted processes, not system usage logs.

The design principle is the same across all of them. Measure what changed downstream of AI use, not the use itself. This requires function leaders and HR to collaborate on what "better" actually looks like before measurement frameworks are built, not after.

What message are you sending?

Leadership communication needs recalibration alongside the metrics. If the only messages employees hear about AI are about adoption rates and usage milestones, the implicit signal is that consumption is what's being watched.

Leaders who want genuine AI capability development need to talk about specific outcomes, where AI changed how a team works, what a particular experiment taught the organization, where a deployment failed and what that revealed. This kind of storytelling signals that the organization values learning, not performance of learning.

Correcting course in a team where tokenmaxxing is already present requires care. Employees who gamed a metric were usually responding to a signal leadership created. Treating it as an individual conduct problem misses that and creates exactly the conditions that make honest reporting feel unsafe.

When your employees are willing to do pointless work just to check a box, that is not a metric problem," Filippenko says. That is operational rot.

Fixing it means leadership acknowledging the signal it sent before asking people to change their behavior.

The more productive frame for managers is to reset expectations about what the organization actually cares about and then give people a path to demonstrate real capability. That means visible examples of outcome-based success being recognized, not just the removal of the old metric.

Drif's suggested intervention is blunt and worth considering: rename the metric publicly, from "AI usage" to "work completed with AI." The rename itself sends the signal to people what the organization actually values without requiring anyone to be singled out for the behavior the old metric created.

Navigating Ambiguity

The deeper discipline here is patience with ambiguity. Most organizations are not yet in a position to measure AI's contribution to business outcomes with precision, and the pressure to show something in the meantime is real. Dashboards and leaderboards fill that vacuum because they produce numbers, and numbers feel like accountability.

The problem is that the accountability they produce is for the wrong thing. Building measurement systems that hold people accountable for outcomes takes longer and requires more cross-functional design work and it is the only approach that doesn't corrupt the data you'll eventually need to make better decisions.

Tokenmaxxing is an early warning, not an endpoint. Treat it as a culture signal worth taking seriously now, and you’ll have much cleaner performance data, and much healthier AI adoption, down the road.

Don’t Lose the Plot: How to Prevent AI Usage from Becoming a Performance Metric

Create a Free Account to Keep Reading—and Keep Leading Smarter

Why Measurement Goes Wrong

What It Costs

How to Prevent It

More Articles

What message are you sending?

Navigating Ambiguity

Create a Free Account to Keep Reading—and Keep Leading Smarter

Why Measurement Goes Wrong

What It Costs

How to Prevent It

More Articles

What message are you sending?

Navigating Ambiguity

Your AI Won’t Show Up for the Deposition

The Water Is Getting Warm. Your Best People Can Feel It.

Somebody Has to Manage the Bots