Choosing a Search Firm

Compensation Intelligence

Board & Governance

Succession Strategy

AI Leadership Trends

Talent & Workforce Trends

AI Leadership Appointments

Compensation Changes

Big Tech Succession

CHRO & CPO Appointments

CEO Transitions

Board Members and Governance Committees

Operating Partners at private equity and venture capital firms

CHROs and Chief People Officers

HR leaders responsible for executive hiring

CEOs and Founders

What CEOs Still Struggle to Answer About Their AI Investments

May 6, 2026

Enterprise AI spending is accelerating with no sign of reversal. According to Gartner, global enterprise AI investment reached $644 billion in 2025. NVIDIA's 2026 State of AI survey found that 86% of organisations plan to increase their AI budgets this year, with nearly 40% planning increases of 10% or more. The companies committing that capital include some of the largest and most analytically sophisticated in the world.

‍

Most of them still cannot tell whether it is working.

‍

A 2026 survey of 100 senior enterprise AI leaders from ModelOp found that more than two-thirds of enterprises still rely on estimates rather than measured financial results to assess return on investment. They track time saved, project cost reductions, and usage volumes. They are not tracking what those inputs translate to in financial terms. The gap between AI activity and measurable return on investment has a name inside the industry: the AI value illusion.

‍

What Companies Are Actually Measuring

‍

The monitoring infrastructure being built around AI use inside large companies is detailed and in some cases intrusive. With tools from Microsoft, corporate customers can track active users, prompt volume, and agent activity across entire organisations over time. At companies like Globant, token consumption, usage patterns, and cost are tracked across teams and projects in real time, with project leaders having access to usage data broken down by individual team member. Token costs are treated as a standard line item in budget and return on investment calculations, sitting alongside labor and infrastructure in the cost of goods.

‍

The result is that enterprises have comprehensive visibility into how much AI is being used and what it costs. They have considerably less clarity on whether it is producing better outcomes.

‍

McKinsey research shows that 64% of companies say AI is driving innovation, but only 39% report a measurable impact on earnings. PwC data shows that 66% of organisations report measurable productivity improvements from AI agents, but only 62% expect ROI exceeding 100%, and the gap between efficiency gains and P&L impact remains wide for the majority. Deloitte's 2026 State of AI in the Enterprise report found that while 54% of organisations expect to move 40% or more of their AI experiments into production within three to six months, only 25% have actually reached that milestone.

‍

The measurement problem is not primarily technical. The tools to track AI usage exist and are being widely deployed. The challenge is attribution: isolating AI as the cause of a productivity or revenue improvement when dozens of other variables are changing simultaneously in a large organisation. As Sameer Gupta, Americas financial services AI leader at EY, has noted, leaders can see where AI is being used and where productivity appears to improve, but isolating AI as the primary driver is hard in practice.

‍

Tokenmaxxing and the Proxy Problem

‍

The combination of detailed usage tracking and performance incentives tied to AI adoption is producing a specific behavioral response in parts of the enterprise workforce. Inside some organisations, internal systems now rank employees on leaderboards by their AI usage volume. That visibility is generating what practitioners call tokenmaxxing: employees increasing their AI usage volume to signal productivity, regardless of whether that usage is producing better outputs.

‍

The dynamic is a predictable consequence of measuring the wrong thing. Tokens processed is a cost and activity metric. It says nothing about the quality of the work those tokens produced. Research from MIT Sloan flags a related risk: outsourcing cognitive tasks to AI tools without adequate oversight risks eroding the underlying skills workers depend on, in the same way that reliance on calculators degrades mental arithmetic. If performance reviews reward AI usage and employees respond by using AI more extensively rather than more effectively, organisations may find they are measuring the appearance of adoption rather than its substance.

‍

Gartner research shows that only one in five AI investments delivers measurable return on investment, which raises a specific governance question: when performance evaluations are tied to AI use and the majority of AI investments are not generating measurable returns, what signal are those evaluations actually capturing?

‍

The Attribution Gap

‍

The core problem facing enterprise leaders trying to close the measurement gap is not a lack of data. It is a lack of connective tissue between the data that exists and the financial outcomes it is supposed to explain.

‍

According to Futurum Group's 2026 Enterprise Software Survey of 830 IT decision-makers, direct financial impact, combining revenue growth and profitability, nearly doubled to 21.7% as the primary ROI metric for enterprise AI. At the same time, productivity gains collapsed as the leading success metric. The enterprise buyer base is demanding that AI capability connect directly to the P&L, not just reclaim hours.

‍

That demand is harder to satisfy than it sounds. Most organisations measure AI impact at the group or role level, comparing patterns across teams rather than evaluating individual employees directly. That approach can identify where AI appears to correlate with better outcomes, but correlation is not attribution. A team using AI tools more frequently and performing better may be doing so because of AI, because of better management, because of a favorable market, or because the same employees who adopt new tools early also happen to be higher performers in general. Separating those contributions requires longitudinal data and controlled comparisons that most organisations do not yet have.

‍

A recent MIT study found that 95% of enterprise AI initiatives fail to deliver measurable return on investment, a figure that sits in sharp contrast to the optimistic projections in most capital expenditure announcements. Only 41% of AI agent rollouts cross positive ROI within 12 months, and 19% never reach payback, according to Gartner. The 4.1-month median payback period cited for customer service deployments reflects the deployments that succeed, not the full distribution of outcomes.

‍

Where the Evidence Is Strongest

‍

The clearest evidence for AI-driven productivity gains comes from two types of deployment. The first is customer-facing service automation, where the unit of output, a resolved customer query, is countable and the comparison to human-agent handling is direct. The second is software development, where code output can be measured, cycle times are trackable, and error rates provide an independent quality signal.

‍

Salesforce reports that its Agentforce system resolves 63% of customer support questions autonomously while maintaining customer satisfaction comparable to human agents, and that internally the company has automated 96% of support cases while saving more than 50,000 hours of sales work. Travel company Engine deployed an AI agent in 12 days that now handles 50% of chat volume while reducing handle time by 15%. Heathrow Airport has seen a 30% increase in digital revenue linked to AI-driven agents. These are specific, bounded deployments where the outcome is measurable and the comparison group is clear.

‍

In software development, case studies show productivity gains between 20% and 55%, with JPMorgan reporting 10% to 20% productivity increases and EchoStar Hughes saving 35,000 hours through AI coding tools. Federal Reserve analysis found that workers using generative AI save an average 5.4% of their work hours, translating to roughly 2.2 hours per week per knowledge worker. At scale across thousands of employees, those figures represent meaningful recovered capacity, even if they do not yet show up cleanly in earnings.

‍

The deployments that struggle to show return on investment are the diffuse ones: rolling out AI tools broadly across an organisation and measuring aggregate usage without connecting specific use cases to specific outcomes. Only about 5% of companies achieve substantial AI ROI at scale, according to current analysis, while 35% report partial returns. The organisations that reach measurable returns tend to apply AI to core workflows, pair deployment with process redesign, and scale deliberately rather than widely.

‍

The Measurement Imperative

‍

What is emerging from the current state of enterprise AI is a clear split between organisations that are measuring AI adoption and those that are measuring AI outcomes. The former group is larger, better equipped with monitoring tools, and producing impressive usage statistics. The latter group is smaller, more disciplined in its deployment choices, and generating the return on investment data that boards and CFOs are beginning to demand.

‍

Futurum Group's research shows that direct financial impact has overtaken productivity as the primary ROI metric enterprises care about in 2026. Salesforce CFO research confirms the same shift: 61% of CFOs say AI agents are changing how they evaluate return on investment entirely, moving beyond time saved toward measurable business outcomes including cost avoidance, revenue generated, and risks mitigated.

‍

The companies that will prove their AI investments are working are not necessarily the ones with the highest usage volumes or the most sophisticated tracking infrastructure. They are the ones that defined what success looks like before deployment, built the measurement layer before scaling, and connected specific AI use cases to specific financial outcomes rather than measuring AI as an enterprise-wide phenomenon.

‍

Everything else, the leaderboards, the token counts, the usage dashboards, is activity data. It describes what is happening. It does not yet explain whether the spending is worth it.