2026-06-11 Why AI isn’t showing up on your bottom line

2026-06-11

Why AI isn’t showing up on your bottom line | Azeem Azhar

•workfutures.io #discover : Why AI isn’t showing up on your bottom line | Azeem Azhar (@2026-07-11)

body

I had tea with a senior exec at a well-known public tech company last month. She has about a thousand engineers working for her, and nearly every one of them works with Claude Code. They are producing more lines of code, submitting more pull requests, getting more done. Productivity is up for individuals, but she doesn’t seeing proportional gains at the organization level. As she put it to me: “one plus one plus one plus one equals one-and-a-half.”

She is not alone. Uber’s COO Andrew Macdonald went on record this week saying that the relationship between AI investment and results is not there yet:

I think maybe implicitly there is more that is getting shipped, but it’s very hard to draw a line between one of those stats and, ‘Okay, now we’re actually producing 25% more useful consumer features.’

AI has delivered something. I have felt it; my team has felt it; most users have felt it, which is why we keep returning and using more of it. Two years ago, only a dozen Anthropic customers were spending over $1 million a year on Claude ^[1]; today, more than 1,000 do. More impressively still, Anthropic’s average corporate customer increased their spend by a factor of five in the past year.

But in more than three years since ChatGPT’s release, only 27% of executives say AI has met their ROI expectations. What do we make of the other 73%? Could their expectations be too high? Or too low? Do they even have the right class of expectations?

In a way, we can’t tell, but we can feel the vibes. And the vibes are that individual workers are getting faster and more productive. But for now, those individual gains from AI do not compound into firm-level ROI.

That is the puzzle we are going to solve in today’s essay.

Let’s go!

The productivity puzzle, restated

Back in 1987, Robert Solow famously pointed out that you could see the computer age everywhere but in the productivity statistics. He was right for the next few years, then wrong. Paul David realized this was a common problem with general-purpose technologies. They systematically depressed measured productivity in their early stages because companies needed to invest in all sorts of hard and soft know-how before the gains appeared. Erik Brynjolfsson calls this the productivity J-curve: general-purpose technologies are a drag in their early years because firms have to make complementary intangible investments before the gains materialize.

Paul David’s 1990 paper on electrification is the canonical account of why a general-purpose technology can sit inside firms for ages before we see the results. The story he tells – building on Warren Devine’s earlier survey of the shift from shafts to wires – runs through three phases. And those phases map onto where AI is now.

Stage 1: The lightbulb

One of electricity’s first factory roles was the simplest – lighting. A brighter floor was safer than one lit by gas, and cleaner than one lit by oil. But work still flowed through the same sequence of people, machines, shafts and belts. Electricity had improved the workers’ immediate environment, but it did not change the factory’s operating logic. When ChatGPT was first released, it did something similar. It increased how quickly we could write emails; individuals sped up on some tasks, but the firm did not. This is Stage 1 of AI transformation: the lightbulb.

Most of the AI products we see today are all about individual productivity. Yes, there are enterprise plans for ChatGPT and Claude and whatever else. But the unit of work is still the task that the individual has to hand. The enterprise plan just lets them quickly access the corporate skills repository.

Stage 2: The group drive

The next stage of electricity adoption in factories focused on cost savings rather than productivity. Louis Bell wrote in 1891 that large central steam plants could be five to seven times more coal-efficient than small engines. Factories bought power from central stations and installed electric motors to drive their existing shafts and belts.

Then, a professor of electrical engineering, F. B Crocker, and his colleagues found another application. Electric power frees the shop floor from the shafting. Mechanical power belts, tools and oil, the mess of the factory floor, could all be moved; machines no longer had to be arranged in parallel lines beneath shafts.

The open question was how many motors a factory needed: one per tool, or one per group of tools? The latter became known as group drive. It was a single motor that powered a cluster of machines via a shared shaft.

Group drive preserved existing layouts, reused sunk capital, needed fewer motors, and gave many of electricity’s benefits without the cost of rebuilding the plant. It was cheaper and easier than the alternatives, so it won and dominated factories through the 1890s, 1900s and into the First World War.

AI agents are better than chatbots. They can handle whole workflows rather than single tasks. But, like group drive, they are attached to the existing organizational geometry. In the case of electricity, this was the shop floor layout. For AI, it’s the web of processes designed by the companies well before anyone knew what an LLM was.

An AI recruiting agent speeds up a process that was previously done by humans and an applicant tracking system. Their recruiting pipeline may shrink from weeks to hours. A customer service agent takes on more tickets than the support team could before. The logic behind these examples is essentially cost-saving – fewer support tickets need humans, recruiters screen more candidates with the same headcount and marketers create more variants without additional staff. The firm isn’t making consequential decisions any faster. This is Stage 2, the group drive, machines turning faster on the same shafts.

Stage 3: The unit drive

It was only later, once the organizing logic of the factory moved to throughput from cost-saving, that the deeper value of unit drive, one motor per machine, became clear. In 1913, Ford’s Highland Park plant decided to orient machines and workers around the workflow rather than the geometry of shafts and belts. Over the next decade (1919–1929), as more factories adopted unit drive, US manufacturing labor productivity grew by 5.4% a year.

Highland Park, 1914

The pattern for AI will mirror the pattern David spotted for electrification. Stage 1 speeds the individual, Stage 2 a workflow and Stage 3 the firm.

The ladder

You’ll be familiar with other maturity models, like Carnegie Mellon’s Capability Maturity Model. These are useful, but they treat each stage as a monolithic capability tier. A capability tier tells you how well a firm does a “fixed thing.”

Our ladder is different. The stages are organizing logics. The logic is the goal the firm is pursuing with the technology. A workshop that installed electric lighting was no worse than Ford’s factory, but it was pursuing a different goal: a safe way to illuminate the workshop floor. You didn’t get to an assembly line by adding more lightbulbs. Factories were pursuing cost savings and everything was organized to fit that goal, from layout to staffing to supplier relationships.

A maturity model will often tell you to do more of what you are doing to move forward. We reckon that what matters is what you are trying to do. And our stages hint at why companies deploying AI might get stuck. A firm does not graduate stage by stage on every axis. An individual developer who is 50% more productive with AI tools but must submit to traditional review cycles will find themselves in a queue. The product team that can prototype faster than ever but needs to wait for sign-offs, will build up a backlog of features. A sales team, supercharged by AI-assisted proposals, may close deals faster than legal can review them.

As firms move towards Stage 2 on execution, those managerial layers, how decisions get made, stay where they were. We call that mismatch congestion. It is the buildup of individual and then team outputs waiting for somewhere to go. To become a Stage 3 firm, you need to rebuild around decision speed between workflows, not the speed of individual workflows. If you are already suffering from congestion, adding more workflows and more output to a blocked decision pipeline will only make things worse.

We discovered this at Exponential View. We could prototype, even build, new features faster than our usual processes for releasing them. We’re figuring out how to fix that – and we’re only a team of eight, so we are sympathetic to the challenges of larger orgs.

To speed up those decisions, AI needs to be able to take them. The role that managerial oversight has may be the thing preventing the cycle time from shortening. AI will need a new cognitive layer that allows the firm to interpret signals without a worker as an intermediary. If a customer sends a feature request to customer support, it typically passes through a support agent and then a product manager, who decides whether it belongs on the roadmap, before a developer codes it weeks or months later. With AI, the signal is observed by an agent. It orients against the roadmap and the codebase. It decides whether the feature is worth drafting. It builds it. Hours, not weeks. This is the Stage 3 firm, the unit drive.

How to move to Stage 3

OODA, redrawn

We are big fans of John Boyd’s Observe-Orient-Decide-Act loop. He is mostly remembered for the maxim, “operate at a faster tempo or rhythm than our adversaries”. And indeed, the OODA loop is a key to understanding Stage 3.

Boyd knew that speed wasn’t the only determining factor in winning. Understanding your environment, having a good model for what makes a good decision or not, is also key. A firm running a fast loop against a bad model of the world just loses money faster.

We’ve called this model of the world the “business operating graph”. The business operating graph is the machine-readable version of what middle management knows, some held in well-organized systems of record, some in reams of emails and meeting notes, and some perhaps just in people’s heads.

And a business operating graph needs to do three things if you are going to make fast, good decisions that don’t end in disaster.

First, signals: in an AI-native firm, the work starts with data, not with a person. A signal comes in, and an agent automatically acts on it. Some of these signals might be things you already track; others might be the type of anomaly suited for a machine to act on.

Second, coordination: A signal has to be tested against a live, shared picture of the business to act against; otherwise, it will act in ways untethered from the business. One of the breakthroughs we noticed with the arrival of R Mini Arnold, my agent, and our other OpenClaw agents was that they were so much more useful for having a wider context of our business and current priorities than previous tools.

Third, risk: an agent acting on its own needs explicit rules about when to escalate. In many ways, it is similar to an employee. A support worker can solve a customer complaint, but escalates anything involving the press or a regulator. Firms already govern autonomy through permissions and thresholds; agents need the same structure.

We don’t think of a “business operating graph” as a monolithic thing that needs to be built. It is the “state of the business,” and most companies already have something that represents that. It is the enterprise SaaS platforms like Atlassian, Salesforce or, particularly, ServiceNow.

These are likely to form the foundation of that “business operating graph.”

Most Stage 2 firms have, at best, let AI into a small slice of their work; tightly scoped to a single function, usually customer service. I’m keen to find a company that has completely closed this loop. Fast-fashion company Shein is the closest parallel we found.

The typical fashion company’s OODA loop was seasonal: observe the market through runway trends, then spend the next months developing a line before getting sales feedback. Shein compressed this to one to two weeks. It pushed small batches of designs online, watched clicks, carts and sales, then reordered the winners and killed the losers. The customer signal moved across the firm without the usual congestion. By 2024, Shein had grown to be the same order of magnitude as Inditex, the Marvel Universe of fast fashion.

Shein closed this loop without AI agents. They used clickstream data, on-demand manufacturing, and an instrumented supply chain to build the fastest test-and-order system for fast fashion. In 2021, before ChatGPT, Shein was adding 2,000 to 10,000 items to its inventory per day, with a 40-day turnover. The loop, not the technology that closes it, is Shein’s competitive advantage. And now, AI might just be the cheapest way most firms can ever get there.

The signal

In Stages 1 and 2, a person is the eyes of the business. In Stage 3, the agent first sees the signal on its own. And to make this possible, the firm must capture every decision, transaction, sensor reading, customer action etc., as data.

Jack Dorsey’s fintech company, Block, is one of the few companies we know of making this bet in Stage 3 territory. Jack is very forward and open about the reorg, so we have some first-hand insight to work from. Block receives user data from Cash App and Square and reviews it against the context of the firm’s “economic graph.” This is a living map of everything that makes up Block: the customers, the merchants, their behaviors, products, permissions, past events. The graph gives Block’s humans the context they need to make decisions without going to their boss first.

But for this to work, Block had to reorganize around completely new roles: the individual contributors who build capabilities, the directly responsible individuals who oversee the success across teams and the player-coaches who execute and mentor. The firm is flatter; and the signals from all the data the apps capture don’t have to travel far before someone decides to act on it.

Most firms don’t have Block’s advantages, a strong digital trail, or flatness. They may have one but not the other, which is not enough to move to Stage 3. Even in flat hierarchies, the day-to-day runs on tacit knowledge and decisions that the digital layer doesn’t see. Think of all your in-person meetings that don’t exist for AI; or a field engineer’s notes, which have the technical detail about the machine they repaired, but nothing about the judgment calls made on the job. Massive areas of work happen outside the data layer, and they now have to come into the fold.

The manager’s job

For many loops to work well, they need to work together. Back to Shein’s example, they closed many of the critical loops at the same time. The trends team identifies the latest new thing on TikTok, then product runs a limited-series test, then feeds that signal to suppliers who adjust production based on the demand intel and so on. Each loop in the firm has its own “state,” but the shared “state” is what makes the organization. That’s the hard work of coordination. In today’s companies, the shared state is in middle management (and all its tacit and implicit knowledge), with some supportive scaffolding in the SaaS the orgs use.

An AI system for a Stage 3 firm will need a map of the organization that middle management right now holds in their heads, SaaS and spread across slide decks and conversations. But unlike a static process doc, the graph updates all the time as customers make decisions and the organization moves. This concept has antecedents in knowledge graphs, enterprise ontologies and the digital twin tradition.

Palantir has been working with BP to create a digital twin of their oil and gas activity. They mapped all platforms and pipelines, dependencies, conditions that get updated by more than two million sensors. The agents see everything. Each action is grounded in that graph.

Without that shared representation, each loop stays an isolated function. The customer service agent’s activity remains within customer service, and the product misses a chance to learn from it. The firm speeds up individual functions, but coordination remains slow as human managers shuttle decisions and things between functions.

Dealing with risk

In Stage 2, every consequential piece of work the agent does still goes through a human approval gate. One estimate shows that some 40% of the time saved by AI is eaten up again in checking and correcting AI’s output.

In Stage 3, the approval gate is replaced with an escalation gate. The agent can act on its own within a defined scope, but it calls in the human when it hits the edge of what it’s allowed to do. Human work is to define that edge. I think about it in three dimensions:

Value – how costly is the mistake?
Confidence – how certain is the system?
Reversibility – can you reverse the action?

This is Jeff Bezos’s one-way-door / two-way-door framework, generalized and applied to autonomous systems rather than human decisions.

Once you define what your AIs can do on their own and what they have to escalate to you, activity will undergo an extraordinary acceleration. Spotify calls this the Golden Path. It is a set of checks to ensure code looks and behaves right before it can be merged into the main codebase. If an agent writes code that passes those checks, it can move forward without human review.

The hard jump

The thing to note is that while getting to Stage 1 is really about individual software, Stage 3 is all about fundamental organizational change. Stage 3 needs you to redefine the boundaries of who gets to make which types of decisions. And at the heart of that is the resource allocation within a firm. Decisions that matter, after all, are about whether to deploy resources (as capital or expenses) against a particular goal so that it both meets a financial target and stays within the firm’s desired operation space.

In simple English, that’s really at the heart of what a company does. And it isn’t clear that you can just magic up a piece of software that can do that.

For companies today, Stage 3 looks nothing like Stage 1. That step isn’t just about software, nor is it about management. It’s about both. And this explains why the “Forward Deployed Engineer”, which is a fancy AI-era word for management consultant, has become such an important part of enterprise AI deployments. OpenAI has launched the Deployment Company with TPG, Bain and McKinsey because deploying AI increasingly means rebuilding the firm’s OODA loops — orientation, decision rules and all – for increasingly autonomous actions.

The system must be allowed to make decisions across the firm, including across managerial hierarchies. The role of the human then shifts from manually routing information to building the systems that route it, defining the constraints under which they operate, and owning the outcomes they produce.

Where to start

The first step toward the unit drive was giving the first machine a motor.

For AI, it will be about finding a tightly scoped loop in your organization and closing it. A loop has these three components: (1) a signal flowing in, (2) a business operating graph that holds the relationships the agent needs and (3) a defined scope for what the agent can act on (mentioned above, value, confidence and reversibility).

We can’t build fully autonomous loops for the main essays Exponential Viewproduces. That Azeem is typing this in right now is the whole point.

But we’ve experimented with this for a new product we’re building. We feed customer feedback directly into product management agents, which analyze it and produce prioritized product features. These, in turn, are turned into specs that coding agents can pick up and develop and deploy. We haven’t completely closed this loop. A closed loop would mean the agents would tell customers we’ve shipped what they asked for and solicit more feedback. But we’ve seen the potential.

And that approach makes sense. Start by building a single loop (and the operating graph it needs) — that loop needs to have a path to be completely closed and autonomous, except for exceptions. Then you need to experiment.

Ramp, the fintech, has a good case study with their policy agent. The firm processes transactions for more than 50,000 businesses. Each transaction is a signal that gets checked by an agent against the operating graph; in this case, it’s Ramp’s expense policy. The agent can approve, reject or escalate, citing the specific policy text in its reasoning.

If a human has to step in to fix an ambiguous case, for example, a seat upgrade on a flight when the policy doesn’t explicitly talk about upgrades, the system is updated with the resolution. “Seat upgrades below $200 for VP level and above: approved.” The operating graph updates. Early users of the agent have reduced manual reviews by about 85%.

So, start by finding a loop that’s useful to your company and that you can take end-to-end through a series of agents. The key to acknowledge is that excessive human intervention, through decision-making or escalation, creates congestion and defeats the purpose. This is more effective than getting more licenses for Copilot or ChatGPT. Ford’s factory wasn’t more lightbulbs, it was continuous flow.

Stage 3 in an AI firm is more than continuous flow. It will be continuous learning. That’s important. Stage 2 produces productivity gains and cost savings, but those advantages are temporary. Competitors can copy and catch-up fast. Your cost advantage will disappear.

What is harder to copy is a firm that learns ever faster.

Bezos, in his 2016 shareholder letter, was already arguing something close to this, in the language of decisions:

Most decisions should probably be made with somewhere around 70% of the information you wish you had. If you wait for 90%, in most cases, you’re probably being slow. Plus, either way, you need to be good at quickly recognizing and correcting bad decisions. If you’re good at course-correcting, being wrong may be less costly than you think, whereas being slow is going to be expensive for sure.

Bezos’s lesson, predating the transformer architecture, let alone ChatGPT, is as true today as it was then.

1+1+1+1=4

Electricity did really take 30 years to show up in aggregate numbers. But wide adoption, as David wrote, “required working out the details in the context of many kinds of new industrial facilities, in many different locales, thereby building up a cadre of experienced factory architects and electrical engineers familiar with the new approach to manufacturing.”

Individual companies, however, did do well – and quickly. Ford, of course, was one. Dunnell Manufacturing had over 25% production increase by 1895; the US Government Printing Office increased pressroom productivity by 15% by 1899 and could fit forty more presses into the same floor area.

Which brings us back to the senior exec I had tea with. Her thousand engineers shipping more code are producing a 1+1+1+1=1.5; that isn’t because AI isn’t working. It’s because the congestion is building up between decisions. The fix is closing one loop, then another, until the firm’s clock speed increases.

The firms of the future will compete on loop speed.

Annualized run rate ↩︎