Skip to main content
Key Takeaways

Governance Shift: AI governance must evolve from policy to runtime monitoring as agentic AI takes autonomous actions.

Deployment Priority: AWS prioritizes rapid AI deployment over governance, focusing on in-production learning and scaling.

Unmitigated Risks: Credo AI catalogues 1,600 AI risks, with 15% lacking mitigation, raising governance concerns for agentic AI.

Accountability Gaps: Current governance frameworks struggle to assign accountability for AI agent actions affecting users.

During a live demo at HumanX last week, Mati Staniszewski, co-founder of ElevenLabs, walked an audience through a government service agent built on his company's platform.

The agent guided a fictional user through business registration. It authenticated their identity via WhatsApp. It pulled documents from a linked Google account, switched languages mid-conversation when the user mentioned Spanish-speaking employees, and answered questions about the seller's permit process in California.

Then the user asked about taxes. The agent ended the call.

Keep Reading—and Keep Leading Smarter

Create a free account to finish this piece and join a community of forward-thinking leaders unlocking tools, playbooks, and insights for thriving in the age of AI.

Step 1 of 3

Name*
This field is hidden when viewing the form

It didn't flag uncertainty. It didn't escalate. It terminated a conversation that a real user would have assumed was being handled, because the agent didn't know a tax department existed to route to.

Staniszewski fixed the problem on stage in a few minutes, connecting the business agent to a tax agent, pushing the change, and restarting. The audience watched a live production update happen in real time.

What's worth sitting with is not that the agent failed. Agents fail. It's what the failure looked like. No warning. No handoff. No record of what the user needed. The gap only became visible when a user walked into it.

Now put that scenario inside Revolut, which has deployed ElevenLabs agents across four million customers in more than 30 languages. Or inside Deutsche Telekom, whose network now includes agents fielding customer queries in real time. The bottleneck, Staniszewski said, is no longer the technology. It's deployment.

That statement is both accurate and, for anyone responsible for what those agents do, the most important sentence of the entire session.

The Model That Governance Was Built For

For most of the last three years, enterprise AI governance operated on a manageable set of assumptions. An AI system surfaced a recommendation. A human reviewed it. The human bore accountability for what happened next. The model wasn't perfect, but it was legible. You could point to where a decision was made and who made it.

That architecture is breaking down.

Navrina Singh, founder and CEO of Credo AI, has spent six years building what she describes as the AI governance category, starting in the predictive machine learning era and now operating deep into the agentic one.

At HumanX, she described the central shift. Governance used to be about policy, now it has to be about operational and runtime guardrails. The distinction matters more than it might sound. Policy is something you write, review annually, and update when something goes wrong.

Runtime governance means monitoring what an AI system is actually doing, in production, continuously, against criteria that may need to shift as the model underneath it evolves.

Run in Real Time

Run in Real Time

“You can’t do governance just once. You have to do it continuously,” Singh said.

That shift from policy to runtime maps directly onto the difference between advisory AI and agentic AI. Advisory AI makes suggestions within a defined scope. Agentic AI takes actions, chains those actions across systems, hands off to other agents, and produces outcomes that no single human reviewed before they happened.

The ElevenLabs demo compressed this into a few minutes. One initial user interaction produced authentication via WhatsApp, document retrieval from Google Workspace, a language switch, a seamless agent transfer, and a proactive outbound call offering a grant program. Five distinct autonomous actions. No human greenlit each step. That's what agentic AI looks like at the level of a single use case.

At enterprise scale, the governance gap becomes something considerably harder to manage.

Join the People Managing People community for access to exclusive content, practical templates, member-only events, and weekly leadership insights—it’s free to join.

Join the People Managing People community for access to exclusive content, practical templates, member-only events, and weekly leadership insights—it’s free to join.

Name*

The Builders' Calculus

Matt Garman, CEO of Amazon Web Services, was straightforward about where enterprise AI value is actually accumulating. The early generative AI wins, content creation, document summarization, are giving way to something more fundamental.

Agents are the way that most enterprises and most companies are going to get most of their value out of AI.

Matt Garman-97630

Amazon's internal deployment tracks that claim. The company's Quick Suite tool gives hundreds of thousands of employees access to AI agents connected to enterprise data across Salesforce, Workday, email, and internal documentation.

Software developers are seeing roughly 4.5 times more output with AI assistance than without. A coding agent called Kiro recently resolved a customer bug request and pushed the fix in 25 minutes.

These are real gains, and Garman presented them without inflation. But embedded in the case for deployment is a priority order that governance has to reckon with. Staniszewski was explicit about it.

The bottleneck isn’t technology anymore. It’s deployment. The companies that ship, that learn, and scale quickly are the companies that will win.

Mati-61294
Mati StaniszewskiOpens new window

Co-founder at ElevenLabs

Ship. Learn. Scale. That's the sequence. Governance doesn't appear in it, not because it's being ignored, but because the competitive logic of the current moment rewards velocity. The "learn" phase in that sequence is happening in production, with real users, against real consequences.

This isn't cynicism. Garman and Staniszewski are describing how technology markets work when the rate of capability improvement is high and the cost of waiting is real. But for the CHRO responsible for workforce AI decisions, or the COO whose operations are being restructured around agentic systems, the calculus lands differently. They don't capture the upside of moving first. They inherit the liability when something goes wrong.

A Gap Nobody Has Closed

Credo AI has assembled what Singh describes as the world's largest repository of AI risks, currently cataloging around 1,600 distinct categories. They have mitigations for roughly 85% of those.

The remaining 15 percent, approximately 240 known risk categories, have no established mitigation yet. That's the state of governance in advisory AI. Agentic systems, where multiple models interact, hand off tasks, and produce outputs that feed into further automated decisions, introduce failure modes that haven't been fully characterized, let alone mitigated.

The harder problem Singh identified is provenance. In a multi-agent system, tracing accountability requires answering not just what decision was made, but how. As in, which agent contributed what, which data sources were consulted, where in the chain an error originated, and how a correction would propagate.

Most governance frameworks were designed to evaluate a model's output. They weren't built to reconstruct the reasoning of an interconnected system operating across platforms, models, and organizational boundaries.

Saahil Jain, CTO of You.com, which pivoted from consumer search to building search APIs for AI agents, framed the structural issue. Human users interact with information and exercise judgment about what they see. Agents consuming information don't have the same capacity for self-correction.

They will essentially use the context that they're given and cite it in some way that does not include humans in the loop," he said.

The governance architecture that enterprises have built assumes a human review somewhere in the chain. Over the course of the week at HumanX, the phrase "agents governing agents" was something I heard again and again. Agentic deployment, it seems, is systematically removing that assumption, without replacing it with anything structurally equivalent.

Regulatory frameworks aren't closing the gap. Singh has advised governments across the US, Europe, Australia, and India on AI governance, and she was direct about the existence of a massive lag.

Rules designed for earlier machine learning systems don't map cleanly onto agentic architectures. The Colorado AI Act is pushing on impact assessments. European privacy and transparency requirements are tightening at the frontier model level.

But the pace of agentic deployment is well ahead of any regulatory cycle, and Singh expects that to remain true for the foreseeable future.

The Compliance Floor Isn't the Ceiling

SOC 2 compliance has become table stakes in enterprise AI procurement. Singh described Fortune 500 clients, after a year of broad experimentation with multiple vendors, now scrutinizing suppliers on whether their data is protected, whether their accountability standards are documented, and whether bringing their AI into the enterprise creates net exposure.

According to Singh, only about 40% of AI vendors are reaching production in large enterprises as a result. The selection pressure is real.

But SOC 2 compliance and agentic governance are different things, and conflating them is one of the more consequential mistakes the enterprise market is currently making.

SOC 2 addresses security and availability. It says something meaningful about whether a vendor's systems are protected and reliable. It says nothing about whether an agent's decisions are traceable, whether its outputs are contextually accurate, whether it escalates edge cases appropriately, or whether its behavior drifts as the models underneath it are updated without notice.

Those are governance questions, and they require a different kind of infrastructure than a compliance certification provides.

Singh outlined what that infrastructure looks like in practice: governance teams that include data scientists, risk and privacy specialists, and behavioral experts, not just security personnel. Independent verification organizations, including an emerging standard called AIUC, that test agentic systems against established criteria through external parties. Continuous runtime monitoring, not just pre-deployment evaluation.

Credo AI recently launched a governance MCP, currently in beta, designed to push guardrails upstream into the development process itself, giving builders the ability to embed governance logic at the point of construction rather than layering it on after the fact.

The direction is right. Whether adoption moves fast enough to matter is a separate question.

The Accountability Question

Garman describes AI transformation at Amazon with specific optimism. Sales teams spending most of their time with customers instead of pipeline administration, software developers unblocked from years-long backlogs, products shipping faster and with more responsiveness to what users actually want. The gains are real. The logic is sound.

But embedded in the agentic deployment picture is a question that governance frameworks haven't answered cleanly. When an AI agent takes an action that harms someone, who is accountable?

The voice agent that ends a call at the wrong moment, stranding a user who needed help. The financial AI that flags someone incorrectly and affects their access to credit. The HR screening agent that deprioritizes a candidate based on criteria encoding historical bias. The benefits platform that auto-denies a claim without any human review before the decision reaches the claimant.

In each case, an agent acted. In each case, a human set it in motion. In each case, the chain between decision and consequence is longer and more diffuse than anything existing governance frameworks were designed to map.

Singh's observation about waiting deserves to be taken seriously beyond the business logic she was offering.

"Many companies, when we speak to them, say let's wait for an incident to happen, and then if we need to invest in AI governance," she said. "By that time, they're going to be irrelevant."

That's a competitive argument. There's also a harder one. As agentic AI moves into benefits administration, performance evaluation, workforce planning, customer finance, and clinical intake, the people on the receiving end of those decisions have legitimate interests in how those decisions are made and who answers when they're wrong.

Governance frameworks calibrated for AI as an advisor are not adequate for AI as an actor. The builders know this. They are deploying anyway. The window for building accountability into these systems before the workflows reorganize around them is not indefinitely open.

David Rice

David Rice is a long time journalist and editor who specializes in covering human resources and leadership topics. His career has seen him focus on a variety of industries for both print and digital publications in the United States and UK.

Interested in being reviewed? Find out more here.