AI Agentic Leadership Archives - Hoola Hoop - Executive Coaching for CEOs, CTOs & Boards

A CTO’s Policy for Non-Engineer Commits.

A product manager just opened a pull request. A designer shipped a working prototype instead of a Figma file. Someone in revenue built a tool over the weekend and wants to know when it goes live. None of them report to engineering, and all of them are now producing code. The instinct to ban it or wave it through both fail in predictable ways. There is a better answer, and it fits in four words.

Every CTO needs a vibe coding policy now, because the question it answers has already arrived in your repository. None of the people producing this code report to engineering, and the instinctive responses, a quiet ban or a shrug, both fail in predictable ways. I have watched CEOs drop AI-generated code into a channel and announce, “Let’s launch this today, I vibed it with Claude Code so I know it’s solid.” At our June 11 CTO roundtable on agentic adoption, the group of 7 CTOs converged on a principle that holds up better than either: bring demos, not code. This article unpacks what that vibe coding policy means in practice and how to turn four words into an actual operating policy.

Everyone Can Produce Code Now

The numbers describe a transition that is already complete. A June 2026 Black Duck survey of 831 software engineers and DevOps professionals found that 97 percent of development teams use AI coding assistants, while only 30 percent operate under a fully governed approach. A quarter of teams have no defined AI coding policy at all. The activity is not confined to engineering either. A Varonis analysis of 1,000 real-world IT environments found that 98 percent of organizations have unsanctioned or unverified apps in use, including shadow AI. The tools are already inside the building, whether or not anyone approved them.

This is not only a CTO’s problem. Every unreviewed tool a non-engineer ships becomes new attack surface, and that lands on the CISO’s desk as squarely as it lands on yours. The same Black Duck survey found nearly two-thirds of teams are already worried their AI assistants are introducing security defects. Who may ship code has quietly become a security question, not just an engineering one.

The boundary that used to separate “people who write code” from “people who request code” was enforced by skill scarcity. AI removed the enforcement mechanism, and no committee voted on it. In some cases, it seems as if CEOs and BOD members voted on it.

So the CTO inherits a question that did not exist three years ago: what happens when a non-engineer shows up with working code? Most leaders I talk with have answered it by default rather than by design. That is the gap a deliberate vibe coding policy needs to close.

Five Ways the Default Vibe Coding Policy Fails

When there is no deliberate vibe coding policy, organizations fall into one of five default patterns. Each one feels reasonable in the moment. Each one fails for the same underlying reason.

Failure 01

The Unenforced Ban

The most common policy is a prohibition nobody believes and one that makes things worse. Engineering declares that only engineers commit code, the wiki page goes up, and within a month the CMO’s team is running three AI-built tools that IT has never seen. Prohibition does not stop the behavior. It stops the visibility. The near-universal shadow-AI numbers are what a ban looks like in practice.

Failure 02

The Free-for-All

The opposite failure accepts every contribution in the name of velocity. It feels generous and modern, and it quietly transfers risk to the people least equipped to see it. Someone who builds something that works has proven the idea, not the implementation. Enthusiasm is not the same as ownership, and velocity that skips the people accountable for what ships is not speed. It is deferred cost wearing a costume.

Failure 03

The Silent Re-Write

Some teams split the difference informally. They accept the PM’s pull request to be polite, then an engineer quietly rebuilds it from scratch. This doubles the work, hides the true cost of the contribution, and teaches the contributor nothing about why their code could not ship. The org pays for the same feature twice and calls it collaboration.

Failure 04

The Hero Bottleneck

Other teams route every non-engineer artifact to one senior engineer who “handles AI stuff.” That engineer becomes a human relay between two halves of the company. Review queues grow, the senior engineer stops doing senior work, and the process collapses the first week they take vacation.

Failure 05

The Morale Inversion

Engineers watch executives celebrate a weekend prototype that ignored every standard the team is held to, and they read the message clearly: the rules are for us, not for outcomes. Meanwhile the non-engineers who built something real feel dismissed when it dies in review. A policy vacuum manages to demoralize both sides at once.

The Common Thread

Each failure comes from treating this as a code question when it is an ownership question. None of the five patterns answers the only thing that matters: who is accountable for what runs in production? A workable vibe coding policy starts there, not at the tooling layer.

What “Bring Demos, Not Code” Means in Practice

The four words carry a precise meaning. Anyone in the company may bring engineering a problem, and anyone may bring a demo that shows the problem and a candidate solution. What they may not bring is a contribution, meaning code that expects to ship as written. A strong vibe coding policy makes that distinction explicit through five commitments.

Demos, or even prototypes, are specifications and not contributions.

A working prototype from a PM is the richest spec engineering has ever received. It encodes intent, the edge cases the PM cares about, and a tested interaction model. Treat it as input to the build, never as the build itself. The artifact informs the work; it does not become the work.
Engineering owns everything that ships.

No exceptions by seniority or title. If it reaches production, an engineering team owns and operates it, and the CTO answers for it. In practice this is a line the CTO and CISO draw together: security owns the bar that anything reaching production must clear, regardless of who built it or how impressive the demo was. Teams that formalize oversight report major efficiency gains at twice the rate of ungoverned teams.
Problems get a real intake path.

A principle without a doorway becomes a ban in disguise. Non-engineers need a defined place to bring demos and prototypes, a named reviewer rotation rather than a single hero, and a committed response time. If the path is slower than going around it, people will go around it.
“Production-ready” is written down.

Security review, test coverage, observability, data handling, dependency policy. The bar does not move based on who built the demo. Publishing the bar does two things: it protects the codebase, and it shows contributors that the gap between a demo and a product is real rather than political.
Credit flows to the problem-finder.

When the shipped feature traces back to a designer’s demo, the designer’s name stays attached. If contributors lose authorship the moment engineering touches the work, they stop bringing it, and the shadow pipeline reopens. Recognition is the cheapest governance tool you have.

The Underlying Principle of This Vibe Coding Policy

The policy works because it trades a false boundary for a true one. The false boundary said only engineers may create code, which is no longer enforceable. The true boundary says only the accountable team may ship code, which is enforceable forever because it rests on responsibility rather than capability.

A demo from a PM is the best specification engineering has ever received. The mistake is letting it become the implementation.

The Policy Skeleton: Making It Real in Your Org

Writing the principle into operations takes less effort than most governance projects because it follows rails your org already has. The roundtable discussion surfaced a skeleton that adapts to most mid-size and enterprise teams. It starts with classifying every non-engineer artifact into one of three buckets.

Disposable

No review

Demos and explorations that never touch real data. Let people build freely here. This is where ideas get cheap and fast, and nothing is at stake.

Internal, guard-railed

Light registration

Tools running on sandboxed data with a defined blast radius. Someone should know they exist, but they do not need the full gate.

Production-intent

Formal intake

Anything customers or critical workflows touch. This enters the intake path, meets the written production-ready bar, and ships under engineering ownership.

Then set the review economics. The reviewer’s job is not to grade the demo’s code, which will usually be discarded. The job is to extract the specification: what problem, what data, what edge cases, what interaction model. A thirty-minute structured review of a working demo can replace what used to be weeks of requirements meetings. That reframe matters, because reviewers who think they are doing code review on AI output burn out fast, and reviewers who understand they are harvesting specs see the value.

Finally, decide where your line sits. The right boundary differs by company. A fintech moving regulated money draws the production-intent line lower than a media company shipping landing pages. This calibration, where the line goes and what it costs to put it in the wrong place, is a judgment call, and it is exactly the kind of decision where operators who have carried the pager tend to see around corners that a generic framework cannot.

Questions to Sit With

If a PM opened a pull request tomorrow, does your team have an answer, or just a reaction?
How many AI-built tools are running in your company right now that engineering has never seen?
What does your written definition of production-ready say, and could a non-engineer find it?
Who reviewed the last prototype an executive built, and what happened to their enthusiasm afterward?
If the answer to a contribution is “not like this,” does your process show the contributor what “like this” would mean?

A Final Thought

The arrival of non-engineer code is not a crisis. It is the first time in software history that the people closest to the problems can show engineering what they mean instead of describing it in a document. A CTO who meets that with a clear vibe coding policy converts a governance headache into the highest-bandwidth requirements channel the org has ever had. A CTO who meets it with a ban or a shrug gets the shadow pipeline, the rework, and the morale bill.

The work is deciding where your line sits before the next pull request forces the decision in public. This connects to the broader discipline of managing AI-era liabilities deliberately. Our guide on AI governance covers the cost side of the same conversation, and the companion piece on the five things every CTO must do in the agentic era sets the wider context for this vibe coding policy.

This was inspired from our June 11 CTO roundtable.

We run new tables regularly. If you’d like to join a peer group for engineer leaders, request a seat or talk to us about coaching for the operators making these calls weekly and much more.

Request a Seat

Leigh Newsome

Partner, Hoola Hoop

Leigh Newsome is a Partner at Hoola Hoop and a CTO coach with 25 years of experience scaling product and engineering teams. He has worked with a wide range of startups and global enterprises, including Avid, Digidesign, WPP, and Kantar/Millward Brown, and successfully led TargetSpot (backed by Union Square Ventures, Bain Capital Ventures, and CBS) through its acquisition to Radionomy Group (Vivendi). When he’s not coaching CTOs, you’ll find him teaching digital audio to graduate students at NYU, building audio and signal processing applications, or flying fixed-wing aircraft, but never all three at once.

The Lesson the Cockpit Already Learned

When the first generation of glass cockpits and flight management systems landed in commercial aviation in the 1980s, the assumption was the obvious one: a more capable autopilot means a less demanding job for the human. The pilots who lived through that transition will tell you the opposite happened. The hand-flying portion of the job shrank. The judgment portion grew. The number of ways a flight could go subtly wrong, before anyone noticed, expanded as the automation took on more of the routine work.

American Airlines highlighted this challenge in the now-famous 1997 training presentation “Children of the Magenta Line,” delivered by Captain Warren VanderBurgh. The presentation warned that as cockpit automation became more capable, pilots could become overly dependent on flight-management systems and less attentive to the broader operational picture. The lesson was not that automation should be avoided. Rather, pilots needed to understand the appropriate level of automation for a given situation, recognize when the system was no longer helping, and be prepared to step down a level – or fly manually – when necessary. This philosophy became an influential part of modern airline training, emphasizing that automation changes the pilot’s role from continuously controlling the aircraft to continuously understanding, monitoring, and managing it.

Air France 447, lost over the South Atlantic in 2009, remains one of the most frequently cited case studies in discussions of automation and human performance. When unreliable airspeed indications caused the autopilot to disengage, control of a still-flyable aircraft was handed back to the crew under difficult and rapidly changing conditions. The pilots struggled to diagnose the aircraft’s state, recognize an aerodynamic stall, and coordinate an effective recovery. The BEA’s final report has become essential reading for aviation and human-factors professionals because it illustrates how quickly automation failures can become understanding failures. The accident was not caused by a malfunctioning autopilot alone; it exposed the challenges of transferring control, situational awareness, and decision-making from machine to human in a high-pressure environment.

Instrument Reading

The cockpit did not lose pilots when automation got better. It raised the bar on what a pilot had to know, when they had to know it, and how fast they had to make the call. The same thing is happening to the CTO role, only it is happening in eighteen months rather than forty years.

Three Forces That Raise the Leadership Tax

The dominant industry framing assumes that agentic autonomy frees leadership capacity. The opposite is closer to the truth. Three reinforcing dynamics, all of which I’m watching play out in the engineering organizations I coach, push the cost of leadership up as autonomy increases.

Force 01

Errors compound at machine speed.

When humans wrote the code, mistakes propagated at human review speed. Agentic systems can propagate the same mistake at the speed of CI/CD. A poorly framed instruction, a missed edge case, or a brittle assumption can move from idea to deployment before reviewers fully understand the consequences. As noted by SquaredTech, teams are increasingly finding that coding agents handle happy-path implementation well but still struggle with business-rule invariants, exceptional states, and edge-case logic. The technical failure is a symptom. The leadership challenge is defining the trust envelope, establishing override points, and assigning clear accountability when automation reaches its limits.

Force 02

The cheap human checkpoints are gone.

Every layer of automation removes a chance for a human to catch a mistake by default. The reviewer-of-last-resort no longer appears simply because someone happens to inspect the diff. The checkpoints that remain have to be designed deliberately, by leaders who understand where the system is fragile, which decisions cannot be delegated, and how learning flows back into the process. A consistent pattern I see across the engineering organizations is that the teams capturing real value from AI built those checkpoints in deliberately from the start. The teams getting marginal returns bolted AI onto unchanged workflows and assumed the tooling would absorb the oversight. The difference is not the tooling. It is how the organization chooses to govern the tooling.

Force 03

Accountability did not transfer.

“The agent did it” is not a defense to a board, a regulator, a customer, or a court. The EU AI Act’s Article 14 human-oversight requirement, which applies to covered high-risk AI systems beginning in 2026, reflects a principle that many organizations are already discovering operationally: responsibility cannot be delegated to the agent. Humans must be able to understand, monitor, intervene in, and override AI-driven decisions when necessary. Every CTO I work with has at least one decision sitting in their inbox right now that used to be “the team’s call” and is now “your call,” because someone ultimately owns the decision to let the agents run.

The cumulative effect is that the leadership tax goes up with autonomy, not down. The judgment work that used to be distributed across senior reviewers, code review forums, and the slow friction of human throughput now concentrates in fewer hands, fewer decisions, and a much shorter time window. That is not a bad thing. It is the actual job. But it is a different job than most CTOs were promoted into, and it is one nobody is teaching out loud.

Four Decisions a CTO Never Hands to Autopilot

Every cockpit checklist has a category of items that never come off the captain’s plate, no matter how capable the automation gets. Takeoff and landing briefings. The go-around decision. The diversion call. Anything involving an irreversible commitment of fuel, time, or the safety of the people in the back. These are not on the autopilot’s menu because they are not the autopilot’s job. The same logic applies to the CTO seat. Below are the four decisions I coach every technology leader to keep on a written, visible, never-delegate list.

The Never-Delegate List

Four decisions that stay with the human in the left seat, no matter how good the agent gets.

Setting the trust envelope.

Which categories of work get autonomy, which categories never do, and on what basis the boundary moves. This is a judgment call anchored in business risk, regulatory exposure, and customer impact, not a configuration in the agent platform. The trust envelope is a written artifact, reviewed quarterly, signed by the CTO. Nobody else can make this call credibly because nobody else carries the accountability if it is wrong.

Owner: CTO. Reviewed: Quarterly. Documented: Always.

Naming the override moments.

Where are the canaries. Who has the kill switch. What signal triggers a human stepping back in. In aviation we call this the “stabilized approach criteria,” a small set of conditions that, if not met by 1,000 feet, mean you go around. The criteria exist precisely so the decision is made before the moment. Most engineering orgs have not written the equivalent down for their agents, and almost none have rehearsed the takeover. Both are leadership work.

Owner: CTO. Rehearsed: At least twice a year.

Owning the accountability when (not if) the agent gets it wrong.

The board, the regulator, the customer, and the press all want a named human. The CTO who pre-decides that the answer is “me, with my CEO” walks into the postmortem from a position of credibility. The CTO who tries to diffuse the answer across the team or, worse, onto the model vendor, loses the room within two questions. This is the decision you make before the incident, on a quiet Tuesday, so it is already made when Friday breaks open.

Owner: CTO. Communicated to the team: Before the incident.

The line that doesn’t move.

A small list of decisions that are leadership decisions in perpetuity. What we build and what we deliberately do not. What we say publicly when something goes wrong. Who we hire, who we fire, who we promote. How we treat a customer in crisis. The agent does not get a vote on any of these. The temptation to let the agent draft the customer apology, suggest the hiring shortlist, or pre-write the board memo is real and growing. Most of the time, drafting is fine. The decision is not. Name the line before someone else moves it.

Owner: CTO + CEO. Reviewed: Anytime the business model changes.

The act of writing this list down, and reading it to your leadership team out loud, is one of the highest-leverage exercises I run with the CTOs and CPOs. It is also one of the most uncomfortable, because every line on the list is a place a leader has historically been tempted to let drift to the team, the platform, or the tooling. The drift is what produces the headlines.

Agentic AI Autonomy Demands More Leadership, Not Less

1.7x

revenue growth for “future-built” AI leaders vs laggards (BCG, 2025)

40%

greater cost reductions in AI-applied areas vs laggards (BCG, 2025)

BCG’s Widening AI Value Gap report, based on a survey of 1,250 senior executives, found that only 5% of organizations are “future-built” on AI, while 60% remain laggards. The difference isn’t budget. BCG argues that leaders adopt an AI-first operating model centered on reinvention rather than incremental investment.

The organizations capturing the most value consistently do three things. First, they add new roles rather than remove them, common examples in AI-native teams include the AI Reliability Engineer, Spec Author, and Agent Orchestrator (see The AI-Native Team). Second, they spend more time on calibration, trust reviews, and retrospectives than on traditional oversight. Third, they redesign organizations around judgment rather than throughput, pairing a flatter execution layer with a denser judgment layer.

Course Correction

If you are using AI adoption as a reason to flatten your leadership layer, you are running the same play that grounded a generation of cockpits in the 1990s and a generation of self-driving programs in the 2020s. Cut the layer that does the keystrokes. Reinforce the layer that does the judgment.

The Pre-Flight Checklist

If you are a CTO or CPO running an agentic deployment of any meaningful scale, these are the questions I would put on the page in front of you before the next leadership offsite. They are written as a checklist because a checklist is what works under load, and the next eighteen months of this role will be exactly that.

Is your trust envelope written down, signed by the right humans, and reviewed on a cadence the team can recite from memory?
Do the people on your team know what triggers a takeover, who calls it, and what the first three moves are once it is called?
If an agent did something material wrong this week, can you name the human who answers for it without consulting an org chart?
Is your never-delegate list a written artifact your direct reports can quote back to you, or is it living only in your head?
Are you spending more leadership time on calibration and retrospectives than you were a year ago, or are you assuming the agents absorbed the work?

A Final Thought from the Left Seat

The CTOs and CPOs I work with who are getting agentic AI right are not the ones who happen to have the best models, the deepest budget, or the most aggressive rollout schedule. They are the ones who sat down on a Wednesday morning and wrote the trust envelope, the override criteria, the accountability ledger, and the never-delegate list. Then they read them out loud to the team. Then they reviewed them in 90 days. That is not exotic work. It is the work the autopilot cannot do for you.

If you found this useful and want the companion piece on how this connects to the broader board-level conversation about AI accountability, the principles in our agentic AI governance framework pair directly with the never-delegate list above. The two pieces work together: one names the leadership decisions that cannot be automated, the other names the organizational infrastructure required to enforce them.

Ready to talk about CTO coaching with Leigh?

Book a 30-minute introductory call to explore whether coaching is right for you.

Book a meeting with Leigh →