Project Zero for AI Governance: Decision Surfaces, Not Inventories

Executive Summary

Most organizations are being told to update policies and incident response plans for AI. That is not enough. When AI-driven failures occur, the question will not be what you documented; it will be whether you can reconstruct what happened and defend it to regulators, boards, and courts.

This article is for GCs, CISOs, data leaders, and engineers who need a shared, system-level understanding of where AI is making decisions today, and what it takes to make those decisions traceable, governable, and defensible. The core challenge is not inventory; it is traceability. Before reaching for tools or frameworks, organizations must first map where AI is actually making decisions, because defensible governance requires visibility across the full decision chain and must operate at the speed of inference, not the speed of meetings. This work starts at Zero: Decision Surface Mapping,” because if you cannot first identify where AI is actually making decisions, every downstream policy, tool, or control rests on assumptions rather than systems. As we explain in the article, Project Zero is not a one-time exercise and may need to be revisited in loops as visibility and tooling mature.

Real governance must be embedded into instrumented pipelines that generate execution-level evidence. The tooling landscape remains fragmented, so organizations should prioritize by risk: Start with decision surfaces that affect people (e.g., safety, discrimination), involve sensitive data, or sit in regulated domains, and build with evolution in mind. If you cannot reconstruct the chain from input to model to output to action, you do not have governance. You have assumptions.

Introduction and Background

This article is the second installment in our AI Governance Series and reflects field observations from RSAC 2026, read through the lens of AI governance infrastructure and the regulatory expectations now taking shape.

There is a critical assumption embedded in most AI governance conversations: that organizations know where their AI systems are. Many do not.

This builds on our earlier work, AI Governance Is Not Policy. It Is Infrastructure, which argued that governance must be embedded into systems, not layered on through documentation and oversight. If governance is infrastructure, the next question is unavoidable: Where, exactly, is that infrastructure operating?

For a GC or CISO, this translates into a simple ask: “Show me where AI is making or steering decisions we would have to defend to a regulator, court, or board after something goes wrong.” That is the map this article helps you build.

Regulatory developments are already reinforcing this shift. The EU AI Act ties high-risk systems to recordkeeping, logging, and post-market monitoring designed to ensure decision traceability over time. U.S. regulators, including the FTC, are emphasizing accountability for AI-driven outcomes, which in practice requires the ability to reconstruct what systems did, when, and under whose authority.

After a week on the RSA Conference floor, meeting vendors and pressure-testing “AI governance” claims, one pattern was consistent: Most organizations cannot tell you where their AI systems are, what decisions those systems influence, or who authorized them to act, particularly as agentic systems become more dynamic and ephemeral.

That is the inventory problem. It also only scratches the surface.

Even organizations with discovery tooling, data mapping, and access controls often cannot answer the questions that regulators, litigants, and boards will ultimately ask: What system made this decision, using what data, and under whose authority, and why was it not stopped sooner?

That is not an inventory problem. It is a traceability problem. And traceability is what enforcement, litigation, and governance ultimately test.

Project Zero: Decision Surface Mapping

Before tooling, controls, or frameworks, almost every organization skips a critical step: decision surface mapping. Decision surface mapping asks a different question: not where AI exists but where it decides.

It is the act of identifying where AI is making or influencing decisions across the enterprise—where outcomes are determined, often across distributed systems, not just within a single CI/CD pipeline.

In practice, decision surfaces include:

  • Co-pilots embedded in business tools
  • Automated workflows in call centers and customer operations
  • Robotic Process Automation (RPA)1 and desktop scripting
  • CI/CD pipelines making release or testing decisions
  • Agent-based orchestration frameworks
  • Downstream actions triggered by model outputs

A concrete example: Consider a customer support environment where a co-pilot drafts responses in a CRM system, a routing model determines which customers are “high priority,” and an RPA bot pushes refunds into a billing system. These components may not appear as a single “AI system,” but together they form a decision surface that determines outcomes—who receives what, when, and based on which inferences. When that chain fails, regulators and courts will not care how the system was labeled. They will ask how that specific outcome was decided and under what authority, and why no one intervened sooner.

That scrutiny intensifies in higher-risk contexts such as algorithmic pricing or areas governed by antidiscrimination law, including lending, employment, and housing. This is your decision surface. If you cannot define it, you may not have defensible business processes.

Decision surface mapping also exposes the scope of personal data flows embedded in AI-driven decisions and the points where those decisions can cause harm. Any surface that touches a customer, employee, or user is simultaneously a data processing activity and a potential source of discrimination, exclusion, or denial of opportunity.

Data protection, AI safety, and regulatory compliance are not separate workstreams. They are outputs of the same map.

AI systems that make or steer high-stakes decisions distribute opacity and risk across domains. Regulators are already asking traceability questions, through data protection, safety, and civil rights frameworks. Organizations that treat these as parallel tracks will duplicate effort and leave gaps no one owns.

A Note on AI Governance Committees

Many organizations are standing up AI governance committees. Some are well-designed: cross-functional, empowered, and tied to existing risk frameworks. This is the right instinct, but it is not enough.

A governance committee can define policy, approve use cases, set risk thresholds, and establish review cadences. What it cannot do is observe what is happening at the pipeline level. A committee reviewing a quarterly AI inventory report is governing a document. It does not govern the systems.

The gap between what a committee can see and what systems are doing is where AI governance fails in practice. That gap grows with new model deployments, agentic workflows, third-party integrations, and automated decisions without human review.

Committees govern at the speed of meetings. AI systems operate at the speed of inference. That asymmetry is the structural problem.

The solution is to give committees something real to govern: instrumented pipelines, execution traces, and evidence artifacts that reflect what is happening, not just what was approved. Where possible, policy should be embedded as code to scale governance beyond human review.

Regulatory expectations around logging, post-market monitoring, and life cycle oversight converge on a simple reality: Without execution evidence, governance is just risk language. Data governance followed this path, evolving from policy into systems that made data visible, classifiable, and controllable.

Even then, many organizations never fully closed the gap. The result is what we often describe as the 1999 Problem: accumulated data debt driven by incomplete inventory and limited visibility into how data moves through systems.

AI will amplify this problem. It is no longer just data moving through systems. It is decisions.

Governance must operate at two speeds simultaneously:

  1. Human Speed: Policy decisions, risk escalation, use case approval, incident response
  2. Machine Speed: Continuous monitoring, behavioral baselining, anomaly detection, and audit trail generation

Neither layer works alone. Committees without instrumentation govern on faith. Instrumentation without committees is logging without accountability. Effective governance requires both: clear decision authority and the evidence to support it.

Why Traditional ‘AI Inventory’ Fails GCs and CISOs

Some organizations are attempting to solve AI inventory issues with variations of scanning and discovery:

  • AI-SPM tools to detect model usage
  • Data Security Posture Management (DSPM) tools to track data exposure
  • Identity systems to map access
  • Code scans for LLM integrations

These are useful. They are also insufficient in isolation, because they answer only fragments of the compliance picture:

  • Where is AI being used?
  • What data is involved?
  • Who has access?

But governance, enforcement, and litigation will ask different questions:

  • What system made or actioned this decision?
  • What data pools were used?
  • Under whose authority was the decision made?
  • Could any resultant harm have been detected sooner?
  • What proof do you have for any of the four questions above?

That is not a discovery problem. That is a traceability problem. And traceability is what enforcement, litigation, and regulatory review ultimately measure.

The Shift: From Asset Inventory to Execution Evidence

AI systems cannot be reliably inventoried statically. They must be understood through evidence of execution. In practice, organizations that have the best chance of getting this right converge on the four-layer model map described above. If any one of these layers is missing, decision reconstruction can fail.

  1. Decision Entry Points: Where AI-driven or AI-assisted decisions originate: ◦ co-pilots ◦ workflows ◦ automation layers ◦ orchestration frameworks
  2. Invocation Signals: How AI reveals itself in practice: ◦ LLM API calls ◦ orchestration libraries in code ◦ event triggers ◦ workflow initiations
  3. Execution Traces: What happened: ◦ prompts ◦ tool calls ◦ outputs ◦ system actions ◦ event executions
  4. Identity/Authority Binding: Who is accountable: ◦ service identities ◦ human owners ◦ authorization boundaries ◦ human-review perimeters (all of this will make a human in the loop review actionable at scale)

What We Saw at RSA: The Fragmentation Problem

The RSAC 2026 AI governance landscape reflects this fragmentation. Vendors cluster into narrow domains, each domain addressing a slice of the problem but not the whole decision chain. These domains include:

  • AI discovery and posture management tools attempting to identify AI usage
  • Orchestration frameworks defining how agents act
  • DSPM and lineage tools tracking data movement
  • Identity systems managing access
  • Runtime monitoring observing behavior
  • Supply chain tools watching model integrity

Each is necessary. None is sufficient on its own. This is not a tooling gap. It is an architectural gap. In security industry parlance, it’s the lack of a “single pane of glass”—on steroids—where no single system provides end‑to‑end visibility into how data, models, decisions, and controls interact in practice. As a result, cross‑tool risk, governance, and evidence often remain invisible to practitioners.

Security’s response to this problem was the emergence of the Security Information and Event Management (SIEM) tools as a centralized control and observability layer with policy‑driven response hooks. We are now seeing analogous thinking emerge around AI control‑plane solutions, driven by the same need to correlate signals across fragmented systems and make behavior visible at runtime.

A Realistic Assessment of Current Tooling

The SIEM analogy is instructive, but it should not obscure how long security’s journey to centralized observability took, or how incomplete that journey remains for many organizations. SIEM platforms emerged in the early 2000s, yet two decades later, security teams still struggle with log ingestion gaps, alert fatigue, correlation failures, and visibility blind spots across hybrid environments. Security leaders remember what it felt like to operate without centralized visibility. AI governance is at that exact stage now, but compressed into a much faster timeline. The promise of a single pane of glass has been perpetually five years away for most enterprises.

AI governance tooling is at an earlier and more fragmented stage. The vendor landscape we observed at RSAC 2026 reflects a market still sorting itself into categories: discovery tools that identify model usage but cannot trace decisions; orchestration frameworks that define agent behavior but do not produce litigation-ready evidence; monitoring solutions that detect anomalies but lack the context to explain what went wrong or why. No single platform today delivers end-to-end decision traceability from input to action.

This is not a criticism of the vendors building in this space. It is an objective assessment of where the market stands. Organizations should not wait for a mature, integrated AI governance platform to emerge before their mapping and evidence work begins. That platform may take years to materialize, and regulatory expectations are crystallizing now. Please don’t let perfect be the enemy of good.

The practical implication is that most organizations will need to assemble governance visibility from multiple tools, stitched together through custom integrations, manual processes, and deliberate architectural choices about where to instrument. This is expensive, imperfect, and necessary. It also means that governance programs must be designed with tool evolution in mind, built to incorporate better observability as it becomes available rather than locked into the limitations of today’s offerings.

The organizations best positioned for this transitional period are those that treat tooling as an enabler of governance architecture, not a substitute for it. Start with the four-layer evidence model. Identify which layers your current environment can observe and which remain opaque. Then evaluate tools based on which gaps they close, not which dashboards they provide. A beautiful interface that cannot reconstruct a decision chain is not a governance solution. It is a reporting tool.

Why Starting With Tools Fails

The risks of a tool-first approach bear repeating in concrete terms for legal and risk leaders. Organizations that begin with vendor selection rather than decision surface mapping typically encounter three failure modes: They inherit the visibility limitations of whatever products they purchase; they optimize governance around what is easy to log rather than what matters most; and they miss the decision chains that carry the greatest regulatory and litigation exposure. We have seen this pattern before in security governance, risk scoring tools deployed without underlying methodology, and platforms claiming HIPAA or PCI-DSS compliance without caveating the visibility gaps that compliance actually depends on. AI governance requires the same intellectual rigor that security eventually learned, applied earlier in the maturity curve.

Bringing It Back to Logic: Governing by Dependency, Not Sequence

AI governance that holds under scrutiny follows a logical pattern that often loops:

Zero: Decision Surface Mapping

You cannot govern what you cannot see. Understand and define where decisions occur before you deploy anything.

One: Visibility Capability Assessment

Understand your environment’s capability to trace AI actions, not what vendors claim their tools can detect.

Two: Orchestration Alignment

Determine where controls and interventions belong in the decision chain, not just at the perimeter.

Three: Evidence Design

Define what must be provable, reconstructible, and retained before an incident requires you to prove it.

For executives, the key is to keep asking: “What depends on what?” If a governance decision or disclosure depends on a map you do not yet have, that is a signal to go back to Step Zero before signing.

A word about this sequence:

It is not strictly linear in practice, as with a strict arithmetic order of operations; the sequence reflects logical dependency, not procedural linearity.

Most organizations will discover new decision surfaces after deploying visibility tooling at Step One. Runtime monitoring may reveal orchestration patterns that were invisible during the initial mapping exercise. The order above reflects logical priority, not execution rigidity.

Teams should expect to revisit Step Zero repeatedly as observability matures and blind spots are exposed. Governance programs that assume a one-time mapping exercise will fail. Those that build for iteration will improve and mature over time, as we did with privacy and security in the early days. A compliance program is a living, breathing thing.

Where To Start: Prioritizing Decision Surfaces

The scope of decision surface mapping can appear overwhelming, particularly for organizations that have not yet conducted a comprehensive AI inventory. Attempting to map every decision surface simultaneously is neither practical nor necessary. The goal is not completeness on day one. It is structured visibility into the decisions that carry the greatest legal, regulatory, and reputational exposure—in a phrase, risk triage.

Organizations should prioritize decision surfaces based on three factors:

  1. Regulatory Risk Concentration
    Decision surfaces operating in domains subject to hard antidiscrimination law (lending, employment, housing, health care, insurance underwriting) or with true human safety application should be mapped before lower-risk operational uses are. These are the domains where regulators have explicit authority, established enforcement histories, and increasing interest in algorithmic accountability. A co-pilot that assists with marketing copy presents different risk than does a model that influences credit decisions, even if both appear in the same AI inventory.
  1. Personal Data Intensity
    Decision surfaces that ingest, process, or act on sensitive personal data (health information, financial records, biometric identifiers, data concerning minors) demand earlier attention. These surfaces simultaneously trigger data protection obligations under GDPR, U.S. state privacy laws, and sector-specific regimes, meaning a governance failure implicates multiple compliance frameworks at once.
  1. Consumer And Employee Proximity
    Decision surfaces that directly affect individuals (customer service routing, benefits eligibility, performance evaluation, access to opportunities) are more likely to generate complaints, litigation, and regulatory inquiries than are internal operational tools. When a decision surface can injure a person, the standard of care is higher and the tolerance for opacity is lower.

This is not a license to ignore lower-priority surfaces indefinitely. It is a recognition that governance programs must be built incrementally and that sequencing matters. Organizations that attempt to boil the ocean will often end up with superficial coverage everywhere and defensible evidence nowhere. Those that start with high-risk surfaces and expand methodically will develop the institutional muscle (the processes, tooling integrations, and cross-functional coordination) needed to scale.

A practical starting point: Identify three to five decision surfaces where AI is influencing outcomes for customers or employees in regulated or high-sensitivity contexts. Map those surfaces completely using the four-layer model before expanding scope. The lessons learned in those initial exercises will reshape how the organization approaches the broader mapping effort.

The Bottom Line

AI governance is not an inventory exercise. It is an evidence architecture exercise. And it starts at Step Zero. Policies promise governance. Decision surfaces define it. Pipelines prove it.

Organizations that understand this early will build systems that hold. The rest will discover it under pressure.

What Comes Next: The Evidence Stack

This article is the starting point of a broader build of a legal engineering evidence stack. From here, we will walk through each domain of the AI governance infrastructure stack and answer a single question: What evidence does this domain need to produce for governance to hold?

The RSAC 2026 AI Governance Evidence Exhibitor Cheat Sheet we previously published provides the spine for this series. It maps eight governance domains:

  1. Governance and Risk Orchestration
    Q: How are governance decisions recorded in these platforms connected to the systems running AI?
  1. AI Discovery and Security Posture Management
    Q: Can your organization identify every AI system touching your data right now?
  1. Agent Orchestration and Workflow Control
    Q: What actions are your AI agents authorized to take, and how are those actions recorded?
  1. DSPM
    Q: What sensitive data has already reached your models, and did you know before today?

  1. Data Lineage and Pipeline Visibility
    Q: Can you trace, step-by-step, how a specific model output was produced?
  1. Identity and Access Governance for AI Systems
    Q: What systems can your AI agents access (using what identities), and who last reviewed those authorizations?
  1. Runtime Protection and Behavioral Monitoring
    Q: If a deployed AI system began behaving unexpectedly tomorrow, how quickly would you know, and what would stop it?
  1. AI Supply Chain and Model Integrity
    Q: If your AI vendor pushed a model update tonight, how would you know whether system behavior changed?

Each domain is anchored by a core diagnostic question that cuts to its evidence obligation. In future installments, we will make this stack explicit and visual. We will publish one article for each domain that provides details on available products for evidence generation. Every few weeks, we will take one domain, examine the vendor landscape, test the technology claims, and assess what evidence that domain can and cannot produce today.

We are mapping AI governance one step at a time, with your input and help. Please keep your feedback coming. We are all learning together.


1 Note: In some legal regimes, traditional diligence or negligence standards may be displaced or irrelevant. Certain AI systems may instead be evaluated under products liability or statutory frameworks that focus on the presence of a defect, violation, or harm rather than the reasonableness of governance processes. In such contexts, evidence of governance may inform risk assessment or enforcement posture but will not, by itself, eliminate liability. In these heightened‑liability settings, deeper decision surface mapping, targeted automation, and cross‑company coordination are necessary to give individuals the time and visibility to foresee and mitigate harm, where possible.