This article is sponsored by Tabnine and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page.
Complex work depends on context, and AI does not have it by default. To leverage AI at scale and generate a return on investment, businesses need a way to equip agents with the organizational knowledge, system awareness, and guardrails they would normally expect a human hire to learn through onboarding.
The scale of this problem is far larger than most executives realize. MIT’s NANDA initiative reports that 95% of enterprise generative AI pilots fail to deliver measurable business value, despite an estimated $30–40 billion in collective investment. The core barrier, according to the report, is not model quality or regulation, but approach, specifically, the failure of most GenAI systems to retain feedback, adapt to workflow context, or improve over time.
A recent article published on ResearchGate, Governed Memory: A Production Architecture for Multi‑Agent Workflows, demonstrates that even advanced AI systems operate with only 53–65% accuracy on long‑horizon, multi‑step enterprise tasks when they lack a shared, governed organizational context; introducing a dedicated context layer raises performance to 74.8% on the LoCoMo benchmark, a material reduction in task failure for production workflows.
The study shows that this same context layer reduces token consumption by 50.3% across multi‑step executions, directly lowering operating costs, while enforcing zero cross‑entity data leakage under adversarial testing—an essential requirement for regulated environments.
Enterprises need a way to give AI agents the same onboarding, institutional knowledge, and guardrails that human engineers receive — delivered through governed, on‑prem, context‑rich infrastructure — so they can operate safely, efficiently, and at scale.
Emerj recently hosted a conversation with Eran Yahav, CTO and co-founder at Tabnine. The AI in Business podcast discussion uncovered what enterprise AI agents lack to scale inside complex, existing systems, and the solution infrastructure leaders can put in place to deploy with measurable returns.
This article explores rethinking how enterprises approach AI pilots by centering the systems that determine whether agents succeed or fail:
- Organizational context as infrastructure: Agents without institutional knowledge fail complex tasks at the same rate as an uninformed new hire, making context the foundation of reliable AI deployment.
- Pre-computing organizational knowledge: A context engine that maps dependencies upfront eliminates redundant token consumption and prevents agents from executing on outdated information.
- Perimeter deployment as a compliance requirement: The context engine aggregates the organization’s most sensitive systems, making inside-the-firewall deployment a security requirement rather than an optional configuration.
Listen to the full episode below:
Episode: Why Enterprise AI Fails Without a Context Engine – with Eran Yahav of Tabnin
Guest: Eran Yahav, CTO and co-founder, Tabnine.
Expertise: AI for Software Engineering, Program Analysis & Synthesis, Developer Productivity Tools, Programming Languages & Verification
Brief Recognition: Eran Yahav previously served as a Research Staff Member at the IBM T.J. Watson Research Center, where he worked on static analysis, program synthesis, and program verification. He is the co-founder and CTO of Tabnine (formerly Codota), where he has led technical development since around 2014. He is a Professor of Computer Science at the Technion – Israel Institute of Technology, with a research record recognized by the Alon Fellowship for Outstanding Young Researchers, an ERC Consolidator Grant, and the Robin Milner Young Researcher Award.
Organizational Context As Infrastructure
Yahav argues that the primary reason AI agents fail in complex enterprise tasks is not model capability but the absence of organizational context.
That challenge is especially acute in brownfield environments, where teams work within existing systems rather than building from scratch.
In large firms — especially banks — human engineers require six to nine months to become productive because they must learn the systems, dependencies, business logic, and unwritten norms encoded in millions of lines of legacy code. AI agents face the same environment but without any mechanism to absorb this institutional knowledge.
He describes the gap this way:
“AI agents are really facing this critical challenge of not having the understanding that human engineers do. They need to understand the entire context in which they operate. They need to understand the organization, the existing systems, how existing systems are being maintained and manipulated.”
— Eran Yahav, CTO and co‑founder, Tabnine
Without this grounding, agents frequently select outdated components, misinterpret legacy patterns, or follow the first API they encounter — outcomes that mirror how an untrained developer would navigate a large brownfield system.
To address this, Yahav recommends treating organizational context as the foundation of any AI initiative. A dedicated context layer must:
- Aggregate code, design documents, incident reports, and production telemetry
- Map dependencies and relationships across systems
- Surface only the relevant context at execution time
- Maintain a governed representation of how the enterprise actually operates
As Yahav explains, this layer functions as the map that defines the universe in which agents operate. It shifts the enterprise AI roadmap away from larger models or more pilots and toward the infrastructure required for agents to behave predictably.
Pre-computing Organizational Knowledge
Yahav emphasizes that even highly capable agents fail when they must independently rediscover how an enterprise’s systems work. Without a shared context layer, agents crawl irrelevant services, misidentify dependencies, or latch onto outdated components — behavior that inflates token spend and slows execution.
He illustrates the issue concretely: ask an agent how to retrieve employee data, and you’ll get one answer. In a large enterprise, there may be fourteen different ways to do that — and the first one the agent encounters is often deprecated, incorrect, or simply the most expensive. The agent confidently executes on the wrong path because it lacks a mechanism to determine which option reflects the current organizational reality.
To prevent this, the context engine continuously ingests source code, architectural artifacts, historical incident data, and production‑level logs, and pre‑computes the dependencies. Instead of reconstructing this knowledge for every task, agents query a governed, up‑to‑date map of the organization, which narrows their reasoning to the systems that matter.
The Ferrari analogy captures the operational stakes: an agent can move extremely fast, but without a map, it drives in circles, burning fuel and producing unreliable output. As Yahav puts it:
“The agent itself is like this very powerful car. It’s like a Ferrari. It can go really, really fast. But if it doesn’t have a map of where it’s trying to go, it will just drive in circles very, very quickly and basically burn a lot of fuel and get nowhere.”
— Eran Yahav, CTO and co‑founder, Tabnine
In Yahav’s experience, enterprises operating with a centralized context layer see 2× higher success rates and up to 80% reductions in token consumption.
For CFOs, Yahav recommends starting with two metrics: token spend and team output velocity. Without both in view, there is no baseline from which to measure whether agents are delivering returns. He is direct about the current state of ROI measurement:
“You really need to measure how much are you spending on agents and have some visibility into the velocity of the team, what is that I’m getting as output. The current ways in which we have to measure this are not sufficiently sophisticated — and this is true not just for us, but for the entire industry.”
— Eran Yahav, CTO and co‑founder, Tabnine
He notes that a remaining challenge for CFOs may be that measuring agent velocity and output quality is still immature across the industry. Leaders need visibility into what they are spending on agents and what they are getting back — and while context reduces waste, the tooling for quantifying ROI is still evolving.
Perimeter Deployment As a Compliance Requirement
Eran stresses that the context engine cannot sit outside the enterprise boundary. Because it touches the organization’s most sensitive engineering assets — from source code to design records to production telemetry, it effectively becomes a high‑fidelity representation of the organization’s internal systems. For regulated industries, this makes perimeter‑based deployment non‑negotiable.
He explains that customers routinely require the context engine to run behind their firewalls or in a fully air‑gapped environment, since it touches the most sensitive sources of institutional knowledge. This is not only a security requirement but a trust requirement: enterprises must know that the system governing agent behavior is not exposing or transmitting internal logic to external infrastructure.
Yahav frames it this way: “It has access to many of the most precious sources of information inside the organization. Many of our customers want the context engine to run inside their perimeter.”
Beyond data protection, the context layer also becomes the mechanism that ensures agents behave safely. As organizations delegate more tasks to autonomous systems, leaders need confidence that agents understand the systems they are modifying.
Eran argues that trust is impossible without context: agents must be onboarded with the same institutional awareness as human engineers before they can be allowed to manipulate production‑adjacent systems.
He states clearly that AI cannot be deployed at scale unless the context layer operates within a secure, governed environment. This is the only way to:
- and ensure that agent‑driven changes are reviewable and safe.
- prevent leakage,
- maintain regulatory posture,


















