This article is part 4 of a 5-part series from Schema App CTO Mark van Berkel exploring the shift from Schema Markup to Content Knowledge Graphs, governed entity data, and the modern infrastructure required for AI search and the emerging agentic web.
If you are new to the series, start with the earlier articles:
Part 1: From Structured Data to Knowledge Graphs: Why Most Brands Are Still at Step One
Part 2: Entity Governance: The Missing Layer in AI-Ready Content Systems
Part 3: The Shift From AI SEO Tactics to Knowledge Infrastructure
While much of the industry is debating prompt hacks, llms.txt files, and “how to optimize for ChatGPT,” another conversation is happening among the teams that have spent the last decade building machine-readable infrastructure.
That conversation is about architecture. Specifically, it is about what a complete enterprise data stack looks like in a world where AI systems do more than crawl and index content. They retrieve from it, reason over it, and increasingly act on behalf of users as the agentic web continues to emerge.
The foundations of this stack already exist. Some layers have been operational for years, while others are emerging quickly. The real question is whether organizations will build this infrastructure intentionally or rely on fragmented representations of their business assembled by external systems.
This article maps the stack from the structured data foundations most teams already know, through the Content Knowledge Graph and governance layers that create durable machine understanding, to the vector, retrieval, and action layers shaping the emerging agentic web.

The foundation: Schema Markup as the ingestion layer
Every story about machine-readable web infrastructure starts in the same place: JSON-LD and Schema.org.
Schema.org gave the web a shared vocabulary for describing entities and relationships. JSON-LD made those descriptions easier to implement and maintain. The introduction of stable identifiers via @id allowed entities to persist across pages and the web, rather than existing as isolated snippets of markup.
For the first time, organizations could explicitly describe meaning in a machine-readable way:
- This is a Product
- It has this name
- It costs this much
- It is offered by this Organization
The industry adopted this enthusiastically, and the benefits were real. Rich results, Knowledge Panel eligibility, improved crawling and indexing. For many organizations, implementing Schema Markup on key templates was the first time they had thought systematically about how machines understood their content.
But most organizations stopped at the markup layer.
The majority of Schema Markup implementations today are still page-level. The JSON-LD validates correctly, the required properties are present, and the markup is attached to pages as part of the publishing workflow.
That is still valuable work. But Schema Markup was never intended to be the entire strategy. It is the ingestion layer of a much larger architecture. It is the bridge between human-readable content and machine-readable understanding.
The shift: websites as semantic data layers
The meaningful leap beyond page-level markup comes from building a structured entity layer, or “Content Knowledge Graph,” underneath the website itself.
In a Content Knowledge Graph, entities exist independently from individual pages. Products, services, people, locations, and brand concepts become persistent objects connected through explicit relationships rather than isolated page content. This changes how machines understand (and display) your business.
Cross-page consistency becomes structural instead of manual, and relationships between entities become queryable. The same governed entity layer can simultaneously support structured data, APIs, AI retrieval systems, internal applications, and future AI experiences.
This is what it means to treat your web presence as a graph instead of a collection of pages.
We built our platform around this principle from the beginning. Every implementation becomes a named graph with persistent entities, explicit relationships, and a machine-readable data layer that exists independently from the page layer.
The gap between “we implemented Schema Markup” and “we operate a Content Knowledge Graph” is still large for most organizations. But that gap increasingly determines whether AI systems can deeply understand your business or simply parse fragments of it.
The governance layer: validating models, not just syntax
There is a major difference between validating that JSON-LD is syntactically correct and validating that the data accurately represents the business. Most tools today focus on syntax validation. They confirm that the required properties exist, flag formatting issues, and check compliance with structured data guidelines.
That is necessary, but it is only the starting point.
Governance operates at the model level. It validates whether entities are represented consistently, whether relationships are accurate, and whether the business rules behind the data are being enforced across systems.
For example:
- Is the product connected to the correct organization?
- Is approved terminology being used consistently?
- Are controlled vocabularies being enforced?
- Has a content update created drift between the visible page and the structured entity layer?
In RDF-based systems, SHACL provides a formal way to validate these constraints. But the larger principle applies regardless of technology: governance is about maintaining data quality and consistency over time.
This becomes especially important as organizations scale. Schema drift happens continuously, products evolve, and content changes. Teams update pages without updating the underlying entity layer. Without governance, inconsistencies accumulate quietly until AI systems begin surfacing conflicting information back to users. This is real brand risk, especially in regulated industries like healthcare and finance.
Mature governance is an observability discipline. It continuously monitors for drift among source systems, the entity graph, and the surfaces consumers actually read, and alerts when claims about the same entity no longer agree. The goal is to detect inconsistency before a model surfaces it as an answer, not after.
Entity resolution: connecting the same entity, everywhere
Entity resolution is one of the most important layers in the stack.
Externally, it connects your entities to public knowledge sources like Wikidata or Google’s Knowledge Graph so machines can recognize they refer to the same real-world thing.
Internally, it ensures the same entity remains consistent across your own ecosystem. For example, the same physician appearing across a healthcare provider directory, location pages, and the main website should be recognized as a single entity, not as fragmented versions of the same person.
This matters because AI systems constantly compare information across sources. When entities appear inconsistent across your own properties, it becomes a trust issue.
On our own site, implementing robust entity linking by connecting our entities to external knowledge bases and improving internal coherence increased AI Overview visibility by 19.72%. That is the practical impact of entity resolution done well: a measurable expansion in the surfaces where your brand is cited.
Most organizations think about entity linking externally. The harder challenge is maintaining coherence internally at enterprise scale.
See how Entity Hub helps organizations manage entity relationships and governed data at enterprise scale.
When graphs alone reach their limits
A Content Knowledge Graph built on curated entities, explicit relationships, governance, and entity resolution is powerful. It provides precision, authority, and a structural model of your business that no amount of unstructured content can replicate.
But it is not the whole picture.
Graphs excel at representing what is known and structured. They are less effective at handling the messy, natural-language reality of how people ask questions (especially within AI interfaces), how content relates to content semantically (not just structurally), and how agents need to interact with your brand to complete tasks.
The limitations are specific:
- Semantic similarity at the phrase level. A graph knows that Product X and Product Y are related by an explicit relationship. It does not inherently know that a blog post about “reducing customer churn” is semantically relevant to a product page about “retention tools”, even though a human would make that connection immediately.
- Natural-language interfaces. Users and agents increasingly interact through conversation rather than structured queries. A graph provides the facts; it does not natively support the conversational layer that makes those facts accessible.
- Task completion. In the emerging agentic web, machines do not just look up information; they act: scheduling, quoting, comparing, purchasing. A graph provides the entities, but the action layer is something else.
That is why the modern AI stack is layered rather than graph-only.
The modern stack: complementary layers working together
The architecture supporting the agentic web is layered, with each layer contributing a distinct capability.
Duane Forrester recently mapped a four-layer machine-readable content stack as the architecture that comes after llms.txt, and he is right about every layer in it. Those four layers map closely to ingestion, graph, governance, and entity resolution, the foundation our customers have been operating for years. The retrieval patterns, conversational interfaces, and action layers that sit above them are where the agentic web actually happens, and they are the focus of the rest of this article.
Vector layer: semantic retrieval and similarity
Embeddings (dense numerical representations of text, entities, and concepts) provide what graphs do not: the ability to find content that is semantically similar even when there is no explicit structural relationship.
This matters for retrieval. When a user asks, “How do I reduce churn for enterprise accounts?” The vector layer finds the content on your site that is most relevant to that question, even if the page never uses the word “churn.” The graph then provides the precise facts about the products, services, and experts referenced in that content.
The combination is more powerful than either alone. Graphs provide precision. Vectors provide recall. Together, they ensure that the right content is found and the right facts are grounded.
Analytics on graph-shaped data
Once your business is modelled as a graph, the same algorithms that power search and recommendation systems become available to you internally. Modern graph platforms like AWS Neptune Analytics expose them as managed primitives, so the question shifts from “can we run this” to “what should we ask of our own graph.”
Two algorithms in particular are useful for content, SEO, and brand teams.
- PageRank, the algorithm that built modern search, ranks entities by the structural importance of the relationships pointing to them. Run over your Content Knowledge Graph, it tells you which entities in your business carry the most representational weight, and therefore which ones must be most accurate, most up to date, and most consistently described across the surfaces an AI system might cite. The entities at the top of that ranking are the ones whose facts you cannot afford to get wrong.
- Louvain community detection finds clusters of densely connected entities. Run over your graph, it surfaces the natural topic pillars in your business, often with a precision that human content audits miss. Large central clusters are the topics you already cover with authority. Small or isolated clusters are the gaps that show up when an AI system is asked to summarize what you do.
These are not new algorithms. What changes when you run them over a governed Content Knowledge Graph is the question they answer: not “which page is underperforming,” but “which entity in our business is structurally invisible to the systems we want to be cited by.” Personalized PageRank, seeded on a strategic topic, sharpens that further. That is the difference between a publishing asset and an intelligence asset.
AI retrieval and tool protocols
LLMs need three things to produce accurate, grounded answers: context, structure, and trust signals.
Retrieval-Augmented Generation (RAG) is the dominant pattern for providing context, fetching relevant information from a knowledge base and injecting it into the model’s prompt.
Graph-aware RAG is significantly more effective than naive text-chunk retrieval, because it provides structured context: not just a passage of text, but entities, relationships, and provenance that the model can reason over.
Tool protocols like the Model Context Protocol (MCP) extend this further by providing AI systems with a structured interface to your data that describes what data is available, how to request it, and the format of the response.
This is how your Content Knowledge Graph becomes callable by AI agents, not just indexable by crawlers.
Conversational interfaces over governed data
Natural-language interfaces (chatbots, conversational search, in-product copilots) are the front end of the agentic web. They are also the layer where governance pays off most visibly.
A conversational interface is only as good as the data it draws from. If the underlying Content Knowledge Graph is inconsistent or ungoverned, the conversational layer will confidently ship wrong answers, known as hallucinations. The user trusts the natural-language format more than they would trust a raw data display, which makes ungoverned chat worse than no chat at all.
The flip side is the opportunity. Open standards like MCP and NLWeb are emerging to let any organization expose their Content Knowledge Graph as a conversational interface that AI systems and agents can query directly. NLWeb-style interfaces turn your governed entity layer into a callable Q&A surface, with provenance preserved end to end.
The brands that publish their own conversational layer over governed data control how they are represented. The brands that do not get represented by whatever model someone else trained.
The action layer: from answers to execution
This is where the agentic web becomes tangible. Historically, websites helped users discover information, then pushed them into separate workflows to complete tasks like booking, purchasing, or requesting a quote.
Now those workflows are becoming machine-readable.
AI agents will not just discover that your organization offers consultations. They will understand how to schedule them, what information is required, and what constraints apply.
Schema.org already supports this direction through Action types like ScheduleAction, QuoteAction, and SearchAction.
We describe these machine-readable workflows as Agentic Entry Points: structured service definitions that tell AI systems what your business can do. Organizations that become callable by AI agents will have a structural advantage over organizations that remain informational only.
Callable does not mean open. Production action layers carry the same authentication, rate-limiting, and audit discipline as any enterprise API. The governance layer extends to who is calling and what they are entitled to do, not only to the data the response contains.
The brands that win this surface will be the ones that publish Agentic Entry Points the same way they publish APIs today: with explicit contracts, observability, and access control built in from the start.

Why this changes the future of SEO
The transition from traditional SEO to AI-driven discovery changes what organizations are optimizing for.
| Era | What machines do | What brands optimize for |
|---|---|---|
| Classic SEO | Rank pages | Visibility in search results |
| AI discovery | Assemble grounded answers | Being cited and trusted |
| Agentic web | Complete tasks on behalf of users | Being callable by agents |
This does not make SEO irrelevant. The teams that have spent years structuring meaning, modelling entities, and governing machine-readable data are uniquely positioned to lead this transition because the underlying principles remain the same. The systems around them are becoming more capable.
What the market is still missing
For all the energy in “AI readiness” conversations, most organizations are over-invested in content generation and surface-level format optimization, and under-invested in the infrastructure underneath.
They produce more content without an entity model to make it consistent, and chase new formats without governance to make the data accurate. They talk about agents without the structured, machine-readable service layer that agents require.
But you cannot prompt your way out of bad data.
If your Content Knowledge Graph is stale, your retrieval returns stale context. If your entities are inconsistent, your conversational interface generates inconsistent answers. If your action layer is not wired to governed data, your agent interactions are unreliable. Each layer in the stack amplifies the layer below it. That is the upside when the foundation is sound, and the risk when it is not.
What forward-thinking teams should do now
The full stack described here is not something you build in a quarter. It is a direction, and the practical question is where to start.
If you have not yet built a Content Knowledge Graph: Start there. Model your core entities and relationships. Establish governance. This is the foundation on which everything else depends.
If you have a Content Knowledge Graph but limited retrieval: Explore how your entity data can serve as structured context for AI systems, through APIs, through tool protocols, or through vector-augmented retrieval patterns.
If you are thinking about the agentic web: Identify one or two high-value, repeatable tasks that an agent should be able to complete with your brand. Define them as machine-readable actions and publish your first Agentic Entry Point. Being callable once is more valuable than planning to be callable broadly.
In every case, treat the machine-readable layer as a product: with owners, a roadmap, quality standards, and operational discipline.
Build every layer in this stack correctly and you can still publish a contradiction in plain English. The graph can be right while the paragraph above it is wrong, and AI systems will read that as a brand that does not agree with itself.
If you are exploring this architecture, the next step is implementation: entity-first design, governance, and retrieval systems grounded in trusted data. We have been building these systems for more than a decade. If you are looking to build a stronger foundation for AI search and the agentic web, get in touch with us here.

