Schema Markup

How Enterprises Can Maintain Data Ownership in the Age of AI

Reading Time: 5 minutes

Key Takeaways

  • Data ownership drives AI independence and long-term flexibility for enterprises
  • Closed ecosystems increase costs and limit innovation within your organization
  • Open standards enable scalable, future-ready AI strategies

As organizations race to adopt AI, an important strategic question is emerging:

Will enterprises control their data and intelligence, or will they depend on a handful of proprietary platforms to access them?

The way companies structure, store, and expose their data today will shape how independent their AI capabilities are tomorrow. Organizations that maintain control over their data can integrate new technologies and build flexible AI strategies. Those that rely on closed systems risk becoming dependent on a single vendor ecosystem.

The Risk of Closed Data Ecosystems

Venture capitalist Bill Gurley has warned about the growing risk of “closed data” vendors. These vendors restrict how customer data can be used, often limiting it to their own AI models or platforms.

When data is locked inside a closed ecosystem, organizations lose flexibility. It becomes harder to integrate new tools, adopt specialized solutions, or use proprietary data to train independent AI systems.

Gurley argues that long-term innovation depends on interoperability and portability. Instead of locking data into a single platform, companies should focus on connecting specialized tools and workflows around their own unique datasets.

Salesforce and the API Tax

One example often cited in this debate is Salesforce. While cloud platforms originally promised easier access to data, some vendors now charge significant fees for customers to retrieve or move their own data through APIs.

This practice creates what many describe as a “digital toll road.” If a vendor charges you to access the data your organization created, it becomes harder to move that data into new systems, train independent AI models, or integrate best-of-breed tools.

Over time, these costs and restrictions can push companies to remain within a single ecosystem, even when better technologies exist elsewhere. Gurley sees these closed environments as a direct risk to innovation.

The FAIR Framework: The Standard for Data Excellence

To avoid these limitations, many organizations are adopting the FAIR data principles. Established in 2016, these guidelines provide a measurable roadmap for ensuring data is optimized for both human and machine intelligence:

  • Findable: Data must have unique, persistent identifiers (such as DOIs or IRIs) and rich metadata so it can be easily found by AI agents.
  • Accessible: Data should be retrievable via standardized, open communication protocols (like HTTP or SPARQL).
  • Interoperable: Data must use a shared, broadly applicable language for knowledge representation—this is where RDF and Schema.org become critical.
  • Reusable: Data must have clear provenance and usage licenses so it can be combined and utilized in new settings without legal or technical friction.

It is important to note that FAIR does not mean public. Data can remain private and secure while still being structured in a way that allows internal teams and AI systems to use it effectively.

Learn what marketers must do to prepare their data for the Agentic Web.

The Technical Foundation: The Semantic Web Stack

The open approach Gurley advocates is not only a business philosophy. It is supported by a technical architecture based on W3C standards, often referred to as the “Semantic Web Stack.”

These technologies allow organizations to transform raw data into a structured Knowledge Graph that remains portable and vendor independent:

  • RDF (Resource Description Framework): A graph-based data model using global identifiers (IRIs). It is “open-ended by nature,” allowing new relationships and data to be added without breaking existing structures. Rich ontologies (including OWL, where you need formal reasoning) can sit on top of this graph model.
  • Schema.org: A shared vocabulary understood by search engines and AI. When combined with RDF, it creates “universal machine readability”.
  • SPARQL: A powerful query language that allows for distributed queries across different data sources, ensuring you aren’t trapped in a single proprietary silo.
  • SHACL (Shapes Constraint Language): A W3C standard for describing shapes and constraints on your RDF data so you can validate that graphs match your business rules. SHACL is useful for governance, quality, and keeping agent-facing data consistent without locking it in a proprietary schema tool.

R.V. Guha: An Architect of the Open Web

R.V. Guha spent decades pushing the web toward structured, portable data.

At Apple, he built the Meta Content Framework (MCF); at Netscape, he worked with Tim Bray to turn MCF into an XML-based framework that fed the W3C’s Resource Description Framework. Marc Andreessen, writing in 1999, likened Guha’s early RDF work to Aldus Manutius making books pocket-sized, as RDF was a step toward machine-usable structure across devices, not just more pages.

Guha was an acknowledged contributor of RDF and named co-editor of RDF Schema (RDFS), the 2004 Recommendation with Dan Brickley, the vocabulary layer on top of RDF’s graph model.

A Legacy of Interoperability

Guha’s prior projects form the bedrock of the open web:

  • RSS (Really Simple Syndication): Guha created the first version of RSS, enabling a “programmable web” where content could flow freely across sites without being trapped by a single gatekeeper.
  • RDF (Resource Description Framework): As a primary architect of RDF, he helped establish the graph-based “web of data” model, identifiers and relationships that make enterprise knowledge portable across tools and vendors.
  • Schema.org: As a founder of Schema.org, he helped define a shared vocabulary that major search engines and many AI systems already use as a bridge between published content and machine understanding.

These standards now play a central role in how search engines and AI systems interpret information on the web.

NLWeb and the Future of AI Access

Guha’s latest initiative, NLWeb (Natural Language Web), represents the next evolution of this “open” philosophy in the age of AI. Unlike ChatGPT, which often acts as an external “black box” that crawls and guesses what a website means, NLWeb puts power back into the hands of the website owner:

  • Data Sovereignty: NLWeb allows publishers to create conversational interfaces directly on their own sites, using their own data without surrendering it to a central model provider.
  • Standardized Handshake: Every NLWeb-enabled site functions as a Model Context Protocol (MCP) server. This allows any trusted AI agent to query the site’s data accurately, rather than “scraping and guessing”.
  • Future-Proofing: By combining NLWeb with MCP, you ensure agents receive governed, accurate data, effectively “agent-proofing” your content for the next generation of AI discovery.

How Schema App Supports an Open, AI-Ready Data Strategy

Schema App enables enterprises to build and govern their Content Knowledge Graph using open semantic web standards like Schema.org and RDF, with support for emerging open-source protocols like NLWeb. This ensures your data remains portable, interoperable, and fully under your control.

Unlike closed platforms, Schema App is built on an open architecture. You are not locked into a proprietary system or forced to pay to access your own data. Instead, your Knowledge Graph becomes a reusable, governed data layer that can support search, analytics, and AI use cases across your organization.

By structuring your data in this way, you are not just optimizing for today’s search engines. You are creating a foundation that allows AI agents, applications, and future technologies to reliably access and use your data without friction.

Addressing Common Concerns About Data Ownership in Your Partnership With Schema App

Objection Rebuttal & Support
"We fear vendor lock-in." You own the data and the model. Because your Schema App Knowledge Graph is built on open W3C standards (RDF + Schema.org), the data is portable and can be exported at any time.
"This sounds expensive and complex." The upfront cost reduces long-term debt. RDF's open-ended nature means new consumers (chatbots, analytics) can reuse the same graph without new integration costs later.
"Why not just use simple JSON?" RDF provides the "meaning". While we at Schema App may use JSON-LD for compatibility, RDF is the model that defines the unambiguous relationships required for AI to function without error.
"Search/AI won't use this." It is the language of Agents. Exposing your graph via standard protocols like MCP makes your data directly callable by AI agents, transforming your site from a destination for people into an API for agents.

Data Ownership Is the Foundation of Enterprise AI

The most resilient enterprises will be those that control and connect their proprietary data across systems.

By building on open standards like RDF and Schema.org, organizations create a portable, interoperable knowledge layer that supports evolving AI use cases without vendor lock-in.

When you own your data, it remains a strategic asset, ready to power innovation instead of being restricted by it.

Profile image of Mark van Berkel, Chief Technology Officer and Co-founder of Schema App.
CTO, Co-founder

Mark van Berkel is the Chief Technology Officer and Co-founder of Schema App. A veteran in semantic technologies, Mark has a Master of Engineering – Industrial Information Engineering from the University of Toronto, where he helped build a semantic technology application for SAP Research Labs. Today, he dedicates his time to developing products and solutions that help enterprise teams structure and connect their data so it is accurately understood by search engines and AI, improving visibility and enabling more effective AI-driven outcomes.