How to Leverage Your Schema.org Knowledge Graph for LLMs Like ChatGPT

Schema Markup

It’s no secret that the AI revolution is well underway. According to a report by Accenture, 42% of companies want to make a large investment in ChatGPT in 2023.

Most organizations are trying to stay competitive by embracing the AI changes in the market and identifying ways to leverage “off-the-shelf” Large Language Models (LLMs) to optimize tasks and automate business processes.

However, as the adoption of generative AI accelerates, companies will need to fine-tune their Large Language Models (LLM) using their own data sets to maximize the value of the technology and address their unique needs. There is an opportunity for organizations to leverage their content Knowledge Graphs to accelerate their AI initiatives and get SEO benefits at the same time.

What is an LLM? 

A Large Language Model (LLM) is a type of generative artificial intelligence (AI) that relies on deep learning and massive data sets to understand, summarize, translate, predict and generate new content.

LLMs are most commonly used in natural language processing (NLP) applications like ChatGPT, where users can input a query in natural language and generate a response. Businesses can utilize these LLM-powered tools internally to provide employees with Q&A support or externally to deliver a better customer experience.

Despite the efficiency and benefits it offers, however, LLMs also have their challenges.

LLMs are known for their tendencies to ‘hallucinate’ and produce erroneous outputs that are not grounded in the training data or based on misinterpretations of the input prompt. They are expensive to train and run, hard to audit and explain, and often provide inconsistent answers.

Thankfully, you can use knowledge graphs to help mitigate some of these issues and provide structured and reliable information for the LLMs to use.

What is a Knowledge Graph?

Gartner’s “30 Emerging Technologies That Will Guide Your Business Decisions” report, published in February 2024, highlighted Generative AI and Knowledge Graphs as critical emerging technologies companies should invest in within the next 0-1 years. 

A Knowledge Graph is a collection of relationships between things defined using a standardized vocabulary, from which new knowledge can be gained through inferencing. When knowledge is organized in a structured format, it enables efficiencies in the retrieval of information and improves accuracy.

For instance, most organizations have websites that contain extensive information about the business, such as its products and services, locations, blogs, events, case studies, and more. However, the information is unstructured, because it exists as text on the website.

You can use Structured Data, also known as Schema Markup, to describe the content and entities on each page, as well as the relationships between these entities across your site and beyond. Implementing semantic Schema Markup can:

  • Help search engines better understand and contextualize your content, thereby providing users with more relevant results on the SERP
  • Help your organization develop a reusable content knowledge graph. This graph can provide valuable structured information to enhance your business’s capabilities with LLMs.

Learn the fundamentals of Content Knowledge Graphs and actionable steps to develop your own using Schema Markup.

Using an LLM to generate your Schema Markup

To develop your content knowledge graph, you can create your Schema Markup to represent your content. One of the new ways SEOs can achieve this is to use the LLM to generate Schema Markup for a page. This sounds great in theory however, there are several risks and challenges associated with this approach.

One such risk includes property hallucinations. This happens when the LLM makes up properties that don’t exist in the Schema.org vocabulary. Secondly, the LLM is likely unaware of Google’s required and recommended structured data properties, so it will predict them and jeopardize your chances of achieving a rich result. To overcome this, you need a human to verify the structured data properties generated by the LLM.

LLMs are good at identifying entities on Wikidata. However, it lacks knowledge of entities defined elsewhere on your site. This means the markup created by the LLM will create duplicate entities, disconnected across pages on your site or even within a page, making it even more difficult for you to manage your entities.

In addition to duplicate entities, LLMs lack the ability to manage your Schema Markup at scale. It can only produce static Schema Markup for each page. If you make changes to the content on your site, your Schema Markup will not update dynamically, which results in schema drift.

With all the risks and challenges of this piecemeal approach, the Schema Markup created by the LLM is static and unconnected for a page—it doesn’t help you develop your content knowledge graph.

Instead, you should create your Schema Markup in a connected, scalable way that updates dynamically. That way, you’ll have an up-to-date knowledge graph that can be used not only for SEO but also to accelerate your AI experiences and initiatives.

Synergy Between Knowledge Graphs and LLMs

There are three main ways of leveraging the content knowledge graph to enhance the capabilities of LLMs for businesses.

  1. Businesses can train their LLMs using their content knowledge graph.
  2. Businesses can use LLMs to query their content knowledge graphs.
  3. Businesses can structure their information in the form of a knowledge graph to help the LLM function more effectively.

Training the LLM Using Your Content Knowledge Graph

For a business to thrive in this technological age, connecting with customers through their preferred channel is crucial. LLM-powered AI experiences that answer questions in an automated, context-aware manner can support multi-channel digital strategies. By leveraging AI to support multiple channels, businesses can serve their customers through their preferred channels without having to hire more employees.

That said, if you want to leverage an AI chatbot to serve your customers, you want it to provide your customers with the right answers at all times. However, LLMs don’t have the ability to perform a fact check. They generate responses based on patterns and probabilities. This results in issues such as inaccurate responses and hallucinations.

To mitigate this issue, businesses can use their content knowledge graphs to train and ground the LLM for specific use cases. In the case of an AI chatbot, the LLMs would need an understanding of what entities and relations you have in your business to provide accurate responses to your customers.

Using the Schema.org Vocabulary to Define Entities

The Schema.org vocabulary is robust, and by leveraging the wide range of properties available in the vocabulary, you can describe the entities on your website and how they are related with more specificity. The collection of website entities forms a content knowledge graph that is a comprehensive dataset that can ground your LLMs. The result is accurate, fact-based answers to enhance your AI experience.

Let’s illustrate how your content knowledge graph can train and inform your AI Chatbot.

A healthcare network in the US has a website with pages on their physicians, locations, specializations, services, etc. The physician page has content relating to the specific physician’s specialties, ratings, service areas and opening hours.

If the healthcare network has a content knowledge graph that captures all the information on their site, when a user searches on the AI Chatbot “I want to book a morning appointment with a neurologist in Minnesota this week”, the AI Chatbot can deduce the information by accessing the healthcare network’s content knowledge graph. The response would be the names of the neurologists who service patients in Minnesota and have morning appointments available with their booking link.

The content knowledge graph is also readily available, so you can quickly deploy your knowledge graph and train your LLM. If you are a Schema App customer, we can easily export your content knowledge graph for you to train your LLM.

Using LLMs to Query Your Knowledge Graph

Instead of training the LLM, you can use the LLM to generate the queries to get the answers directly from your content knowledge graph.

This approach of generating answers through the LLM is less complicated, less expensive and more scalable. All you need is a content knowledge graph and a SPARQL endpoint. (Good news, Schema App offers both of these.)

  1. The Schema App application loads the content model from your content knowledge graph, which would be all the Schema.org data types and properties that exist within your website knowledge graph.
  2. Then the user would ask the Schema App application a question.
  3. The Schema App application combines the question with the content model and asks the LLM to write a SPARQL query. Note: The only thing the LLM does is transform the question into a query.
  4. Schema App application then executes the SPARQL against your content knowledge graph and displays the results or requests as a formatted response using the LLM.

This method is possible because the LLMs have a great understanding of SPARQL and can help translate the question from natural language to a SPARQL query.

By doing this, the LLM doesn’t have to hold the data in memory or be trained on the data because the answers exist within the content knowledge graph, which makes it stateless and a less resource-intensive solution. Furthermore, companies can avoid providing all their data to the LLM as this method introduces a control point to the knowledge graph owner to only allow questions on their data that they approve.

Overcoming LLM Restrictions

This approach also overcomes some of the restrictions of the LLMs.

For example,  LLMs have token limits, which restrict the input and output number of words that can be included. This approach eliminates this problem by using the LLMs to build the query/prompt and using the knowledge graph to query. Since SPARQL queries can query gigabytes of data, they don’t have any token limitations. This means you can use an entire content knowledge graph without worrying about the word limit.

By using the LLM for the sole purpose of querying the knowledge graph, you can achieve your AI outcomes in an elegant, cost-effective manner and have control of your data while also overcoming some of the current LLM restrictions.

Optimizing LLMs by Managing Data in the form of a Knowledge Graph

You can machine learn Obama’s birthplace every time you need it, but it costs a lot and you’re never sure it is correct.” – Jamie Taylor, Google Knowledge Graph

One of the most considerable costs of running an LLM is the inference cost (aka the cost of running a query through the LLM).

In comparison to a traditional query, LLMs like ChatGPT have to run on expensive GPUs to answer queries ($0.36 per query according to research), which can eat into profits in the long run.

Businesses can reduce the inference cost of the LLM by storing the historical responses or knowledge generated by the LLM in the form of a knowledge graph. That way, if someone asks the question again, the LLM does not have to exhaust resources to regenerate the same answer. It can simply look up the answer stored in the knowledge graph.

Unstructured data that the LLM is trained on can also cause inefficiencies in the retrieval of information and high inference costs. Therefore, converting unstructured data such as documents and web pages into a knowledge graph can reduce information retrieval time and produce more reliable facts.

As the volume of data in the hybrid cloud environment continues to grow exponentially, knowledge graphs play a crucial role in data management and organization. They contribute to the ‘Big Convergence,’ which combines data management and knowledge management to ensure efficient information organization and retrieval.

Build Your Knowledge Graph Through Schema App

In summary, the integration of knowledge graphs with LLMs can significantly enhance decision-making accuracy, especially in the realm of Marketing.

The content knowledge graph is an excellent foundation to leverage schema data in LLM tools, leading to more AI-ready platforms. It’s an investment that could pay off handsomely, especially in a world increasingly reliant on AI and knowledge management.

At Schema App, we can help you quickly implement your Schema Markup data layer and develop a semantically relevant and ready-to-use content knowledge graph to prepare your organization for AI.

Regardless of whether you use Schema App to author your Schema Markup, we can produce a content knowledge graph for you. Schema App can capture the Schema.org data from your existing implementation using our Schema App Analyzer to develop your marketing knowledge graph.

Get in touch with our team to find out more about how Schema App can help you build your marketing knowledge graph to enhance your LLM.

Mark van Berkel, Schema App

Mark van Berkel is the Chief Technology Officer and Co-founder of Schema App. A veteran in semantic technologies, Mark has a Master of Engineering – Industrial Information Engineering from the University of Toronto, where he helped build a semantic technology application for SAP Research Labs. Today, he dedicates his time to developing products and solutions that allow enterprise teams to leverage Schema Markup to boost their SEO strategy and drive results.

Menu