The 4 Steps to Building a Content Knowledge Graph

Last Updated: 5 months ago

Reading Time: 9 minutes

Table of Contents

Knowledge graphs have been central to semantic technology for decades. From healthcare and eCommerce to fraud detection and SEO, knowledge graphs empower organizations to harness the full potential of their information architecture.

But even with a long history, knowledge graphs are more relevant than they’ve ever been. According to Gartner’s Emerging Tech Impact Report, a robust knowledge graph is imperative for organizations looking to implement generative AI technologies. Knowledge graphs can help organizations ground their AI initiatives—like LLMs—in factual data about the organization.

If you’re interested in building a knowledge graph but are unsure where to start, you’re in luck. The good news is that if you have a website, you can construct a reusable content knowledge graph that supports both SEO and your internal AI initiatives.

This article will take you through the four steps of building a content knowledge graph using the Schema.org vocabulary.

Why should you use Schema.org to build your content knowledge graph?

You can create a knowledge graph using any number of ontologies, vocabularies, or glossaries. However, Schema.org should be the vocabulary of choice for constructing a content knowledge graph since it allows you to simultaneously maximize the SEO benefits.

Help search engines clearly understand and contextualize the content on your web page

The Schema.org vocabulary was created by major search engines as an industry-standard vocabulary for translating human-readable web content into a language that machines understand. By using this vocabulary to construct a knowledge graph based on your content, you’re also reaping the SEO benefits that come with it, including:

Equipping search engines with an accurate understanding of your brand content
Facilitating accurate and pertinent search queries that closely match your content
Driving more targeted, engaged, and quality traffic to your site

Achieve rich results and stand out in search

By annotating your web content with the required Schema.org types and properties, search engines like Google may award visually enhanced search features for content like Products, Videos, Recipes, and Ratings. These rich results present key information directly in the SERP, and can increase click-through rates and drive more engagement and quality traffic to your pages.

Building Your Content Knowledge Graph

So you know about the SEO benefits of using Schema.org, but how does that get you a content knowledge graph? In the book Knowledge Graphs: Methodology, Tools and Selected Use Cases, Semantic Web and Knowledge Graph Experts, Fensel et al., break down the process of creating a knowledge graph into four steps:

Knowledge Creation,
Knowledge Hosting,
Knowledge Curation, and
Knowledge Deployment.

We’ve applied an SEO lens to these steps to explain how you can create a robust content knowledge graph using your organization’s web content. Let’s get started.

Step 1: Knowledge Creation

The first step to building a content knowledge graph is having high-quality, original content on your website and marking up that content using the Schema.org vocabulary.

Have high-quality content on your website

As a general first rule, you need to ensure your website content supports the specific objectives of your organization. Whether you’re selling good and services, educating the public, or wanting to build authority in a particular domain of expertise, the content across your website should exist to support those goals.

Beyond this, Google has shared guidelines on what it deems “helpful, reliable, people-first content.” This is an excellent resource that provides a series of questions you can use to assess the quality of your content. For example, you’ll want to ensure your content provides:

Original information, reporting, research, or analysis
A substantial, complete, or comprehensive description of the topic
Substantial value when compared to other pages in search results

Marking up your content using the Schema.org vocabulary

To start building your content knowledge graph, you must annotate your high-quality content using types and properties from the Schema.org vocabulary. The annotations can be expressed in various formats, but Google recommends using JSON-LD. This translates the human-readable content on your website into machine-readable statements called RDF triples.

Include URIs in your Schema Markup to disambiguate your entities
In order to provide value beyond SEO, the entities in your Schema Markup must be represented by Uniform Resource Identifiers (URIs).

In JSON-LD, these identifiers appear as @ids to give the entities in your markup a unique identity that disambiguates and differentiates them from other entities – similar to how a social security number can uniquely differentiate people who may share the same name.

While Schema Markup still provides SEO value without including @ids, they are a requirement for the markup to become a reusable knowledge graph.

How to Apply JSON-LD to Web Pages
There are a few options for implementing Schema Markup on your web pages. You can manually author the JSON-LD and insert it in your webpages’ HTML, or you can use a plugin to generate and deploy the markup on your site.

Manual authoring requires technical expertise and isn’t scalable if you have a large number of webpages, and while plugins that auto-generate markup are a scalable authoring solution, the markup is generally much less descriptive. If you want to customize your markup and ensure it is dynamic and connected, we recommend using the Schema App Highlighter to generate and deploy your markup at scale without having to do any manual coding.

Whatever method you choose, the Schema Markup you author must appear in the HTML of the webpages being described, making it available for search engines and other web crawlers. In this state, your webpage content transforms into semantically enriched data, but this data doesn’t truly become a knowledge graph until it’s been collected and stored.

Step 2: Knowledge Hosting

In the hosting step, the Schema Markup you’ve authored for your website must be collected and hosted in a way that allows the RDF data to be retrieved.

Collecting the Schema Markup

There are two ways of collecting the Schema Markup once it has been applied to a website:

1. Crawling: Where a crawler crawls a website, extracts the JSON-LD that has been applied, and stores it in a knowledge graph.

2. Mapping: Many tools that map content to Schema.org will also store that markup in a knowledge graph.

But where does this storage occur?

Storing Data

Because knowledge graphs are represented as RDF triples, the best place to store them for easy retrieval is an RDF database or triplestore. There are a variety of RDF stores available. Examples include:

OpenLink Virtuoso
Ontotext GraphDB
Amazon Neptune
Stardog
AllegroGraph

For more information and to compare the various options, check out DB-engines.com. They rank the popularity of database management systems and provide helpful analysis.

Retrieving Data

You can retrieve RDF data from a database or triplestore using SPARQL – an RDF query language. In the simplest terms, SPARQL uses known information to find unknown information (variables) using pattern matching.

For example, we could write a SPARQL query that says, “Find all the people in my database who work for Schema App and know about semantic technology.” “Mark van Berkel,” our co-founder, would return as a match, and so would all other entities in our knowledge graph that match the same criteria.

When you add Schema Markup to your website using Schema App’s authoring tools, we host that data for you in our Knowledge Graph Data Platform. You can query your own graph using the SPARQL endpoint interface in your account. You can also use our Export Data API to export your knowledge graph for reuse in other contexts.

Once you have found an appropriate way to host your knowledge graph, you can move on to curation.

Step 3: Knowledge Curation

It is a well-known fact that cleaning data is time consuming, and resource intensive, especially if you’ve got a lot of it. That said, we will address 3 of the most important aspects of curating your data to ensure your high-quality web content has resulted in a high-quality content knowledge graph.

In the knowledge curation step, you should ensure that the data within your content knowledge graph is:

Accessible
Correct
Complete

Let’s break those down further.

Accessible

The data in your knowledge graph needs to be available.

For example, when extracting your content knowledge graph from your website, you’ll want to ensure that none of your web pages run into issues like “404 not found” errors. You will also want to ensure that the RDF store you’ve selected for hosting keeps your data retrievable and secure.

Correct

Your markup is free of syntax errors
The language used to express your knowledge graph can’t have syntax errors like missing commas or brackets in the wrong places. Auto-generated markup from plugins or other authoring tools will prevent this from happening, but if you’ve authored your markup manually, you’ll need to take extra precautions. Syntax errors can be identified on a page-by-page basis by running your pages through the Schema Validator.

The markup must align with the content on the page
If you make content changes to your page without updating your markup, your knowledge graph will become out of date.

Assessing whether the statements in your markup are correct and up-to-date can be difficult depending on the size of your dataset and how you manage your Schema Markup. This is especially true if you implement your markup manually. As previously mentioned, data cleanup is complex and resource-intensive, and becomes ever more so as your content grows and changes over time.

Therefore, we recommend using a dynamic Schema Markup generator tool like the Schema App Highlighter to ensure your page’s markup always aligns with its content and your RDF triples remain correct.

The markup follows the Schema.org vocabulary guidelines
You also need to ensure that your entity types use the most descriptive properties and that the properties used connect to expected types. For example, I can’t say that a Person worksFor another Person, because Schema.org states that the worksFor property can only connect a Person to an Organization.

The types and relationships you apply to your data dictate what you’re able to query for in your graph, and as a result also play an essential role in the completeness of your Knowledge Curation.

Complete

Ensure your knowledge graph contains enough data to answer queries relevant to your use cases. For example, if you want to know the correlation between ratings for products of specific sizes, colors, or prices, those properties must exist in your data.

In cases where your content references well-known entities (like brands, people, places, or concepts), you may also want to implement entity linking. Entity linking is a process that identifies entities in text and links them to corresponding known entities from external knowledge bases like Wikipedia, DBpedia, and Google’s Knowledge Graph.

You can apply entity linking:

Manually for absolute precision
Automatically using Natural Language Processing APIs

Once embedded in your markup, these entities provide additional SEO value by helping search engines like Google disambiguate and contextualize your content to provide more accurate results for search queries. When it comes to your content knowledge graph, entity linking makes your knowledge graph more descriptive, providing an even richer data layer for you to reuse.

Step 4: Knowledge Deployment

The knowledge deployment stage transforms the knowledge graph’s theoretical structure into practical applications that drive tangible benefits for your organization and its stakeholders. In fact, I prefer to call this the “Reuse” stage, since this is when you can finally reuse the knowledge graph you’ve created for all sorts of different initiatives.

To reap the SEO benefits we’ve previously described, you’ll need to ensure you’ve published your Schema Markup externally for search engines to consume.

Beyond the SEO benefits, your content knowledge graph can be reused for things like enhancing user experience, content optimization, and AI and machine learning. Let’s explore these opportunities further.

Enhancing User Experience

You can utilize your content knowledge graph to improve website navigation and internal search functionality.

For example, if a user visits a product page on an eCommerce site for smartphones, the content knowledge graph can be leveraged for a recommendation engine to dynamically generate suggestions based on the products being viewed. This can appear as a “You May Also Like” section or complementary products, like phone cases or chargers, suggested during checkout. This enhanced user experience can significantly increase engagement and conversion rates.

Content Optimization

You can use your content knowledge graph to optimize existing content or identify gaps in your content.

For instance, your organization likely publishes blog posts on various topics. With a content knowledge graph, you can analyze the connections among entities in your blog posts. This analysis helps you pinpoint clusters of related topics or categories that have more coverage. If you notice gaps in the topics your organization wants to emphasize in their web presence, you can create additional content to fill those gaps.

AI and Machine Learning Applications

Organizations can use knowledge graphs to accelerate their AI initiatives, including Chatbots and other LLM functions.

Knowledge graphs provide a foundation for training AI and machine learning models for tasks such as natural language processing, recommendation systems, and predictive analytics. Knowledge graph data is already structured, making it easier for machines to process than unstructured content (natural language). This makes using AI less costly as use continues to scale.

If you’re concerned about the risks of hallucinations from LLMs, you’ll be happy to know that knowledge graphs can also be leveraged for Retrieval-Augmented Generation (RAG), resulting in more accurate answers to queries.

These are just some of the ways a content knowledge graph can support your organization in this rapidly changing technical landscape. And the best part is, you can easily construct one with the pre-existing data that constitutes your website.

Developing a Content Knowledge Graph for Your Organization

Although creating a content knowledge graph has only four steps, implementing these steps can be resource-intensive. However, with the numerous possibilities for reuse, building a content knowledge graph is a worthwhile investment that will yield a strong return as semantic search, AI, and knowledge management continue to evolve.

At Schema App, we can help you implement your Schema Markup data layer and develop a semantically enriched reusable content knowledge graph to prepare your organization for AI and support your semantic SEO efforts.

Get in touch with our team to learn more.

Jasmine Drudge-Willson Product Manager

Jasmine is the Product Manager at Schema App. Schema App is an end-to-end Schema Markup solution that helps enterprise SEO teams create, deploy and manage Schema Markup to stand out in search.

AI, AI Search, content knowledge graph, Knowledge Graph, Semantic SEO

Measurable Impact of Scaling Entity Linking for Entity Disambiguation

The Value of Schema Markup for Healthcare Organizations