Interview with Schema App Creator Mark van Berkel: Schema at Scale Now and in the Future

Schema App News

Martha: Hi and welcome to schemas stories. My name is Martha van Berkel and here we interview people who are thought leaders in the schema markup and structured data world and today I’m absolutely delighted to welcome my co-founder Mark van Berkel. Welcome Mark.

Mark: Hi Martha.

Martha: So, let’s kick off and talk a little bit, have you start by just telling us a little bit about yourself, a little bit about your background.

Mark: Sure.  So, I started off as a developer and spent a few years doing custom development. In 2005 started Master of Engineering. So, there I started to learn about semantic technologies way back in the day. I was 13 years ago and did a proof of concept for SP research labs which was really interesting and that kind of whet my appetite for getting into it, but we were a little early even in those days of course with the tools that were available. Spent a few more years doing consulting and other kind of IT projects and then 2012 started Hunch Manifest and started rolling through some different ideas and eventually landed on the Schema App idea. So, yeah, I guess my background; developer first and I was into a technical team leads architecture really interested in the information architecture especially and kind of the interplay between semantic technologies and the rest of the world.

Martha: Fantastic and so tell us why did you build schema app? Where did that come from?

Mark: Good question. Well, while we started the business in 2012, I hadn’t yet determined, or we hadn’t determined what we’re going to actually build and sell and what was going to take off, so we had tried a couple of ideas. One of the ones was in 2013. I had built a kind of a little gadget which would help some of our marketing clients to help get found on the web and it was specifically very narrow. We were looking at home and construction businesses and this is back in 2013, so we were trying to create the kind of templates for schema markup and sort of put it in the hands of those people who need help getting found online. So, that was a very early days and it was just kind of an interest of mine, but it was about a year later when I was at SEMTECH BIZ conference in San Francisco where there were a few different thought leaders again in kind of the intersection of semantic technology and SEO. I believe J Myers from Best Buy was there, Barbara Starr and some other person on the panel and I’m forgetting who it was at this moment but it was kind of an interesting intersection because they were even articulating how few people there were that lived at that intersection. So, coming from the semantic technology background was like “Oh well. This is something that’s very interesting to me and something that we can work on”. So, it was from that I started to—I introduced a JSON-LD schema generator. So, this was in 2014 but it was still a bit premature. So we were at a conference later, probably a year later after that first one, I met Aaron Bradley at the conference and he’s like “Oh.  That’s a great little tool” and he wondered though if Google was using it to reward companies with Rich Snippets and the answer at that point was like “No. I haven’t seen evidence of that actually yet.” But, so, we were a little early with the JSON-LD product but from there it was just kind of like—Google then did support it and then it was like “Okay well, it’s game on. Let’s really get this schema product and schema vocabulary into the hands of a lot more people. So, for me it was just how can I make it a lot easier for adoption; Where marketers who maybe don’t have the IT resources to build into their site how do we kind of start getting some robust schema markup into their hands and then also for the experts. How do we enable them to do the more complex things but at a scale that is otherwise quite difficult to achieve? So, to us that’s really where I think you know we’re providing a lot of value and where we continue to kind of build out our products.

Martha: Now you kind of skipped over the fact that you also built it for you to use, you know, as we found ourselves as a digital marketing agency. Do you want to talk a little bit about sort of the challenges when you first started schema markup and then we’ll transition to talk about like some of the biggest challenges of doing it at that scale?

Mark: Yeah. So, we had a period of two years roughly where we were a marketing agency providing kind of SEO and email marketing strategies and services and where this was something that we often recommended clients to take a look at. We did want to include schema markup among their kind of tactics and so for me it was kind of a productivity tool. How can I generate the schema markup a little quicker. So, at the time, there was no generators out there and there was one actually by Raven Tools, but it was very limited in its scope. So, I wanted something for the whole vocabulary, so I could more well describe all those businesses that we were doing marketing services for, so describing all their services and all the products with their additional attributes and properties and all those things is where like the juicy details live, where you can actually really articulate something and again Google was moving quickly—they’re introducing new features. So, for us it was a productivity tool, so how can I generate it and then also maintain it. That’s a big part that I think is overlooked. It’s great to generate and copy and paste code in but what happens next month when the rules change or the vocabulary changes like what do you do to go back and update all that stuff and I’m not really interested in doing a lot of maintenance as my co-founder can attest in terms of like doing work over and over again. I’m really happy to figure out how to solve it once for a bunch of people so that now with kind of this database generating tool we can then actually query the data, update the query or update the data on the fly and kind of make the maintenance a breeze rather than having making it a pain and actually overlooked.

Martha: You talked to me a little bit. You talked a lot about maintenance as being one of those big challenges when you’re starting to do schema markup more and more detailed as well as that scale can you talk a little bit more about sort of other challenges you’ve seen working with sort of global clients around doing schema markup at scale and perhaps how you ever come though at those challenges.

Mark: So, I guess primarily there’s a bit of a divide between marketing and IT – like marketing wants to adopt all these tools, all these features so that they can get the latest from—the latest features from Google and yet IT wants to be in control of the technology. They want to be the ones calling the shots and the implementing and sometimes there’s a bit of a tension between the want to go fast and while also maintaining, you know, maybe the stability of the system and kind of maintaining like that which IT does often and does well. So there’s a bit of this tension and that’s kind of one thing that we often deal with is how do we kind of speak to the IT team with how they can and are still playing a role among the roles and the different ways in which they may want to think about this. There’s a whole lifecycle approach to this—like you know there’s probably four different steps to the lifecycle of your schema markup. There’s one about determining what’s your strategy; what are the things you’re going to mark up, what are kind of the content things that you may want to adjust in your HTML,  secondly there’s the generating part—like how do you actually map the data that you have into the schema markup, then there’s also a reporting kind of aspect to this. A third step, so like how do you know that and how do you measure that it’s been implemented. How do you know like the depth or the breadth of your content and schema on your website. And then there’s also the reporting and then leveraging this, so once you have all this schema markup you can then repurpose it for analytics or repurpose it for chatbots. There’s always so many things they’re going to add on and this kind of expansion opportunity that IT doesn’t look at. So, they [IT] look at like the second step. Marketing might look at the first step which then marketing might say okay here’s what we want to mark up and then IT says okay well here’s your generated code but then they wave their hands and they’re done. So you know, how does marketing kind of keep on top of the quality of that content – is it meeting the needs they want and are they actually then repurposing it to kind of make the best use of all this rich information. So that’s kind of like I guess overall like we see that over and over this kind of pattern or maybe limited thinking in terms of the lifecycle but more specifically like microdata for a long time has been the kind of method of choice. So, adding properties within HTML elements has been a great way for templating those repeated data items within a web page but the challenge with that has been the maintenance again so like designers might go in and adjust the display of the image and they might forget to retain the item prop and you know there goes your image feature for Google and you know while marketing may also be wanting to get things in there you know the kind of interplay of like the developer and the designer and the marketing team like kind of like there’s too many hands in the pot with my core data so it also has some challenges with all of those changing requirements and kind of the busy landscapes and simpler smaller teams like maybe that’s not a challenge maybe that person is all one in the same and you know everything’s hunky-dory. Also, JavaScript implementations are quite popular in the last year as well but they’re also not a golden bullet I would say. Basically, it’s just you take JavaScript there’s a couple of different approaches you could use. I think predominantly they use like data scraping so you use JavaScript to inspect elements on the page to determine you know for some ID or some class within the HTML that’s unique to the page you grab that and then you pull it in to the name of the product or the name of the article or whatever the case may be. But again, this is sensitive to changes in design so if you’re having a team that’s maybe a bit distributed and either maybe the designers adjusted something again or perhaps it’s a third party that you have on that site which does reviews, and they put in widgets for all the reviews that thing has generated or the aggregate score if they change something that can throw off your JavaScript. So, again there’s quite a bit of maintenance to it so that’s one of the challenges with JavaScript and then there’s also kind of like the moody structured data testing tool that like month to month maybe works and maybe doesn’t and so like you have to really know whether or not your JavaScript implementation is valid so check them to see if it’s compliant with the Google Bots. oh that’s a Chrome version 41 so like making sure that like there’s no JavaScripting in there that’s going to have a problem with that. It’s going to help you to be assured that the Google is actually going to pick it up and then there’s other ones like the other consumers don’t see it so Google is great to be supporting a render JavaScript but being in some of the other ones do not provide that kind of level of quality for JavaScript support. So, this is also kind of one of the challenges with JavaScript but it’s great to often let’s say bootstrap things because it’s pretty—if you have a tag manager—it’s pretty easy to setup and pretty easy, pretty quick to get done. The other solutions, I don’t know, maybe templated JSON-LD. Like if you have some sort of template for products or for articles then you may want to have the developer create schema markup in a JSON-LD block. So at least there you’re taking out the design of the page you’re just kind of you have that information layer they translate and see a markup and you maybe have the marketing person who’s asking for questions for changes in that schema markup and that can take a couple of weeks though, or months of time before you have those cycles where the development team actually gets the changes implemented. You know they’re also, as a developer myself, I also had to learn the lesson that I’m not a schema expert so even asking a developer just to put schema markup in there, it doesn’t mean it’s going to be right and it probably won’t be unless if they also had their hand and the schema markup pot for a while to understand some of the best practices.

Martha: And so, I’ll just interrupt you here to say like does schema app solve some of those problems like as part of the reason you built it to try to look at the best of those worlds. Can you just speak briefly about that since I want to also talk about schema ownership as another topic?

Mark: Yeah. So, yes, we solved a number of those things. So, especially the IT challenge so we provide marketing team the tools to be able to markup these things without having to touch a line of code so yes so like the time to change is much faster. They’re able to deploy markup on a whole site within a couple of hours. Even for big sites if it’s templated. So, yeah, our highlighter does quite a bit of the work for these large sites and then also our editor is quite robust as well at creating the detailed markup that they may want to do but then we also provide some of the add-ons that kind of bootstrap—get you up to it kind of a minimum level of quality for all your vocabulary on your site so for WordPress you can install our plugin or for Shopify or add-on and things like that and that’s kind of like getting you to at least I’ll say like table stakes and then after that we kind of want you to optimize even further yet. So, we give them that flexibility. It’s not bulletproof by any stretch of the imagination like maybe the designs could have an impact you to how we roll out our schema markup, but you know as long as it’s in single hands or single control of one person then I think it it’s at least something that can be solved.

Martha: Can you talk a little bit about schema ownership and this is something I think that’s starting to evolve in the market as we you know even see the or come out and talk about clean reviews and even some of the releases we saw today about talking, about how you can, you know, like have your site crawl their URLs for job postings or tell Google which sites have job postings if you talk a little bit about schema ownership how do you see that today and then where you see that going?

Mark: Yeah. The recent stuff is all around the SD publisher and SD license—structured data I don’t know why they abbreviated it but its stands for license structured data publisher I think it is and so those are attributes you can put into your schema markup to say I’m the owner and I license anybody under the Creative Commons you know XYZ license to use this data so this is definitely useful for instructing those consumers like Googlebot or Bing or you know Yandex or Schema App or whoever to kind of like give them the instructions for how they want to be used. So, I think that would allow you to kind of open source your data so if you want it to be kind of like really like shared through the data Commons you can include that publishing license to say like yes that’s great like put all my data or put certain parts of my data into this kind of you know broader web repository and I think that’s just it’s a very practical step for actually acknowledging their license of which you’re sharing this information so there’s that kind of ownership of that data but even within an organization I think there’s ownership like to questions sometimes like around who has the responsibility for that schema markup? Is this a question of like is there a data architecture team or is there a marketing function like who owns that that schema quality and that schema you know like quantity even like who’s responsible for the site wide maybe it’s segregated by sub-site and I don’t know if people really have a clear sense of who takes ownership for that and some businesses like I think often it comes to and falls to marketing but like it’s not really like a question we hear very often.

Martha: Very cool! Thanks for sharing.  So, one last question and we’ll be at a time. So, where do you see this evolving? Where do you see this going you know over the next? I’ll say one, two, maybe five years.

Mark: So, where are things going? Um – good question. So, I guess with the segway from the last one is that data Commons is kind of an interesting possibility so today that has the claim review and you can download all this basically schema markup from the data Commons and you can then kind of do what you want to, use it to do some analysis to determine whether or not the claim reviews should be true and when they’re actually stated as false and going to do some interesting add-ons from that and I would say like this is probably an early signal of what is to come. So, I do really think there’s the opportunity is hidden these kind of add-ons and other ways in which you can kind of look at your data like I have kind of a list of things that I’m thinking about or things that I think we’re starting to see emerge and you know some things are not even new likes just like maybe reimagine ways of doing things because now you have a common vocabulary for a lot of things in a lot of sites so like it could be their spins from talk for a couple of years around augmenting your analytics data with your business information or semantic analytics so like how do you kind of segment your data for your blog posts on your site by author or by tag or by category and to kind of provide additional insight as to which is the best performing types of content like this is its I still don’t think that’s got enough legs yet and so that’s got a long way to run because it’s still kind of difficult to get set up and what else like there’s still things like if you have a knowledge graph like let’s say if you’ve done a good job your markup on your site. What else? Can you do that like you could think of this as a data repository for informing a chatbot for instance like this would be a very common one – I think that people can look at. So, there’s lots of interesting actions in the vocabulary. So, maybe there’s a potential action to read a white paper. For instance, like if you have all your white papers or those downloadable things in a vocabulary then you can then provide that over to some sort of chat bot or just translation into some other consumers for that list of all the white papers or it could be other forms like contact forms like there’s a bunch of different actions I take a view actions and watch actions like for different tools and like we see a bit of the Google assistant stuff helping you to link up with podcasts and you can play like a podcast based on the schema markup so why couldn’t it be a good TV show or a video object and you know why does it have to be limited to Google when Alexa is also coming around doing similar things and you know or there’s maybe your own experience like you could just repurpose that same information and into your own internal assistant services or chatbots like maybe through the Facebook chatbot infrastructure and other things I’ve thought about are like different browser extensions like ways in which like consumers can just kind of like leverage the schema markup or maybe on-site search of the AdWords custom targeting I haven’t seen I personally haven’t seen too much of that implemented but I’ve heard people talking about it and I think that is kind of an interesting opportunity. Otherwise like I should just mention generally of the community like that vocabulary will I’m sure it’s going to continue to expand like we continue to see like two or three releases per year so we’re going to see more and more specific classes and enumerations and maybe we’ll see more extensions like the gs1 or maybe we’ll actually get a definition of Google’s own extension for the things that they keep putting out but it’s I think that can also be motivated by these other add-on so let’s say the strategy could be driven by this kind of augmented analytics so like knowing which content articles perform the best can inform your content strategy and so maybe there’s just a way to segment the data that you could expose with but it’s not in the vocabulary so you may want to just create your own extension and so I think that that’s probably got a long way to go but this is a good place to start. Yeah, otherwise I think I’d have to just add one more thing. Google’s adding so many things that are interesting lately that it’s going to be a lot of fun to see where Google continues to expand their feature set.

Martha: Lots to think about. I feel like we could have almost a podcast or an interview on like each of those add-ons sort of exploring you know how else can use your schema markup. So, thank you Mark so much for joining us today. If people want to find you online where should they look?

Mark: So, I’m semi-active in a number of communities including the Google+ semantic search marketing group from Aaron and Jarno so I participate there but I also I’m on Twitter too @vberkel and I’m on LinkedIn Mark Van Berkel, pretty easy to get and yeah through that’s our main website so and somewhere in there you’ll find my handywork as well.

Martha: Yeah Mark at, it’s easy to find him. Thank you, Mark, for joining us today and for helping us understand where this is evolving to and for sharing where schema is started. Thank you for joining us and have a great day.

At Schema App, one of our core values is to always be learning and teaching. That’s why we love talking with other structured data experts!

Are you ready to unleash the power of structured data?


Mark van Berkel, Schema App

Mark van Berkel is the Chief Technology Officer and Co-founder of Schema App. A veteran in semantic technologies, Mark has a Master of Engineering – Industrial Information Engineering from the University of Toronto, where he helped build a semantic technology application for SAP Research Labs. Today, he dedicates his time to developing products and solutions that allow enterprise teams to leverage Schema Markup to boost their SEO strategy and drive results.