Hello and welcome to schema stories, my name is Martha Van Berkel and I’m the co-founder of Schema App and we bring the Schema Stories to bring real people, who work with Schema.org in real life, to share how it all fits in, the value, where they see it going. Today I’m honored to be joined by distinguished thought leader in Semantic Web Richard Wallis. Welcome Richard.
I’d like you to introduce yourself so maybe tell us a bit about yourself and your history with semantic search.
Okay, um yeah this could take a long time if we go too far back so let me just say I’ve been in computing longer than I care to admit but in later years I’ve been involved with semantic web technologies in all sorts of sectors with a bit of a focus on cultural heritage and libraries. I’ve worked for library system companies and things like that. More recently I’ve become an independent consultant, advisor, trainer, and anything else you’d like me to do except make the tea. I am lousy at making tea. For anything to do with structure data on the web, semantic web, etc. uh, I work for various organizations looking to find out about or trying to pawn or get some advice about applying and semantic web techniques specifically around Schema.org into their organizations. I also work with, notice the word ‘with’, not for, Google, in their application of delivering Schema.org to the world. Some may know that Schema.org is jointly sponsored by Google, Bing, Yahoo and Yandex (the Russian search engine). Part of Google’s role is that they provide the webmaster for the site, or the vocabulary master, a guy called Dan Brickley and I work with Dan in helping the delivery of the vocabulary. I actually get involved in some of the Python code runs the site and the things that make it run faster, and the things that make it run slower, but you don’t do many of those with a bit of luck. Extending the vocabulary, advising people, kind of outreach towards people wanting to get involved using the vocabulary or even extending it and adding to it. So quite a quite a varied role. This has all come from a short history of Schema.org and it’s using what, you term, as semantic search. Maybe we’ll explore this a little bit later on, it might be based on semantic principles but most of it is nothing to do search even though the search engines make a great deal of use of it.
You recently released a blog about some of your recent work around the Schema.org vocabulary. Can you talk a bit about your passion around working on the vocabulary and your latest work.
Well I mean, this this stems from being a practical user of structured data on the web and having come through the the birth of the Semantic Web and link data as it arrived in my work, when I was working for library companies, one of the major frustrations was we were experts in structured data and we publish data in our vocabularies, then nobody else on the web understood. So when Schema.org arrived on the scene, it was obvious that is was going to become, if it took off, which it has, a de facto vocabulary for all sectors across the web. Which meant you could understand data published by people in the sector next door and share your resources in such a way. So that kind of passion um drove me to look at Schema.org from a bibliographic point of view and seeing some its limitations and rather than sitting there and moaning about limitations I set up a w3c community group with the wonderful name of schema bibx. It grew together, getting on to about a hundred people across across the world, mostly in the bibliographic domain but also in publishing and other areas and we put together proposals to the Schema.org community for extending the vocabulary in a bibliographic direction, and over a period of about two years, we added a great deal. So there’s not much you can’t say now about things in the bibliographic and publishing world with Schema.org. That then moved on to some more specific bibliographic stuff which turns into one of the first two extensions to Schema.org. Schema.org’s got corel vocabulary and it’s got extensions which tend to be sector-specific although available to everybody. So bib.schema.org which was released about a year ago now came out of that group and my efforts. And from that, and having seen the benefits and seeing the potential of this I’ve been passionate about spreading the word across the the web world thought for the benefit of all because I can see taking this to some interesting places.
Well let’s talk about those interesting places. Right now people mostly think about Schema.org and how do I get rich snippets or evolve my search results but that is just a small sector of the benefit of what you can do when you start structuring your data on the web. Do you want to talk about your thoughts on those additional values or where you think that the application or the use of structured data and schema.org may go to.
Yeah, I mean, I think we notice very slowly how the web’s evolving and the web is changing dramatically and has been doing for the last couple years. And it’s on a trajectory to change even more and what the search engines, we’re finding is, just by interpreting texts of the webpage you could infer a reasonable amount about the thing that you thought the page was about. And they got very clever at this but I think they’ve hit the problem of diminishing returns from identifying text on the page and what they really want to know is what is the thing that this page is about and how does that thing relate to other things. So if it’s a widget how does it relate to the organization that made it or to the organization that is selling it, or the delivery terms, or the author to sell it, or multiple authors and that kind of stuff. If it’s a bank talking about loans, what sort of loans are they talking about? Is it commercial loans, mortgage loan? What’s the interest right all that kind of thing and it becomes very difficult to infer that from text. So building on a lot of previous technologies inbedding structured data within the page specifically with Schema.org for no other reason and that’s what the search engines are looking for and you can tell them specific detail about the single things and very often it’s things that are described on the page in the relationship to other things, not necessarily in your domain, so when you’re talking about the place your local branch is located you can say that this is the same as this place, as defined in Wikipedia, for instance, so then the search engines starting to build up a context set for the whole world in the relationship between them and in scientific terms that’s called the graph and this is where the term knowledge graph comes from, with the search engines, what they’re doing in building these massive related entity graphs of what’s available on the world. If you part of that, you can then become part of what the search engines are using this for, some of it is rich snippets and what was the writing of this particular project when it was sold or the TV program when it was viewed, and some of it is used to help them deliver information boxes, answer boxes. So you know, if you ask and how high mount Everest is invariably you get an answer box with just a picture of the mountain and 8,848 meters. So where are those attributes coming from? They’re coming from the knowledge graph. So that then using this to evolve the web. The web is changing as a site very often to any of us, if somebody said how would you describe the web to a Martian from outer space we’d be describing a standard desktop or laptop computer screen with the web page on it. Things have changed and let me demonstrate. I don’t know whether this is going to work, I hate doing live demos, but let’s see what happens.
Richard: Hey Siri, where can I get some cash?
Siri: One possibility is Halifax nearby on Bridge Street, do you want that one?
Richard: No, I don’t.
Siri: A second is HSBC on Bridge Street less than two miles to your east, how about that one?
Siri: Okay, can I call or get directions. Just tell me what you like to do.
Richard: Walking directions.
Siri: Getting walking directions to HSBC.
Anyway, there is a map that tells me how to walk to that bank and now that is a common interaction on the web. How many web pages did we see then? Zero, so rich snippets and where we rank in search results, yes, they’re important but they’re not the whole story. The information that the search engines etc are using to power conversations like that, there’s some some entities or things that it’s getting from my phone, it knows who I am and where I am but entities like cash, and what we called cash machines in UK or ATMs in the states are entities associated with banks, associated with bank branches, and locations and it can then start inferring hang on, I know where Richard is, I know where the nearest bank is, I can draw a line between the two and draw a map.
So is this what you meant in your latest presentation? You talk about “the future is global and contextual” is that what you’re talking about? How the interactions with, whatever it is, computers, or information, needs that context to add value?
Yes, I mean you can have the cleverest computer in the world but it is only as good as the information input into it. So we hear about cognitive computing computers winning quiz shows and and and that kind of stuff but that computer, which was IBMs Watson, can only win that quiz show if it had the information that could be used for the questions. So what we’re doing here is we’re providing, amongst other sources, but a major source of information, to these knowledge graphs, that a setting a global context, that the search engines can use for, Rich Snippets possibly improving their search results (even they tend to be a little bit coy about whether that is actually happening or not). Like I said, I work with google not for google so I don’t know the answer to that question myself. But it’s also laying the context for this the semantic discovering algorithms to be run, not just by the search engine, because this data is open to anybody. To build a map of knowledge across the world, a map of entities and their relationships and attributes across the world that we can be part of. If you look back two years ago, that conversation I just had with my phone, would it means seen as something utterly amazing, and if you try to that if you tried it two years ago and go “Oh it’ not very good” try again today, both on android, and Apple and an other operating systems. It is much better toay than it was before and that’s because they’ve got more information to work with and they’ve improved the speech recognition. If you want, when somebody asks the question, later on in their driverless car, “take me to the nearest gas station that said sells chocolate, where’s the information going to come from to feed that, unless you, as the owner of the gas station, list the products that you sell on your site, in a structured way, and that’s what’s going to drive this thing forward. So, a de facto, general purpose vocabulary is an enabler. A lot of people are saying “We’re at the head of a new wave driven by Schema.org” We’re at the head of a new wave driven by about six or seven different technologies and Schema.org is just an enabler. If you want to be part of this, and by the way gain the benefits of rich snippets and similar activities, get on board now.
For people who are new to doing structured data, what’s the urgency? We’re seeing this great change, so is there urgency?
It’s about as urgent as SEO is. If you want to be part of the world, and stand a chance for the search engines identifying that your product or your organization or your service maybe just the answer to a question one of their users is using, you have to tell them. They’re not telepathic, I know some people think they are, but they can only work on the data that’s given to them. Now, what you may find with something like Schema.org, if you provide more accurate information to the search engines, they might be able to answer the question with your resource, quicker than it would have traditionally sent them to one of your web pages and theoretically and this is where the SEO community have a little bit about worry about this. Theoretically you could end up with less traffic to your site, but you could end up with more accurate traffic. So another example; if a user is looking for the customer service department to your organization, traditionally, you type in customer service Widgets Incorporated and you’d end up on Widgets Incorporated homepage and you gotta click down and find the phone number. Today, if you’re providing constructed information, the answer to that question will be presented on the search engines page, probably if it’s on a portable but device, with the ring me now, click on it, so you get a satisfied customer that never touched your website.
Will the intent of websites go away, where information will only appear in search and there’ll be some other way to present information about your company. But the website itself, will become historical.
Well, it still has a purpose, because if it’s a description of a product or the order me page, or something like that, the user has to have somewhere to land. The home page starts to become less relevant and the whole site. Its major customer well might well be the search engines, providing information to them, so they can direct resources to you. On that particular example with the phone number people say how do I analyze the usage of that? I did hear of an organization that asked themselves exactly that question and they have a special phone number and it’s only put in the structured data so calls to that phone number they know, has only come from the search engine. So there are ways around this and there are ways to use the structured data in your website for better analytics. The topic not going to go into it now, or would be here all day.
We’re quite passionate about the semantic analytics and it’s something that we’re working hard in our suite of tools to enable. Maybe we’ll have a follow up conversation about the big benefits of structuring your data for a deep understanding of what your traffic is about about.
I know that you’ve worked on extensions and having an external extension so for people who are not the experts, where there isn’t appropriate vocabulary, what do you recommend to them?
There’s a spectrum, there is core Schema.org and it’s core vocabulary, then there what was termed hosting extensions, so their extensions that have been accepted by the Schema.org community and they’re as good as part of our vocabulary and in fact and the URL or URI you in bed in your data is still schema.org / it’s not bib.schema.org. It’s more of a navigational exercise so people can find definitions on the website. If you have even more data that you would want to expose about your resources the two industry-specific for a general-purpose vocabulary, like Schema.org, you can build your own external extension which builds on top of Schema.org. So it assumes Schema.org is there there and then you extend those terms. If you then embed that in your website, the systems, as a minimum, will be able to understand the Schema.org data youve got in there and specialists, very often people cooperating in the same marketplace or the same sector, cultural heritage is great for this, will also understand what you’re describing in a industry specific vocabulary that’s built on top of Schema.org or relates to Schema.org with some relationships between the vocabulary. It can get a bit technical, you have to be passionate about what you’re doing in your sector. I’ve done some work in the banking sector to help improve the Schema vocabulary for describing bank accounts and loans and things like that and they have so much detail vocabulary there’s no way you could embed that and in Scheme.org and still keep the general population understanding what the hecks going on. They’re looking at building an extension and outside of that, as well as, enhancing the inside vocabulary. One of things I would encourage people to do if you find that the vocabulary is limited in your area, I’d love to be able to describe this, but can’t. Get on board with the group, anybody can make a suggestion, you don’t have to be a search engine company, you don’t have to be a database expert, you can just get users that say “I’m trying to markup my left handed screw widget and I can’t find a property for direction of screw, I can I suggest one?”. Anybody can do that, it’s a living vocabulary. The ideal is to release a new release of it every one to two months, it probably turned out to be every two to three months at the conduct current right. So it’s constantly evolving and changing and building and take part in the conversation.
The last piece, when I first introduced you, I asked “Can you introduce how you’re related to semantic search?” and you said you might have a some thoughts on on my term semantic search. Do you want to give us a rant?
Yeah, well I mean, for the first 20 years of it’s life, the the web tortoise, that the only way to find something is to search. As human beings we search and then we tend to discover relationships. How often, for the people that still go to libraries, have you gone into a library going to exactly the shelf where your book that you want is and you come away with a different book because you’ve searched in the general area and then you make related decisions of what you want? Schema.org and structured data is very closely associated with things and their relationships. So if you look, at say, the google page 10 results, which are you search results, and then the knowledge panel on the right hand side for the person, or the organization, or the topic you’re looking for. If you start clicking around in there, you’re not searching, as the Semantic Web people would put it, you’re navigating the graph of knowledge about the resources on the web and that’s where the real value is and that’s what’s happening behind the scenes. The intelligent agents, the cognitive computing algorithms that are driving things like Sir are navigating those relationships, as well as, looking at texts on pages as well. So it’s kind of delivering the other side of the discovery coin that’s been missing for the last 20 years in the web. So when somebody says “semantic search” I kind of sigh gently and say it’s far more than that. But I can understand why such a marketing term has evolved. I remember trying to explain what web 2.0 was a few years ago and with with with the equal discomfort. But people generally congregate, ran that flag, and have an idea of what we’re on about.
Almost like searching the Semantic Web, would be more accurate.
Yeah, or navigating the Semantic Web and not using the word search or tool. But nevermind! I’m not I’m not starting a movement to say “please drop the word search from semantic search”
Richard anything else you’d like to add about Schema.org, your thoughts on the future?
One of the things, all the questions I’m often asked when I’m working with clients, national libraries, and people like that, and another organisations, is “Can I just sprinkle Schema.org into my website or don’t have to re-engineer my whole data infrastructure to take account of it?” and the answer is actually both. You can gain a lot of benefit from the data that you already have in the front end that’s building your webpages, to capture that data, and I’m and place it in structured form using Schema.org website. There are benefits around that, many we’ve discussed already. Equally, if you have a lot of structured data in your organization, where you got, uh I don’t know, pick a banker guy, a bank, branches, I’ve got types of accounts, I’ve got types of loans, I got terms and conditions, and all that kind of thing, which are entities in the environment. It is well worth, whilst you’re evolving your internal infrastructure, to identify those entities, so that when they do surface through the website you’ve got a nice self-contained structure definition of that thing, which you can then relate to other things that you are describing on the website. So you can relate a ‘loan advisor’ for ‘mortgage loans’ in a particular branch. You can bring those things together much easier if if you’ve got an internal structure to do it but you don’t have to go the whole hog, the whole way, you can you can equally use things at the surface, and then when you are happy with them, look at the way they impact your structure. It does affect your thinking, when you start describing the entities on your webpage, it kind of gets you to think about your internal structures as well, and identify the benefits of using a bit of structure internally as well.
One of the conversations my co-founder Mark and I have been having is around open data and about structuring open data with Schema.org or other vocabularies and the power that will have, taking open data to the next level.
Well, if you’re sharing Schema.org data in your webpage you’re providing open data to the world. Open data has been proven as a success. I mean, I presume people might have seen this diagram, the open data cloud diagram based on link data techniques started in 2006 and that picture is based on 2014. It was open data, it used linkdata principles but the key difference between that and today is a) you have to produce a special server endpoint to deliver that data and b) everybody was using a different vocabulary. Even libraries on there, are using different flavors of similar vocabularies. So nobody could really understand each other as a deep level. Lay structured data on top of that and all of a sudden you’ve got structured open data across the web. It doesn’t mean all your data needs to be open, you go into a commercial organization and say “You need to share some open data on your website” and they go “We’ve got trade secrets! We can’t open up everything, what about our customers?” it that it has to get beyond that and say “The elements you want to make open, you expose.” and I tried to stand up on their head by saying “Tell me what doesn’t need to be shared?”. It’s a much easier question to to answer and you end up with a much better solution. So open data, we’re already on the way. Anybody could pass my website or a library’s website, that I work, and capture the Schema.org data openly off their sites.
Thank you so much Richard, we are out of time. Thank you for generously sharing your thoughts on where the future is. For people who want to follow up with Richard, you can find him on his blog. He’s always blogging about a different elements that he’s involved with as well as changes and that blog is dataliberate.com. Thank you Richard and thank you for joining us on Schema stories.
Thank you for inviting me Martha.