Steve Macbeth, the executive sponsor from Microsoft (Bing) for schema.org, joins Martha van Berkel in a conversation about schema.org. Steve shares why he got involved at the start of the semantic web standard, how he sees schema markup playing a role in Virtual Reality and AI, and gives advice on how to make sure that your content is semantically connected. If you are interested in understanding why structured data on the web is so important in this changing world, take 19 minutes and listen or read this interview.
Steve: Hi, thanks for having me.
Martha: No problem well let’s jump right in. Tell us a bit about yourself and how you got started with schema.org?
Steve: I am a long-term Microsoft employee I run a small incubation engineering team right now but I’ve been with the company since 2002. Most of my work has been in the AI space natural language speech and I started in search which at the time with MSN search in 2006. I went to China for three years to start a Development Center there. During that time we supported many search features primarily in structured data and so I got a lot of experience kind of being a consumer of structured data and at the time there wasn’t very good structured data on the web and so we had to build a lot of technology to convert largely unstructured data into structured data in order to power features like vertical search. So that was kind of my introduction and then when I came back from China I ended up running the Bing core relevance team. That was when I wanted to try to solve the structured data problem more systemically than scraping the web and converting unstructured into structured data. That’s kind of a bit of a background on me.
Martha: And so you were there right at the very beginning right in the inception so what I understand?
Steve: Yeah Guha who was the original founder who works at Google he came to visit Microsoft 2009, and basically proposed this idea of having Bing and Google and Yahoo at the time, collaborate on building a schema that we would endorse and it was aligned with work I had been doing already and I think of addressed a number of challenges we were seeing and so the opportunity to do that at an industry level and partner with Google and Yahoo seemed like a great opportunity. We weren’t and aren’t the leaders in search and so I think whenever you’re not the leader and the leader wants to do standards work it’s always beneficial to dissipate.
Martha: In those very early days were there any kind of key areas of schema.org that you were really passionate about and really pushed forward to be included in the schema.org vocabulary?
Steve: Well when we originally started we took a very kind of noun focused perspective which was you know how do we capture a bunch of the nouns that people are searching for movies and locations and things like that and books and products and provide some structure around them. For me I’ve always been very interested in the kind of action side of the web and I particularly been starting to look a lot at mobile search and the relationship between how actions get linked to in the mobile space and so the whole action extensions we did. This was probably two years after we had originally launched the original schema.org work we started to add actions and that was the area that I would say I was most interested in and I still see the most opportunity. Although I don’t think we’ve ever made as much progress there as we have in the mountain space but I would love to see more progress.
Martha: One of the drivers we’ve been talking about especially enterprise looking at actions is that if you define them on the website a chatbot can then understand the action it can do on the page. So as a business driver for looking at defining those actions I don’t know if you have any thoughts on that?
Steve: I think the struggle with all of this kind of market is that there’s a catch-22 which is that publishers have limited resources they’re only gonna mark up where there’s value and marking up and consumers of markup will only build features where there’s lots of markup and so if there’s no markup in an area and no one builds features and there’s no features that’s markup and this has always been the biggest struggle I think in this space. So I think the more people can think of ways to use the existing markup this is always this is always better. So I love the idea that you bots are starting to you know consume some of the existing markup and use that be smarter and to drive value.
Martha: Love a very broad question you know the Semantic Web is something you know those of us that are sort of semantic technologists get really excited about but especially as you sort of look at the AI components of things like why are you passionate about the Semantic Web and kind of where do you see it going?
Steve: Yeah, I think there’s sort of two aspects one is like personality-wise why am I interested I’m a an order a person you know I like things to be orderly and you know the non-Semantic Web is a very you know disorganized place you know unstructured text is very disorganized The you know the organizer in me likes the idea of you know clean lines and well connected lines and lots of semantic meaning in relationships. That’s kind of like my personality is predisposed to wanting to organize things and I think you know semantic markup an outlet for that kind of I don’t know if it’s a personality flaw or a strength but.
Martha: We love it you know our mission is to you know translate the world content to be understood by machines otherwise is like we want to organize the web right we’re right there with you we follow it.
Steve: And then I think from a like a practical perspective my work, I’m not in the Bing team anymore I don’t work directly on search and more in the broad AI space and in fact there was just a big re-org today at Microsoft that changes my team’s Charter slightly into more of the AI perception space. They’ve been merged with a number of other teams doing things like vision and speech and my team is doing computer vision and I definitely well there’s not a direct relationship right now. I do think longer term AR and VR and how they interact with the Semantic Web is super important and it provides this opportunity as you view or interact with stuff in the real world you know the ability to connect to semantic meaning around that information I think becomes super important and I think there’s a lot of rich experiences that could be built. If you could connect an item, you see in the real world with the digital counterpart to that item and its relationships and its actions. I think this is very nascent area for the Semantic Web. The interesting thing now is really how to connect the digital world in the physical world and I think this is no semantic meaning. I think maybe is the fabric in which we can connect those two things and you see this already with mapping you know like yeah Maps didn’t have latitude and longitude you couldn’t connect in maps in the physical world but maps are deeply semantic you know in their designs and so they’re very easy to stitch together with the real world. Most data on the web because it doesn’t have that kind of semantic backbone it’s much more difficult to connect to the real world and so we tend to use the map as the mechanism to connect it. I’m looking at this restaurant not because I know that this is the restaurant but because I know that on the map has this restaurant and in real world, I am at that location so the map kind of acts as the interface between the real world and the digital world. I think we need more interfaces between the digital world and the real world.
Martha: Yeah, it’s interesting it’s almost like we’re seeing today around images and connecting the images to the knowledge graph, let’s call it. Your saying take it a step further and make it so it’s actually more like our human experience connecting to our knowledge graph. I love how you are painting the next step and why we should be building this foundation. We’ve talked about Bing and there’s been lots of conversation about Bing and that it is recently consuming JSON-LD which it has taken a bit longer than Google to consume or show evidence of supporting JSON-LD. Can you talk a little bit about how Bing uses schema.org and entities. I know that’s not your primary team but maybe we just make some comments on that?
Steve: Yeah no let me let me first start by giving you the caveat which is anything I say might be incorrect and if it is I apologize because I am not as involved day to day with that work as I used to be. I think there are a few areas where Bing consumes the data but mostly it’s from our object repository. So we have an entity repository called Satori. I think this is public knowledge. That is a place where all of the entities Bing knows about are stored and that entity database is used obviously to drive many features in Bing. Relevance features, Ad components to that there’s rich like vertical search features that leverage that data to do rich captions and filtering and these kinds of features. But more and more other products are starting to take advantage of that too.
You know enterprise functionality is starting to look at, how can I take a customer’s catalogue and then bind that to the web graph and again this is I don’t think this is any different than the conversation we had earlier which is you know finding these connection points. Like if I have a catalog and in that catalog is a bunch of product how do I bind that to sentiment on the web. In order to do that I need to find some connective tissue that says okay well this product in your catalog is the same as this product people are talking about on Twitter or Facebook or Instagram or wherever you know or writing reviews on the web. So for us Satori is this kind of connecting fabric that says, we know the names of things because people search for the names of things we know the canonical web address of things and then we have all this structured data about those things. So we can use natural words and you know well-known IDs as that connecting fabric. So, the main way that Microsoft consumes and uses structured data is through Satori.
Martha: Got it! So, would Satori be the equivalent to Google’s knowledge graph? I think that’s a great way for people to kind of connect those dots and speak Microsoft language instead of always the Google language.
Steve: But a much sexier name.
Martha: Yeah and you’ve put me in contact with the Satori team some of you get them on here to have a similar conversation. Talk to me a little bit about AI so you’re sort of in the VR space you talked about like of that connecting pieces like do you see any other sort of AI experiences emerging in the near term?
Steve: Well that is a very broad question. Yes, we have a event at Microsoft called tech fest which is just ending. Our tech fest is a three-day event where researchers generally showcase the work they’ve done and then anybody at Microsoft and some press can come through and see it. It’s kind of like a science fair and so it’s always a good opportunity to kind of see what’s going on. I would say the majority of the booths at tech fest this year were AI related or leveraging AI or making it easier to build AI. So, there’s a I think there’s a huge focus in the company and in the industry on how can we use AI to make software better. I think AI is kind of there will be come a time where nobody talks about AI because you know AI in some ways is like object-oriented programming. Nobody talks about, “are there more object-oriented features coming” because it’s a tool to build real-world things. Right now it’s novel and so it’s a way to differentiate, like object-oriented was 15 years ago. I think in another 5 years and maybe less people will stop talking about AI in products because every product will have some aspects of AI.
Martha: Is there something that was that Techfest set that stood out to you that kind of got you excited?
Steve: It’s all proprietary so I can’ really share. I think there’s a lot of stuff in the VR space I like excited about personally. I went to ready player one last night and so I kind of loved VR in the VR space there was and is working in haptic feedback which I think is really exciting and so you know I think for me that area is super exciting. The overlap between AI and semantic data like we talked about and AR, I’m a big believer that in the near future maybe 10 years everybody will use some form of augmented reality through glasses or contacts or something like that.
Martha: Excellent! well thank you so much for being part of today and sharing that vision especially as though that I’ll say like that bleeding line between you know reality and sort of the connected graphs of things to me that was a new idea of how to see the evolution. If people want to get in touch with you or to sort of follow your thinking or the work that you’re doing how do they get in touch with you?
Steve: I mean they can just email me I’m on LinkedIn Steve Macbeth if you just search for Steve Macbeth Microsoft on LinkedIn you can find me. But my email address at Microsoft is just Steve. Macbeth and so if anybody wants to reach out let’s do that. I think I’m in my stuff’s also on that schema.org website somewhere.
Martha: I think that’s where I found you.
Steve: I think we had talked about one other question which you I think didn’t answer but I had an answer for go for it. Which could people who are doing semantic markup do? The thing we talked about earlier which is create connection points is like the most important because semantics is better than no semantics. Semantics without the ability to connect to other data is almost as valueless as no semantics. I mean semantic data only valuable in my opinion when it can be bridged to other data.
Martha: I love that yeah, we talked about islands of code versus actually building a graph. Is that what you mean?
Steve: Exactly, and so, I think the more you can think about okay I’ve just marked up this entity on my page or this action what is the mechanism that can be related to other things and so I think Wikipedia forms a very nice backbone for canonical IDs. Very early in the in the formation of schema.org we made a strong decision which was not to support canonical IDs and I think it was an important thing because it would have been very politically contentious at the time to support it. Because we basically would have had to pick somebody’s ID system to have canonical IDs. I think the time has come for canonical ideas I would love to see schema.org or some other organization take on canonical IDs but the more you can think about like what is the canonical ID for this entity. Is it Lat and Long? is it binding back to Wikipedia which I think can form this is kind of backbone of canonical ideas. You know ISBN can be that for certain types of data but like just thinking about what is the mechanism that this can be connected to other IDs and then I think for the future like how can it connect to the real world and so I think like QR codes in semantics seems like a really weird idea I think on the surface but I think will become more and more important because it’s the mechanism in which you can basically allow people.in the physical world interact with your semantic data in the same way that I think lat/long allow that. I just wanted to kind of throw that last bit.
Martha: At Schema App we are true believers in connecting your data and defining it wherever that definition is right . There are more groups coming out whether it be data commons or open data defining those things for different groups or different entity themes. How do we continue to encourage people even at a very basic level, if you’re a small business defining your city by the wiki data entry for that city etc. There’s very simple ways that you can start making those connections. We’ve been marching that with you Steve trying to get even the earliest adopters of schema.org to be thinking about that connections or paths sort of between things to make sure that they’re connected.
Martha: Well thank you I love that that last thought thanks for adding that in and again you can find Steve on LinkedIn also share his LinkedIn and email sort of within the post and so people can find him there again thank you for taking the time I know you’re a busy man and for your continued support in schema.org have a good day.
Steve: Thanks for having me.