Semantic Web & Linked Data: Technology of the future? Hopefully.
The inspiration of this short article is a simple question my wife asked while we were enjoying a recent episode of Continuum, a Canadian science-fiction series:
Q1. Hey, Emre, isn’t the girl who is playing Kiera’s grandmother the same girl who played Rosie Larsen in The Killing?
I said that I believed so and that it would be very easy to get the definitive answer via IMDb. Before I finished my sentence though, it occured to me that this would be a nice test to evaluate the current state of the Semantic Web and Linked Data. After all, how difficult would it be to query the wonderful world of linked data with a couple of SPARQL queries, and even go further by asking the following question:
Q2. Who are the actors that performed both in The Killing and Continuum?
After all, semantic web, linked data, coupled with DBpedia can easily tell us the actors that starred in Hoffa and The Shining, right? Simply running the following SPARQL query running the the following query using http://live.dbpedia.org/sparql:
shows the correct answer as Jack Nicholson. And if we can do this then we can easily answer the second question (Q2), right? Well, let’s wear our semantic web and linked data hacker hat and try it out using the following SPARQL query:
and see that the result set is empty. According to Semantic Web and Linked Data there are no common artists between those TV series.
Enter Python and IMDbPY: Technology of Today
After that surprising result I have decided to remove my semantic web and linked data hacker hat and wear my Python hacker hat. Enter IMDbPY:
IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies.
I had used this very nice piece of software in the past, back when I was trying to build tvrecommend, my own TV movies recommendation engine that made heavy use of libsvm and IMDbPY (rest in peace :))
A quick, dirty and short Python script turned out to be more than enough:
According to this script not one but three people acted in both The Killing and Continuum:
This simple example shows two things:
- Semantic Web technologies such as SPARQL and Linked Data sources such DBPedia sometimes work and when they do querying them declaratively is way more simpler and shorther compared to implementing imperative programming solutions.
- But unless there is some real incentive for big companies (such as Amazon.com who owns IMDb) to publish their up-to-date data in machine readable, linked format that is according to the Semantic Web standards, and unless it is as easy as adding a few sentences in English for people to add semantically tagged and SPARQL-queryable data, we still have a long way to go for establishing reliable platforms consuming open data.
Of course, the second problem is very much known as a variation of chicken-egg problem in semantic web, linked and open data circles, however I am yet to witness a truly powerful and flexible solution that goes beyond focused and limited domains.