RSS

Is Semantic Web and Linked Data Good Enough? SPARQL & DBPedia vs. Python & IMDbPY

11 Jul

Semantic Web & Linked Data: Technology of the future? Hopefully.

The inspiration of this short article is a simple question my wife asked while we were enjoying a recent episode of Continuum, a Canadian science-fiction series:

Q1. Hey, Emre, isn’t the girl who is playing Kiera’s grandmother the same girl who played Rosie Larsen in The Killing?

I said that I believed so and that it would be very easy to get the definitive answer via IMDb. Before I finished my sentence though, it occured to me that this would be a nice test to evaluate the current state of the Semantic Web and Linked Data. After all, how difficult would it be to query the wonderful world of linked data with a couple of SPARQL queries, and even go further by asking the following question:

Q2. Who are the actors that performed both in The Killing and Continuum?

What will be state of Semantic Web and Linked Data 65 Years from Now?

What will be state of Semantic Web and Linked Data 65 Years from Now?

After all, semantic web, linked data, coupled with DBpedia can easily tell us the actors that starred in Hoffa and The Shining, right? Simply running the following SPARQL query running the the following query using http://live.dbpedia.org/sparql:

PREFIX dbpedia2: <http://dbpedia.org/property/&gt;
SELECT ?artist
FROM NAMED <http://live.dbpedia.org&gt;
WHERE {
<http://dbpedia.org/resource/The_Shining_%28film%29&gt; dbpedia2:starring ?artist .
<http://dbpedia.org/resource/Hoffa&gt; dbpedia-owl:starring ?artist .
}
LIMIT 10

shows the correct answer as Jack Nicholson. And if we can do this then we can easily answer the second question (Q2), right? Well, let’s wear our semantic web and linked data hacker hat and try it out using the following SPARQL query:

and see that the result set is empty. According to Semantic Web and Linked Data there are no common artists between those TV series.

Enter Python and IMDbPY: Technology of Today

After that surprising result I have decided to remove my semantic web and linked data hacker hat and wear my Python hacker hat. Enter IMDbPY:

IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies.

I had used this very nice piece of software in the past, back when I was trying to build tvrecommend, my own TV movies recommendation engine that made heavy use of libsvm and IMDbPY (rest in peace :))

A quick, dirty and short Python script turned out to be more than enough:

from imdb import IMDb
imdb = IMDb()
the_killing = imdb.get_movie('1637727')
continuum = imdb.get_movie('1954347')
imdb.update(the_killing, 'full credits')
imdb.update(continuum, 'episodes')
continuum_episode = continuum['episodes'][1][5]
imdb.update(continuum_episode)
cast_of_the_killing = the_killing['cast']
cast_of_continuum_episode = continuum_episode['cast']
for actor in set(cast_of_the_killing).intersection(cast_of_continuum_episode):
print actor

According to this script not one but three people acted in both The Killing and Continuum:

Lessons Learned

This simple example shows two things:

  • Semantic Web technologies such as SPARQL and Linked Data sources such DBPedia sometimes work and when they do querying them declaratively is way more simpler and shorther compared to implementing imperative programming solutions.
  • But unless there is some real incentive for big companies (such as Amazon.com who owns IMDb) to publish their up-to-date data in machine readable, linked format that is according to the Semantic Web standards, and unless it is as easy as adding a few sentences in English for people to add semantically tagged and SPARQL-queryable data, we still have a long way to go for establishing reliable platforms consuming open data.

Of course, the second problem is very much known as a variation of chicken-egg problem in semantic web, linked and open data circles, however I am yet to witness a truly powerful and flexible solution that goes beyond focused and limited domains.

About the author: Emre is the co-founder & CTO of TM Data ICT Solutions in Belgium. You can read more about him in About page of this blog.

 
6 Comments

Posted by on July 11, 2012 in Programlama, python

 

Tags: , , , , , , , , , , , ,

6 responses to “Is Semantic Web and Linked Data Good Enough? SPARQL & DBPedia vs. Python & IMDbPY

  1. Alexandru

    July 11, 2012 at 22:23

    I think you used the wrong knowledge base. DBpedia is based on Wikipedia not in IMDB, so of course the information is going to be different. If you would have used LinkedMDB which amongst others also pulls its data from IMDB you might have gotten different results:
    http://www.linkedmdb.org/snorql/ .

     
  2. Emre Sevinc

    July 11, 2012 at 23:11

    Hello Alexandru,

    I did not mention it in my article but I also tried LinkedMDB and its SPARQL endpoints. Unfortunately the simplest query fails to return a result and its web pages provide me with a content that is not very up to date, e.g. see the LinkedMDB page of Rachel Nichols who stars in Continuum: http://data.linkedmdb.org/page/actor/623 You will not see Continuum among them.

     
  3. coskun gunduz

    July 12, 2012 at 10:47

    Big (or all!) companies should find ways to publish their up-to-date data without human effort (or much less effort, just for confirmation for example) i.e. automatically with some software.

     
  4. Emre Sevinc

    July 13, 2012 at 00:06

    Yes, I agree but I still can’t see the short-term incentive for many of the companies. The lack of widespread expertise of Semantic Web and very easy to integrate tools only add to this. Nevertheless I still think there is hope, for example this video alone is a very good indication of semantic web activities going at full speed: http://videolectures.net/w3cworkshop2012_herman_w3c_semantic/

     
  5. Çağatay Çallı

    July 13, 2012 at 08:59

    There is also the problem of correct query engine implementation according to SPARQL standards. Most projects lack this and work too defective for even simple facilities offered by SPARQL.

    Aside from Apache Jena (great project) and professional products, effort wasted to find a correct tool (in your preferred programming language) to work with your own linked data is still a pain.

     
  6. Emre Sevinc

    July 13, 2012 at 10:30

    Çağatay,

    Would you care to give examples of SPARQL engines that return defective results for queries? Nowadays we are using OWLIM for the CUBIST project and it proves to be a very powerful solution. I also did a project in the past using AllegroGraph (and Common Lisp) and that platform was a convenient one, too.

    On the other hand, yes, Jena is like bread and butter of semantic web programming and I’m glad that people are working for various bindings for it such as Scala. Nevertheless, there is still a long way to go in terms of our ability to express our designs and queries as compactly and intuitively as possible.

     

Leave a comment