According to its website: Pattern is a web mining module for the Python programming language. It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).
The module is bundled with 30+ example scripts.
Probably the most interesting use of the system is “Belgian elections, June 13, 2010 – Twitter opinion mining“:
In the week before the Belgian 2010 elections, we analyzed approximately 7,600 tweets that mentioned the name of a Belgian politician. What makes this experiment interesting is the fact that Belgium is divided in a Dutch-speaking half (Flanders, 60% of the population) and a French-speaking half (Wallonia, 40% of the population). Flemings can only vote for Flemish politicians, Walloons can only vote for Walloon politicians.
However the most striking part of the project is that researchers did not bother to do anything specific to Dutch and French but rather simply used Google Translate to translate Dutch and French tweets into English and then use the existing sentiment analysis systems for English:
The sentiment_score() function in the example uses SentiWordNet to rate words. Take the following tweet — chosen for its visible (positive) sentiment: “Danny Pieters, sterke speech voor een gedurfde en degelijke sociale bescherming.” We translate it into English using Google Translate and then weigh the individual words: …
One wonders if that would work for other languages such as Turkish, too. Apparently their Google Translate based system predicted the outcome of the results correctly: http://www.clips.ua.ac.be/pages/pattern-examples-elections