RSS

Monthly Archives: August 2010

Adventures in the world of multi boot USB sticks


I’m in the process of creating the ultimate USB multi boot stick for personal and professional uses, so far I experimented a little bit with Parted Magic, Clonezilla and grml. I want to note down the useful sites so far:

- Create your own save-your-ass multi-boot USB stick

- Boot Multiple ISO from USB (MultiBoot USB)
Read the rest of this entry »

 
Leave a comment

Posted by on August 31, 2010 in sysadmin

 

Turkish Deasciifier: Firefox add-on version statistics


Turkish Deasciifier Firefox Add-on recent statistics

Turkish Deasciifier Firefox Add-on recent statistics

I’m happy to see that Turkish Deasciifier Firefox add-on is being downloaded every day and used regularly by people who like / need it. I plan to add some features but currently I’m waiting for Jetpack SDK developers to solve some technical problems.

PS: Those fancy graphics are part of the Mozilla add-on management pages and the charting component is Simile Timeline component.

 
Leave a comment

Posted by on August 28, 2010 in Linguistics, Programlama

 

Tags:

Some recent NLP articles for Turkish processing


Here’s a short list compiled by Ahmet A. Akın:

- Unsupervised Search for The Optimal Segmentation for Statistical Machine Translation (Coşkun Mermer – Ahmet Akın)

- Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish (Reyyan Yeniterzi, Kemal Oflazer)

- Annotating Subordinators in the Turkish Discourse Bank (Deniz Zeyrek, Ümit Turan, Cem Bozşahin, Ruket Cakıcı,Ayışığı Sevdik-Çallı, Işın Demirşahin, Berfin Aktaş, İhsan Yalçınkaya, Hale Ögel)

- A Stochastic Finite-State Morphological Parser for Turkish (Haşim Sak & Tunga Güngör, Murat Saraçlar)

- Collocation Extraction in Turkish Texts Using Statistical Methods (Senem Kumova Metin, Bahar Karaoğlan)

- A Freely Available Morphological Analyzer for Turkish (Çağrı Çöltekin)

 
Leave a comment

Posted by on August 28, 2010 in Linguistics, Programlama

 

TRmorph: a relatively complete morphological analyzer for Turkish (under GPL)


TRmorph is a relatively complete morphological analyzer for Turkish. It is implemented using SFST, and uses a lexicon based on (but heavily modified) the word list from Zemberek spell checker. The morphological analyzer is distributed under the GPL.

To use the analyzer you need SFST. As well as the full source code, a compiled fsa, suitable to be used with SFST’s fst-mor or fst-infl is included. A UNIX makefile is provided for easy compilation from the sources (see the included README file for details. The analyzer is fairly complete, however, it may not be easy on unaccustomed eyes. Documentation and cleanup work is going on, you may want to visit soon to get a newer version.”

For details and live demo see http://www.let.rug.nl/~coltekin/trmorph/ and http://www.let.rug.nl/~coltekin/papers/coltekin-lrec2010.pdf

For some relevant natural language processing resources please see Resources for Turkish morphological processing, Morphological Disambiguation of Turkish Text with Perceptron Algorithm and http://denizyuret.blogspot.com/2006/11/turkish-resources.html.

 
Leave a comment

Posted by on August 25, 2010 in Linguistics, Programlama

 

Hamsi can become more than a type of fish due to a Turkish mathematician


Hamsi

Hamsi

One of the most famous type of fish in Turkey, namely Hamsi can now become very famous in the world of cryptography and security. A Turkish mathematician who is pursuing a Ph.D. at K.U. Leuven proposed a cryptographic hashing algorithm to NIST.

Özgül Küçük‘s Hamsi cryptographic hashing algorithm is one of the 15 algorithms that made it to the second round of the international contest.

As we’re getting very close to the finals I’m eagerly waiting to hear about the winner.

PS: It is also a pleasure to see the name of Alp Öztarhan, a colleague of mine, who seems to have implemented the first version of the algorithm.

 
Leave a comment

Posted by on August 21, 2010 in Programlama, security

 

fail2ban: Defending Apache against brute force attacks to digest authentication protected pages


I’ve just realized that the default filters installed with fail2ban in Ubuntu GNU/Linux does not help you when you use Digest Authentication with Apache. In order to have the most basic measure against brute force attacks to a digest authentication enabled web service you need to modify /etc/fail2ban/filter.d/apache-auth.conf. I have tried the suggestion given at fail2ban wiki and it seems to work http://www.fail2ban.org/wiki/index.php/Talk:Apache:

Once you add the line above to the apache-auth.conf file, try a to enter wrong username / password combinations when you are presented with the authentication window and then check if fail2ban detects it (I’m assuming your log files are at their usual locations):


$ fail2ban-regex /var/log/apache2/error.log /etc/fail2ban/filter.d/apache-auth.conf

If it returns success and you can see that the relevant IP addresses are matched then you can restart your fail2ban server and have one more level of protection.

 
2 Comments

Posted by on August 21, 2010 in security, sysadmin

 

Getting ready for the CALL conference at UA – ANTWERP CALL 2010: Motivation and beyond


I’ll try to attend to international CALL (Computer Assisted Language Learning) at University of Antwerp for the next three days:

University of Antwerp

University of Antwerp

Keynote speakers Antonie Alm (University of Otago, New Zealand), Maarten Vansteenkiste (Ghent University, Belgium) and Ema Ushioda (Warwick University, United Kingdom) will provide an overview of literature on motivation, an introduction to Self-Determination Theory and a presentation of the L2 SELF model.

Here are some highlighted topics from the conference

* the impact of ICT on motivation;
* designing for motivation;
* the role of ICT in the analysis of motivation;
* the relationship between motivation and proficiency level;
* learning styles;
* anxiety;
* technophobia/technophilia;
* self-models;
* teacher motivation.

 
Leave a comment

Posted by on August 17, 2010 in Linguistics, Programlama

 

Live Carillon Performance from Ghent


Some more live carillon performances, this time from Ghent. Close your eyes in order to stay away from the distraction of videos and enjoy the exciting rich timbre of the bells:

Read the rest of this entry »

 
Leave a comment

Posted by on August 15, 2010 in Music

 

Ancient Symbols, Computational Linguistics, and the Reviewing Practices of the General Science Journals


The strongest criticism comes after and against one of the most controversial and recently popular research which made use of computers to understand ancient symbols. The issue was made famous by WIRED’s “Artificial Intelligence Cracks Ancient Mystery” article. Richard Sproat’s strong criticism of mis-using statistical methods in order to detect if a sequence of symbols constitute a language is worth reading: “Ancient Symbols, Computational Linguistics, and the Reviewing Practices of the General Science Journals“.

UPDATE: Rao’s answer to the following criticism can be read at Rebuttal of Sproat, Farmer, et al.’s supposed “refutation”. Also see http://indusresearch.wikidot.com/script

“Few archaeological finds are as evocative as artifacts inscribed with symbols. Whenever an archaeologist finds a potsherd or a seal impression that seems to have symbols scratched or impressed on the surface, it is natural to want to ‘read’ the symbols. And if the symbols come from an undeciphered or previously unknown symbol system it is common to ask what language the symbols supposedly represent and whether the system can be deciphered.

Of course the first question that really should be asked is whether the symbols are in fact writing. A writing system, as linguists usually define it, is a symbol system that is used to represent language. Familiar examples are alphabets such as the Latin, Greek, Cyrillic, or Hangul alphabets, alphasyllabaries such as Devanagari or Tamil, syllabaries such as Cherokee or Kana, and morphosyllabic systems like Chinese characters. But symbol systems that do not encode language abound: European heraldry, mathematical notation, labanotation (used to represent dance), and Boy Scout merit badges are all examples of symbol systems that represent things, but do not function as part of a system that represents language. Whether an unknown system is writing or not is a difficult question to answer.
Read the rest of this entry »

 
Leave a comment

Posted by on August 15, 2010 in Linguistics, Programlama, Science

 

Back from London


 
1 Comment

Posted by on August 14, 2010 in General

 
 
Follow

Get every new post delivered to your Inbox.

Join 53 other followers