turkish-deasciifier

Diacritics restoration: can we do better by using neural networks and deep learning? Perspectives from a 10-year-old open source project

UPDATE (2023-06-14): Now that we’re living in the world of ChatGPT and Large Language Models (LLMs), a software developer, Murat Çorlu, suggested that ChatGPT’s performance for diacritics restoration (deasciification) for Turkish is very successful: https://twitter.com/muratcorlu/status/1668335101602848768 He shared his example at https://chat.openai.com/share/3bb666fd-9f35-40df-8efb-9dd0c59bb264. In order to see if ChatGPT is really the best (see the Accuracy benchmark given in “TABLE IV” below), a nice experiment would be to take a validated Turkish corpus, “asciify” it, feed the output to ChatGPT (e.g. via its API), retrieve the “deasciified” output, comparing it to the original corpus and checking what percentage of the text matches the original one. If the result turns out to be at least 1-2 points bigger than 97.06%, we’ll have a clear winner! 😉 Of course, enough care should be taken so that the initial Turkish corpus is not only validated (all diacritics are correct), but also representative of Turkish usage in a lot of domains, including multi-lingual texts, texts with heavy foreign terminology, abbreviations, ambiguities, etc.

People who need to write correctly in languages that have letters with various diacritics such as ‘ğ‘, ‘ş‘, ‘ö‘, ‘ı‘, etc., can be troubled with US or UK standard QWERTY keyboards because of the lack of such letters on those keyboard layouts. If you also need to switch between languages such as English, and Turkish, you know what I mean.

Possible forms of diacritic restoration in Turkish for “aci”. Source: “Diacritic Restoration Using Recurrent Neural Network” by Ayşenur Genç Uzun

The process of taking a piece of writing without correct spelling (that uses standard ASCII characters, without proper diacritics) , and replacing the relevant letters with the correct ones is known as “diacritics restoration“, or “diacritics reconstruction” (or “deASCIIfication” colloquially). About 10 years ago, I wrote a Python program to help people with this: Turkish Deasciifier; a port of the Emacs Lisp code developed by Prof. Deniz Yüret. There’s also a web interface at http://turkceyap.appspot.com.

Read the rest of this entry »

Leave a comment

Posted by Emre Sevinç on October 22, 2020 in Linguistics, Programlama, python, Science

Tags: deasciify, turkish-deasciifier

Turkish Mode for Emacs is now available as a package via MELPA

29 Mar

Turkish Mode for Emacs, developed by Deniz Yüret, is now available as a package via MELPA. This is for people trying to type Turkish documents on a U.S. keyboard using Emacs. The program provides a `turkish-mode` in which the correct Turkish accents are added to the ASCII version of the last word typed each time the user hits space. If you are using a recent stable version of Emacs that lets you use the Emacs package manager, and you’ve added MELPA as a repository, installing it is as easy as running:

M-x package-install turkish

and then putting the following line in your init file:

(require 'turkish)

Once you have done that, in any Emacs session you can toggle the Turkish mode

M-x turkish-mode

The same program has been converted to many different languages and available on many platforms such as a Python package, a Java package, a Perl CPAN package, an Ubuntu PPA package, a web application, a Chrome plug-in, a Firefox add-on, and a Safari add-on.

Leave a comment

Posted by Emre Sevinç on March 29, 2016 in Emacs, Programlama

Tags: turkish-deasciifier

Turkish Deasciifier Firefox Add-on: 1.5 Years Later

30 Jan

It’s been almost 1.5 years since the release of the initial version of the Firefox add-on for Turkish Deasciifier. According to the latest statistics, it has been downloaded more than 6300 times and close to 300 people use it every day:

Turkish Deasciifier Firefox Add-on Statistics

Thanks to the recent changes in Firefox update policy, most of the users seem to be able to experience the Turkish Deasciifier add-on on very recent, up-to-date versions of Firefox, and it is also great to know that there are no compatibility issues related to the newest versions. Being able to easily upgrade your browser without any add-on glitches is very important from the perspective of usability. Another important point is being able to install the add-on without any browser restart, thanks to the underlying Add-on SDK that also renders add-on development very comfortable, at least compared to the development with traditional technologies.
Read the rest of this entry »

Leave a comment

Posted by Emre Sevinç on January 30, 2012 in General, Programlama

Tags: add-on, add-on sdk, firefox, javascript, mozilla, turkish-deasciifier

Turkish Deasciifier: Future-proof (or compatible with Firefox 6.* at least)

09 Jun

I’ve just received an e-mail from the Firefox Add-ons Team related to my Turkish Deasciifier add-on:

Dear add-on author,

Good news! Our automated tests did not detect any compatibility issues with your add-on Turkish Deasciifier and Firefox 6. We’ve updated your add-on’s compatibility to work with Firefox 6.* so that our Aurora users can begin using your add-on. Firefox 6 beta is expected in just a few weeks.

You can learn more about what’s new in Firefox 6 at http://blog.mozilla.com/addons/2011/06/07/making-compatible-with-firefox-5-and-6/ and https://developer.mozilla.org/en/Firefox_6_for_developers

For more information on our new compatibility process with rapid Firefox releases, please read this post: http://blog.mozilla.com/addons/2011/04/19/add-on-compatibility-rapid-releases/

Thank you,
Firefox Add-ons team

Leave a comment

Posted by Emre Sevinç on June 9, 2011 in General, Programlama

Tags: add-on sdk, firefox, jetpack, mozilla, turkish-deasciifier

Firefox 4.0 compatible Turkish Deasciifier (v. 0.2.2) is released

23 May

I promised to thousands of users who used Turkish Deasciifier that I’d release the version compatible with Firefox 4.0 as soon as Mozilla released it. Well, better late than never, it took me some time to find the suitable interval during which I could work on it but finally it is officially reviewed and released. Current users will get it via automatic update mechanism within Firefox. New users can install it by visiting https://addons.mozilla.org/firefox/addon/turkish-deasciifier/.

Enjoy automatic conversion to Turkish letters while using your non-Turkish keyboard.

For users who don’t want to install the add-on but use it online we always have http://turkceyap.appspot.com/ available for your linguistic pleasure.

For more info and news visit http://ileriseviye.org/blog/tag/turkish-deasciifier/. For grabbing the source code and suggesting patches visit https://github.com/emres/jetpack-turkish-deasciifier.

2 Comments

Posted by Emre Sevinç on May 23, 2011 in Programlama

Tags: add-on sdk, firefox, jetpack, mozilla, turkish-deasciifier

Modifying old Turkish Deasciifier code to make it compatible with Firefox 4.0

22 May

I promised to the users of my Turkish Deasciifier add-on that I’d release a Firefox 4.0 compatible version as soon as I can after 4.0 of Firefox is released. I started to work on the old code which was based on an ancient version of Add-on SDK. It took me some time to wrap my head around the radical changes that the Add-on SDK team did to various APIs but finally I was able to port the old code into the new version that uses the latest version of Add-on SDK.

Emacs environment for developing Firefox add-ons using Add-on SDK (Jetpack)

I uploaded the packaged .xpi file as version 0.2.2 to addons.mozilla.org and it is put into the review queue to be reviewed by someone from the add-ons team at Mozilla. As soon as it’s officially reviewed, users of the old version will get an automatic update. Or they can simply visit https://addons.mozilla.org/en-US/firefox/addon/turkish-deasciifier/ in order to enjoy automatic Turkish letter conversion on non-Turkish keyboards.

2 Comments

Posted by Emre Sevinç on May 22, 2011 in Programlama

Tags: add-on sdk, firefox, jetpack, mozilla, turkish-deasciifier

Turkish Deasciifier Firefox add-on version 0.2.1 is published

15 Jan

Turkish Deasciifier Firefox add-on is a tiny add-on that lets you type Turkish correctly without a Turkish keyboard. I listened to requests from its users and added a keyboard shortcut: WindowsKey + Shift + t or in other words Meta + Shift + t. I know this key is probably already assigned to some functions in Windows 7 and that the right thing to do is to present a user interface to the end-user so that she can select the combination herself but I checked this with people who requested this feature and once they confirmed that it was good for them I did a minor modification to the code and released version 0.2.1.

During the process I learned that Jetpack SDK is now named as Add-on SDK and the system moved to version 1.01b. I also discovered that this latest version of Add-on SDK is not compatible with Firefox 3.6.x series anymore and that is why I had to rely on an older version of Add-on SDK to prepare the new version of Turkish Deasciifier add-on. It looks like this is the time to worry about Firefox 4.0 compatibility: Making your add-on compatible with Firefox 4. In this case I prefer to wait for the official release of Firefox 4.0 and then I’ll update my add-on as soon as possible (and will add the user interface (preferences) for shortcut selection). In the meantime ‘2011 Jetpack Roadmap‘ by Myk Melez provides a very good reading for everybody doing Jetpack / Add-on SDK based development.

You can read more about Turkish Deasciifer at http://ileriseviye.org/blog/tag/turkish-deasciifier/ and browse the source code at https://github.com/emres/jetpack-turkish-deasciifier.

If you want to try the system without using Firefox or an add-on you can visit http://turkceyap.appspot.com/.

PS: Speaking of Firefox 4.0 beta, you should definitely give it a try because it is a huge pile of awesome. 😉

8 Comments

Posted by Emre Sevinç on January 15, 2011 in Linguistics, Programlama

Tags: jetpack, turkish-deasciifier

Turkish Deasciifier: Added as an Ubuntu package to my PPA

11 Nov

I’ve just added Turkish Deasciifier to my Ubuntu PPA (Personal Package Archive). Nearly all of the Debian packaging work was done by my dear friend and Debian developer Recai Oktaş, but due to some minor rough edges and lack of time on his part, it is yet to take its place in the official Debian software repository. Thus, in the meantime I decided it to give it a try and learn more about Launchpad, PPAs and Ubuntu packaging, building, etc. The result is available at https://launchpad.net/~emre-sevinc/+archive/deasciifier. This means Ubuntu GNU/Linux users can now add my PPA to their repository list:
Read the rest of this entry »

Leave a comment

Posted by Emre Sevinç on November 11, 2010 in Linguistics, Linux, Programlama

Tags: turkish-deasciifier, ubuntu

Turkish Deasciifier: Firefox add-on has been downloaded more than 1000 times

25 Sep

Turkish Deasciifier add-on has been downloaded more than 1000 times as of today. Compared to super popular Firefox add-ons which had been downloaded tens of million times in a few years, this number may seem so small. But nevertheless I believe it deserves a celebration for a such a tiny linguistic utility (that targets not a very widely used language, a language which is spoken by about 80 million people). I hope it will continue to be useful to people out there who don’t have Turkish keyboards but want to write using proper Turkish letters.

2 Comments

Posted by Emre Sevinç on September 25, 2010 in Linguistics, Programlama

Tags: turkish-deasciifier

Turkish Deasciifier: Firefox add-on version statistics

28 Aug

Turkish Deasciifier Firefox Add-on recent statistics

I’m happy to see that Turkish Deasciifier Firefox add-on is being downloaded every day and used regularly by people who like / need it. I plan to add some features but currently I’m waiting for Jetpack SDK developers to solve some technical problems.

PS: Those fancy graphics are part of the Mozilla add-on management pages and the charting component is Simile Timeline component.

Leave a comment

Posted by Emre Sevinç on August 28, 2010 in Linguistics, Programlama

Tags: turkish-deasciifier

	Doktorunuz, kanser t… on Sizi muayene eden doktorunuz i…
	En önemli ikinci pro… on 'Nerdy' bir bilimcin…
	Kids have to know th… on Müzik Enstrumanları Müzesi…
	Catch-22, Hindistan… on Emacs ile caz çalmak mümkün…
	Catch-22, Hindistan… on İşitsel Programlama, Common Mu…

FZ Blogs

Tag Archives: turkish-deasciifier

Diacritics restoration: can we do better by using neural networks and deep learning? Perspectives from a 10-year-old open source project

Turkish Mode for Emacs is now available as a package via MELPA

Turkish Deasciifier Firefox Add-on: 1.5 Years Later

Turkish Deasciifier: Future-proof (or compatible with Firefox 6.* at least)

Firefox 4.0 compatible Turkish Deasciifier (v. 0.2.2) is released

Modifying old Turkish Deasciifier code to make it compatible with Firefox 4.0

Turkish Deasciifier Firefox add-on version 0.2.1 is published

Turkish Deasciifier: Added as an Ubuntu package to my PPA

Turkish Deasciifier: Firefox add-on has been downloaded more than 1000 times

Turkish Deasciifier: Firefox add-on version statistics

Search:

RSS Links

Recent Posts

Top Posts & Pages

Recent Comments