Word Frequencies and Language Resources (different sets of corpus)

18 Jun

Recently I’m searching for non-lemmatized word frequency tables compiled for various languages such as German, French, Spanish, Dutch, etc. So far it seems a better idea to construct such tables from different sets of corpus. Here are some relevant links.

2 outstanding examples:

– Wortschatz: 57 Corpus-Based Monolingual Dictionaries: and

– negr@corpus: A Syntactically Annotated Corpus of German Newspaper Texts. The corpus is available free of charge to all universities and other non-profit research organizations. Others please contact us for conditions. Version 2 of the corpus is now available containing 20602 sentences (355096 tokens). and

Other links:

– Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources:

– Statistical Language Modeling Toolkit:

– The LDC Corpus Catalog. The LDC’s Catalog contains hundreds of corpora of language data.

– European Corpus Initiative Multilingual Corpus I (ECI/MCI) The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus (ECI/MCI) to be made available in digital form for scientific research at a low a cost as possible. The corpus has been available on CD-ROM since 1994, and is being distributed by ELSNET.

– ELRA?s missions are to promote language resources for the Human Language Technology (HLT) sector, and to evaluate language engineering technologies.

– ELDA – Evaluations and Language resources Distribution Agency ? is ELRA?s operational body, set up to identify, classify, collect, validate and produce the language resources which may be needed by the HLT ? Human Language Technology ? community.
Besides, ELDA is involved in HLT evaluation campaigns.

Leave a comment

Posted by on June 18, 2010 in Linguistics, Programlama


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: