Recently I’m searching for non-lemmatized word frequency tables compiled for various languages such as German, French, Spanish, Dutch, etc. So far it seems a better idea to construct such tables from different sets of corpus. Here are some relevant links.
2 outstanding examples:
– negr@corpus: A Syntactically Annotated Corpus of German Newspaper Texts. The corpus is available free of charge to all universities and other non-profit research organizations. Others please contact us for conditions. Version 2 of the corpus is now available containing 20602 sentences (355096 tokens). http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/negra-corpus.html and http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/negra-corpus.html
– Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources: http://nlp.stanford.edu/links/statnlp.html
– Statistical Language Modeling Toolkit: http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html
– The LDC Corpus Catalog. The LDC’s Catalog contains hundreds of corpora of language data. http://www.ldc.upenn.edu/Catalog/
– European Corpus Initiative Multilingual Corpus I (ECI/MCI) The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus (ECI/MCI) to be made available in digital form for scientific research at a low a cost as possible. The corpus has been available on CD-ROM since 1994, and is being distributed by ELSNET. http://www.elsnet.org/resources/eciCorpus.html
– ELRA?s missions are to promote language resources for the Human Language Technology (HLT) sector, and to evaluate language engineering technologies. http://www.elra.info/
– ELDA – Evaluations and Language resources Distribution Agency ? is ELRA?s operational body, set up to identify, classify, collect, validate and produce the language resources which may be needed by the HLT ? Human Language Technology ? community.
Besides, ELDA is involved in HLT evaluation campaigns. http://www.elda.org/