Natural Language Processing for Dutch

03 Jun

Some links in the intersection of Dutch and NLP:

– Dutch corpora in NLTK:

– The Alpino Treebank: This treebank contains syntactically annotated Dutch sentences. The treebank (more than 150,000 words) includes the full cdbl (newspaper) part of the Eindhoven corpus.

– LASSY (Large Scale Syntactic Annotation of written Dutch) is a STEVIN project. STEVIN is a Flemish-Dutch Language and Speech Processing Technology Programme launched by de Nederlandse Taalunie. The STEVIN programme office is run jointly by NWO Humanities Division and SenterNovem. A large corpus of written Dutch texts (1,000,000 words) is syntactically annotated (manually corrected), based on D-COI and its successor. In addition, the full corpus to be developed in the successor op D-COI (500,000,000 words) is syntactically annotated automatically. The project aims to extend the available syntactically annotated corpora for Dutch both in size as well as with respect to the various text genres and topical domains.

– DAISY: Dutch lAnguage Investigation of Summarization technologY

Leave a comment

Posted by on June 3, 2010 in Linguistics, Programlama


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: