Image
Newsroom
Get an overview of product news, blog posts, and events here.
Blog
26.08.2015
Language Identification and Language Chunking
Identifying the language of a given text is a crucial preprocessing step for almost all text analysis methods. It is considered as a solved problem since more than 20 years. Available solutions build on the simple observation that for all languages typical letter sequences (letter n-grams) exist, that occur significantly more frequent in this language than in other languages.
Blog
07.07.2015
The difference between stemming and lemmatization
"Stemming" as well as "Lemmatization" are commonly used buzzwords in the field of Information Retrieval (IR), particularly in the development of powerful search engines. [...]
So what exactly is the difference between these two methods? What are the advantages and disadvantages and which one should be preferred? [...]
So what exactly is the difference between these two methods? What are the advantages and disadvantages and which one should be preferred? [...]
Blog
13.04.2015
Approximative data structures for natural language processing
Some say software developers draw their motivation from minimizing or maximizing numbers in any given problem. That's a smug innuendo. From my experience, developers are always on the lookout for beautiful solutions, of which numbers are but a symptom. The usage of approximative data structures for language processing is one such example of a beautiful idea with nice numbers.
Press Area
Go here for our press releases and information for journalists.