Image
Mann entspannt am Arbeitsplatz

Blog

Here’s an overview of our latest blog posts on enterprise search, document intelligence and legal tech.
Sort by
Blog
26.08.2015

Language Identification and Language Chunking

Identifying the language of a given text is a crucial preprocessing step for almost all text analysis methods. It is considered as a solved problem since more than 20 years. Available solutions build on the simple observation that for all languages typical letter sequences (letter n-grams) exist, that occur significantly more frequent in this language than in other languages.
Blog
07.07.2015

The difference between stemming and lemmatization

"Stemming" as well as "Lemmatization" are commonly used buzzwords in the field of Information Retrieval (IR), particularly in the development of powerful search engines. [...]

So what exactly is the difference between these two methods? What are the advantages and disadvantages and which one should be preferred? [...]
Blog
13.04.2015

Approximative data structures for natural language processing

Some say software developers draw their motivation from minimizing or maximizing numbers in any given problem. That's a smug innuendo. From my experience, developers are always on the lookout for beautiful solutions, of which numbers are but a symptom. The usage of approximative data structures for language processing is one such example of a beautiful idea with nice numbers.

Questions? We’re happy to answer them!

Have feedback or a question about a blog post?
Or would you like to learn more about a specific topic?