In our glossary, we explain key terms from the fields of enterprise search, artificial intelligence, and knowledge management. The definitions help you better understand technologies such as semantic search, generative AI, and machine learning.

What are agents (or AI agents)?

Autonomous software programs that are capable of independently planning, executing, and monitoring complex tasks. In enterprise search, for example, AI agents can proactively collect information, interpret queries, optimize searches, or make proactive recommendations. The term agentic describes systems or architectures based on the coordination of multiple such agents.

What is an AI assistant/enterprise chatbot?

A generative AI-based system that enables natural language interactions within a company to find information, automate tasks, or provide support by accessing internal company data.

What is an AI platform (generic/without company data)?

A platform or service that enables direct interaction with large language models (LLMs) without connecting them to proprietary data sources. It leverages the general knowledge of the LLM and its language generation capabilities for a variety of tasks, but is not tailored to specific company information.

What is an algorithm?

A clearly defined formula that determines the order of hits within the hit list.

What are clients?

In information technology, a client represents a self-contained unit in terms of data and organization. A system is described as multi-client capable if it can serve multiple clients, such as locations or subsidiaries. The clients are technically separated from each other and have no access to each other's data.

What is a composite decomposition?

A linguistic text analysis method that breaks down compound multi-word terms.

What is a crawler?

A software component that automatically collects content from connected data sources, capturing relevant metadata and (if available) permissions, and making this data available for subsequent indexing.

What are embeddings?

Numerical representations of texts, words, or other data that capture their semantic meaning. They enable AI models to understand relationships between data points and perform similarity searches.

What is enrichment?

The process of enhancing or enriching data and documents by adding additional information or metadata. This can include automatic classification, extraction of entities (people, places), or linking to external data to improve search quality.

What is enterprise search?

A technology that enables companies to make internal and, where applicable, external data from heterogeneous sources comprehensively searchable. The aim is to provide employees with fast and relevant access to all the information they need, thereby increasing efficiency.

What is full-text search?

A search method that searches the entire content of documents to find occurrences of specific words or phrases. Unlike a pure metadata search, it takes into account all text components of a document.

What is generative AI?

A type of artificial intelligence that is capable of creating new content (e.g., texts, summaries, responses) based on learned patterns and provided context.

What are guardrails?

Technical safeguards that control the inputs, processing steps, and outputs of AI systems. They help detect and prevent manipulative instructions, non-compliant user requests, or the unintentional disclosure of sensitive information.

What are hallucinations?

The term describes the tendency of generative AI models to generate plausible-sounding but factually incorrect, fabricated, or irrelevant information. RAG is a method for minimizing hallucinations in enterprise search.

What is a hit list?

Lists all results that are found in the index under the search term and can therefore be found. Unlike Internet search engines, the hit list in enterprise search applications always returns the exact number of hits and not an approximate number of results.

What is an index?

A central data structure within an enterprise search system in which all documents that are to be found via the search are stored. The original documents are not stored in their entirety in the index, but are broken down into their relevant components. These are stored in a highly optimized, database-like format to ensure fast and accurate searching and to enable access to the original documents.

What is an indexer?

The indexer is the component or process in an enterprise search system that processes the data collected from source systems and converts it into a search-optimized structure (the index). This involves breaking down content, extracting metadata, and organizing it for quick retrieval.

What is indexing?

Process in which the document is broken down into its individual components, such as metadata or text passages. After breaking down the text, it is analyzed, the language is identified, words are converted back to their basic form, and the word occurrence in the total index is determined. The document is then stored in the search index in a specific format.

What is a large language model, or LLM for short?

An AI model that has been trained on huge amounts of different file types (e.g., text, audio, video, images) to understand the information they contain and generate human-like language based on that information. In enterprise search, it is used to interpret queries, condense content, and formulate answers - typically in combination with retrieval so that answers are based on company sources.

What is lemmatization?

Conversion of inflected words to their basic forms, example: “booked ->booking, Books -> Book”

What is machine learning (ML)?

A subfield of artificial intelligence in which models learn patterns from sample data in order to make predictions or decisions. In the context of enterprise search, ML is used to improve the relevance of search results, automatically generate metadata, classify documents, and train the capabilities of LLMs.

What ist MCP (Model Context Protocol)?

MCP (Model Context Protocol) is an open standard that standardizes the integration of AI applications and Large Language Models (LLMs) with external tools, systems, and data sources, such as email, GitHub, Notion, or databases. MCP was developed by Anthropic and released as an open-source project.

What are metadata?

Additional information about the document, such as author, category, document type. Maintaining metadata is like “search engine optimization” for enterprise search applications. Only if the metadata is complete and well maintained will the documents be findable later on. A good enterprise search enables automatic metadata generation through tagging (keyword indexing).

What are multimodal models?

AI models that simultaneously process and understand information from multiple data modalities (e.g., text, images, audio, video). In enterprise search, they enable comprehensive search and analysis of content across different formats to make hidden information accessible.

What is natural language processing (NLP)?

An area of artificial intelligence that encompasses methods for processing and understanding natural language. In enterprise search, NLP helps to better interpret search queries and content (e.g., synonyms, spellings, word forms), analyze and enrich texts (e.g., entities, topics, classification), and forms the basis for semantic search and AI functions such as summarization or response generation.

What are passages?

Short, relevant text passages or sentences within a document. In modern search systems (especially those using RAG), passages are often presented directly as a more precise answer to a search query or used to generate answers, rather than the entire document.

What is a permission-checked search?

An essential feature in enterprise search that ensures users can only access search results for which they have the appropriate access rights. This is essential for data security and compliance in a corporate context.

What is a preview?

A preview or reduced-size representation of the contents of a document or file. In Enterprise Search, a preview is typically displayed in the search results to give users a quick insight into the content so that they can assess the relevance of a document without having to open it completely.

What is prompt engineering?

The art and science of formulating effective prompts for generative AI models in order to obtain desired, precise, and useful responses.

What is a source system?

Any system from which the enterprise search solution collects and indexes content to make it searchable. Examples include file systems, databases, content management systems (CMS), ERP systems, CRM systems, and web portals.

What is relevance?

Determines the order of hits within the hit list. Relevance is determined by the algorithm, but can also be influenced by search profiles and boost factors.

What is Retrieval Augmented Generation (RAG)?

A technique that combines the ability of an LLM to generate text with the ability of a search system to retrieve relevant information from a knowledge base. RAG ensures that generated responses are fact-based and directly derived from internal company data, reducing “hallucinations.”

What is semantic search?

A search method that not only searches for keywords, but also understands the meaning and context of the search query in order to deliver more relevant results. It is significantly improved by LLMs and vector search.

What is a search filter?

Interactive elements in the search interface that enable users to narrow down search results based on specific criteria (e.g., document type, author, date, topic). They are often based on metadata or classifications and improve the precision of the search.

What are search profiles?

Can be defined for different users and user groups and enable customized hit lists by prioritizing certain document types, data sources, or authors.

What is text classification?

Text classification is a key technology for analyzing documents and identifying topics and content.

What is vector search?

A search method in which texts (e.g., search queries and document content) are converted into numerical vectors (embeddings) in order to find content based on semantic similarity rather than just identical words. In enterprise search, vector search complements traditional keyword search and improves semantic hits, RAG applications, and the discovery of content with similar meanings, among other things.

Glossary - Important terms related to enterprise search and artificial intelligence