Image

Glossary - Important terms related to enterprise search and artificial intelligence

In our glossary, we explain key terms from the fields of enterprise search, artificial intelligence, and knowledge management. The definitions help you better understand technologies such as semantic search, generative AI, and machine learning.

Autonomous software programs that are capable of independently planning, executing, and monitoring complex tasks. In enterprise search, for example, AI agents can proactively collect information, interpret queries, optimize searches, or make proactive recommendations. The term agentic describes systems or architectures based on the coordination of multiple such agents.

A clearly defined formula that determines the order of hits within the hit list.

A software component that automatically collects content from connected data sources, capturing relevant metadata and (if available) permissions, and making this data available for subsequent indexing.

Numerical representations of texts, words, or other data that capture their semantic meaning. They enable AI models to understand relationships between data points and perform similarity searches.

The process of enhancing or enriching data and documents by adding additional information or metadata. This can include automatic classification, extraction of entities (people, places), or linking to external data to improve search quality.

A technology that enables companies to make internal and, where applicable, external data from heterogeneous sources comprehensively searchable. The aim is to provide employees with fast and relevant access to all the information they need, thereby increasing efficiency.

A type of artificial intelligence that is capable of creating new content (e.g., texts, summaries, responses) based on learned patterns and provided context.

The term describes the tendency of generative AI models to generate plausible-sounding but factually incorrect, fabricated, or irrelevant information. RAG is a method for minimizing hallucinations in enterprise search.

A central data structure within an enterprise search system in which all documents that are to be found via the search are stored. The original documents are not stored in their entirety in the index, but are broken down into their relevant components. These are stored in a highly optimized, database-like format to ensure fast and accurate searching and to enable access to the original documents.

The indexer is the component or process in an enterprise search system that processes the data collected from source systems and converts it into a search-optimized structure (the index). This involves breaking down content, extracting metadata, and organizing it for quick retrieval.

Process in which the document is broken down into its individual components, such as metadata or text passages. After breaking down the text, it is analyzed, the language is identified, words are converted back to their basic form, and the word occurrence in the total index is determined. The document is then stored in the search index in a specific format.

A generative AI-based system that enables natural language interactions within a company to find information, automate tasks, or provide support by accessing internal company data.

A platform or service that enables direct interaction with large language models (LLMs) without connecting them to proprietary data sources. It leverages the general knowledge of the LLM and its language generation capabilities for a variety of tasks, but is not tailored to specific company information.

A linguistic text analysis method that breaks down compound multi-word terms.

An AI model that has been trained on huge amounts of different file types (e.g., text, audio, video, images) to understand the information they contain and generate human-like language based on that information. In enterprise search, it is used to interpret queries, condense content, and formulate answers - typically in combination with retrieval so that answers are based on company sources.

Conversion of inflected words to their basic forms, example: “booked ->booking, Books -> Book”

A subfield of artificial intelligence in which models learn patterns from sample data in order to make predictions or decisions. In the context of enterprise search, ML is used to improve the relevance of search results, automatically generate metadata, classify documents, and train the capabilities of LLMs.

In information technology, a client represents a self-contained unit in terms of data and organization. A system is described as multi-client capable if it can serve multiple clients, such as locations or subsidiaries. The clients are technically separated from each other and have no access to each other's data.

Additional information about the document, such as author, category, document type. Maintaining metadata is like “search engine optimization” for enterprise search applications. Only if the metadata is complete and well maintained will the documents be findable later on. A good enterprise search enables automatic metadata generation through tagging (keyword indexing).

AI models that simultaneously process and understand information from multiple data modalities (e.g., text, images, audio, video). In enterprise search, they enable comprehensive search and analysis of content across different formats to make hidden information accessible.

An area of artificial intelligence that encompasses methods for processing and understanding natural language. In enterprise search, NLP helps to better interpret search queries and content (e.g., synonyms, spellings, word forms), analyze and enrich texts (e.g., entities, topics, classification), and forms the basis for semantic search and AI functions such as summarization or response generation.

Short, relevant text passages or sentences within a document. In modern search systems (especially those using RAG), passages are often presented directly as a more precise answer to a search query or used to generate answers, rather than the entire document.

A preview or reduced-size representation of the contents of a document or file. In Enterprise Search, a preview is typically displayed in the search results to give users a quick insight into the content so that they can assess the relevance of a document without having to open it completely.

The art and science of formulating effective prompts for generative AI models in order to obtain desired, precise, and useful responses.

Any system from which the enterprise search solution collects and indexes content to make it searchable. Examples include file systems, databases, content management systems (CMS), ERP systems, CRM systems, and web portals.

An essential feature in enterprise search that ensures users can only access search results for which they have the appropriate access rights. This is essential for data security and compliance in a corporate context.

Determines the order of hits within the hit list. Relevance is determined by the algorithm, but can also be influenced by search profiles and boost factors.

A technique that combines the ability of an LLM to generate text with the ability of a search system to retrieve relevant information from a knowledge base. RAG ensures that generated responses are fact-based and directly derived from internal company data, reducing “hallucinations.”

A search method that not only searches for keywords, but also understands the meaning and context of the search query in order to deliver more relevant results. It is significantly improved by LLMs and vector search.

Interactive elements in the search interface that enable users to narrow down search results based on specific criteria (e.g., document type, author, date, topic). They are often based on metadata or classifications and improve the precision of the search.

Can be defined for different users and user groups and enable customized hit lists by prioritizing certain document types, data sources, or authors.

Lists all results that are found in the index under the search term and can therefore be found. Unlike Internet search engines, the hit list in enterprise search applications always returns the exact number of hits and not an approximate number of results.

Text classification is a key technology for analyzing documents and identifying topics and content.

A search method in which texts (e.g., search queries and document content) are converted into numerical vectors (embeddings) in order to find content based on semantic similarity rather than just identical words. In enterprise search, vector search complements traditional keyword search and improves semantic hits, RAG applications, and the discovery of content with similar meanings, among other things.

A search method that searches the entire content of documents to find occurrences of specific words or phrases. Unlike a pure metadata search, it takes into account all text components of a document.