Elasticsearch Vector DB

What is an Elasticsearch?

Elasticsearch is a free open-source search engine based on Apache Lucene with many goodies such as aggregation, analytics, and ETL using Logstash and Kibana dashboard which make it today the best search engine in the world with many capabilities to become a centralized database for many products and companies.

Elasticsearch can store Petabytes of data on many servers (nodes) and be searched and analyzed at a very high speed.

Elasticsearch is a great big data platform for unstructured data, as data is being stored in JSON format enabling auto schema detection plus adding fields on the fly with no schema modification.

What is Elasticsearch’s Vector DB?

Elasticsearch’s vector database version 8.11 offers an efficient way to create, store, and search vector embeddings at scale.

Combine text search and vector search for hybrid retrieval, resulting in the best of both capabilities for greater relevance and accuracy.

Elasticsearch includes a full vector database, multiple types of retrieval (text, sparse and dense vector, hybrid), and your choice of machine learning model architectures.

Build your search experience with aggregations, filtering and faceting, and auto-complete. Run your search in the cloud, on-prem, or air gapped.

Elastic k-Nearest Neighbor (kNN) Search

The k-nearest neighbor (kNN) algorithm performs a similarity search on fields of dense_vector type. This type of search, which is more appropriately called “approximate kNN”, accepts a vector or embedding as a search term, and finds entries in the index that are close.

Managed Cloud Deployment

Elastic Vector Search - Self-Managed (Basic) VS Managed Cloud:

Embeddings

Store embeddings

Free

Generate embeddings

Paid

Custom embeddings

Free

Retrieval

Search embeddings

Free

Search BM25

Free

Hybrid search (BM25 + Vectors)

Free

Reciprocal Rank Fusion – RRF

Paid

Filtering, faceting, aggregations

Free

Search autocomplete

Free

Optimized for multiple data types (text, vector, geo)

Free

Machine Learning

Support for several embedding models

Paid

Built-in semantic search model

Paid

Data inference pipelines

Paid

Search Experience Tools

Ingest tools (web crawler*, connectors*, API framework, beats, fleet, agent)

*Paid

Document and field level security

Paid

Elastic Tools

Observability tools (Kibana)

Free

Search UI components

Free

Piped queries – ES|QL (coming soon)

Free

The elastic vector dimensions have had limitations until version 8.9.2, with a maximum of 1024 dimensions. However, from version 8.10.0 onward, this limit has been increased to 2048 dimensions and further expanded to 4096 dimensions starting from version 8.11.

Semantic Search Example

We have performed a vector search over a Wikipedia dataset, using ‘multilingual-22-12’ as an embedding model for the queries.

First, we have created a vector search index, with a vector size of 768 and Cosine as similarity.

				
					{
    PUT vector-search-index/_doc
    "my_vector": doc["emb"],
    "id": doc["id"],
    "title": doc["title"],
    "description": doc["description"],
    "url": doc["url"],
    "wiki_id": doc["wiki_id"],
    "views": doc["views"],
    "paragraph_id": doc["paragraph_id"],
    "langs": doc["langs"]
}
				
			

Now we can perform semantic vector search and hybrid search.

Here we perform a semantic search, with the query: “Food in the Chinese cuisine”

				
					POST vector-search-index/_search
{
    "knn": {
        "field": "my_vector",
        "query_vector": [0.5355, 0.081, ...], "HERE YOU PUT EMBEDDING"
        "k": 5,
        "num_candidates": 100
    }
}
				
			

Results

Share:

More Posts

Qdrant Vector DB

Qdrant version 1.3 serves as an AI Vector Database and a search engine for vector similarity. Functioning as an API service, it facilitates the search for the closest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be transformed into comprehensive applications for tasks such as matching, searching, recommending, and beyond.

Read More »

Vespa Vector DB

Vespa version 8 is a fully featured search engine and AI Vector Database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query. Integrated machine-learned model inference allows you to apply AI to make sense of your data in real time.

Read More »

MongoDB Atlas Vector Search

MongoDB Atlas is a cloud database that handles the deployment and management of your databases. MongoDB Atlas functions as a database like MongoDB but also as an AI Vector DB. While MongoDB can be self-hosted, MongoDB Atlas stands out as a managed cloud database service that offers various administrative tasks, security, and scalability features.

Read More »

We Are Here For You :

Or fill in your details and we will contact you ASAP: