MongoDB Atlas Vector Search

MongoDB Atlas

MongoDB Atlas is a cloud database that handles the deployment and management of your databases. MongoDB Atlas functions as a database like MongoDB but also as an AI Vector DB. While MongoDB can be self-hosted, MongoDB Atlas stands out as a managed cloud database service that offers various administrative tasks, security, and scalability features.

What is Atlas Vector Search?

Atlas Vector Search platform, version 7.0.2 or later, merges your operational database and vector search capabilities into one fully managed system. With Atlas Vector Search, you can access powerful features, including full vector database functionalities.

You can store all your data, metadata, and vector embeddings within Atlas. Then, leverage Atlas Vector Search to develop AI-driven applications.

Through the $vectorSearch aggregation stage, you can perform semantic searches directly on the data stored in your Atlas cluster. As long as your vector embeddings are 4096 dimensions wide or less, you can effortlessly index them alongside your other data within the cluster. This means you can utilize the $vectorSearch stage to filter your data and execute semantic searches across indexed fields.

Atlas Vector Search is available at no cost, and you can leverage it within an M0 Free Forever Cluster. However, vector indexing is not possible on your local MongoDB. it’s only feasible on Atlas.

Atlas Vector Embeddings

Atlas is stored and indexed vector embeddings within Atlas, providing a robust platform for efficient search and retrieval processes. With both pre-trained machine learning models like OpenAI and Hugging Face.

Use cases

  • Retrieval augmented generation (RAG)
  • Semantic search
  • Recommendation engines
  • Dynamic personalization

Index types

To perform a Vector Search you should create an Atlas Vector Search Index on your dataset, you should create your Indexes in the MongoDB Atlas platform. It is possible to create 3 indexes for free.

Atlas Search

For keyword Search, full-text search.

Example of JSON file to create your index:

				
					{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "string"
        }
      ]
    }
  }
}

				
			

Atlas Vector Search

For Semantic Vector Search, Query your vector embeddings based on semantic similarity by using the ANN search algorithm.

Example of JSON file to create your index:

				
					{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "plot_embedding",
      "similarity": "euclidean",
      "type": "vector"
    },
    {
      "path": "genres",
      "type": "filter"
    },
    {
      "path": "year",
      "type": "filter"
    }
  ]
}

				
			

In this example we use Movies dataset, the similarity function can be Euclidean, Cosine, dotProduct. Then we can choose which filters to project in our index, like the movie genre or the release year. numDimensions is the size of your embeddings.

Now you can perform a Semantic Vector Search, and use the $vectorSearch pipeline stage to search for movies that match the specified vector embeddings.

Note that the queries should use embedding the same as the vector embedding in the plot_embedding field. The queries specify a search for up to 100 nearest neighbors and limit the results to 10 documents only. The queries specify a $project stage to perform which field to include.

Formula in this case:

score = (1 + cosine/dot_product(v1,v2)) / 2

Example of JSON file to create your query:

				
					
   '$vectorSearch': {
     'index': 'YOUR INDEX NAME',
     'path': 'plot_embedding',
     'filter': {
       '$or': [
         {
           'genres': {
             '$ne': 'Crime'
           }
         }, {
           '$and': [
             {
               'year': {
                 '$lte': 2015
               }
             }, {
               'genres': {
                 '$eq': 'Action'
               }
             }
           ]
         }
       ]
     },
     'queryVector': [-0.016465975, -0.0036450154,...],
     'numCandidates': 200,
     'limit': 10
   }
 }, {
   '$project': {
     '_id': 0,
     'title': 1,
     'genres': 1,
     'plot': 1,
     'year': 1,
     'score': {
       '$meta': 'vectorSearchScore'
     }
   }
 }
]


				
			

Also, we can combine results from both semantic search and full-text search queries to create a Hybrid Search.

Formula in this case:

1.0/{document position in the results + vector or full-text penalty + constant value}

Example of JSON file to create your query:

				
					
vector_penalty = 1
full_text_penalty = 10
db.embedded_movies.aggregate([
 {
   "$vectorSearch": {
     "index": "YOUR SEMANTIC INDEX NAME",
     "path": "plot_embedding",
     "queryVector": [-0.0105516575,-0.014830452,...],
     "numCandidates": 100,
     "limit": 20
   }
 }, {
   "$group": {
     "_id": null,
     "docs": {"$push": "$$ROOT"}
   }
 }, {
   "$unwind": {
     "path": "$docs",
     "includeArrayIndex": "rank"
   }
 }, {
   "$addFields": {
     "vs_score": {
       "$divide": [1.0, {"$add": ["$rank", vector_penalty, 1]}]
     }
   }
 }, {
   "$project": {
     "vs_score": 1,
     "_id": "$docs._id",
     "title": "$docs.title"
   }
 },
 {
   "$unionWith": {
     "coll": "movies",
     "pipeline": [
       {
         "$search": {
           "index": "YOUR FULL TEXT INDEX NAME",
           "phrase": {
             "query": "new york",
             "path": "title"
           }
         }
       }, {
         "$limit": 20
       }, {
         "$group": {
           "_id": null,
           "docs": {"$push": "$$ROOT"}
         }
       }, {
         "$unwind": {
           "path": "$docs",
           "includeArrayIndex": "rank"
         }
       }, {
         "$addFields": {
           "fts_score": {
             "$divide": [
               1.0,
               {"$add": ["$rank", full_text_penalty, 1]}
             ]
           }
         }
       },
       {
         "$project": {
           "fts_score": 1,
           "_id": "$docs._id",
           "title": "$docs.title"
         }
       }
     ]
   }
 },
 {
   "$group": {
     "_id": "$title",
     "vs_score": {"$max": "$vs_score"},
     "fts_score": {"$max": "$fts_score"}
   }
 },
 {
   "$project": {
     "_id": 1,
     "title": 1,
     "vs_score": {"$ifNull": ["$vs_score", 0]},
     "fts_score": {"$ifNull": ["$fts_score", 0]}
   }
 },
 {
   "$project": {
     "score": {"$add": ["$fts_score", "$vs_score"]},
     "_id": 1,
     "title": 1,
     "vs_score": 1,
     "fts_score": 1
   }
 },
 {"$sort": {"score": -1}},
 {"$limit": 10}
])


				
			

Share:

More Posts

Qdrant Vector DB

Qdrant version 1.3 serves as an AI Vector Database and a search engine for vector similarity. Functioning as an API service, it facilitates the search for the closest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be transformed into comprehensive applications for tasks such as matching, searching, recommending, and beyond.

Read More »

Vespa Vector DB

Vespa version 8 is a fully featured search engine and AI Vector Database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query. Integrated machine-learned model inference allows you to apply AI to make sense of your data in real time.

Read More »

We Are Here For You :

Or fill in your details and we will contact you ASAP: