Microsoft Azure Cosmos DB for MongoDB (vCore) 使用向量搜索 Comment

介紹

Microsoft Azure Cosmos DB for MongoDB 是 Azure Cosmos DB 的一種 API，它可以讓使用者在 Cosmos DB 中運行 MongoDB 應用程式，而無需修改代碼。這讓開發者可以利用 Cosmos DB 的全球分佈、多模型和極限擴展的特性，同時還可以使用熟悉的 MongoDB 工具和 SDK。

以下是 Azure Cosmos DB for MongoDB 的一些主要特性：

全球分佈
Azure Cosmos DB 讓你可以在世界任何地方進行數據分佈，並且可以隨時在地理位置之間進行數據遷移。
多模型和多 API
除了 MongoDB API，Cosmos DB 還支持 SQL API，Gremlin API（用於圖形資料庫）和 Azure Table API。你可以在同一個資料庫中混合和匹配不同的模型和 API。
自動化的數據分區
Azure Cosmos DB 自動將你的數據進行分區，以便你可以在高效率和高性能的條件下進行擴展。
無休止擴展
Cosmos DB 讓你可以無休止地擴展存儲和吞吐量。
實時分析
Azure Cosmos DB 支持實時分析，讓你可以對數據進行實時查詢和視覺化。

當你選擇 vCore 模式時，你可以獲得一種基於伺服器的配置，這將提供更多的彈性和控制能力。在 vCore 模式下，你可以選擇特定數量的虛擬核心、記憶體大小、存儲空間，並可選擇是否開啟 Geo-redundancy 或多區域寫入。此模式更適合大規模和生產級別的應用程式，需要更多的控制和彈性。

建立資料庫

首先，我們必須在 Azure 上啟動 Cosmos 的伺服器。要注意的是，我們所需要的向量搜尋（vector search）只能在 vCore 的版本中運作，這是先覽版本（preview）！

所以我們要建立的服務，就是 Azure Cosmos DB for MongoDB (vCore)。

在全部可以配置的方案中我都選了最低配的版本，然而預估的花費仍然是 $175 USD。

安裝 MongoDB shell

接著為了能按照官方網站的教學去建立向量搜尋的搜尋方式，我們需要直接透過 MongoDB Shell。我本來希望所有的後端都能夠用 python 完成的，因為這樣比較容易串接 —— 但是 pymongo 現在還有許多的功能尚未與 MongoDB 保持一致。

這是我參考的安裝教學 (主要是 Ubuntu 的)。
https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-ubuntu/

0. 版本確認

MongoDB 的版本是比較需要仔細比對清楚的，例如我所使用的 Ubuntu 22.04 是在 MongoDB 6.0 版本才被支援的。所以以下的安裝是基於 Ubuntu 22.04 + MongoDB 6.0。

1. 將公鑰匯入套件管理系統中

如果系統中沒有 gnupg，可以使用以下指令安裝（The gnupg package in Ubuntu is for GNU Privacy Guard (GnuPG), a free implementation of the OpenPGP standard as defined by RFC4880.）：

sudo apt install gnupg

接著匯入 MongoDB 的公鑰 GPG Key：

curl -fsSL https://pgp.mongodb.com/server-6.0.asc | \
   sudo gpg -o /usr/share/keyrings/mongodb-server-6.0.gpg \
   --dearmor

2. 建立 MongoDB list file

sudo touch /etc/apt/sources.list.d/mongodb-org-6.0.list
sudo echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-6.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/6.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-6.0.list

接著重新讀取本地套件資料庫。

sudo apt update

如此一來就可以正式安裝 MongoDB 套件了。

sudo apt install -y mongodb-org

（Optional）為了防止自動更新意外更新到 MongoDB，可以固定在當前安裝版本上。

echo "mongodb-org hold" | sudo dpkg --set-selections
echo "mongodb-org-database hold" | sudo dpkg --set-selections
echo "mongodb-org-server hold" | sudo dpkg --set-selections
echo "mongodb-mongosh hold" | sudo dpkg --set-selections
echo "mongodb-org-mongos hold" | sudo dpkg --set-selections
echo "mongodb-org-tools hold" | sudo dpkg --set-selections

連線 Azure Cosmos DB 資料庫

1. 設定「網路」設定中的本地端 IP

首先，先從左側欄位找到『Network』選項，並將本地端的 IP 添加進在允許連線的 IP 列表（或是可以設定全開放，但不建議）。這步驟有按鈕可以一鍵添加。

2. 找到連接字串

為了能夠連線到 Azure Cosmos 建立的 DB，我們需要先進入剛剛建立好的 Cosmos 系統，選擇左側欄位的 Connection strings。

進去後，我們就會看到一組可供我們於本地端連線的

把這串連接字串複製下來；接下來我們就可以透過 mongosh 直接連線 Cosmos DB 了。要記得，直接複製的字串上還有需要填寫的使用者名稱跟密碼，要填寫正確才能連線喔！

3. 連線到 Cosmos DB

打開終端機，輸入：

mongosh "<YOUR_CONNECTION_STRING>"

向量搜尋

1. 建立 collection

這段我會推薦直接使用 mongo shell 來做，理由如第一段所述。

use test;

db.runCommand({
   createIndexes: 'exampleCollection',
   indexes: [
     {
       name: 'vectorSearchIndex',
       key: {
         "vectorContent": "cosmosSearch"
       },
       cosmosSearchOptions: {
         kind: 'vector-ivf',
         numLists: 100,
         similarity: 'COS',
         dimensions: 3
       }
     }
   ]
 });

exampleCollection：是我們建立的 collection 名稱
vectorContent：等等就是我們要放入向量的欄位名稱，可以自己改動，但是後面的 cosmosSearch 是固定的
numLists：是我們放入資料的數量，官方建議設定大一點
similarity：是我們的向量搜尋演算法，有 COS (餘弦距離)、L2 (歐式距離)、IP (內部乘績) 可以選擇

2. 放入資料

這步驟我會推薦使用 python 來做，如果我們所需要放入的資料很多的情況，使用程式語言自動放入資料是比較輕鬆的作法。

首先安裝 pymongo。

pip3 install pymongo

接著連線到資料庫，並且放入資料：

from pymongo import MongoClient


# Connect to Azure Cosmos DB
connecting_string = "<YOUR_CONNECTION_STRING>"
client = MongoClient(connecting_string)

# Get the collection we want to insert data
db = client["test"]
collection = db["exampleCollection"]

# Prepare data
docs = [
    {
        "name": "Eugenia Lopez", 
        "bio": "Eugenia is the CEO of AdvenureWorks.", 
        "vectorContent": [0.51, 0.12, 0.23],
    },
    {
        "name": "Cameron Baker", 
        "bio": "Cameron Baker CFO of AdvenureWorks.", 
        "vectorContent": [0.55, 0.89, 0.44],
    },
    {
        "name": "Jessie Irwin", 
        "bio": "Jessie Irwin is the former CEO of AdventureWorks and now the director of the Our Planet initiative.", 
        "vectorContent": [0.13, 0.92, 0.85],
    },
    {
        "name": "Rory Nguyen", 
        "bio": "Rory Nguyen is the founder of AdventureWorks and the president of the Our Planet initiative.", 
        "vectorContent": [0.91, 0.76, 0.83],
    },
]

# Insert
result = collection.insert_many(docs)
print(f"Inserted document ids: {result.inserted_ids}")

3. 透過向量查詢資料

# Define the query vector
queryVector = [-0.52, 0.28, 0.12]

# Define the aggregate query
query = [
    {
        "$search": {
            "cosmosSearch": {
                "vector": queryVector,
                "path": "vectorContent",
                "k": 3
            },
            "returnStoredSource": True
        }
    }
]

# Execute the aggregate query
results = collection.aggregate(query)

# Print the resultsfor result in results:
    print(result)

Microsoft Azure Cosmos DB for MongoDB (vCore) 使用向量搜索 Comment

介紹

建立資料庫

安裝 MongoDB shell

0. 版本確認

1. 將公鑰匯入套件管理系統中

2. 建立 MongoDB list file

連線 Azure Cosmos DB 資料庫

1. 設定「網路」設定中的本地端 IP

2. 找到連接字串

3. 連線到 Cosmos DB

向量搜尋

1. 建立 collection

2. 放入資料

3. 透過向量查詢資料

References

Read More

相關

Leave a Reply取消回覆

Microsoft Azure Cosmos DB for MongoDB (vCore) 使用向量搜索 Comment

介紹

建立資料庫

安裝 MongoDB shell

0. 版本確認

1. 將公鑰匯入套件管理系統中

2. 建立 MongoDB list file

連線 Azure Cosmos DB 資料庫

1. 設定「網路」設定中的本地端 IP

2. 找到連接字串

3. 連線到 Cosmos DB

向量搜尋

1. 建立 collection

2. 放入資料

3. 透過向量查詢資料

References

Read More

分享此文：

相關

Leave a Reply取消回覆