向量數據庫 Chroma 和 Milvus的使用

一、什么是向量數據庫？

向量數據庫（Vector Database）是專門用來存儲和檢索向量數據的數據庫。它廣泛應用于圖像搜索、推薦系統、自然語言處理等領域。

簡單理解：

你給數據庫一堆「特征向量」(比如圖片、文本的數字表達)
你問數據庫「最像這個向量的有哪些？」
數據庫快速返回「最相似」的結果

二、Chroma 和 Milvus 簡介

名稱	特點	語言支持	適用場景
Chroma	輕量級、Python友好、易上手	Python	小項目、原型、快速開發
Milvus	企業級、高性能、支持多種部署方案	多語言（Python、Go等）	大規模、高并發、復雜場景

三、環境準備

操作系統：Windows / Mac / Linux 都可以
Python 版本：3.7 及以上
安裝包管理器：pip

四、安裝與配置

1 、安裝 Chroma

直接安裝Python庫

pip install chromadb

2 、安裝 Milvus

Milvus 分為兩個部分：

Milvus Server（核心數據庫服務，需單獨安裝或用Docker運行）
Milvus Python SDK（客戶端，方便Python調用）

2.1、使用官方推薦腳本（最省心）

Milvus 官方提供的腳本會自動啟用嵌入式 etcd 并正確配置啟動：.

curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
bash standalone_embed.sh start

2.2、驗證安裝

啟動后查看容器狀態：

docker ps

應顯示 milvus_standalone 正常運行

查看日志確認 embedded etcd 啟動成功，無連接錯誤：

docker logs milvus_standalone

啟動日志無報錯

測試連接端口：

nc -zv localhost 19530

成功連接表示 Milvus 已正常監聽端口。

2.3、安裝 Milvus Python SDK

pip install pymilvus

五、使用示例

1、Chroma 簡單示例

import chromadb

# 創建客戶端 - 使用新的配置方式
client = chromadb.PersistentClient(path=".chromadb/")

# 創建/獲取集合 - 使用 get_or_create_collection 避免重復創建錯誤
collection = client.get_or_create_collection("test_collection")

# 插入向量數據
collection.add(
    documents=["蘋果", "香蕉", "橘子"],  # 文本描述
    embeddings=[[0.1, 0.2, 0.3], [0.2, 0.1, 0.4], [0.15, 0.22, 0.35]],  # 對應向量（示例）
    ids=["1", "2", "3"]
)

# 查詢最相似向量
results = collection.query(
    query_embeddings=[[0.1, 0.2, 0.31]],
    n_results=1
)

print(results)

返回結果

說明：

documents 是你給數據庫的文本
embeddings 是文本的向量表示（通常由模型生成）
查詢時傳入一個向量，返回最接近的n個結果

2 、Milvus 簡單示例

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection

# 連接 Milvus
connections.connect("default", host="127.0.0.1", port="19530")

# 定義集合結構
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=3)
]

schema = CollectionSchema(fields, "test collection")

# 創建集合
collection = Collection("test_collection", schema)

# 插入數據
ids = [1, 2, 3]
embeddings = [
    [0.1, 0.2, 0.3],
    [0.2, 0.1, 0.4],
    [0.15, 0.22, 0.35]
]

collection.insert([ids, embeddings])

# 創建索引
index_params = {
    "index_type": "IVF_FLAT",
    "params": {"nlist": 10},
    "metric_type": "L2"
}
collection.create_index("embedding", index_params)

# 加載集合
collection.load()

# 查詢向量
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
results = collection.search([[0.1, 0.2, 0.31]], "embedding", search_params, limit=2)

for result in results[0]:
    print(f"id: {result.id}, distance: {result.distance}")

運行結果

六、總結

功能	Chroma	Milvus
安裝	純Python庫，簡單快速	需要運行服務，推薦Docker部署
適合項目規模	小型、開發測試	大規模、生產環境
語言支持	Python優先	多語言支持
性能	適中	高性能，支持分布式

posted @ 2025-07-13 11:22 久曲健閱讀(48) 評論(2) 收藏舉報

刷新頁面返回頂部

久曲健

向量數據庫 Chroma 和 Milvus的使用

一、什么是向量數據庫？

二、Chroma 和 Milvus 簡介

三、環境準備

四、安裝與配置

1 、安裝 Chroma