Loading

使用 SK 進(jìn)行向量操作

先祝大家 2025 新年好。
在 2024 年落地的 LLM 應(yīng)用來看，基本上都是結(jié)合 RAG 技術(shù)來使用的。因為絕大多數(shù)人跟公司是沒有 fine-turning 的能力的。不管是在難度還是成本的角度看 RAG 技術(shù)都友好的多。

在 RAG（Retrieval-Augmented Generation）中，向量的意義在于將文本數(shù)據(jù)轉(zhuǎn)換為高維向量表示，以便進(jìn)行高效的相似性搜索和信息檢索。具體來說，向量在 RAG 中的作用包括：
文本嵌入：將文本數(shù)據(jù)（如用戶查詢、文檔內(nèi)容）轉(zhuǎn)換為向量表示。這些向量捕捉了文本的語義信息，使得相似的文本在向量空間中距離較近。
相似性搜索：通過計算向量之間的距離（如余弦相似度），可以快速找到與查詢向量最相似的文檔向量，從而實現(xiàn)高效的信息檢索。
增強生成：在生成式模型（如 GPT）生成文本時，利用檢索到的相關(guān)文檔向量作為輔助信息，提高生成結(jié)果的相關(guān)性和準(zhǔn)確性。

使用 SK 對向量進(jìn)行存儲與檢索

如果要使用 RAG 技術(shù)，基本上離不開對向量進(jìn)行存儲，檢索等基礎(chǔ)操作。好在 SK 已經(jīng)為我們?nèi)挤庋b好了。以下讓我們看看如何使用 SK 來玩轉(zhuǎn)向量。

定義 User Model 類

定義 User Model 類用來描述數(shù)據(jù)結(jié)構(gòu)。使用 VectorStoreRecordKeyAttribute 指示 key 字段，使用 VectorStoreRecordDataAttribute 指示數(shù)據(jù)字段，VectorStoreRecordVector 指示向量字段。

        public class UserModel
    {
        [VectorStoreRecordKey]
        public string UserId { get; set; }

        [VectorStoreRecordData]
        public string UserName { get; set; }

        [VectorStoreRecordData]
        public string Hobby { get; set; }

        public string Description => $"{UserName}'s ID is {UserId} and hobby is {Hobby}";
        
        [VectorStoreRecordVector(1024, DistanceFunction.CosineDistance, IndexKind.Hnsw)]
        public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

    }

SK 為我們提供了 IVectorStore 接口。這樣各種向量存儲的方案只要實現(xiàn)這個接口就可以了。 SK 為我們提供了很多 out-of-the-box 的庫，比如：InMemory, Redis, Azure Cosmos, Qdrant, PG。只要通過 nuget 安裝就可以使用了。
下面我們使用 Redis 作為向量數(shù)據(jù)庫給大家演示。

使用 docker 安裝 redis stack server

默認(rèn) redis 是不支持向量搜索的，我們需要使用 redis/redis-stack-server:latest 這個鏡像。

docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest

初始化 RedisVectorStore

 var vectorStore = new RedisVectorStore(
  ConnectionMultiplexer.Connect("localhost:6379").GetDatabase(),
  new() { StorageType = RedisStorageType.HashSet });

初始化 collection

創(chuàng)建一個 collection 來存儲用戶信息。collection 可以認(rèn)為就是關(guān)系數(shù)據(jù)庫里的表。

  // init collection
   var collection = vectorStore.GetCollection<string, UserModel>("ks_user");
   await collection.CreateCollectionIfNotExistsAsync();

初始化 EmbeddingGenerationService

以下還是使用本地的 ollama 服務(wù)提供 embedding generation 服務(wù)。這個服務(wù)是所有 text to vector 的核心。

 // init embedding serivce
    var ollamaApiClient = new OllamaApiClient(new Uri(ollamaEndpoint), modelName);
    var embeddingGenerator = ollamaApiClient.AsTextEmbeddingGenerationService();

Vector CRUD

以下代碼演示了如何把 User 的 Description 字段轉(zhuǎn)成 vector 后進(jìn)行最基本的 Insert、Update、Delete、Get 操作。

// init user infos and vector
var users = this.CreateUserModels();
 foreach (var user in users)
 {
     user.DescriptionEmbedding = await embeddingGenerator.GenerateEmbeddingAsync(user.Description);
 }

// insert or update
foreach (var user in users)
{
    await collection.UpsertAsync(user);           
}

// get
var alice = await collection.GetAsync("1");
Console.WriteLine(alice.UserName);
var all = collection.GetBatchAsync(users.Select(x=>x.UserId));
await foreach(var user in all)
{
    Console.WriteLine(user.UserName);
}

// delete
await collection.DeleteAsync("1");

Vector Search

以下演示了如何進(jìn)行向量相識度搜索。先把問題的文本進(jìn)行一次向量生成，然后使用這個向量進(jìn)行搜索。搜索的時候可以配置匹配的字段，以及取前幾個結(jié)果。

// search
var vectorSearchOptions = new VectorSearchOptions
{
    VectorPropertyName = nameof(UserModel.DescriptionEmbedding),
    Top = 3
};
var query = await embeddingGenerator.GenerateEmbeddingAsync("Who hobby is swimming?");
var searchResult = await collection.VectorizedSearchAsync(query,vectorSearchOptions);
await foreach (var user in searchResult.Results)
{
    Console.WriteLine(user.Record.UserName);
    Console.WriteLine(user.Score);
}

總結(jié)

以上我們演示了如何把數(shù)據(jù)模型向量化后配合 redis 進(jìn)行 CRUD 的基本操作。同時還演示了把文本問題的向量化搜索，也就是相似的檢索。雖然以上演示是配合 redis 運行的，但是 SK 還給我們提供了非常多的選擇，你可以快速的選擇你喜歡的向量數(shù)據(jù)庫進(jìn)行存儲。比如：Azure Cosmos, Qdrant, PG, SQLite 等等。好了，也沒啥可以多說的了，希望這篇文章能幫助到大家學(xué)習(xí) SemanticKernel, 謝謝。

示例代碼已上傳到 github
https://github.com/kklldog/SKLearning

posted @ 2025-03-01 16:47 Agile.Zhou 閱讀(484) 評論(0) 收藏舉報

刷新頁面返回頂部