[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

Suchun-sv · 2024-09-06T11:18:09Z

Ⅰ. Describe what this PR did

给ai-cache插件添加基于语文本向量相似度召回缓存的能力

Ⅱ. Does this pull request fix one issue?

update update: 注意在使用http协议的时候不要用tls update: add lobechat add: makefile for ai-proxy fix bugs fix bugs fix: redis connection fix: dashvector and dashscope cluster fix: change vdb collection feat: add chroma logic docs: 增加 api 说明 update: no callback version fix: change to callback fix: finish chrome remove: key update: gitignore

codecov-commenter · 2024-09-06T11:39:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 44.22%. Comparing base (ef31e09) to head (1767896).
Report is 94 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1290      +/-   ##
==========================================
+ Coverage   35.91%   44.22%   +8.31%     
==========================================
  Files          69       75       +6     
  Lines       11576     9823    -1753     
==========================================
+ Hits         4157     4344     +187     
+ Misses       7104     5150    -1954     
- Partials      315      329      +14

see 90 files with indirect coverage changes

plugins/wasm-go/extensions/ai-cache/cache/provider.go

plugins/wasm-go/extensions/ai-cache/cache/redis.go

plugins/wasm-go/extensions/ai-cache/config/config.go

plugins/wasm-go/extensions/ai-cache/vector/dashvector.go

plugins/wasm-go/extensions/ai-cache/main.go

CH3CHO · 2024-09-08T07:23:27Z

plugins/wasm-go/extensions/ai-cache/main.go

-	}
-	log.Debugf("unknown message:%s", bodyJson)
-	return ""
+	RedisSearchHandler(key, ctx, config, log, stream, true)


目前所有的 cache 逻辑都是从 CacheProvider 作为入口的。假如我本地没有 Redis，能跳过 Cache 阶段直接查 VDB 吗？

这个和embedding部分好像都需要做成可选的？因为有些vdb不支持直接存embedding。

理论上需要判断是否含embeddingProvider和cacheProvider从而进入不同的入口函数，以及对应修改core.go的逻辑？

不支持直接存embedding的VDB能给个例子吗？这种我们怎么查呢？

具体改法可以再想想。

目前所有的 cache 逻辑都是从 CacheProvider 作为入口的。假如我本地没有 Redis，能跳过 Cache 阶段直接查 VDB 吗？

跳过 Cache 阶段直接查 VDB 有两种方式：

用用户定义的 Embedding 模型产生的 Vector 进行向量查询，这种方式需要在插件中使用 embeddingProvider。

部分 VDB 对 Embedding 过程进行了封装，支持直接使用字符串进行查询，比如（Chroma[1]，Weaviate[2]），这种方式不需要 embeddingProvider。

这里可能需要考虑这两种对接方式？

[1]https://docs.trychroma.com/reference/py-collection#query
[2]https://weaviate.io/developers/weaviate/quickstart#step-6-queries

整个流程我粗略写了一下，可能不太全，供参考：

func onHttpRequestBody() { if err := searchCache(ctx, config, key); err != nil { return types.ActionContinue } return types.ActionPause } func searchCache(ctx, config, key) { cacheProvider := config.ActiveCacheProvider if cacheProvider == nil { return searchVectorDb(ctx, config, key) } cacheProvider.search(key, func (response, err) { processCacheResponse(ctx, response, err) }) } func processCacheResponse(ctx, resposne, err) { if err == nil && response != nil { // cache hit sendReponse(ctx, response) proxywasm.ResumeRequest() return } // err != nil: search failed // err == nil && response == nil: cache miss err = searchVectorDb(ctx, key) if err != nil { proxywasm.ResumeRequest() return } } func searchVectorDb(ctx, config, key) { embeddingProvider := config.ActiveEmbeddingProvider if embeddingProvider == nil { return error.New("xxxx") } embeddingProvider.search(key, func (response, err) { processVectorDbResponse(ctx, response, err) }) } func processVectorDbResponse(ctx, response, err) { } func sendResponse(ctx wrapper.HttpContext, response string) { stream := ctx.Get("xxx") if stream { // send stream response } else { // send normal response } }

跳过 Cache 阶段直接查 VDB 有两种方式：

用用户定义的 Embedding 模型产生的 Vector 进行向量查询，这种方式需要在插件中使用 embeddingProvider。

部分 VDB 对 Embedding 过程进行了封装，支持直接使用字符串进行查询，比如（Chroma[1]，Weaviate[2]），这种方式不需要 embeddingProvider。

这里可能需要考虑这两种对接方式？

[1]https://docs.trychroma.com/reference/py-collection#query [2]https://weaviate.io/developers/weaviate/quickstart#step-6-queries

嗯，这两种都要支持。

实现上可以考虑在接口层面进行区分。实际的 provider 可以选择实现基于 text 的查询接口或者基于 embedding 的查询接口，callback 使用相同的函数签名。core 在调用时判断 provider 实现了哪个接口，决定是否需要调用 embedding provider。

同理，如果有的 provider 不支持 upload，也可以把 upload 作为一个单独的接口来进行实现。

plugins/wasm-go/extensions/ai-cache/embedding/dashscope.go

plugins/wasm-go/extensions/ai-cache/config/config.go

plugins/wasm-go/extensions/ai-cache/core.go

plugins/wasm-go/extensions/ai-cache/vector/dashvector.go

plugins/wasm-go/extensions/ai-cache/core.go

CH3CHO · 2024-09-12T01:58:10Z

plugins/wasm-go/extensions/ai-cache/core.go

+
+	if err != nil {
+		log.Errorf("Failed to retrieve key: %s from cache, error: %v", key, err)
+		proxywasm.ResumeHttpRequest()


这个地方有问题的。如果进了这个分支，这个函数并没有返回，外面的 onHttpRequestBody 也并没有返回 types.ActionPause。这个 ResumeHttpRequest 会不会报错我不确定，但至少外面的请求应该是会一直卡住的。

能理解为proxywasm.ResumeHttpRequest()后面需要return来确保当前函数退出是吗？目前修改为

err := activeCacheProvider.Get(queryKey, func(response resp.Value) { if err := response.Error(); err == nil && !response.IsNull() { log.Infof("cache hit, key: %s", key) processCacheHit(key, response, stream, ctx, config, log) } else { if err != nil { log.Errorf("error retrieving key: %s from cache, error: %v", key, err) } if response.IsNull() { log.Infof("cache miss, key: %s", key) } if useSimilaritySearch { err = performSimilaritySearch(key, ctx, config, log, key, stream) if err != nil { log.Errorf("failed to perform similarity search for key: %s, error: %v", key, err) proxywasm.ResumeHttpRequest() return } } proxywasm.ResumeHttpRequest() return } })

johnlanni · 2024-09-12T07:11:45Z

跟这个PR似乎有重合的部分？#1248

如这个PR里评论的，embeding和vector部分逻辑比较通用，建议放到独立的 ai-utils 目录下

CH3CHO · 2024-09-12T07:13:18Z

跟这个PR似乎有重合的部分？#1248

如这个PR里评论的，embeding和vector部分逻辑比较通用，建议放到独立的 ai-utils 目录下

#1248 是在这个 PR 的基础上扩展更多的 DB 支持。只是先提了个 PR。后续合并的顺序也是先合并这个，再合并 #1248。二者重合的部分会在这里合并。

CH3CHO · 2024-09-18T08:50:39Z

plugins/wasm-go/extensions/ai-cache/embedding/dashscope.go

+	DOMAIN             = "dashscope.aliyuncs.com"
+	PORT               = 443
+	DEFAULT_MODEL_NAME = "text-embedding-v1"
+	ENDPOINT           = "/api/v1/services/embeddings/text-embedding/text-embedding"


这几个字段最好体现出dashscope的关键字，不然别的服务对接的时候名字不好起。

修改为:

const ( DASHSCOPE_DOMAIN = "dashscope.aliyuncs.com" DASHSCOPE_PORT = 443 DASHSCOPE_DEFAULT_MODEL_NAME = "text-embedding-v1" DASHSCOPE_ENDPOINT = "/api/v1/services/embeddings/text-embedding/text-embedding" )

CH3CHO · 2024-09-18T08:54:31Z

plugins/wasm-go/extensions/ai-cache/core.go

 	"fmt"

 	"github.com/alibaba/higress/plugins/wasm-go/extensions/ai-cache/config"
 	"github.com/alibaba/higress/plugins/wasm-go/extensions/ai-cache/vector"
 	"github.com/alibaba/higress/plugins/wasm-go/pkg/wrapper"
+	"github.com/go-errors/errors"


这个import应该不对

修改为：

"errors"

CH3CHO · 2024-09-18T09:04:32Z

plugins/wasm-go/extensions/ai-cache/core.go

+			processCacheHit(key, mostSimilarData.Answer, stream, ctx, config, log)
+		} else {
+			// otherwise, continue to check cache for the most similar key
+			CheckCacheForKey(mostSimilarData.Text, ctx, config, log, stream, false)


这个的报错没有处理

修改为:

err = CheckCacheForKey(mostSimilarData.Text, ctx, config, log, stream, false) if err != nil { log.Errorf("check cache for key: %s failed, error: %v", mostSimilarData.Text, err) proxywasm.ResumeHttpRequest() }

CH3CHO · 2024-09-18T09:05:23Z

plugins/wasm-go/extensions/ai-cache/core.go

+	})
+
+	if err != nil {
+		log.Errorf("Failed to retrieve key: %s from cache, error: %v", key, err)


为啥这个地方用了 %v 来处理 err，另一个地方就用的是 %s + err.Error()。。。

好像之前习惯在errors.New里面用拼接的方式，统一改成了%v的方式

err = fmt.Errorf("failed to parse response: %v", err)

CH3CHO · 2024-09-18T09:10:10Z

plugins/wasm-go/extensions/ai-cache/core.go

+			log.Errorf("Failed to perform similarity search for key: %s, error: %v", key, err)
+		}
+	}
+	proxywasm.ResumeHttpRequest()


如果去查了VDB，这里是不是就不能Resume？

改成了不使用vdb或查找vdb出错的时候再Resume，看起来会合理点

// handleCacheResponse processes cache response and handles cache hits and misses. func handleCacheResponse(key string, response resp.Value, ctx wrapper.HttpContext, log wrapper.Log, stream bool, config config.PluginConfig, useSimilaritySearch bool) { if err := response.Error(); err == nil && !response.IsNull() { log.Infof("Cache hit for key: %s", key) processCacheHit(key, response.String(), stream, ctx, config, log) return } log.Infof("Cache miss for key: %s", key) if err := response.Error(); err != nil { log.Errorf("Error retrieving key: %s from cache, error: %v", key, err) } if useSimilaritySearch { if err := performSimilaritySearch(key, ctx, config, log, key, stream); err != nil { log.Errorf("Failed to perform similarity search for key: %s, error: %v", key, err) proxywasm.ResumeHttpRequest() } } else { proxywasm.ResumeHttpRequest() } }

CH3CHO · 2024-09-18T09:13:47Z

plugins/wasm-go/extensions/ai-cache/core.go

+	// Attempt to upload answer embedding first
+	if ansEmbUploader, ok := activeVectorProvider.(vector.AnswerEmbeddingUploader); ok {
+		log.Infof("[onHttpResponseBody] uploading answer embedding for key: %s", key)
+		err := ansEmbUploader.UploadAnswerEmbedding(key, emb, value, ctx, log, nil)


这个命名会不会容易让人误解为上传Answer的Embedding？

接口名和接口类型修改为:

UploadAnswerAndEmbedding AnswerAndEmbeddingUploader

CH3CHO

So far so good.

CH3CHO · 2024-09-21T08:28:07Z

plugins/wasm-go/extensions/ai-cache/vector/dashvector.go

@@ -157,6 +157,13 @@ func (d *DvProvider) QueryEmbedding(
 	return err
 }

+func checkField(fields map[string]interface{}, key string) string {


getField 是不是好一点？

johnlanni and others added 17 commits August 1, 2024 15:09

fix bugs

4f7bfbd

fix bugs

0f9e816

fix bugs

ff1bce6

fix conflict

f2a9ff6

Merge branch 'alibaba:main' into main

5cbae03

alter some errors

27b2f71

fix: embedding error

130f2ee

fix bugs && update interface design

56314d7

fix bugs && refine the variable names

85549d0

update design for cache to support extension

8444f5e

Merge branch 'alibaba:main' into main

a655bc4

Refined the code; README.md content needs to be updated.

d68fa88

fix bugs, README.md to be updated

5179392

fix bugs, refine variable name, update README.md

ece7e2f

Merge branch 'alibaba:main' into main

e868a1a

delete folder

138a526

Suchun-sv requested review from johnlanni, WeixinX and CH3CHO as code owners September 6, 2024 11:18

Suchun-sv and others added 3 commits September 6, 2024 12:59

fix typos

e8ad550

fix typos

c83f5c4

change append to appendMsg

f3d3292

CH3CHO requested changes Sep 8, 2024

View reviewed changes

Suchun-sv and others added 2 commits September 11, 2024 00:52

fix bugs and refine code

b0cf29d

Merge branch 'main' into main

4a18f96

CH3CHO reviewed Sep 11, 2024

View reviewed changes

plugins/wasm-go/extensions/ai-cache/embedding/dashscope.go Outdated Show resolved Hide resolved

plugins/wasm-go/extensions/ai-cache/config/config.go Outdated Show resolved Hide resolved

CH3CHO reviewed Sep 11, 2024

View reviewed changes

plugins/wasm-go/extensions/ai-cache/core.go Outdated Show resolved Hide resolved

plugins/wasm-go/extensions/ai-cache/vector/dashvector.go Show resolved Hide resolved

plugins/wasm-go/extensions/ai-cache/core.go Outdated Show resolved Hide resolved

CH3CHO reviewed Sep 12, 2024

View reviewed changes

Suchun-sv and others added 2 commits September 12, 2024 04:51

fix bugs and update the SetEx function

21c9a79

Merge branch 'main' into main

1767896

Optimize query flow logic (not fully tested)

71b9530

CH3CHO requested changes Sep 18, 2024

View reviewed changes

Suchun-sv added 2 commits September 21, 2024 00:53

Fix bugs and verify removal of cache setting

51b9ccc

fix bugs and update logic as requested

3583bc9

CH3CHO reviewed Sep 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

Suchun-sv commented Sep 6, 2024

codecov-commenter commented Sep 6, 2024 •

edited

Loading

CH3CHO Sep 8, 2024

Suchun-sv Sep 11, 2024

CH3CHO Sep 11, 2024

EnableAsync Sep 12, 2024

CH3CHO Sep 12, 2024

CH3CHO Sep 12, 2024

CH3CHO Sep 12, 2024

Suchun-sv Sep 12, 2024

johnlanni commented Sep 12, 2024

CH3CHO commented Sep 12, 2024

CH3CHO Sep 18, 2024

Suchun-sv Sep 21, 2024

CH3CHO Sep 18, 2024

Suchun-sv Sep 21, 2024

CH3CHO Sep 18, 2024

Suchun-sv Sep 21, 2024

CH3CHO Sep 18, 2024

Suchun-sv Sep 21, 2024

CH3CHO Sep 18, 2024

Suchun-sv Sep 21, 2024

CH3CHO Sep 18, 2024

Suchun-sv Sep 21, 2024

CH3CHO left a comment

CH3CHO Sep 21, 2024

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

Are you sure you want to change the base?

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

Conversation

Suchun-sv commented Sep 6, 2024

Ⅰ. Describe what this PR did

Ⅱ. Does this pull request fix one issue?

codecov-commenter commented Sep 6, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnlanni commented Sep 12, 2024

CH3CHO commented Sep 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CH3CHO left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 6, 2024 •

edited

Loading