ChromaDB

以下是所有可用的 ChromaDB 过程列表，请注意，此列表和签名过程与其他过程（例如 Qdrant 的过程）一致

名称描述

名称	描述
apoc.vectordb.chroma.info(hostOrKey, collection, $config)	获取指定现有集合的信息，如果集合不存在则抛出错误 500
apoc.vectordb.chroma.createCollection(hostOrKey, collection, similarity, size, $config)	创建一个集合，集合名称在第 2 个参数中指定，并指定 `similarity` 和 `size`。默认端点为 `<hostOrKey param>/api/v1/collections`。
apoc.vectordb.chroma.deleteCollection(hostOrKey, collection, $config)	删除在第 2 个参数中指定名称的集合。默认端点为 `<hostOrKey param>/api/v1/collections/<collection param>`。
apoc.vectordb.chroma.upsert(hostOrKey, collection, vectors, $config)	在第 2 个参数中指定的名称的集合中，插入（upsert）向量 [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]. 默认端点为 `<hostOrKey param>/api/v1/collections/<collection param>/upsert`。
apoc.vectordb.chroma.delete(hostOrKey, collection, ids, $config)	删除指定 `ids` 的向量。默认端点为 `<hostOrKey param>/api/v1/collections/<collection param>/delete`。
apoc.vectordb.chroma.get(hostOrKey, collection, ids, $config)	获取指定 `ids` 的向量。默认端点为 `<hostOrKey param>/api/v1/collections/<collection param>/get`。
apoc.vectordb.chroma.query(hostOrKey, collection, vector, filter, limit, $config)	从指定的 `vector` 和 `limit` 结果数量中检索最接近的向量，位于第 2 个参数中指定名称的集合中。默认端点为 `<hostOrKey param>/api/v1/collections/<collection param>/query`。
apoc.vectordb.chroma.getAndUpdate(hostOrKey, collection, ids, $config)	获取指定 `ids` 的向量，并可选择创建/更新 neo4j 实体。默认端点为 `<hostOrKey param>/api/v1/collections/<collection param>/get`。
apoc.vectordb.chroma.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config)	从指定的 `vector` 和 `limit` 结果数量中检索最接近的向量，位于第 2 个参数中指定名称的集合中，并可选择创建/更新 neo4j 实体。默认端点为 `<hostOrKey param>/api/v1/collections/<collection param>/query`。

apoc.vectordb.chroma.info(hostOrKey, collection, $config)

获取指定现有集合的信息，如果集合不存在则抛出错误 500

apoc.vectordb.chroma.createCollection(hostOrKey, collection, similarity, size, $config)

创建一个集合，集合名称在第 2 个参数中指定，并指定 similarity 和 size。默认端点为 <hostOrKey param>/api/v1/collections。

apoc.vectordb.chroma.deleteCollection(hostOrKey, collection, $config)

删除在第 2 个参数中指定名称的集合。默认端点为 <hostOrKey param>/api/v1/collections/<collection param>。

apoc.vectordb.chroma.upsert(hostOrKey, collection, vectors, $config)

在第 2 个参数中指定的名称的集合中，插入（upsert）向量 [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]. 默认端点为 <hostOrKey param>/api/v1/collections/<collection param>/upsert。

apoc.vectordb.chroma.delete(hostOrKey, collection, ids, $config)

删除指定 ids 的向量。默认端点为 <hostOrKey param>/api/v1/collections/<collection param>/delete。

apoc.vectordb.chroma.get(hostOrKey, collection, ids, $config)

获取指定 ids 的向量。默认端点为 <hostOrKey param>/api/v1/collections/<collection param>/get。

apoc.vectordb.chroma.query(hostOrKey, collection, vector, filter, limit, $config)

从指定的 vector 和 limit 结果数量中检索最接近的向量，位于第 2 个参数中指定名称的集合中。默认端点为 <hostOrKey param>/api/v1/collections/<collection param>/query。

apoc.vectordb.chroma.getAndUpdate(hostOrKey, collection, ids, $config)

获取指定 ids 的向量，并可选择创建/更新 neo4j 实体。默认端点为 <hostOrKey param>/api/v1/collections/<collection param>/get。

apoc.vectordb.chroma.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config)

从指定的 vector 和 limit 结果数量中检索最接近的向量，位于第 2 个参数中指定名称的集合中，并可选择创建/更新 neo4j 实体。默认端点为 <hostOrKey param>/api/v1/collections/<collection param>/query。

其中第 1 个参数可以是 apoc 配置 apoc.chroma.<key>.host=myHost 定义的键。如果 hostOrKey=null，默认为 'https://:8000'。

示例

获取集合信息（它利用了此 API）

CALL apoc.vectordb.chroma.info(hostOrKey, 'test_collection', {<optional config>})

表 1. 示例结果
值
{"name": "test_collection", "metadata": {"size": 4, "hnsw:space": "cosine"}, "database": "default_database", "id": "74ebe008-1ccb-4d3d-8c5d-cdd7cfa526c2", "tenant": "default_tenant"}

创建一个集合（它利用了此 API）

CALL apoc.vectordb.chroma.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})

表 2. 示例结果
名称	元数据	数据库	ID	租户
test_collection	{"size": 4, "hnsw:space": "cosine"}	default_database	9c046861-f46f-417d-bd01-ca8c9f99aee5	default_tenant

删除一个集合（它利用了此 API）

CALL apoc.vectordb.chroma.deleteCollection($host, '<collection_id>', {<optional config>})

这会返回一个空结果。

插入（upsert）向量（它利用了此 API）

CALL apoc.vectordb.qdrant.upsert($host, '<collection_id>',
    [
        {id: 1, vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}, text: 'ajeje'},
        {id: 2, vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}, text: 'brazorf'}
    ],
    {<optional config>})

这会返回一个空结果。

获取向量（它利用了此 API）

CALL apoc.vectordb.chroma.get($host, '<collection_id>', ['1','2'], {<optional config>}), text

表 3. 示例结果
分数	元数据	ID	向量	文本	实体
null	{city: "Berlin", foo: "one"}	null	null	null	null
null	{city: "Berlin", foo: "two"}	null	null	null	null

获取带有 {allResults: true} 的向量

CALL apoc.vectordb.chroma.get($host, '<collection_id>', ['1','2'], {<optional config>}), text

表 4. 示例结果
分数	元数据	ID	向量	文本	实体
null	{city: "Berlin", foo: "one"}	1	[…]	ajeje	null
null	{city: "Berlin", foo: "two"}	2	[…]	brazorf	null

查询向量（它利用了此 API）

CALL apoc.vectordb.chroma.queryAndUpdate($host,
    '<collection_id>',
    [0.2, 0.1, 0.9, 0.7],
    {city: 'London'},
    5,
    {allResults: true, <optional config>}), text

表 5. 示例结果
分数	元数据	ID	向量	文本
1,	{city: "Berlin", foo: "one"}	1	[…]	ajeje
0.1	{city: "Berlin", foo: "two"}	2	[…]	brazorf

我们可以定义一个映射，通过利用向量元数据来获取关联的节点和关系，并可选择创建它们。

例如，如果我们使用上述 upsert 过程创建了 2 个向量，我们可以填充一些现有节点（例如 (:Test {myId: 'one'}) 和 (:Test {myId: 'two'})）

查询向量

CALL apoc.vectordb.chroma.queryAndUpdate($host, '<collection_id>',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            embeddingKey: "vect",
            nodeLabel: "Test",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

这将填充两个节点：(:Test {myId: 'one', city: 'Berlin', vect: [vector1]}) 和 (:Test {myId: 'two', city: 'London', vect: [vector2]})，它们将在 entity 列结果中返回。

我们还可以将映射配置 mode 设置为 CREATE_IF_MISSING（如果节点不存在则创建）、READ_ONLY（仅搜索节点/关系，不进行更新）或 UPDATE_EXISTING（默认行为）

CALL apoc.vectordb.chroma.queryAndUpdate($host, '<collection_id>',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            mode: "CREATE_IF_MISSING",
            embeddingKey: "vect",
            nodeLabel: "Test",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

这会创建如上所示的 2 个新节点。

或者，我们可以填充现有关系（例如 (:Start)-[:TEST {myId: 'one'}]→(:End) 和 (:Start)-[:TEST {myId: 'two'}]→(:End)）

CALL apoc.vectordb.chroma.queryAndUpdate($host, '<collection_id>',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            embeddingKey: "vect",
            relType: "TEST",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

这将填充两个关系：()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-() 和 ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()，它们将在 entity 列结果中返回。

我们还可以将映射用于 apoc.vectordb.chroma.query* 过程，以搜索符合标签/类型和 metadataKey 的节点/关系，而不进行更新（即等同于使用包含 mode: "READ_ONLY" 映射配置的 *.queryOrUpdate 过程）。

例如，对于先前的关系，我们可以执行以下过程，它只会在 rel 列中返回关系

CALL apoc.vectordb.weaviate.query($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        relType: "TEST",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

我们也可以将映射用于 apoc.vectordb.chroma.get* 过程

为了优化性能，我们可以选择在 apoc.vectordb.chroma.query 和 apoc.vectordb.chroma.get 过程中 YIELD 什么。例如，通过执行 CALL apoc.vectordb.chroma.query(…) YIELD metadata, score, id，RestAPI 请求将包含 {"include": ["metadatas", "documents", "distances"]}，这样我们就不会返回不需要的其他值。

可以将向量数据库过程与apoc.ml.rag 一起执行，如下所示

CALL apoc.vectordb.chroma.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value

它返回一个字符串，该字符串通过利用数据库向量的嵌入来回答 $question。

删除向量（它利用了此 API）

CALL apoc.vectordb.chroma.delete($host, '<collection_id>', [1,2], {<optional config>})

这会返回一个包含已删除 id 字符串的数组。例如，[\"1\", \"2\"]