Weaviate
以下是所有可用的 Weaviate 过程列表,请注意此列表和过程签名与其他过程(例如 Qdrant 过程)一致
名称 | 描述 |
---|---|
apoc.vectordb.weaviate.info($host, $collectionName, $config) |
获取指定现有集合的信息,如果集合不存在则抛出 FileNotFoundException |
apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config) |
创建一个集合,使用第二个参数指定的名称,并指定 |
apoc.vectordb.weaviate.deleteCollection(hostOrKey, collection, $config) |
删除第二个参数指定的名称的集合。默认端点是 |
apoc.vectordb.weaviate.upsert(hostOrKey, collection, vectors, $config) |
在第二个参数指定的名称的集合中,执行向量的插入更新操作,向量格式为 [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]。默认端点是 |
apoc.vectordb.weaviate.delete(hostOrKey, collection, ids, $config) |
删除具有指定 |
apoc.vectordb.weaviate.get(hostOrKey, collection, ids, $config) |
获取具有指定 |
apoc.vectordb.weaviate.query(hostOrKey, collection, vector, filter, limit, $config) |
在第二个参数指定的名称的集合中,检索与指定 |
apoc.vectordb.weaviate.getAndUpdate(hostOrKey, collection, ids, $config) |
获取具有指定 |
apoc.vectordb.weaviate.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config) |
在第二个参数指定的名称的集合中,检索与指定 |
其中第一个参数可以是 apoc 配置 apoc.weaviate.<key>.host=myHost
定义的键。当 hostOrKey=null 时,默认值为 'https://:8080/v1'。
示例
CALL apoc.vectordb.weaviate.info($host, 'test_collection', {<optional config>})
值 |
---|
{"vectorizer": "none", "invertedIndexConfig": {"bm25": {"b": 0.75, "k1": 1.2}, "stopwords": {"additions": null, "removals": null, "preset": en}, "cleanupIntervalSeconds": 60}, "vectorIndexConfig": {"ef": -1, "dynamicEfMin": 100, "pq": {"centroids": 256, "trainingLimit": 100000, "encoder": {"type": "kmeans", "distribution": "log-normal"}, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": cosine, "skip": false, "dynamicEfFactor": 8, "bq": {"enabled": false}, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64}, "multiTenancyConfig": {"enabled": false}, "vectorIndexType": "hnsw", "replicationConfig": {"factor": 1}, "shardingConfig": {"desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id"}, "class": "TestCollection", "properties": [{"name": "city", "description": "This property was generated by Weaviate’s auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": "word", "indexSearchable": true, "dataType": ["text"]}, {"name": "foo", "description": "This property was generated by Weaviate’s auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": word, "indexSearchable": true, "dataType": ["text"]} ] } |
CALL apoc.vectordb.weaviate.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})
向量化器 | 倒排索引配置 | 向量索引配置 | 多租户配置 | 向量索引类型 | 复制配置 | 分片配置 | 类别 | 属性 |
---|---|---|---|---|---|---|---|---|
none |
{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60} |
{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 } |
{ "enabled": false } |
hnsw |
{ "factor": 1 } |
{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" } |
TestCollection |
null |
CALL apoc.vectordb.weaviate.createCollection("https://<weaviateInstanceId>.weaviate.network",
'TestCollection',
'cosine',
4,
{headers: {Authorization: 'Bearer <apiKey>'}})
向量化器 | 倒排索引配置 | 向量索引配置 | 多租户配置 | 向量索引类型 | 复制配置 | 分片配置 | 类别 | 属性 |
---|---|---|---|---|---|---|---|---|
none |
{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60} |
{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 } |
{ "enabled": false } |
hnsw |
{ "factor": 1 } |
{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" } |
TestCollection |
null |
CALL apoc.vectordb.weaviate.deleteCollection($host, 'test_collection', {<optional config>})
它返回一个空结果。
CALL apoc.vectordb.weaviate.upsert($host, 'test_collection',
[
{id: "8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
{id: "9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
],
{<optional config>})
lastUpdateTimeUnix | 向量 | id | creationTimeUnix | 类别 | 属性 |
---|---|---|---|---|---|
1721293838439 |
[0.05, 0.61, 0.76, 0.74] |
8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308 |
1721293838439 |
TestCollection |
{city: "Berlin", foo: "one"} |
1721293838439 |
[0.19, 0.81, 0.75, 0.11] |
9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308 |
1721293838439 |
TestCollection |
{city: "London", foo: "two"} |
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {<optional config>})
分数 | 元数据 | id | 向量 | 文本 | 实体 |
---|---|---|---|---|---|
null |
{city: "Berlin", foo: "one"} |
null |
null |
null |
null |
null |
{city: "Berlin", foo: "two"} |
null |
null |
null |
null |
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {allResults: true, <optional config>})
分数 | 元数据 | id | 向量 | 文本 | 实体 |
---|---|---|---|---|---|
null |
{city: "Berlin", foo: "one"} |
1 |
[…] |
null |
null |
null |
{city: "Berlin", foo: "two"} |
2 |
[…] |
null |
null |
CALL apoc.vectordb.weaviate.query($host,
'test_collection',
[0.2, 0.1, 0.9, 0.7],
'{operator: Equal, valueString: "London", path: ["city"]}',
5,
{fields: ["city", "foo"], allResults: true, <other optional config>})
分数 | 元数据 | id | 向量 | 文本 |
---|---|---|---|---|
1, |
{city: "Berlin", foo: "one"} |
1 |
[…] |
null |
0.1 |
{city: "Berlin", foo: "two"} |
2 |
[…] |
null |
我们可以定义一个映射,通过利用向量元数据来获取关联的节点和关系,并可选择创建它们。
例如,如果我们使用上述 upsert(插入更新)过程创建了 2 个向量,我们可以填充一些现有节点(即 (:Test {myId: 'one'})
和 (:Test {myId: 'two'})
)
CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
它将这两个节点填充为:(:Test {myId: 'one', city: 'Berlin', vect: [vector1]})
和 (:Test {myId: 'two', city: 'London', vect: [vector2]})
,它们将在 entity
列结果中返回。
我们还可以将映射配置的 mode
设置为 CREATE_IF_MISSING
(如果节点不存在则创建)、READ_ONLY
(仅搜索节点/关系,不进行更新)或 UPDATE_EXISTING
(默认行为)
CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
mode: "CREATE_IF_MISSING",
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
它会创建 2 个新节点,如上所述。
或者,我们可以填充一个现有关系(即 (:Start)-[:TEST {myId: 'one'}]→(:End)
和 (:Start)-[:TEST {myId: 'two'}]→(:End)
)
CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
它将这两个关系填充为:()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-()
和 ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()
,它们将在 entity
列结果中返回。
我们还可以将映射用于 apoc.vectordb.weaviate.query
过程,以便根据标签/类型和 metadataKey 搜索匹配的节点/关系,而不进行更新(即等同于使用 mode: "READ_ONLY"
映射配置的 *.queryOrUpdate
过程)。
例如,对于先前的关系,我们可以执行以下过程,它只在 rel
列中返回这些关系
CALL apoc.vectordb.weaviate.query($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
我们也可以将映射用于 |
为了优化性能,我们可以选择在使用 apoc.vectordb.weaviate.query 和 例如,通过执行 |
可以将向量数据库过程与apoc.ml.rag 一起执行,如下所示
CALL apoc.vectordb.weaviate.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD score, node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
它返回一个字符串,通过利用数据库向量的嵌入来回答 $question
。
CALL apoc.vectordb.weaviate.delete($host, 'test_collection', [1,2], {<optional config>})
值 |
---|
["1", "2"] |