根据搜索提示查找电影
一旦电影节点具有编码其标题和情节的嵌入,并且数据库中存在这些嵌入的向量索引,您就可以根据模糊的描述(一个搜索提示)检索电影,这有点像您查询搜索引擎以根据几个关键词查找相关网页。
本页中的示例展示了如何检索与提示一个罪犯因爱而改变
相关的电影。您可以想象这就像去电影院问:“您会向我推荐哪些电影?今天我想看一部罪犯因爱而改变的电影。” 主要区别在于,在电影院对话以自然语言进行,而在数据库中搜索时,提示首先被转换为嵌入。然后,向量索引可以使用提示嵌入来检索与搜索提示最相似的节点。
始终使用相同的模型生成嵌入:选择一个模型,并用它为数据集和搜索提示生成嵌入。尝试混合使用不同向量大小的模型将导致错误。具有相同向量大小的模型将起作用,但它们之间不太可能很好地交互,因为它们的训练方式很可能不同。 |
使用开源库生成的嵌入相似度
此示例使用SentenceTransformers
来检索与描述一个罪犯因爱而改变
相关的节点。
from sentence_transformers import SentenceTransformer
import neo4j
URI = '<URI for Neo4j database>'
AUTH = ('<Username>', '<Password>')
DB_NAME = '<Database name>' # examples: 'recommendations-50', 'neo4j'
driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
driver.verify_connectivity()
model = SentenceTransformer('all-MiniLM-L6-v2') (1)
query_prompt = 'a criminal is changed through love' (2)
query_embedding = model.encode(query_prompt)
related_movies, _, _ = driver.execute_query(''' (3)
CALL db.index.vector.queryNodes('moviePlots', 5, $queryEmbedding)
YIELD node, score
RETURN node.title AS title, node.plot AS plot, score
''', queryEmbedding=query_embedding,
database_=DB_NAME)
print(f'Movies whose plot and title relate to `{query_prompt}`:')
for record in related_movies:
print(record)
1 | 重要的是,为搜索提示生成嵌入所使用的模型,必须与用于生成您搜索的嵌入的模型相同。本教程使用all-MiniLM-L6-V2 生成嵌入,这里也再次使用了该模型。 |
2 | 查询提示包含对要检索的电影的模糊描述。然后将其编码为嵌入,以便可以用于查询相似节点。 |
3 | 要查询向量索引,请使用 Cypher 过程 db.index.vector.queryNodes 。数据库将从向量索引moviePlots 中返回与query_embedding 最相似的5 个节点,以及它们与查询嵌入的匹配程度分数 。 |
Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.792834997177124>
<Record title='Laura' plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.7741715908050537>
<Record title='Despicable Me' plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.772994875907898>
<Record title='Laws of Attraction' plot='Amidst a sea of litigation, two New York City divorce lawyers find love.' score=0.7727792263031006>
<Record title='Love the Hard Way' plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.7681001424789429>
使用 OpenAI 和其他云服务生成的嵌入相似度
此示例使用 OpenAI 来检索与描述一个罪犯因爱而改变
相关的节点。
import neo4j
URI = '<URI for Neo4j database>'
AUTH = ('<Username>', '<Password>')
DB_NAME = '<Database name>' # examples: 'recommendations-50', 'neo4j'
driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
driver.verify_connectivity()
openAI_token = '<OpenAI API token>' (1)
search_prompt = 'a criminal is changed through love' (2)
search_query = '''
WITH genai.vector.encode($searchPrompt, 'OpenAI', { token: $token }) AS queryVector (3)
CALL db.index.vector.queryNodes('moviePlots', 5, queryVector) (4)
YIELD node, score
RETURN node.title as title, node.plot, score
'''
records, summary, _ = driver.execute_query(
search_query, searchPrompt=search_prompt, token=openAI_token,
database_=DB_NAME)
print(f'Movies whose plot and title relate to `{search_prompt}`:')
for record in records:
print(record)
1 | 您的 OpenAI API 令牌,例如sk-proj-XXXX 。 |
2 | 查询提示包含对要检索的电影的模糊描述。 |
3 | 查询提示通过 Cypher 函数 genai.vector.encode() 编码为嵌入,以便可以用于查询相似节点。 |
4 | 要查询向量索引,请使用 Cypher 过程 db.index.vector.queryNodes 。数据库将从向量索引moviePlots 中返回与queryVector 最相似的5 个节点,以及它们与查询嵌入的匹配程度分数 。 |
Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' node.plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.9272396564483643>
<Record title='Love the Hard Way' node.plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.9221653938293457>
<Record title='Laura' node.plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.9215129017829895>
<Record title='Despicable Me' node.plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.9206478595733643>
<Record title='Cook the Thief His Wife & Her Lover, The' node.plot='The wife of an abusive criminal finds solace in the arms of a kind regular guest in her husbands restaurant.' score=0.9205931425094604>
匹配质量
匹配质量完全取决于嵌入模型和数据集,而不是 Neo4j 向量索引。 嵌入始终在 Neo4j 外部生成;数据库仅将其作为属性存储。
考虑使用 SentenceTransformers 检索到的与搜索提示一个罪犯因爱而改变
相关的节点(OpenAI 的结果类似)
Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.792834997177124>
<Record title='Laura' plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.7741715908050537>
<Record title='Despicable Me' plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.772994875907898>
<Record title='Laws of Attraction' plot='Amidst a sea of litigation, two New York City divorce lawyers find love.' score=0.7727792263031006>
<Record title='Love the Hard Way' plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.7681001424789429>
此示例显示嵌入按预期工作:Despicable Me
以 77% 的相关性得分排在第三位。与此同时,它也显示了嵌入模型的局限性,因为它检索到的电影与提示并不真正相关
-
Laura
没有“因爱而改变的罪犯”,但它有一个在谋杀
背景下坠入爱河
的警察侦探
(他经常与罪犯
打交道)(再次与罪犯
相关)。 -
Laws of Attraction
完全没有罪犯
,但它有:与爱
相关的吸引
;通常发生在法院并与罪犯
相关的诉讼
;经常与罪犯
联系在一起的律师
;以及爱
,尽管是在律师之间。 -
Love the Hard Way
几乎是反过来的:一个无辜的学生爱上了一个低级的罪犯
(一个小偷
),并陷入了堕落的深渊。
即使这些电影与搜索提示几乎不相关,数据库是正确的:它们是根据嵌入最相关的。为什么嵌入没有按预期方式编码含义,这是一个与向量索引无关,而与外部 AI 模型完全相关的问题。如果您的搜索提示返回的结果不佳,您应该调查嵌入模型及其应用的数据集,而不是调整 Neo4j 的设置。