根据搜索提示查找电影

一旦电影节点有了编码其标题和情节的嵌入,并且在数据库中存在对这些嵌入的向量索引,您就可以检索与松散描述(搜索提示)匹配的电影,就像您使用搜索引擎根据几个关键词查找相关网页一样。

此页面中的示例展示了如何检索与提示罪犯因爱而改变相关的电影。您可以想象它就像去电影院并询问:“您能向我推荐哪些电影?今天我被罪犯因爱而改变的电影所启发。”主要区别在于在电影院,对话是用自然语言进行的,而要搜索数据库,则需要先将提示转换为嵌入。然后,向量索引可以使用提示嵌入来检索其嵌入与搜索提示最相似的节点。

始终使用相同的模型生成嵌入:选择一个模型并使用它为数据集和搜索提示生成嵌入。尝试混合具有不同向量大小的模型会导致错误。具有相同向量大小的模型可以使用,但它们不太可能很好地相互作用,因为它们在训练方式上很可能不同。

使用开源库生成的嵌入的相似度

此示例使用SentenceTransformers来检索与描述罪犯因爱而改变相关的节点。

from sentence_transformers import SentenceTransformer
import neo4j


URI = '<URI for Neo4j database>'
AUTH = ('<Username>', '<Password>')
DB_NAME = '<Database name>'  # examples: 'recommendations-50', 'neo4j'

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
driver.verify_connectivity()

model = SentenceTransformer('all-MiniLM-L6-v2')  (1)

query_prompt = 'a criminal is changed through love'  (2)
query_embedding = model.encode(query_prompt)

related_movies, _, _ = driver.execute_query('''  (3)
    CALL db.index.vector.queryNodes('moviePlots', 5, $queryEmbedding)
    YIELD node, score
    RETURN node.title AS title, node.plot AS plot, score
    ''', queryEmbedding=query_embedding,
    database_=DB_NAME)
print(f'Movies whose plot and title relate to `{query_prompt}`:')
for record in related_movies:
    print(record)
1 重要的是,使用与用于生成要搜索的嵌入的模型相同的模型来生成搜索提示的嵌入。本教程使用all-MiniLM-L6-V2来生成嵌入,这里也再次使用它。
2 查询提示包含要检索的电影的松散描述。然后将其编码为嵌入,以便可以用来查询相似的节点。
3 要查询向量索引,请使用 Cypher 过程db.index.vector.queryNodes。数据库从向量索引moviePlots返回与query_embedding最相似的5个节点,以及它们与查询嵌入匹配程度的score
Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.792834997177124>
<Record title='Laura' plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.7741715908050537>
<Record title='Despicable Me' plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.772994875907898>
<Record title='Laws of Attraction' plot='Amidst a sea of litigation, two New York City divorce lawyers find love.' score=0.7727792263031006>
<Record title='Love the Hard Way' plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.7681001424789429>

使用 OpenAI 和其他云服务生成的嵌入的相似度

此示例使用 OpenAI 来检索与描述罪犯因爱而改变相关的节点。

import neo4j


URI = '<URI for Neo4j database>'
AUTH = ('<Username>', '<Password>')
DB_NAME = '<Database name>'  # examples: 'recommendations-50', 'neo4j'
driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
driver.verify_connectivity()

openAI_token = '<OpenAI API token>'  (1)

search_prompt = 'a criminal is changed through love'  (2)

search_query = '''
WITH genai.vector.encode($searchPrompt, 'OpenAI', { token: $token }) AS queryVector  (3)
CALL db.index.vector.queryNodes('moviePlots', 5, queryVector)  (4)
YIELD node, score
RETURN node.title as title, node.plot, score
'''
records, summary, _ = driver.execute_query(
    search_query, searchPrompt=search_prompt, token=openAI_token,
    database_=DB_NAME)
print(f'Movies whose plot and title relate to `{search_prompt}`:')
for record in records:
    print(record)
1 您的 OpenAI API 令牌,例如sk-proj-XXXX
2 查询提示包含要检索的电影的松散描述。
3 查询提示通过 Cypher 函数genai.vector.encode()编码为嵌入,以便可以用来查询相似的节点。
4 要查询向量索引,请使用 Cypher 过程db.index.vector.queryNodes。数据库从向量索引moviePlots返回与queryVector最相似的5个节点,以及它们与查询嵌入匹配程度的score
Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' node.plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.9272396564483643>
<Record title='Love the Hard Way' node.plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.9221653938293457>
<Record title='Laura' node.plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.9215129017829895>
<Record title='Despicable Me' node.plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.9206478595733643>
<Record title='Cook the Thief His Wife & Her Lover, The' node.plot='The wife of an abusive criminal finds solace in the arms of a kind regular guest in her husbands restaurant.' score=0.9205931425094604>

匹配质量

匹配质量完全取决于嵌入模型和数据集,而不是 Neo4j 向量索引。嵌入始终在Neo4j 之外生成;数据库只存储它们作为属性。

考虑使用 SentenceTransformers(OpenAI 的结果类似)检索到的使用搜索提示罪犯因爱而改变的节点

Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.792834997177124>
<Record title='Laura' plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.7741715908050537>
<Record title='Despicable Me' plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.772994875907898>
<Record title='Laws of Attraction' plot='Amidst a sea of litigation, two New York City divorce lawyers find love.' score=0.7727792263031006>
<Record title='Love the Hard Way' plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.7681001424789429>

此示例表明嵌入按预期工作:卑鄙的我以 77% 的相关性得分位居第三。同时,它也表明了嵌入模型的局限性,因为它检索到了与提示无关的电影

  • 劳拉没有“罪犯因爱而改变”,但它有一个警探(经常与罪犯合作),他在一起谋杀(再次与罪犯相关)的情况下坠入爱河

  • 爱情法则根本没有罪犯,但它有:吸引,与有关;诉讼,通常发生在与罪犯相关的法院;律师,通常与罪犯有关;以及,尽管是在律师之间。

  • 爱的方式则几乎相反:一个无辜的学生爱上了一个罪犯(一个小偷)的低级版本,并走上了堕落之路。

即使这些电影几乎与搜索提示无关,数据库是正确的:根据嵌入,它们是最相关的电影。为什么嵌入不能以人们期望的方式编码含义是一个与向量索引无关的问题,而完全与外部 AI 模型有关。如果您的搜索提示返回的结果很差,您应该调查嵌入模型及其应用到的数据集,而不是调整 Neo4j 方面的设置。