执行不同的算法模式
在 |
本示例解释了 GDS 算法的执行模式 以及如何使用每种模式。
设置
有关如何开始使用 Python 的更多信息,请参阅 使用 Python 连接 教程。
pip install graphdatascience
# Import the client
from graphdatascience import GraphDataScience
# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""
# Configure the client with AuraDS-recommended settings
gds = GraphDataScience(
AURA_CONNECTION_URI,
auth=(AURA_USERNAME, AURA_PASSWORD),
aura_ds=True
)
有关如何开始使用 Cypher Shell 的更多信息,请参阅 Neo4j Cypher Shell 教程。
从安装 Cypher shell 的目录运行以下命令。 |
export AURA_CONNECTION_URI="neo4j+s://xxxxxxxx.databases.neo4j.io"
export AURA_USERNAME="neo4j"
export AURA_PASSWORD=""
./cypher-shell -a $AURA_CONNECTION_URI -u $AURA_USERNAME -p $AURA_PASSWORD
有关如何开始使用 Python 的更多信息,请参阅 使用 Python 连接 教程。
pip install neo4j
# Import the driver
from neo4j import GraphDatabase
# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""
# Instantiate the driver
driver = GraphDatabase.driver(
AURA_CONNECTION_URI,
auth=(AURA_USERNAME, AURA_PASSWORD)
)
# Import to prettify results
import json
# Import for the JSON helper function
from neo4j.time import DateTime
# Helper function for serializing Neo4j DateTime in JSON dumps
def default(o):
if isinstance(o, (DateTime)):
return o.isoformat()
创建一个示例图
我们首先创建一些基本图数据。
gds.run_cypher("""
CREATE
(home:Page {name:'Home'}),
(about:Page {name:'About'}),
(product:Page {name:'Product'}),
(links:Page {name:'Links'}),
(a:Page {name:'Site A'}),
(b:Page {name:'Site B'}),
(c:Page {name:'Site C'}),
(d:Page {name:'Site D'}),
(home)-[:LINKS {weight: 0.2}]->(about),
(home)-[:LINKS {weight: 0.2}]->(links),
(home)-[:LINKS {weight: 0.6}]->(product),
(about)-[:LINKS {weight: 1.0}]->(home),
(product)-[:LINKS {weight: 1.0}]->(home),
(a)-[:LINKS {weight: 1.0}]->(home),
(b)-[:LINKS {weight: 1.0}]->(home),
(c)-[:LINKS {weight: 1.0}]->(home),
(d)-[:LINKS {weight: 1.0}]->(home),
(links)-[:LINKS {weight: 0.8}]->(home),
(links)-[:LINKS {weight: 0.05}]->(a),
(links)-[:LINKS {weight: 0.05}]->(b),
(links)-[:LINKS {weight: 0.05}]->(c),
(links)-[:LINKS {weight: 0.05}]->(d)
""")
CREATE
(home:Page {name:'Home'}),
(about:Page {name:'About'}),
(product:Page {name:'Product'}),
(links:Page {name:'Links'}),
(a:Page {name:'Site A'}),
(b:Page {name:'Site B'}),
(c:Page {name:'Site C'}),
(d:Page {name:'Site D'}),
(home)-[:LINKS {weight: 0.2}]->(about),
(home)-[:LINKS {weight: 0.2}]->(links),
(home)-[:LINKS {weight: 0.6}]->(product),
(about)-[:LINKS {weight: 1.0}]->(home),
(product)-[:LINKS {weight: 1.0}]->(home),
(a)-[:LINKS {weight: 1.0}]->(home),
(b)-[:LINKS {weight: 1.0}]->(home),
(c)-[:LINKS {weight: 1.0}]->(home),
(d)-[:LINKS {weight: 1.0}]->(home),
(links)-[:LINKS {weight: 0.8}]->(home),
(links)-[:LINKS {weight: 0.05}]->(a),
(links)-[:LINKS {weight: 0.05}]->(b),
(links)-[:LINKS {weight: 0.05}]->(c),
(links)-[:LINKS {weight: 0.05}]->(d)
# Cypher query
create_example_graph_on_disk_query = """
CREATE
(home:Page {name:'Home'}),
(about:Page {name:'About'}),
(product:Page {name:'Product'}),
(links:Page {name:'Links'}),
(a:Page {name:'Site A'}),
(b:Page {name:'Site B'}),
(c:Page {name:'Site C'}),
(d:Page {name:'Site D'}),
(home)-[:LINKS {weight: 0.2}]->(about),
(home)-[:LINKS {weight: 0.2}]->(links),
(home)-[:LINKS {weight: 0.6}]->(product),
(about)-[:LINKS {weight: 1.0}]->(home),
(product)-[:LINKS {weight: 1.0}]->(home),
(a)-[:LINKS {weight: 1.0}]->(home),
(b)-[:LINKS {weight: 1.0}]->(home),
(c)-[:LINKS {weight: 1.0}]->(home),
(d)-[:LINKS {weight: 1.0}]->(home),
(links)-[:LINKS {weight: 0.8}]->(home),
(links)-[:LINKS {weight: 0.05}]->(a),
(links)-[:LINKS {weight: 0.05}]->(b),
(links)-[:LINKS {weight: 0.05}]->(c),
(links)-[:LINKS {weight: 0.05}]->(d)
"""
# Create the driver session
with driver.session() as session:
# Run query
result = session.run(create_example_graph_on_disk_query).data()
# Prettify the result
print(json.dumps(result, indent=2, sort_keys=True))
然后,我们从刚刚创建的数据中投影一个内存中的图。
g, result = gds.graph.project(
"example-graph",
"Page",
"LINKS",
relationshipProperties="weight"
)
print(result)
CALL gds.graph.project(
'example-graph',
'Page',
'LINKS',
{
relationshipProperties: 'weight'
}
)
# Cypher query
create_example_graph_in_memory_query = """
CALL gds.graph.project(
'example-graph',
'Page',
'LINKS',
{
relationshipProperties: 'weight'
}
)
"""
# Create the driver session
with driver.session() as session:
# Run query
result = session.run(create_example_graph_in_memory_query).data()
# Prettify the result
print(json.dumps(result, indent=2, sort_keys=True))
执行模式
每个生产级算法都可以以四种不同的模式运行
-
统计
-
流
-
变异
-
写入
在 估计内存使用量并调整实例大小 部分详细解释了额外的 estimate
模式。
在下文中,我们将使用 PageRank 算法来展示每种执行模式的使用方法。
统计
stats
模式可用于评估算法性能,而无需更改内存中的图。在以这种模式运行算法时,将返回一行包含算法统计信息的摘要(例如,计数或百分位数分布)。
result = gds.pageRank.stats(
g,
maxIterations=20,
dampingFactor=0.85
)
print(result)
CALL gds.pageRank.stats(
'example-graph',
{maxIterations: 20, dampingFactor: 0.85}
)
YIELD ranIterations,
didConverge,
preProcessingMillis,
computeMillis,
postProcessingMillis,
centralityDistribution,
configuration
RETURN *
# Cypher query
page_rank_stats_example_graph_query = """
CALL gds.pageRank.stats(
'example-graph',
{maxIterations: 20, dampingFactor: 0.85}
)
YIELD ranIterations,
didConverge,
preProcessingMillis,
computeMillis,
postProcessingMillis,
centralityDistribution,
configuration
RETURN *
"""
# Create the driver session
with driver.session() as session:
# Run query
result = session.run(page_rank_stats_example_graph_query).data()
# Prettify the result
print(json.dumps(result, indent=2, sort_keys=True))
结果包含估计的算法运行时间 (computeMillis
) 以及其他详细信息,如中心性分布和配置参数。
流
stream
模式将算法的结果作为 Cypher 结果行返回。这与标准 Cypher 读取查询的操作方式类似。
在 PageRank 示例中,此模式将为每个节点返回一个节点 ID 和计算出的 PageRank 分数。然后可以使用 gds.util.asNode
过程从其节点 ID 查找节点。
results = gds.pageRank.stream(
g,
maxIterations=20,
dampingFactor=0.85
)
print(results)
CALL gds.pageRank.stream(
'example-graph',
{maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodeId, score
RETURN *
# Cypher query to just get internal node ID and score
page_rank_stream_example_graph_query = """
CALL gds.pageRank.stream(
'example-graph',
{maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodeId, score
RETURN *
"""
# Create the driver session
with driver.session() as session:
# Run query
results = session.run(page_rank_stream_example_graph_query).data()
# Prettify the results
print(json.dumps(results, indent=2, sort_keys=True))
由于算法可能运行很长时间,并且连接可能会突然断开,我们建议使用 mutate
和 write
模式来确保计算完成并将结果保存。
变异
mutate
模式在内存中的图上操作,并使用 mutateProperty
配置参数指定的新属性对其进行更新。新的属性必须不存在于内存中的图中。
此模式在链接几个算法的执行时很有用,其中每个算法都依赖于前一个算法的结果。
在 PageRank 的情况下,此模式的结果是每个节点的分数。在此示例中,我们将计算出的分数添加到内存中的图的每个节点中,作为名为 pageRankScore
的新属性的值。
result = gds.pageRank.mutate(
g,
mutateProperty="pageRankScore",
maxIterations=20,
dampingFactor=0.85
)
print(result)
CALL gds.pageRank.mutate(
'example-graph',
{mutateProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodePropertiesWritten, ranIterations
RETURN *
# Cypher query to just get mutate the graph
page_rank_mutate_example_graph_query = """
CALL gds.pageRank.mutate(
'example-graph',
{mutateProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodePropertiesWritten, ranIterations
RETURN *
"""
# Create the driver session
with driver.session() as session:
# Run query
result = session.run(page_rank_mutate_example_graph_query).data()
# Prettify the result
print(json.dumps(result, indent=2, sort_keys=True))
写入
write
模式将算法计算的结果写回 Neo4j 数据库。写入的数据可以是节点属性(例如 PageRank 分数)、新关系(例如节点相似度相似度)或关系属性(仅针对新创建的关系)。
与上一个示例类似,这里我们将 PageRank 算法的计算出的分数添加到 Neo4j 数据库的每个节点中,作为名为 pageRankScore
的新属性的值。
要将 write 模式计算的结果与另一个算法一起使用,必须从 Neo4j 数据库创建新的内存中的图。 |
result = gds.pageRank.write(
g,
writeProperty="pageRankScore",
maxIterations=20,
dampingFactor=0.85
)
print(result)
CALL gds.pageRank.write(
'example-graph',
{writeProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodePropertiesWritten, ranIterations
RETURN *
# Cypher query to write the graph
page_rank_write_example_graph_query = """
CALL gds.pageRank.write(
'example-graph',
{writeProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodePropertiesWritten, ranIterations
RETURN *
"""
# Create the driver session
with driver.session() as session:
# Run query
result = session.run(page_rank_write_example_graph_query).data()
# Prettify the result
print(json.dumps(result, indent=2, sort_keys=True))
清理
完成示例后,可以删除内存中的图以及 Neo4j 数据库中的数据。
result = gds.graph.drop(g)
print(result)
gds.run_cypher("""
MATCH (n)
DETACH DELETE n
""")
CALL gds.graph.drop('example-graph');
MATCH (n)
DETACH DELETE n;
delete_example_in_memory_graph_query = """
CALL gds.graph.drop('example-graph')
"""
delete_example_graph = """
MATCH (n)
DETACH DELETE n
"""
with driver.session() as session:
# Delete in-memory graph
result = session.run(delete_example_in_memory_graph_query).data()
# Prettify the result
print(json.dumps(result, indent=2, sort_keys=True, default=default))
# Delete data from Neo4j
result = session.run(delete_example_graph).data()
# Prettify the result
print(json.dumps(result, indent=2, sort_keys=True, default=default))
参考
Cypher
-
详细了解 Cypher 语法
-
您可以使用 Cypher 速查表 作为所有可用 Cypher 功能的参考