执行不同的算法模式

Colab Google Colab 中使用笔记本进行跟踪

本示例解释了 GDS 算法的执行模式 以及如何使用每种模式。

设置

有关如何开始使用 Python 的更多信息,请参阅 使用 Python 连接 教程。

pip install graphdatascience
# Import the client
from graphdatascience import GraphDataScience

# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""

# Configure the client with AuraDS-recommended settings
gds = GraphDataScience(
    AURA_CONNECTION_URI,
    auth=(AURA_USERNAME, AURA_PASSWORD),
    aura_ds=True
)

在以下代码示例中,我们使用 print 函数打印 Pandas DataFrameSeries 对象。您可以尝试使用不同的方式打印 Pandas 对象,例如通过 to_stringto_json 方法;如果您使用 JSON 表示,在某些情况下,您可能需要包含一个 默认处理程序 来处理 Neo4j DateTime 对象。查看 Python 连接 部分以获取一些示例。

有关如何开始使用 Cypher Shell 的更多信息,请参阅 Neo4j Cypher Shell 教程。

从安装 Cypher shell 的目录运行以下命令。
export AURA_CONNECTION_URI="neo4j+s://xxxxxxxx.databases.neo4j.io"
export AURA_USERNAME="neo4j"
export AURA_PASSWORD=""

./cypher-shell -a $AURA_CONNECTION_URI -u $AURA_USERNAME -p $AURA_PASSWORD

有关如何开始使用 Python 的更多信息,请参阅 使用 Python 连接 教程。

pip install neo4j
# Import the driver
from neo4j import GraphDatabase

# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""

# Instantiate the driver
driver = GraphDatabase.driver(
    AURA_CONNECTION_URI,
    auth=(AURA_USERNAME, AURA_PASSWORD)
)
# Import to prettify results
import json

# Import for the JSON helper function
from neo4j.time import DateTime

# Helper function for serializing Neo4j DateTime in JSON dumps
def default(o):
    if isinstance(o, (DateTime)):
        return o.isoformat()

创建一个示例图

我们首先创建一些基本图数据。

gds.run_cypher("""
    CREATE
      (home:Page {name:'Home'}),
      (about:Page {name:'About'}),
      (product:Page {name:'Product'}),
      (links:Page {name:'Links'}),
      (a:Page {name:'Site A'}),
      (b:Page {name:'Site B'}),
      (c:Page {name:'Site C'}),
      (d:Page {name:'Site D'}),

      (home)-[:LINKS {weight: 0.2}]->(about),
      (home)-[:LINKS {weight: 0.2}]->(links),
      (home)-[:LINKS {weight: 0.6}]->(product),
      (about)-[:LINKS {weight: 1.0}]->(home),
      (product)-[:LINKS {weight: 1.0}]->(home),
      (a)-[:LINKS {weight: 1.0}]->(home),
      (b)-[:LINKS {weight: 1.0}]->(home),
      (c)-[:LINKS {weight: 1.0}]->(home),
      (d)-[:LINKS {weight: 1.0}]->(home),
      (links)-[:LINKS {weight: 0.8}]->(home),
      (links)-[:LINKS {weight: 0.05}]->(a),
      (links)-[:LINKS {weight: 0.05}]->(b),
      (links)-[:LINKS {weight: 0.05}]->(c),
      (links)-[:LINKS {weight: 0.05}]->(d)
""")
CREATE
  (home:Page {name:'Home'}),
  (about:Page {name:'About'}),
  (product:Page {name:'Product'}),
  (links:Page {name:'Links'}),
  (a:Page {name:'Site A'}),
  (b:Page {name:'Site B'}),
  (c:Page {name:'Site C'}),
  (d:Page {name:'Site D'}),

  (home)-[:LINKS {weight: 0.2}]->(about),
  (home)-[:LINKS {weight: 0.2}]->(links),
  (home)-[:LINKS {weight: 0.6}]->(product),
  (about)-[:LINKS {weight: 1.0}]->(home),
  (product)-[:LINKS {weight: 1.0}]->(home),
  (a)-[:LINKS {weight: 1.0}]->(home),
  (b)-[:LINKS {weight: 1.0}]->(home),
  (c)-[:LINKS {weight: 1.0}]->(home),
  (d)-[:LINKS {weight: 1.0}]->(home),
  (links)-[:LINKS {weight: 0.8}]->(home),
  (links)-[:LINKS {weight: 0.05}]->(a),
  (links)-[:LINKS {weight: 0.05}]->(b),
  (links)-[:LINKS {weight: 0.05}]->(c),
  (links)-[:LINKS {weight: 0.05}]->(d)
# Cypher query
create_example_graph_on_disk_query = """
    CREATE
      (home:Page {name:'Home'}),
      (about:Page {name:'About'}),
      (product:Page {name:'Product'}),
      (links:Page {name:'Links'}),
      (a:Page {name:'Site A'}),
      (b:Page {name:'Site B'}),
      (c:Page {name:'Site C'}),
      (d:Page {name:'Site D'}),

      (home)-[:LINKS {weight: 0.2}]->(about),
      (home)-[:LINKS {weight: 0.2}]->(links),
      (home)-[:LINKS {weight: 0.6}]->(product),
      (about)-[:LINKS {weight: 1.0}]->(home),
      (product)-[:LINKS {weight: 1.0}]->(home),
      (a)-[:LINKS {weight: 1.0}]->(home),
      (b)-[:LINKS {weight: 1.0}]->(home),
      (c)-[:LINKS {weight: 1.0}]->(home),
      (d)-[:LINKS {weight: 1.0}]->(home),
      (links)-[:LINKS {weight: 0.8}]->(home),
      (links)-[:LINKS {weight: 0.05}]->(a),
      (links)-[:LINKS {weight: 0.05}]->(b),
      (links)-[:LINKS {weight: 0.05}]->(c),
      (links)-[:LINKS {weight: 0.05}]->(d)
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(create_example_graph_on_disk_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

然后,我们从刚刚创建的数据中投影一个内存中的图。

g, result = gds.graph.project(
    "example-graph",
    "Page",
    "LINKS",
    relationshipProperties="weight"
)

print(result)
CALL gds.graph.project(
  'example-graph',
  'Page',
  'LINKS',
  {
    relationshipProperties: 'weight'
  }
)
# Cypher query
create_example_graph_in_memory_query = """
    CALL gds.graph.project(
      'example-graph',
      'Page',
      'LINKS',
      {
        relationshipProperties: 'weight'
      }
    )
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(create_example_graph_in_memory_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

执行模式

每个生产级算法都可以以四种不同的模式运行

  • 统计

  • 变异

  • 写入

估计内存使用量并调整实例大小 部分详细解释了额外的 estimate 模式。

在下文中,我们将使用 PageRank 算法来展示每种执行模式的使用方法。

统计

stats 模式可用于评估算法性能,而无需更改内存中的图。在以这种模式运行算法时,将返回一行包含算法统计信息的摘要(例如,计数或百分位数分布)。

result = gds.pageRank.stats(
    g,
    maxIterations=20,
    dampingFactor=0.85
)

print(result)
CALL gds.pageRank.stats(
  'example-graph',
  {maxIterations: 20, dampingFactor: 0.85}
)
YIELD ranIterations,
  didConverge,
  preProcessingMillis,
  computeMillis,
  postProcessingMillis,
  centralityDistribution,
  configuration
RETURN *
# Cypher query
page_rank_stats_example_graph_query = """
    CALL gds.pageRank.stats(
      'example-graph',
      {maxIterations: 20, dampingFactor: 0.85}
    )
    YIELD ranIterations,
      didConverge,
      preProcessingMillis,
      computeMillis,
      postProcessingMillis,
      centralityDistribution,
      configuration
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(page_rank_stats_example_graph_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

结果包含估计的算法运行时间 (computeMillis) 以及其他详细信息,如中心性分布和配置参数。

stream 模式将算法的结果作为 Cypher 结果行返回。这与标准 Cypher 读取查询的操作方式类似。

在 PageRank 示例中,此模式将为每个节点返回一个节点 ID 和计算出的 PageRank 分数。然后可以使用 gds.util.asNode 过程从其节点 ID 查找节点。

results = gds.pageRank.stream(
    g,
    maxIterations=20,
    dampingFactor=0.85
)

print(results)
CALL gds.pageRank.stream(
  'example-graph',
  {maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodeId, score
RETURN *
# Cypher query to just get internal node ID and score
page_rank_stream_example_graph_query = """
    CALL gds.pageRank.stream(
      'example-graph',
      {maxIterations: 20, dampingFactor: 0.85}
    )
    YIELD nodeId, score
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Run query
    results = session.run(page_rank_stream_example_graph_query).data()

    # Prettify the results
    print(json.dumps(results, indent=2, sort_keys=True))

由于算法可能运行很长时间,并且连接可能会突然断开,我们建议使用 mutatewrite 模式来确保计算完成并将结果保存。

变异

mutate 模式在内存中的图上操作,并使用 mutateProperty 配置参数指定的新属性对其进行更新。新的属性必须不存在于内存中的图中。

此模式在链接几个算法的执行时很有用,其中每个算法都依赖于前一个算法的结果。

在 PageRank 的情况下,此模式的结果是每个节点的分数。在此示例中,我们将计算出的分数添加到内存中的图的每个节点中,作为名为 pageRankScore 的新属性的值。

result = gds.pageRank.mutate(
    g,
    mutateProperty="pageRankScore",
    maxIterations=20,
    dampingFactor=0.85
)

print(result)
CALL gds.pageRank.mutate(
  'example-graph',
  {mutateProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodePropertiesWritten, ranIterations
RETURN *
# Cypher query to just get mutate the graph
page_rank_mutate_example_graph_query = """
    CALL gds.pageRank.mutate(
      'example-graph',
      {mutateProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
    )
    YIELD nodePropertiesWritten, ranIterations
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(page_rank_mutate_example_graph_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

写入

write 模式将算法计算的结果写回 Neo4j 数据库。写入的数据可以是节点属性(例如 PageRank 分数)、新关系(例如节点相似度相似度)或关系属性(仅针对新创建的关系)。

与上一个示例类似,这里我们将 PageRank 算法的计算出的分数添加到 Neo4j 数据库的每个节点中,作为名为 pageRankScore 的新属性的值。

要将 write 模式计算的结果与另一个算法一起使用,必须从 Neo4j 数据库创建新的内存中的图。
result = gds.pageRank.write(
    g,
    writeProperty="pageRankScore",
    maxIterations=20,
    dampingFactor=0.85
)

print(result)
CALL gds.pageRank.write(
  'example-graph',
  {writeProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodePropertiesWritten, ranIterations
RETURN *
# Cypher query to write the graph
page_rank_write_example_graph_query = """
    CALL gds.pageRank.write(
      'example-graph',
      {writeProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
    )
    YIELD nodePropertiesWritten, ranIterations
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(page_rank_write_example_graph_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

清理

完成示例后,可以删除内存中的图以及 Neo4j 数据库中的数据。

result = gds.graph.drop(g)
print(result)

gds.run_cypher("""
    MATCH (n)
    DETACH DELETE n
""")
CALL gds.graph.drop('example-graph');

MATCH (n)
DETACH DELETE n;
delete_example_in_memory_graph_query = """
    CALL gds.graph.drop('example-graph')
"""

delete_example_graph = """
    MATCH (n)
    DETACH DELETE n
"""

with driver.session() as session:
    # Delete in-memory graph
    result = session.run(delete_example_in_memory_graph_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True, default=default))

    # Delete data from Neo4j
    result = session.run(delete_example_graph).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True, default=default))

关闭连接

在不再需要时,应始终关闭连接。

虽然 GDS 客户端会在对象被删除时自动关闭连接,但显式关闭它是一个好习惯。

# Close the client connection
gds.close()
# Close the driver connection
driver.close()

参考

Cypher