执行不同的算法模式

在

Google Colab 中使用笔记本进行跟踪

本示例解释了 GDS 算法的执行模式以及如何使用每种模式。

设置

有关如何开始使用 Python 的更多信息，请参阅使用 Python 连接教程。

pip install graphdatascience

# Import the client
from graphdatascience import GraphDataScience

# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""

# Configure the client with AuraDS-recommended settings
gds = GraphDataScience(
    AURA_CONNECTION_URI,
    auth=(AURA_USERNAME, AURA_PASSWORD),
    aura_ds=True
)

在以下代码示例中，我们使用 print 函数打印 Pandas DataFrame 和 Series 对象。您可以尝试使用不同的方式打印 Pandas 对象，例如通过 to_string 和 to_json 方法；如果您使用 JSON 表示，在某些情况下，您可能需要包含一个默认处理程序来处理 Neo4j DateTime 对象。查看 Python 连接部分以获取一些示例。

有关如何开始使用 Cypher Shell 的更多信息，请参阅 Neo4j Cypher Shell 教程。

从安装 Cypher shell 的目录运行以下命令。

export AURA_CONNECTION_URI="neo4j+s://xxxxxxxx.databases.neo4j.io"
export AURA_USERNAME="neo4j"
export AURA_PASSWORD=""

./cypher-shell -a $AURA_CONNECTION_URI -u $AURA_USERNAME -p $AURA_PASSWORD

有关如何开始使用 Python 的更多信息，请参阅使用 Python 连接教程。

pip install neo4j

# Import the driver
from neo4j import GraphDatabase

# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""

# Instantiate the driver
driver = GraphDatabase.driver(
    AURA_CONNECTION_URI,
    auth=(AURA_USERNAME, AURA_PASSWORD)
)

# Import to prettify results
import json

# Import for the JSON helper function
from neo4j.time import DateTime

# Helper function for serializing Neo4j DateTime in JSON dumps
def default(o):
    if isinstance(o, (DateTime)):
        return o.isoformat()

创建一个示例图

我们首先创建一些基本图数据。

gds.run_cypher("""
    CREATE
      (home:Page {name:'Home'}),
      (about:Page {name:'About'}),
      (product:Page {name:'Product'}),
      (links:Page {name:'Links'}),
      (a:Page {name:'Site A'}),
      (b:Page {name:'Site B'}),
      (c:Page {name:'Site C'}),
      (d:Page {name:'Site D'}),

      (home)-[:LINKS {weight: 0.2}]->(about),
      (home)-[:LINKS {weight: 0.2}]->(links),
      (home)-[:LINKS {weight: 0.6}]->(product),
      (about)-[:LINKS {weight: 1.0}]->(home),
      (product)-[:LINKS {weight: 1.0}]->(home),
      (a)-[:LINKS {weight: 1.0}]->(home),
      (b)-[:LINKS {weight: 1.0}]->(home),
      (c)-[:LINKS {weight: 1.0}]->(home),
      (d)-[:LINKS {weight: 1.0}]->(home),
      (links)-[:LINKS {weight: 0.8}]->(home),
      (links)-[:LINKS {weight: 0.05}]->(a),
      (links)-[:LINKS {weight: 0.05}]->(b),
      (links)-[:LINKS {weight: 0.05}]->(c),
      (links)-[:LINKS {weight: 0.05}]->(d)
""")

CREATE
  (home:Page {name:'Home'}),
  (about:Page {name:'About'}),
  (product:Page {name:'Product'}),
  (links:Page {name:'Links'}),
  (a:Page {name:'Site A'}),
  (b:Page {name:'Site B'}),
  (c:Page {name:'Site C'}),
  (d:Page {name:'Site D'}),

  (home)-[:LINKS {weight: 0.2}]->(about),
  (home)-[:LINKS {weight: 0.2}]->(links),
  (home)-[:LINKS {weight: 0.6}]->(product),
  (about)-[:LINKS {weight: 1.0}]->(home),
  (product)-[:LINKS {weight: 1.0}]->(home),
  (a)-[:LINKS {weight: 1.0}]->(home),
  (b)-[:LINKS {weight: 1.0}]->(home),
  (c)-[:LINKS {weight: 1.0}]->(home),
  (d)-[:LINKS {weight: 1.0}]->(home),
  (links)-[:LINKS {weight: 0.8}]->(home),
  (links)-[:LINKS {weight: 0.05}]->(a),
  (links)-[:LINKS {weight: 0.05}]->(b),
  (links)-[:LINKS {weight: 0.05}]->(c),
  (links)-[:LINKS {weight: 0.05}]->(d)

# Cypher query
create_example_graph_on_disk_query = """
    CREATE
      (home:Page {name:'Home'}),
      (about:Page {name:'About'}),
      (product:Page {name:'Product'}),
      (links:Page {name:'Links'}),
      (a:Page {name:'Site A'}),
      (b:Page {name:'Site B'}),
      (c:Page {name:'Site C'}),
      (d:Page {name:'Site D'}),

      (home)-[:LINKS {weight: 0.2}]->(about),
      (home)-[:LINKS {weight: 0.2}]->(links),
      (home)-[:LINKS {weight: 0.6}]->(product),
      (about)-[:LINKS {weight: 1.0}]->(home),
      (product)-[:LINKS {weight: 1.0}]->(home),
      (a)-[:LINKS {weight: 1.0}]->(home),
      (b)-[:LINKS {weight: 1.0}]->(home),
      (c)-[:LINKS {weight: 1.0}]->(home),
      (d)-[:LINKS {weight: 1.0}]->(home),
      (links)-[:LINKS {weight: 0.8}]->(home),
      (links)-[:LINKS {weight: 0.05}]->(a),
      (links)-[:LINKS {weight: 0.05}]->(b),
      (links)-[:LINKS {weight: 0.05}]->(c),
      (links)-[:LINKS {weight: 0.05}]->(d)
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(create_example_graph_on_disk_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

然后，我们从刚刚创建的数据中投影一个内存中的图。

g, result = gds.graph.project(
    "example-graph",
    "Page",
    "LINKS",
    relationshipProperties="weight"
)

print(result)

CALL gds.graph.project(
  'example-graph',
  'Page',
  'LINKS',
  {
    relationshipProperties: 'weight'
  }
)

# Cypher query
create_example_graph_in_memory_query = """
    CALL gds.graph.project(
      'example-graph',
      'Page',
      'LINKS',
      {
        relationshipProperties: 'weight'
      }
    )
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(create_example_graph_in_memory_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

执行模式

每个生产级算法都可以以四种不同的模式运行

统计
流
变异
写入

在估计内存使用量并调整实例大小部分详细解释了额外的 estimate 模式。

在下文中，我们将使用 PageRank 算法来展示每种执行模式的使用方法。

统计

stats 模式可用于评估算法性能，而无需更改内存中的图。在以这种模式运行算法时，将返回一行包含算法统计信息的摘要（例如，计数或百分位数分布）。

result = gds.pageRank.stats(
    g,
    maxIterations=20,
    dampingFactor=0.85
)

print(result)

CALL gds.pageRank.stats(
  'example-graph',
  {maxIterations: 20, dampingFactor: 0.85}
)
YIELD ranIterations,
  didConverge,
  preProcessingMillis,
  computeMillis,
  postProcessingMillis,
  centralityDistribution,
  configuration
RETURN *

# Cypher query
page_rank_stats_example_graph_query = """
    CALL gds.pageRank.stats(
      'example-graph',
      {maxIterations: 20, dampingFactor: 0.85}
    )
    YIELD ranIterations,
      didConverge,
      preProcessingMillis,
      computeMillis,
      postProcessingMillis,
      centralityDistribution,
      configuration
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(page_rank_stats_example_graph_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

结果包含估计的算法运行时间 (computeMillis) 以及其他详细信息，如中心性分布和配置参数。

流

stream 模式将算法的结果作为 Cypher 结果行返回。这与标准 Cypher 读取查询的操作方式类似。

在 PageRank 示例中，此模式将为每个节点返回一个节点 ID 和计算出的 PageRank 分数。然后可以使用 gds.util.asNode 过程从其节点 ID 查找节点。

results = gds.pageRank.stream(
    g,
    maxIterations=20,
    dampingFactor=0.85
)

print(results)

CALL gds.pageRank.stream(
  'example-graph',
  {maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodeId, score
RETURN *

# Cypher query to just get internal node ID and score
page_rank_stream_example_graph_query = """
    CALL gds.pageRank.stream(
      'example-graph',
      {maxIterations: 20, dampingFactor: 0.85}
    )
    YIELD nodeId, score
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Run query
    results = session.run(page_rank_stream_example_graph_query).data()

    # Prettify the results
    print(json.dumps(results, indent=2, sort_keys=True))

由于算法可能运行很长时间，并且连接可能会突然断开，我们建议使用 mutate 和 write 模式来确保计算完成并将结果保存。

变异

mutate 模式在内存中的图上操作，并使用 mutateProperty 配置参数指定的新属性对其进行更新。新的属性必须不存在于内存中的图中。

此模式在链接几个算法的执行时很有用，其中每个算法都依赖于前一个算法的结果。

在 PageRank 的情况下，此模式的结果是每个节点的分数。在此示例中，我们将计算出的分数添加到内存中的图的每个节点中，作为名为 pageRankScore 的新属性的值。

result = gds.pageRank.mutate(
    g,
    mutateProperty="pageRankScore",
    maxIterations=20,
    dampingFactor=0.85
)

print(result)

CALL gds.pageRank.mutate(
  'example-graph',
  {mutateProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodePropertiesWritten, ranIterations
RETURN *

# Cypher query to just get mutate the graph
page_rank_mutate_example_graph_query = """
    CALL gds.pageRank.mutate(
      'example-graph',
      {mutateProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
    )
    YIELD nodePropertiesWritten, ranIterations
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(page_rank_mutate_example_graph_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

写入

write 模式将算法计算的结果写回 Neo4j 数据库。写入的数据可以是节点属性（例如 PageRank 分数）、新关系（例如节点相似度相似度）或关系属性（仅针对新创建的关系）。

与上一个示例类似，这里我们将 PageRank 算法的计算出的分数添加到 Neo4j 数据库的每个节点中，作为名为 pageRankScore 的新属性的值。

要将 write 模式计算的结果与另一个算法一起使用，必须从 Neo4j 数据库创建新的内存中的图。

result = gds.pageRank.write(
    g,
    writeProperty="pageRankScore",
    maxIterations=20,
    dampingFactor=0.85
)

print(result)

CALL gds.pageRank.write(
  'example-graph',
  {writeProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
)
YIELD nodePropertiesWritten, ranIterations
RETURN *

# Cypher query to write the graph
page_rank_write_example_graph_query = """
    CALL gds.pageRank.write(
      'example-graph',
      {writeProperty: 'pageRankScore', maxIterations: 20, dampingFactor: 0.85}
    )
    YIELD nodePropertiesWritten, ranIterations
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Run query
    result = session.run(page_rank_write_example_graph_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

清理

完成示例后，可以删除内存中的图以及 Neo4j 数据库中的数据。

result = gds.graph.drop(g)
print(result)

gds.run_cypher("""
    MATCH (n)
    DETACH DELETE n
""")

CALL gds.graph.drop('example-graph');

MATCH (n)
DETACH DELETE n;

delete_example_in_memory_graph_query = """
    CALL gds.graph.drop('example-graph')
"""

delete_example_graph = """
    MATCH (n)
    DETACH DELETE n
"""

with driver.session() as session:
    # Delete in-memory graph
    result = session.run(delete_example_in_memory_graph_query).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True, default=default))

    # Delete data from Neo4j
    result = session.run(delete_example_graph).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True, default=default))

关闭连接

在不再需要时，应始终关闭连接。

虽然 GDS 客户端会在对象被删除时自动关闭连接，但显式关闭它是一个好习惯。

# Close the client connection
gds.close()

# Close the driver connection
driver.close()

执行不同的算法模式

设置

创建一个示例图

执行模式

统计

流

变异

写入

清理

关闭连接

参考

文档

Cypher

建模