从旧版到新版 Cypher 投影的迁移

谁应该阅读本指南

本指南适用于一直使用旧版 Cypher 投影 gds.graph.project.cypher 的用户。Cypher 投影现在使用 gds.graph.project 聚合函数完成。我们假定大部分提及的操作和概念只需少量解释即可理解。因此,我们在示例和比较中特意保持简洁。请参阅Cypher 投影的文档以获取更多详情。

结构性变更

旧版 Cypher 投影是一个独立的存储过程调用,其中 Cypher 查询作为字符串参数传递并由 GDS 执行。新版 Cypher 投影是一个聚合函数,作为 Cypher 查询的一部分被调用。GDS 不再负责或控制 Cypher 查询的执行。迁移到新版 Cypher 投影需要整体改变 Cypher 查询的编写方式。

不再有独立的节点查询和关系查询。相反,编写一个查询来生成源节点和目标节点对,并使用 gds.graph.project 聚合到图目录中。由于旧版 Cypher 投影中的关系查询已经要求您返回源节点和目标节点对,因此它是新查询的一个很好的起点。粗略地说,查询必须按如下方式重写

表 1. 两种 Cypher 投影之间的结构性变更
旧版 新版
CALL gds.graph.project.cypher(
  $graphName,
  $nodeQuery,
  $relationshipQuery,
  $configuration
)
$relationshipQuery
RETURN gds.graph.project(
  $graphName,
  sourceNode,
  targetNode,
  $dataConfig,
  $configuration
)

查询不再需要遵循特定结构,您可以使用任何生成源节点和目标节点对的 Cypher 查询。

语义变更

旧版 Cypher 投影为节点和关系提供独立的查询。节点查询首先执行,并定义图中的所有节点。关系查询其次执行,并且先前导入的节点作为关系的过滤器。只有先前导入的节点之间的关系才会被导入到图中。任何作为节点查询一部分导入但未出现在任何关系中的节点,都会导致图中出现一个不连接的节点。默认情况下,所有节点都是不连接的,除非它们也出现在关系中。

新版 Cypher 投影没有独立的节点查询和关系查询。不再需要节点查询,节点是从源节点和目标节点对隐式创建的。不连接的节点必须在查询中通过将目标节点设置为 NULL 来显式创建。默认情况下,所有节点都是连接的,除非它们被显式断开连接。

由于新版 Cypher 投影不再负责执行 Cypher 查询,因此图配置不再能返回节点和关系查询。

示例

以下示例基于文档中列出的示例,这些文档涵盖旧版 Cypher 投影新版 Cypher 投影

简单图

表 2. 两种 Cypher 投影的并排比较
旧版 新版

:可能包含不连接节点的简单图投影

CALL gds.graph.project.cypher(
  'persons',
  'MATCH (n:Person) RETURN id(n) AS id',
  'MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN id(n) AS source, id(m) AS target')
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
MATCH (n:Person)
OPTIONAL MATCH (n)-[r:KNOWS]->(m:Person)
WITH gds.graph.project('persons', n, m) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS node, g.relationshipCount AS rels

:不包含不连接节点的简单图投影

不适用,旧版 Cypher 投影无法保证节点连接。

MATCH (n:Person)-[r:KNOWS]->(m:Person)
WITH gds.graph.project('persons', n, m) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS node, g.relationshipCount AS rels

直接转换需要使用 OPTIONAL MATCH 子句来创建不连接的节点,以便创建相同的图。这可能不是您最初想要的,但由于旧版 Cypher 投影无法保证节点连接,因此这是必需的。通过使用等同于 $relationshipQuery 的方法,我们现在在新版 Cypher 投影中也能获得仅连接的节点。

另一个区别是,我们将节点直接传递给新版 Cypher 投影。旧版 Cypher 投影要求我们传递节点 ID。通过直接传递节点,Cypher 投影知道投影的源是一个 Neo4j 数据库,并因此能够使用 .write 过程。也可以传递节点 ID 而不是节点 …​ gds.graph.project('persons', id(n), id(m)),但这仅在投影源不是 Neo4j 数据库时推荐使用。请参阅任意源和目标 ID 值以获取更多详情。

多图

表 3. 两种 Cypher 投影的并排比较
旧版 新版

:多图投影

CALL gds.graph.project.cypher(
  'personsAndBooks',
  'MATCH (n) WHERE n:Person OR n:Book RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type')
YIELD
  graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipCount AS rels
MATCH (n)
WHERE n:Person OR n:Book
OPTIONAL MATCH (n)-[r:KNOWS|READ]->(m)
WHERE m:Person OR m:Book
WITH gds.graph.project(
  'personsAndBooks',
  n,
  m,
  {
    sourceNodeLabels: labels(n),
    targetNodeLabels: labels(m),
    relationshipType: type(r)
  }
) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS node, g.relationshipCount AS rels

与上一个示例类似,我们必须使用 OPTIONAL MATCH 子句来创建不连接的节点,以便创建相同的图。查询也可能因实际的图模式以及是否需要不连接的节点而有所不同。

节点标签和关系类型作为附加配置映射传递给新版 Cypher 投影。节点标签需要作为 sourceNodeLabelstargetNodeLabels 传递,关系类型需要作为 relationshipType 传递。请参阅多图以获取更多详情。

节点属性

表 4. 两种 Cypher 投影的并排比较
旧版 新版

:带节点属性的图投影

CALL gds.graph.project.cypher(
  'graphWithProperties',
  'MATCH (n:Person)
   RETURN
    id(n) AS id,
    labels(n) AS labels,
    n.age AS age',
  'MATCH (n)-[r:KNOWS]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
YIELD
  graphName, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodes, rels
MATCH (n:Person)
OPTIONAL MATCH (n)-[r:KNOWS]->(m:Person)
WITH gds.graph.project(
  'graphWithProperties',
  n,
  m,
  {
    sourceNodeLabels: labels(n),
    targetNodeLabels: labels(m),
    sourceNodeProperties: n { .age },
    targetNodeProperties: m { .age },
    relationshipType: type(r)
  }
) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS node, g.relationshipCount AS rels

:带可选节点属性的图投影

CALL gds.graph.project.cypher(
  'graphWithProperties',
  'MATCH (n)
   WHERE n:Book OR n:Person
   RETURN
    id(n) AS id,
    labels(n) AS labels,
    coalesce(n.age, 18) AS age',
    coalesce(n.price, 5.0) AS price,
    n.ratings AS ratings',
  'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
YIELD
  graphName, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodes, rels
MATCH (n)
WHERE n:Person OR n:Book
OPTIONAL MATCH (n)-[r:KNOWS|READ]->(m)
WHERE m:Person OR m:Book
WITH gds.graph.project(
  'graphWithProperties',
  n,
  m,
  {
    sourceNodeLabels: labels(n),
    targetNodeLabels: labels(m),
    sourceNodeProperties: n { age: coalesce(n.age, 18), price: coalesce(n.price, 5.0), .ratings },
    targetNodeProperties: n { age: coalesce(n.age, 18), price: coalesce(n.price, 5.0), .ratings },
    relationshipType: type(r)
  }
) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS node, g.relationshipCount AS rels

与上一个示例类似,我们在附加映射中传递标签和属性。我们可以使用映射投影以及任何其他 Cypher 表达式来创建属性。请参阅节点属性以获取更多详情。

关系属性

表 5. 两种 Cypher 投影的并排比较
旧版 新版

:带关系属性的图投影

CALL gds.graph.project.cypher(
  'readWithProperties',
  'MATCH (n) RETURN id(n) AS id',
  'MATCH (n)-[r:READ]->(m)
    RETURN id(n) AS source, id(m) AS target, r.numberOfPages AS numberOfPages'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
MATCH (n)-[r:READ]->(m)
WITH gds.graph.project(
  'readWithProperties',
  n,
  m,
  { relationshipProperties: r { .numberOfPages } }
) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

与上一个示例类似,我们在附加映射中传递属性,这里使用 relationshipProperties 键。我们可以使用映射投影以及任何其他 Cypher 表达式来创建属性。请参阅关系属性以获取更多详情。

并行关系

表 6. 两种 Cypher 投影的并排比较
旧版 新版

:带并行关系的图投影

CALL gds.graph.project.cypher(
  'readCount',
  'MATCH (n) RETURN id(n) AS id',
  'MATCH (n)-[r:READ]->(m)
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, count(r) AS numberOfReads'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
MATCH (n)-[r:READ]->(m)
WITH n, m, count(r) AS numberOfReads
WITH gds.graph.project(
  'readCount',
  n,
  m,
  {
    relationshipProperties: { numberOfReads: numberOfReads }
  }
) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

:带并行关系和关系属性的图投影

CALL gds.graph.project.cypher(
  'readSums',
  'MATCH (n) RETURN id(n) AS id',
  'MATCH (n)-[r:READ]->(m)
    RETURN id(n) AS source, id(m) AS target, sum(r.numberOfPages) AS numberOfPages'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
MATCH (n)-[r:READ]->(m)
WITH n, m, sum(r.numberOfPages) AS numberOfPages
WITH gds.graph.project(
  'readSums',
  n,
  m,
  {
    relationshipProperties: { numberOfPages: numberOfPages }
  }
) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

与旧版 Cypher 投影类似,GDS 没有聚合并行关系的机制。并行关系的聚合通过任何适合图模式和数据的方式在查询中完成。请参阅并行关系以获取更多详情。

投影过滤后的图

表 7. 两种 Cypher 投影的并排比较
旧版 新版

:带过滤图的图投影

CALL gds.graph.project.cypher(
  'existingNumberOfPages',
  'MATCH (n) RETURN id(n) AS id',
  'MATCH (n)-[r:READ]->(m)
    WHERE r.numberOfPages IS NOT NULL
    RETURN id(n) AS source, id(m) AS target, r.numberOfPages AS numberOfPages'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
MATCH (n) OPTIONAL MATCH (n)-[r:READ]->(m)
WHERE r.numberOfPages IS NOT NULL
WITH gds.graph.project('existingNumberOfPages', n, m, { relationshipProperties: r { .numberOfPages } }) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

与旧版 Cypher 投影类似,我们可以在将数据传递给 Cypher 投影之前,应用任何 Cypher 方法来过滤数据。请参阅投影过滤后的 Neo4j 图以获取更多详情。

投影无向图

表 8. 两种 Cypher 投影的并排比较
旧版 新版

:带无向图的图投影

不适用,旧版 Cypher 投影无法投影无向图。

MATCH (n)-[r:KNOWS|READ]->(m)
WHERE n:Book OR n:Person
WITH gds.graph.project(
  'graphWithUndirectedRelationships',
  source,
  target,
  {},
  {undirectedRelationshipTypes: ['*']}
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

新版 Cypher 投影可以投影无向图。请参阅无向关系以获取更多详情。

内存估算

表 9. 两种 Cypher 投影的并排比较
旧版 新版

:投影图的内存估算

CALL gds.graph.project.cypher.estimate(
  'MATCH (n:Person) RETURN id(n) AS id',
  'MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN id(n) AS source, id(m) AS target'
) YIELD requiredMemory, bytesMin, bytesMax
MATCH (n:Person)-[r:KNOWS]-(m)
WITH count(n) AS nodeCount, count(r) AS relationshipCount
CALL gds.graph.project.estimate('*', '*', {
  nodeCount: nodeCount,
  relationshipCount: relationshipCount,
})
YIELD requiredMemory, bytesMin, bytesMax

由于新版 Cypher 投影不再是一个过程,因此也没有 .estimate 方法。相反,我们可以使用gds.graph.project.estimate 过程来估算图投影的内存需求。

© . All rights reserved.