读取关系

此页面中的所有示例都假设 SparkSession 已使用适当的身份验证选项初始化。有关更多详细信息,请参阅 快速入门示例

您可以通过指定关系类型、源节点标签和目标节点标签来读取关系及其源节点和目标节点。

示例
val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.show()
示例
df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.show()
等效 Cypher 查询
MATCH (source:Customer)
MATCH (target:Product)
MATCH (source)-[rel:BOUGHT]->(target)
RETURN ...

确切的 RETURN 子句取决于 relationship.nodes.map 选项的值。

DataFrame 列

使用此方法读取数据时,DataFrame 包含以下列

  • <rel.id>: Neo4j 内部 ID

  • <rel.type>: 关系类型

  • rel.[属性名称]: 关系属性

根据 relationship.nodes.map 选项的值添加其他列

relationship.nodes.map 设置为 false (默认) relationship.nodes.map 设置为 true
  • <source.id>: 源节点的 Neo4j 内部 ID

  • <source.labels>: 源节点的标签列表

  • <target.id>: 目标节点的 Neo4j 内部 ID

  • <target.labels>: 目标节点的标签列表

  • source.[属性名称]: 源节点属性

  • target.[属性名称]: 目标节点属性

  • source: 源节点属性映射

  • target: 目标节点属性映射

示例 1. relationship.nodes.map 设置为 false
示例
val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    // It can be omitted, since `false` is the default
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.show()
示例
df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    # It can be omitted, since `false` is the default
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.show()
表 1. 结果
<rel.id> <rel.type> <source.id> <source.labels> source.surname source.name source.id <target.id> <target.labels> target.name rel.order rel.quantity

3189

BOUGHT

1100

[Customer]

Doe

John

1

1040

[Product]

Product1

ABC100

200

3190

BOUGHT

1099

[Customer]

Doe

Jane

2

1039

[Product]

Product1

ABC200

100

示例 2. relationship.nodes.map 设置为 true
示例
val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

// Use `false` to print the whole DataFrame
df.show(false)
示例
df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

# Use `false` to print the whole DataFrame
df.show(truncate=False)
表 2. 结果
<rel.id> <rel.type> <source> <target> rel.order rel.quantity

3189

BOUGHT

{surname: "Doe", name: "John", id: 1, <labels>: ["Customer"], <id>: 1100}
{name: "Product 1", <labels>: ["Product"], <id>: 1040}

ABC100

200

3190

BOUGHT

{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}
{name: "Product 2", <labels>: ["Product"], <id>: 1039}

ABC200

100

节点和关系属性列的模式如 模式推断 中所述进行推断。

过滤

您可以在 Spark 中使用 wherefilter 函数来过滤关系、源节点或目标节点的属性。过滤器的正确格式取决于 relationship.nodes.map 选项的值。

relationship.nodes.map 设置为 false (默认) relationship.nodes.map 设置为 true
  • `source.[property]` 用于源节点属性

  • `rel.[property]` 用于关系属性

  • `target.[property]` 用于目标节点属性

  • `<source>`.`[property]` 用于源节点映射属性

  • `<rel>`.`[property]` 用于关系映射属性

  • `<target>`.`[property]` 用于目标节点映射属性

示例

示例 3. relationship.nodes.map 设置为 false
示例
val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.where("`source.id` > 1").show()
示例
df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.where("`source.id` > 1").show()
表 3. 结果
<rel.id> <rel.type> <source.id> <source.labels> source.surname source.name <target.id> <target.labels> target.name rel.order rel.quantity

3190

BOUGHT

1099

[Customer]

Doe

Jane

2

1039

[Product]

Product 2

ABC200

示例 4. relationship.nodes.map 设置为 true
示例
val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

// Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(false)
示例
df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

# Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(truncate=False)
表 4. 结果
<rel.id> <rel.type> <source> <target> rel.order rel.quantity

3190

BOUGHT

{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}
{name: "Product 2", <labels>: ["Product"], <id>: 1039}

ABC200

100