读取关系

此页面中的所有示例都假定 SparkSession 已使用适当的身份验证选项进行初始化。有关更多详细信息，请参阅快速入门示例。

通过指定关系类型、源节点标签和目标节点标签，可以读取关系及其源节点和目标节点。

示例

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.show()

示例

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.show()

等效的 Cypher 查询

MATCH (source:Customer)
MATCH (target:Product)
MATCH (source)-[rel:BOUGHT]->(target)
RETURN ...

确切的 RETURN 子句取决于 relationship.nodes.map 选项的值。

DataFrame 列

使用此方法读取数据时，DataFrame 包含以下列

<rel.id>: 内部 Neo4j ID
<rel.type>: 关系类型
rel.[property name]: 关系属性

根据 relationship.nodes.map 选项的值，还会添加其他列

relationship.nodes.map 设置为 false（默认） relationship.nodes.map 设置为 true

`relationship.nodes.map` 设置为 `false`（默认）	`relationship.nodes.map` 设置为 `true`
`<source.id>`: 源节点的内部 Neo4j ID `<source.labels>`: 源节点的标签列表 `<target.id>`: 目标节点的内部 Neo4j ID `<target.labels>`: 目标节点的标签列表 `source.[property name]`: 源节点属性 `target.[property name]`: 目标节点属性	`source`: 源节点属性映射 `target`: 目标节点属性映射

<source.id>: 源节点的内部 Neo4j ID
<source.labels>: 源节点的标签列表
<target.id>: 目标节点的内部 Neo4j ID
<target.labels>: 目标节点的标签列表
source.[property name]: 源节点属性
target.[property name]: 目标节点属性

source: 源节点属性映射
target: 目标节点属性映射

示例 1. relationship.nodes.map 设置为 false

示例

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    // It can be omitted, since `false` is the default
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.show()

示例

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    # It can be omitted, since `false` is the default
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.show()

表 1. 结果
<rel.id>	<rel.type>	<source.id>	<source.labels>	source.surname	source.name	source.id	<target.id>	<target.labels>	target.name	rel.order	rel.quantity
3189	BOUGHT	1100	[Customer]	Doe	John	1	1040	[Product]	Product 1	ABC100	200
3190	BOUGHT	1099	[Customer]	Doe	Jane	2	1039	[Product]	Product 2	ABC200	100

示例 2. relationship.nodes.map 设置为 true

示例

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

// Use `false` to print the whole DataFrame
df.show(false)

示例

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

# Use `false` to print the whole DataFrame
df.show(truncate=False)

表 2. 结果
<rel.id>	<rel.type>	<source>	<target>	rel.order	rel.quantity
3189	BOUGHT	{surname: "Doe", name: "John", id: 1, <labels>: ["Customer"], <id>: 1100}	{name: "Product 1", <labels>: ["Product"], <id>: 1040}	ABC100	200
3190	BOUGHT	{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}	{name: "Product 2", <labels>: ["Product"], <id>: 1039}	ABC200	100

表 2. 结果

<rel.id>

<rel.type>

rel.order

rel.quantity

3189

BOUGHT

{surname: "Doe", name: "John", id: 1, <labels>: ["Customer"], <id>: 1100}

{name: "Product 1", <labels>: ["Product"], <id>: 1040}

ABC100

200

3190

BOUGHT

{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}

{name: "Product 2", <labels>: ["Product"], <id>: 1039}

ABC200

100

节点和关系属性列的模式是根据模式推断中说明的方式进行推断的。

筛选

您可以使用 Spark 中的 where 和 filter 函数来筛选关系、源节点或目标节点的属性。筛选器的正确格式取决于 relationship.nodes.map 选项的值。

relationship.nodes.map 设置为 false（默认） relationship.nodes.map 设置为 true

`relationship.nodes.map` 设置为 `false`（默认）	`relationship.nodes.map` 设置为 `true`
`source.[property]` 用于源节点属性 `rel.[property]` 用于关系属性 `target.[property]` 用于目标节点属性	`<source>`.`[property]` 用于源节点映射属性 `<rel>`.`[property]` 用于关系映射属性 `<target>`.`[property]` 用于目标节点映射属性

`source.[property]` 用于源节点属性
`rel.[property]` 用于关系属性
`target.[property]` 用于目标节点属性

`<source>`.`[property]` 用于源节点映射属性
`<rel>`.`[property]` 用于关系映射属性
`<target>`.`[property]` 用于目标节点映射属性

示例

示例 3. relationship.nodes.map 设置为 false

示例

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.where("`source.id` > 1").show()

示例

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.where("`source.id` > 1").show()

表 3. 结果
<rel.id>	<rel.type>	<source.id>	<source.labels>	source.surname	source.name	source.id	<target.id>	<target.labels>	target.name	rel.order	rel.quantity
3190	BOUGHT	1099	[Customer]	Doe	Jane	2	1039	[Product]	Product 2	ABC200	100

示例 4. relationship.nodes.map 设置为 true

示例

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

// Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(false)

示例

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

# Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(truncate=False)

表 4. 结果
<rel.id>	<rel.type>	<source>	<target>	rel.order	rel.quantity
3190	BOUGHT	{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}	{name: "Product 2", <labels>: ["Product"], <id>: 1039}	ABC200	100