读取关系
此页面中的所有示例都假设 |
您可以通过指定关系类型、源节点标签和目标节点标签来读取关系及其源节点和目标节点。
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.show()
等效 Cypher 查询
MATCH (source:Customer)
MATCH (target:Product)
MATCH (source)-[rel:BOUGHT]->(target)
RETURN ...
确切的 RETURN
子句取决于 relationship.nodes.map
选项的值。
DataFrame 列
使用此方法读取数据时,DataFrame 包含以下列
-
<rel.id>
: Neo4j 内部 ID -
<rel.type>
: 关系类型 -
rel.[属性名称]
: 关系属性
根据 relationship.nodes.map
选项的值添加其他列
relationship.nodes.map 设置为 false (默认) |
relationship.nodes.map 设置为 true |
---|---|
|
|
relationship.nodes.map
设置为 false
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
// It can be omitted, since `false` is the default
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
# It can be omitted, since `false` is the default
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.show()
<rel.id> | <rel.type> | <source.id> | <source.labels> | source.surname | source.name | source.id | <target.id> | <target.labels> | target.name | rel.order | rel.quantity |
---|---|---|---|---|---|---|---|---|---|---|---|
3189 |
BOUGHT |
1100 |
[Customer] |
Doe |
John |
1 |
1040 |
[Product] |
Product1 |
ABC100 |
200 |
3190 |
BOUGHT |
1099 |
[Customer] |
Doe |
Jane |
2 |
1039 |
[Product] |
Product1 |
ABC200 |
100 |
relationship.nodes.map
设置为 true
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
// Use `false` to print the whole DataFrame
df.show(false)
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
# Use `false` to print the whole DataFrame
df.show(truncate=False)
<rel.id> | <rel.type> | <source> | <target> | rel.order | rel.quantity |
---|---|---|---|---|---|
3189 |
BOUGHT |
{surname: "Doe", name: "John", id: 1, <labels>: ["Customer"], <id>: 1100} |
{name: "Product 1", <labels>: ["Product"], <id>: 1040} |
ABC100 |
200 |
3190 |
BOUGHT |
{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099} |
{name: "Product 2", <labels>: ["Product"], <id>: 1039} |
ABC200 |
100 |
节点和关系属性列的模式如 模式推断 中所述进行推断。
过滤
您可以在 Spark 中使用 where
和 filter
函数来过滤关系、源节点或目标节点的属性。过滤器的正确格式取决于 relationship.nodes.map
选项的值。
relationship.nodes.map 设置为 false (默认) |
relationship.nodes.map 设置为 true |
---|---|
|
|
示例
relationship.nodes.map
设置为 false
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.where("`source.id` > 1").show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.where("`source.id` > 1").show()
<rel.id> | <rel.type> | <source.id> | <source.labels> | source.surname | source.name | <target.id> | <target.labels> | target.name | rel.order | rel.quantity |
---|---|---|---|---|---|---|---|---|---|---|
3190 |
BOUGHT |
1099 |
[Customer] |
Doe |
Jane |
2 |
1039 |
[Product] |
Product 2 |
ABC200 |
relationship.nodes.map
设置为 true
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
// Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(false)
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
# Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(truncate=False)
<rel.id> | <rel.type> | <source> | <target> | rel.order | rel.quantity |
---|---|---|---|---|---|
3190 |
BOUGHT |
{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099} |
{name: "Product 2", <labels>: ["Product"], <id>: 1039} |
ABC200 |
100 |