导出到 CSV

CSV 导出过程将数据导出为 Python 和 R 生态系统中的数据科学库更支持的格式。我们也可能希望将数据导出为 JSON 格式,以便导入其他工具或用于查询结果的通用共享。本节中描述的过程支持导出到文件或作为流。

对于 apoc.export.csv.allapoc.export.csv.dataapoc.export.csv.graph,节点和关系属性按字母顺序排列,使用以下结构:

_id,_labels,<自然排序的节点属性列表>,_start,_end,_type,<自然排序的关系属性列表>.

对于包含节点属性 agecitykidsmalenamestreet 以及关系属性 barfoo 的图,我们将得到以下结果:

_id,_labels,age,city,kids,male,name,street,_start,_end,_type,bar,foo

导出的标签按字母顺序排列。labels() 函数的输出未排序,请与 apoc.coll.sort() 结合使用。

请注意,为了执行正确的点序列化,不建议导出具有坐标 x,y 和 crs: 'wgs-84' 的点,例如 point({x: 56.7, y: 12.78, crs: 'wgs-84'})。否则,该点将以经度和纬度(以及高度)而不是 x 和 y(以及 z)导出。

可用过程

下表描述了可用过程:

合格名称 类型

apoc.export.csv.all
apoc.export.csv.all(file STRING, config MAP<STRING, ANY>) - 将完整数据库导出到提供的 CSV 文件。

过程

apoc.export.csv.data
apoc.export.csv.data(nodes LIST<NODE>, rels LIST<RELATIONSHIP>, file STRING, config MAP<STRING, ANY>) - 将给定的 NODERELATIONSHIP 值导出到提供的 CSV 文件。

过程

apoc.export.csv.graph
apoc.export.csv.graph(graph MAP<STRING, ANY>, file STRING, config MAP<STRING, ANY>) - 将给定图导出到提供的 CSV 文件。

过程

apoc.export.csv.query
apoc.export.csv.query(query STRING, file STRING, config MAP<STRING, ANY>) - 将运行给定 Cypher 查询的结果导出到提供的 CSV 文件。

过程

导出到文件

默认情况下,禁用导出到文件系统。我们可以通过在 apoc.conf 中设置以下属性来启用它:

apoc.conf
apoc.export.file.enabled=true

有关访问 apoc.conf 的更多信息,请参阅配置选项一章。

如果我们尝试使用任何导出过程而未首先设置此属性,我们将收到以下错误消息:

调用过程失败:原因:java.lang.RuntimeException: 文件导出未启用,请在您的 apoc.conf 中设置 apoc.export.file.enabled=true。否则,如果您在没有文件系统访问权限的云环境中运行,请使用 {stream:true} 配置并使用 'file' 参数的 null 值将导出流回您的客户端。

导出文件会写入 import 目录,该目录由 server.directories.import 属性定义。这意味着我们提供的任何文件路径都是相对于此目录的。如果我们尝试写入绝对路径,例如 /tmp/filename,我们将收到类似以下内容的错误消息:

调用过程失败:原因:java.io.FileNotFoundException: /path/to/neo4j/import/tmp/fileName (没有此类文件或目录)

我们可以通过在 apoc.conf 中设置以下属性来启用在文件系统上的任何位置写入:

apoc.conf
apoc.import.file.use_neo4j_config=false

Neo4j 现在将能够在文件系统上的任何位置写入,因此在设置此属性之前请确保这是您的意图。

导出流

如果我们不想导出到文件,可以通过传入 null 文件名并提供 stream:true 配置来将结果流式传输回 data 列。

示例

本节包含演示如何使用 CSV 导出过程的示例。这些示例基于电影数据集,可以通过运行以下 Cypher 查询导入:

CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (JoelS:Person {name:'Joel Silver', born:1952})
CREATE
(Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix),
(Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix),
(Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix),
(Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix),
(LillyW)-[:DIRECTED]->(TheMatrix),
(LanaW)-[:DIRECTED]->(TheMatrix),
(JoelS)-[:PRODUCED]->(TheMatrix);

下面的 Neo4j Browser 可视化显示了导入的图:

play movies

将整个数据库导出到 CSV

apoc.export.csv.all 过程将整个数据库导出到 CSV 文件或作为流。

以下查询将整个数据库导出到文件 movies.csv
CALL apoc.export.csv.all("movies.csv", {})
表 1. 结果
文件 来源 格式 节点 关系 属性 时间 批处理大小 批次 完成 数据

"movies.csv"

"database: nodes(8), rels(7)"

"csv"

8

7

21

39

15

20000

1

TRUE

NULL

movies.csv 的内容如下所示:

movies.csv
"_id","_labels","born","name","released","tagline","title","_start","_end","_type","roles"
"188",":Movie","","","1999","Welcome to the Real World","The Matrix",,,,
"189",":Person","1964","Keanu Reeves","","","",,,,
"190",":Person","1967","Carrie-Anne Moss","","","",,,,
"191",":Person","1961","Laurence Fishburne","","","",,,,
"192",":Person","1960","Hugo Weaving","","","",,,,
"193",":Person","1967","Lilly Wachowski","","","",,,,
"194",":Person","1965","Lana Wachowski","","","",,,,
"195",":Person","1952","Joel Silver","","","",,,,
,,,,,,,"189","188","ACTED_IN","[""Neo""]"
,,,,,,,"190","188","ACTED_IN","[""Trinity""]"
,,,,,,,"191","188","ACTED_IN","[""Morpheus""]"
,,,,,,,"192","188","ACTED_IN","[""Agent Smith""]"
,,,,,,,"193","188","DIRECTED",""
,,,,,,,"194","188","DIRECTED",""
,,,,,,,"195","188","PRODUCED",""
以下查询在 data 列中返回整个数据库的流:
CALL apoc.export.csv.all(null, {stream:true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data
表 2. 结果
文件 节点 关系 属性 数据

NULL

8

7

21

"\"_id\",\"_labels\",\"born\",\"name\",\"released\",\"tagline\",\"title\",\"_start\",\"_end\",\"_type\",\"roles\" \"188\",\":Movie\",\"\",\"\",\"1999\",\"Welcome to the Real World\",\"The Matrix\",,,, \"189\",\":Person\",\"1964\",\"Keanu Reeves\",\"\",\"\",\"\",,,, \"190\",\":Person\",\"1967\",\"Carrie-Anne Moss\",\"\",\"\",\"\",,,, \"191\",\":Person\",\"1961\",\"Laurence Fishburne\",\"\",\"\",\"\",,,, \"192\",\":Person\",\"1960\",\"Hugo Weaving\",\"\",\"\",\"\",,,, \"193\",\":Person\",\"1967\",\"Lilly Wachowski\",\"\",\"\",\"\",,,, \"194\",\":Person\",\"1965\",\"Lana Wachowski\",\"\",\"\",\"\",,,, \"195\",\":Person\",\"1952\",\"Joel Silver\",\"\",\"\",\"\",,,, ,,,,,,,\"189\",\"188\",\"ACTED_IN\",\"[\"\"Neo\"\"]\" ,,,,,,,\"190\",\"188\",\"ACTED_IN\",\"[\"\"Trinity\"\"]\" ,,,,,,,\"191\",\"188\",\"ACTED_IN\",\"[\"\"Morpheus\"\"]\" ,,,,,,,\"192\",\"188\",\"ACTED_IN\",\"[\"\"Agent Smith\"\"]\" ,,,,,,,\"193\",\"188\",\"DIRECTED\",\"\" ,,,,,,,\"194\",\"188\",\"DIRECTED\",\"\" ,,,,,,,\"195\",\"188\",\"PRODUCED\",\"\" "

将指定节点和关系导出到 CSV

apoc.export.csv.data 过程将指定节点和关系导出到 CSV 文件或作为流。

以下查询将所有带有 :Person 标签且 name 属性以 L 开头的节点导出到文件 movies-l.csv
MATCH (person:Person)
WHERE person.name STARTS WITH "L"
WITH collect(person) AS people
CALL apoc.export.csv.data(people, [], "movies-l.csv", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
表 3. 结果
文件 来源 格式 节点 关系 属性 时间 批处理大小 批次 完成 数据

"movies-l.csv"

"data: nodes(3), rels(0)"

"csv"

3

0

6

2

3

20000

1

TRUE

NULL

movies-l.csv 的内容如下所示:

"_id","_labels","born","name","_start","_end","_type"
"191",":Person","1961","Laurence Fishburne",,,
"193",":Person","1967","Lilly Wachowski",,,
"194",":Person","1965","Lana Wachowski",,,
以下查询将所有 ACTED_IN 关系以及该关系两侧带有 PersonMovie 标签的节点导出到文件 movies-actedIn.csv
MATCH (person:Person)-[actedIn:ACTED_IN]->(movie:Movie)
WITH collect(DISTINCT person) AS people, collect(DISTINCT movie) AS movies, collect(actedIn) AS actedInRels
CALL apoc.export.csv.data(people + movies, actedInRels, "movies-actedIn.csv", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
表 4. 结果
文件 来源 格式 节点 关系 属性 时间 批处理大小 批次 完成 数据

"movies-actedIn.csv"

"data: nodes(5), rels(4)"

"csv"

5

4

15

2

9

20000

1

TRUE

NULL

movies-actedIn.csv 的内容如下所示:

"_id","_labels","born","name","released","tagline","title","_start","_end","_type","roles"
"189",":Person","1964","Keanu Reeves","","","",,,,
"190",":Person","1967","Carrie-Anne Moss","","","",,,,
"191",":Person","1961","Laurence Fishburne","","","",,,,
"192",":Person","1960","Hugo Weaving","","","",,,,
"188",":Movie","","","1999","Welcome to the Real World","The Matrix",,,,
,,,,,,,"189","188","ACTED_IN","[""Neo""]"
,,,,,,,"190","188","ACTED_IN","[""Trinity""]"
,,,,,,,"191","188","ACTED_IN","[""Morpheus""]"
,,,,,,,"192","188","ACTED_IN","[""Agent Smith""]"
以下查询在 data 列中返回所有 ACTED_IN 关系以及该关系两侧带有 PersonMovie 标签的节点流:
MATCH (person:Person)-[actedIn:ACTED_IN]->(movie:Movie)
WITH collect(DISTINCT person) AS people, collect(DISTINCT movie) AS movies, collect(actedIn) AS actedInRels
CALL apoc.export.csv.data(people + movies, actedInRels, null, {stream: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data
表 5. 结果
文件 节点 关系 属性 数据

NULL

5

4

15

"\"_id\",\"_labels\",\"born\",\"name\",\"released\",\"tagline\",\"title\",\"_start\",\"_end\",\"_type\",\"roles\" \"190\",\":Person\",\"1967\",\"Carrie-Anne Moss\",\"\",\"\",\"\",,,, \"189\",\":Person\",\"1964\",\"Keanu Reeves\",\"\",\"\",\"\",,,, \"191\",\":Person\",\"1961\",\"Laurence Fishburne\",\"\",\"\",\"\",,,, \"192\",\":Person\",\"1960\",\"Hugo Weaving\",\"\",\"\",\"\",,,, \"188\",\":Movie\",\"\",\"\",\"1999\",\"Welcome to the Real World\",\"The Matrix\",,,, ,,,,,,,\"189\",\"188\",\"ACTED_IN\",\"[\"\"Neo\"\"]\" ,,,,,,,\"190\",\"188\",\"ACTED_IN\",\"[\"\"Trinity\"\"]\" ,,,,,,,\"191\",\"188\",\"ACTED_IN\",\"[\"\"Morpheus\"\"]\" ,,,,,,,\"192\",\"188\",\"ACTED_IN\",\"[\"\"Agent Smith\"\"]\" "

将 Cypher 查询结果导出到 CSV

apoc.export.csv.query 过程将 Cypher 查询的结果导出到 CSV 文件或作为流。

以下查询将所有 DIRECTED 关系以及该关系两侧带有 PersonMovie 标签的节点导出到文件 movies-directed.csv
WITH "MATCH path = (person:Person)-[:DIRECTED]->(movie)
      RETURN person.name AS name, person.born AS born,
             movie.title AS title, movie.tagline AS tagline, movie.released AS released" AS query
CALL apoc.export.csv.query(query, "movies-directed.csv", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data;
表 6. 结果
文件 来源 格式 节点 关系 属性 时间 批处理大小 批次 完成 数据

"movies-directed.csv"

"statement: cols(5)"

"csv"

0

0

10

3

2

20000

1

TRUE

NULL

movies-directed.csv 的内容如下所示:

"name","born","role","title","tagline","released"
"Lilly Wachowski","1967","","The Matrix","Welcome to the Real World","1999"
"Lana Wachowski","1965","","The Matrix","Welcome to the Real World","1999"
以下查询返回所有 DIRECTED 关系以及该关系两侧带有 PersonMovie 标签的节点流:
WITH "MATCH path = (person:Person)-[:DIRECTED]->(movie)
      RETURN person.name AS name, person.born AS born,
             movie.title AS title, movie.tagline AS tagline, movie.released AS released" AS query
CALL apoc.export.csv.query(query, null, {stream: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data;
表 7. 结果
文件 节点 关系 属性 数据

NULL

0

0

10

"\"name\",\"born\",\"title\",\"tagline\",\"released\" \"Lilly Wachowski\",\"1967\",\"The Matrix\",\"Welcome to the Real World\",\"1999\" \"Lana Wachowski\",\"1965\",\"The Matrix\",\"Welcome to the Real World\",\"1999\" "

您也可以压缩要导出的文件。有关更多信息,请参见此处

当启用 bulkImport 配置时,它会创建一个文件列表,可用于 Neo4j 批量导入。

此配置仅可与 apoc.export.csv.allapoc.export.csv.graph 一起使用。

所有创建的文件都按以下方式命名:

  • 节点文件以输入文件名为基础,并附加 .nodes.[LABEL_NAME].csv

  • 关系文件以输入文件名为基础,并附加 .relationships.[TYPE_NAME].csv

如果节点或关系有多个标签/类型,它将为每个标签/类型创建一个文件。

配置参数

这些过程支持以下配置参数:

表 8. 配置选项
参数 默认值 描述

批处理大小

20000

批处理大小。

分隔符

","

CSV 文件的分隔符字符。

数组分隔符

";"

用于数组的分隔符字符(用于批量导入)。

引号

'always'

CSV 使用的引用字符,可能的值有:

  • none:不添加引号。

  • always:所有值都添加引号。

  • ifNeeded:仅在必要时对字符串应用引号。

区分空值

false

区分空字符串和空值。其行为受 quotes 配置值的影响:

  • none:空字符串和空值不加引号。

  • ifNeeded && differentiateNulls:空字符串用 "" 引号括起来,空值则留空。

  • ifNeeded && !differentiateNulls:空字符串和空值都留空。

  • always && differentiateNulls:空字符串用 "" 引号括起来,空值则留空。

  • always && !differentiateNulls:空字符串和空值都用 "" 引号括起来。

使用类型

false

在文件头中添加类型。

批量导入

true

为 Neo4j Admin 导入创建文件。

超时秒数

100

查询在超时前应运行的最长时间(秒)。

独立标题

false

创建两个文件:一个用于标题,一个用于数据。

流式语句

false

通过配置 batchSize 参数,将结果分批处理到多行中。

false

等同于 streamStatements 配置。

引号配置示例

可以使用不同的引用策略将图导出到 CSV 文件。

以下查询在一个包含一个节点的图上执行:

CREATE (:Quote {name: 'foo,bar,baz', array:["a","b","c"], other: 123})
不带引号导出
CALL apoc.export.csv.all("fileNoQuote.csv", {quotes: 'none'})

返回以下 CSV 文件:

fileNoQuote.csv
_id,_labels,array,name,other,_start,_end,_type
5,:Quote,["a","b","c"],foo,bar,baz,123,,,
带引号导出
CALL apoc.export.csv.all("fileAlwaysQuote.csv", {quotes: 'always'})

返回以下 CSV 文件:

fileAlwaysQuote.csv
"_id","_labels","array","name","other","_start","_end","_type"
"5",":Quote","[""a"",""b"",""c""]","foo,bar,baz","123",,,
仅在需要时带引号导出
CALL apoc.export.csv.all("fileIfNeededQuote.csv", {quotes: 'ifNeeded'})

返回以下 CSV 文件:

fileIfNeededQuote.csv
_id,_labels,array,name,other,_start,_end,_type
5,:Quote,"[""a"",""b"",""c""]","foo,bar,baz",123,,,

区分空值配置示例

默认情况下,空字符串和空值被视为相同,但可以使用 differentiateNulls 配置项更改此行为。

以下查询在包含两个节点的图上执行:

CREATE (:Quote {value1: "Hello", value2: ""})
CREATE (:Quote {value1: "I have a \"null\" value", value2: null})
带引号并区分空值导出
CALL apoc.export.csv.all("fileAlwaysQuote.csv", {quotes: 'always', differentiateNulls: true})

返回以下 CSV 文件:

fileAlwaysQuote.csv
"_id","_labels","value1","value2","_start","_end","_type"
"0",":Quote","Hello","",,,
"1",":Quote","I have a ""null"" value",,,,

value2 的空字符串值用引号 "" 表示,而空值则留空。

仅在需要时带引号并区分空值导出
CALL apoc.export.csv.all("fileIfNeededQuote.csv", {quotes: 'ifNeeded', differentiateNulls: true})

返回以下 CSV 文件:

fileIfNeededQuote.csv
_id,_labels,value1,value2,_start,_end,_type
0,:Quote,Hello,"",,,
1,:Quote,"I have a ""null"" value",,,,

字符串 Hello 不带引号,因为它不包含任何特殊字符。字符串 I have a "null" value 带引号,因为它包含特殊字符。空字符串带引号是为了将其与空值区分开来。空值为空。

不带引号导出

当不带引号导出时,无论 differentiateNulls 设置为何值,空值和空字符串都将始终不带引号显示。

CALL apoc.export.csv.all("fileNoQuote.csv", {quotes: 'none', differentiateNulls: true})

返回以下 CSV 文件:

fileNoQuote.csv
_id,_labels,value1,value2,_start,_end,_type
0,:Quote,Hello,,,,
1,:Quote,I have a "null" value,,,,
© . All rights reserved.