导出到 Apache Arrow

这些存储过程将数据导出为许多 Apache 和非 Apache 工具使用的格式。

可用存储过程

下表描述了可用的存储过程

限定名称类型

限定名称	类型
apoc.export.arrow.all `apoc.export.arrow.all(file STRING, config MAP<STRING, ANY>)` - 将整个数据库导出为 arrow 文件。	`存储过程`
apoc.export.arrow.graph.adoc `apoc.export.arrow.graph(file STRING, graph ANY, config MAP<STRING, ANY>)` - 将给定的图导出为 arrow 文件。	`存储过程`
apoc.export.arrow.query.adoc `apoc.export.arrow.stream.all(config MAP<STRING, ANY>)` - 将整个数据库导出为 arrow 字节数组。	`存储过程`
apoc.export.arrow.stream.all `apoc.export.arrow.all(file STRING, config MAP<STRING, ANY>)` - 将整个数据库导出为 arrow 文件。	`存储过程`
apoc.export.arrow.stream.graph.adoc `apoc.export.arrow.stream.graph(graph ANY, config MAP<STRING, ANY>)` - 将给定的图导出为 arrow 字节数组。	`存储过程`
apoc.export.arrow.stream.query.adoc `apoc.export.arrow.stream.query(query ANY, config MAP<STRING, ANY>)` - 将给定的 Cypher 查询导出为 arrow 字节数组。	`存储过程`

apoc.export.arrow.all
apoc.export.arrow.all(file STRING, config MAP<STRING, ANY>) - 将整个数据库导出为 arrow 文件。

存储过程

apoc.export.arrow.graph.adoc
apoc.export.arrow.graph(file STRING, graph ANY, config MAP<STRING, ANY>) - 将给定的图导出为 arrow 文件。

存储过程

apoc.export.arrow.query.adoc
apoc.export.arrow.stream.all(config MAP<STRING, ANY>) - 将整个数据库导出为 arrow 字节数组。

存储过程

apoc.export.arrow.stream.all
apoc.export.arrow.all(file STRING, config MAP<STRING, ANY>) - 将整个数据库导出为 arrow 文件。

存储过程

apoc.export.arrow.stream.graph.adoc
apoc.export.arrow.stream.graph(graph ANY, config MAP<STRING, ANY>) - 将给定的图导出为 arrow 字节数组。

存储过程

apoc.export.arrow.stream.query.adoc
apoc.export.arrow.stream.query(query ANY, config MAP<STRING, ANY>) - 将给定的 Cypher 查询导出为 arrow 字节数组。

存储过程

导出到文件

默认情况下，导出到文件系统是禁用的。我们可以通过在 apoc.conf 中设置以下属性来启用它

apoc.conf

apoc.export.file.enabled=true

如果我们尝试使用任何导出存储过程而未首先设置此属性，我们将收到以下错误消息

调用存储过程失败: 原因: java.lang.RuntimeException: 文件导出未启用，请在 apoc.conf 中设置 apoc.export.file.enabled=true。否则，如果您在没有文件系统访问权限的云环境中运行，请使用 {stream:true} 配置并将 null 作为 'file' 参数流式传输导出结果回客户端。请注意，流模式不能与 apoc.export.xls.* 存储过程一起使用。

导出的文件会写入 import 目录，该目录由 server.directories.import 属性定义。这意味着我们提供的任何文件路径都是相对于此目录的。如果我们尝试写入绝对路径，例如 /tmp/filename，我们将收到类似于以下内容的错误消息

调用存储过程失败: 原因: java.io.FileNotFoundException: /path/to/neo4j/import/tmp/fileName (无此文件或目录)

我们可以通过在 apoc.conf 中设置以下属性来启用写入文件系统上的任意位置

apoc.conf

apoc.import.file.use_neo4j_config=false

现在 Neo4j 将能够在文件系统上的任意位置进行写入，因此在设置此属性之前，请务必确认这是您的意图。

示例

将 Cypher 查询结果导出到 Apache Arrow 文件

CALL apoc.export.arrow.query('query_test.arrow',
    "RETURN 1 AS intData, 'a' AS stringData,
        true AS boolData,
        [1, 2, 3] AS intArray,
        [1.1, 2.2, 3.3] AS doubleArray,
        [true, false, true] AS boolArray,
        [1, '2', true, null] AS mixedArray,
        {foo: 'bar'} AS mapData,
        localdatetime('2015-05-18T19:32:24') as dateData,
        [[0]] AS arrayArray,
        1.1 AS doubleData"
) YIELD file

表 1. 结果
file	source	format	nodes	relationships	properties	time	rows	batchSize	batches	done	data
"query_test.arrow"	"statement: cols(11)"	"arrow"	0	0	11	468	11	2000	1	true	<null>

将 Cypher 查询结果导出为 Apache Arrow 二进制输出

CALL apoc.export.arrow.stream.query('query_test.arrow',
    "RETURN 1 AS intData, 'a' AS stringData,
        true AS boolData,
        [1, 2, 3] AS intArray,
        [1.1, 2.2, 3.3] AS doubleArray,
        [true, false, true] AS boolArray,
        [1, '2', true, null] AS mixedArray,
        {foo: 'bar'} AS mapData,
        localdatetime('2015-05-18T19:32:24') as dateData,
        [[0]] AS arrayArray,
        1.1 AS doubleData"
) YIELD value

表 2. 结果
value
<Apache Arrow 二进制输出>