加载 XML

许多现有的企业应用程序、端点和文件都使用 XML 作为数据交换格式。加载 XML 过程允许我们处理这些文件。

过程和函数概述

下表描述了可用的过程和函数

限定名称类型

限定名称	类型
apoc.load.xml `apoc.load.xml(urlOrBinary ANY, path STRING, config MAP<STRING, ANY>, simple BOOLEAN)` - 从 XML URL（例如 web-API）加载单个嵌套的 `MAP`。	`过程`
apoc.xml.parse `apoc.xml.parse(data STRING, path STRING, config MAP<STRING, ANY>, simple BOOLEAN)` - 将给定的 XML `STRING` 解析为 `MAP`。	`函数`
apoc.import.xml `apoc.import.xml(urlOrBinary ANY, config MAP<STRING, ANY>)` - 从提供的 XML 文件导入图。	`过程`

apoc.load.xml
apoc.load.xml(urlOrBinary ANY, path STRING, config MAP<STRING, ANY>, simple BOOLEAN) - 从 XML URL（例如 web-API）加载单个嵌套的 MAP。

过程

apoc.xml.parse
apoc.xml.parse(data STRING, path STRING, config MAP<STRING, ANY>, simple BOOLEAN) - 将给定的 XML STRING 解析为 MAP。

函数

apoc.import.xml
apoc.import.xml(urlOrBinary ANY, config MAP<STRING, ANY>) - 从提供的 XML 文件导入图。

过程

`apoc.load.xml`

此过程接受文件或 HTTP URL，并将 XML 解析为映射数据结构。

签名

签名
`apoc.load.xml(urlOrBinary :: ANY, path = / :: STRING, config = {} :: MAP, simple = false :: BOOLEAN) :: (value :: MAP)`

apoc.load.xml(urlOrBinary :: ANY, path = / :: STRING, config = {} :: MAP, simple = false :: BOOLEAN) :: (value :: MAP)

此映射使用以下规则创建

在简单模式下，每种类型的子项在父映射中都有自己的条目。
元素类型作为键以 _ 为前缀，以防止与属性冲突。
如果只有一个元素，则该条目将只以该元素作为值，而不是集合。
如果存在多个元素，则将有一个值列表。
每个子项仍将具有其 _type 字段以区分它们。

此过程支持以下配置参数

表 1. 配置
名称	类型	默认值	描述
failOnError	BOOLEAN	true	如果在解析 XML 时遇到错误，则失败
headers	MAP	{}	查询 XML 文档时使用的 HTTP 头
binary	`Enum[NONE, BYTES, GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY]`	`null`	如果不为 null，则允许将二进制数据而不是文件名/URL 作为第一个参数。类似于二进制文件示例
charset	java.nio.charset.Charset	`UTF_8`	可选字符集，`binary` 配置不为 null 且字符串作为文件

`apoc.xml.parse`

如果我们的数据集包含以 XML 作为属性值的节点，则可以使用 apoc.xml.parse 函数将其解析为映射。

签名

签名
`apoc.xml.parse(data :: STRING, path = / :: STRING, config = {} :: MAP, simple = false :: BOOLEAN) :: MAP`

apoc.xml.parse(data :: STRING, path = / :: STRING, config = {} :: MAP, simple = false :: BOOLEAN) :: MAP

此函数支持以下配置参数

表 2. 配置
名称	类型	默认值	描述
failOnError	BOOLEAN	true	如果在解析 XML 时遇到错误，则失败

以下将 XML 字符串解析为 Cypher 映射

WITH '<?xml version="1.0"?><table><tr><td><img src="pix/logo-tl.gif"></img></td></tr></table>' AS xmlString
RETURN apoc.xml.parse(xmlString) AS value

表 3. 结果
值
{_type: "table", _children: [{_type: "tr", _children: [{_type: "td", _children: [{_type: "img", src: "pix/logo-tl.gif"}]}]}]}

`apoc.import.xml`

如果我们不想在创建图结构之前对 XML 进行任何转换，则可以使用 apoc.import.xml 过程创建 XML 到图的 1:1 映射。

签名

签名
`apoc.import.xml(urlOrBinary :: ANY, config = {} :: MAP) :: (node :: NODE)`

apoc.import.xml(urlOrBinary :: ANY, config = {} :: MAP) :: (node :: NODE)

此过程将返回一个表示 XML 文档的节点，该节点包含映射到 XML 结构的底层节点和关系。

应用以下映射规则

xml	标签	属性
文档	XmlDocument	_xmlVersion, _xmlEncoding
处理指令	XmlProcessingInstruction	_piData, _piTarget
元素/标签	XmlTag	_name
属性	不适用	XmlTag 节点中的属性
文本	XmlWord	为每个词创建一个单独的节点

xml

标签

属性

文档

XmlDocument

_xmlVersion, _xmlEncoding

处理指令

XmlProcessingInstruction

_piData, _piTarget

元素/标签

XmlTag

_name

属性

不适用

XmlTag 节点中的属性

文本

XmlWord

为每个词创建一个单独的节点

XML 文档的节点已连接

关系类型描述

关系类型	描述
:IS_CHILD_OF	指向嵌套的 XML 元素
:FIRST_CHILD_OF	指向第一个子节点
:NEXT_SIBLING	指向同一嵌套级别上的下一个 XML 元素
:NEXT	在整个文档中生成线性链
:NEXT_WORD	仅当配置映射包含 `createNextWordRelationships:true` 时生成。将 XML 中的词语连接到文本流。

:IS_CHILD_OF

指向嵌套的 XML 元素

:FIRST_CHILD_OF

指向第一个子节点

:NEXT_SIBLING

指向同一嵌套级别上的下一个 XML 元素

:NEXT

在整个文档中生成线性链

:NEXT_WORD

仅当配置映射包含 createNextWordRelationships:true 时生成。将 XML 中的词语连接到文本流。

此过程支持以下配置参数

表 4. 配置
配置选项	默认值	描述
connectCharacters	false	如果为 `true`，则 XML 文本元素是其标签的子节点，并通过 `relType` 类型的关系（见下文）相互连接
filterLeadingWhitespace	false	如果为 `true`，则跳过每行开头的空白字符
delimiter	`\s` (正则表达式空白字符)	如果给定，则使用分隔符将文本元素拆分为单独的节点
标签	XmlCharacter	用于文本元素表示的标签
relType	`NE`	用于将文本元素连接成一个链表的关系类型
charactersForTag	{}	标签名 → 字符串的映射。对于给定的标签名，将添加一个额外的文本元素，其中包含作为 `text` 属性的值。例如，对于 TEI-XML 中的 `<lb/>` 标签，可以表示为 `<lb> </lb>`。

从文件导入

默认情况下，从文件系统导入是禁用的。我们可以通过在 apoc.conf 中设置以下属性来启用它

apoc.conf

apoc.import.file.enabled=true

如果我们尝试在未首先设置此属性的情况下使用任何导入过程，则会收到以下错误消息

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf

导入文件从 import 目录读取，该目录由 server.directories.import 属性定义。这意味着我们提供的任何文件路径都是相对于此目录的。如果我们尝试从绝对路径（例如 /tmp/filename）读取，则会收到类似于以下内容的错误消息

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Can’t read url or key file:/path/to/neo4j/import/tmp/filename as json: /path/to/neo4j//import/tmp/filename (No such file or directory)

我们可以通过在 apoc.conf 中设置以下属性来启用从文件系统中任意位置读取文件

apoc.conf

apoc.import.file.use_neo4j_config=false

Neo4j 现在将能够从文件系统中的任意位置读取文件，因此在设置此属性之前请务必确认这是您的意图。

示例

本节中的示例基于 Microsoft 的 book.xml 文件。

book.xml

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
...

此文件可从 GitHub 下载。

从本地文件导入

下面描述的 books.xml 文件包含 Microsoft Books XML 文件中的前两本书。本节中我们将使用较小的文件来简化示例。

books.xml

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <author>Arciniegas, Fabio</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.</description>
   </book>
</catalog>

我们将把此文件放入 Neo4j 实例的 import 目录中。现在，让我们使用 apoc.load.xml 过程编写一个查询来探索此文件。

以下查询处理 books.xml 并以 Cypher 数据结构形式返回内容

CALL apoc.load.xml("file:///books.xml")
YIELD value
RETURN value

表 5. 结果
值
{_type: "catalog", _children: [{_type: "book", _children: [{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}, {_type: "title", _text: "XML Developer’s Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "44.95"}, {_type: "publish_date", _text: "2000-10-01"}, {_type: "description", _text: "An in-depth look at creating applications with XML."}], id: "bk101"}, {_type: "book", _children: [{_type: "author", _text: "Ralls, Kim"}, {_type: "title", _text: "Midnight Rain"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-12-16"}, {_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}], id: "bk102"}]}

我们得到一个表示 XML 结构的映射。每当一个 XML 元素嵌套在另一个 XML 元素中时，都可以通过 .children 属性访问它。我们可以编写以下查询来更好地了解我们的文件包含什么。

以下查询处理 book.xml 并解析结果以提取标题、描述、体裁和作者

CALL apoc.load.xml("file:///books.xml")
YIELD value
UNWIND value._children AS book
RETURN book.id AS bookId,
       [item in book._children WHERE item._type = "title"][0] AS title,
       [item in book._children WHERE item._type = "description"][0] AS description,
       [item in book._children WHERE item._type = "author"] AS authors,
       [item in book._children WHERE item._type = "genre"][0] AS genre;

表 6. 结果
bookId	标题	描述	作者	体裁
"bk101"	{_type: "title", _text: "XML Developer’s Guide"}	{_type: "description", _text: "An in-depth look at creating applications with XML."}	[{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}]	{_type: "genre", _text: "Computer"}
"bk102"	{_type: "title", _text: "Midnight Rain"}	{_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}	[{_type: "author", _text: "Ralls, Kim"}]	{_type: "genre", _text: "Fantasy"}

现在，让我们创建一个包含图书及其元数据、作者和体裁的图。

以下查询处理 book.xml 并解析结果以提取标题、描述、体裁和作者

CALL apoc.load.xml("file:///books.xml")
YIELD value
UNWIND value._children AS book

WITH book.id AS bookId,
     [item in book._children WHERE item._type = "title"][0] AS title,
     [item in book._children WHERE item._type = "description"][0] AS description,
     [item in book._children WHERE item._type = "author"] AS authors,
     [item in book._children WHERE item._type = "genre"][0] AS genre

MERGE (b:Book {id: bookId})
SET b.title = title._text, b.description = description._text

MERGE (g:Genre {name: genre._text})
MERGE (b)-[:HAS_GENRE]->(g)

WITH b, authors
UNWIND authors AS author
MERGE (a:Author {name:author._text})
MERGE (a)-[:WROTE]->(b);

下面的 Neo4j Browser 可视化显示了导入的图

您可以使用 failOnError 配置来处理 URL 或 XML 不正确时的结果。例如，借助 apoc.when 过程，您可以在 URL 不正确时返回 nothingToDo 作为结果

CALL apoc.load.xml("MY_XML_URL", '', {failOnError:false})
YIELD value
WITH value as valueXml
call apoc.do.when(valueXml["_type"] is null, "return 'nothingToDo' as result", "return valueXml as result", {valueXml: valueXml})
YIELD value
UNWIND value["result"] as result
RETURN result

从 GitHub 导入

我们还可以处理来自 HTTP 或 HTTPS URI 的 XML 文件。让我们从处理 GitHub 上托管的 books.xml 文件开始。

这次我们将传入 true 作为过程的第 4 个参数。这意味着 XML 将以简单模式解析。

以下查询使用简单模式从 GitHub 加载 books.xml 文件

WITH "https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
RETURN value;

表 7. 结果
值
{_type: "catalog", _catalog: [{_type: "book", _book: [{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}, {_type: "title", _text: "XML Developer’s Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "44.95"}, {_type: "publish_date", _text: "2000-10-01"}, {_type: "description", _text: "An in-depth look at creating applications with XML."}], id: "bk101"}, {_type: "book", _book: [{_type: "author", _text: "Ralls, Kim"}, {_type: "title", _text: "Midnight Rain"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-12-16"}, {_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}], id: "bk102"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "Maeve Ascendant"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-11-17"}, {_type: "description", _text: "After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society."}], id: "bk103"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "Oberon’s Legacy"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2001-03-10"}, {_type: "description", _text: "In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant."}], id: "bk104"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "The Sundered Grail"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2001-09-10"}, {_type: "description", _text: "The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy."}], id: "bk105"}, {_type: "book", _book: [{_type: "author", _text: "Randall, Cynthia"}, {_type: "title", _text: "Lover Birds"}, {_type: "genre", _text: "Romance"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-09-02"}, {_type: "description", _text: "When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled."}], id: "bk106"}, {_type: "book", _book: [{_type: "author", _text: "Thurman, Paula"}, {_type: "title", _text: "Splish Splash"}, {_type: "genre", _text: "Romance"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-11-02"}, {_type: "description", _text: "A deep sea diver finds true love twenty thousand leagues beneath the sea."}], id: "bk107"}, {_type: "book", _book: [{_type: "author", _text: "Knorr, Stefan"}, {_type: "title", _text: "Creepy Crawlies"}, {_type: "genre", _text: "Horror"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-12-06"}, {_type: "description", _text: "An anthology of horror stories about roaches, centipedes, scorpions and other insects."}], id: "bk108"}, {_type: "book", _book: [{_type: "author", _text: "Kress, Peter"}, {_type: "title", _text: "Paradox Lost"}, {_type: "genre", _text: "Science Fiction"}, {_type: "price", _text: "6.95"}, {_type: "publish_date", _text: "2000-11-02"}, {_type: "description", _text: "After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum."}], id: "bk109"}, {_type: "book", _book: [{_type: "author", _text: "O’Brien, Tim"}, {_type: "title", _text: "Microsoft .NET: The Programming Bible"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "36.95"}, {_type: "publish_date", _text: "2000-12-09"}, {_type: "description", _text: "Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference."}], id: "bk110"}, {_type: "book", _book: [{_type: "author", _text: "O’Brien, Tim"}, {_type: "title", _text: "MSXML3: A Comprehensive Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "36.95"}, {_type: "publish_date", _text: "2000-12-01"}, {_type: "description", _text: "The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more."}], id: "bk111"}, {_type: "book", _book: [{_type: "author", _text: "Galos, Mike"}, {_type: "title", _text: "Visual Studio 7: A Comprehensive Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "49.95"}, {_type: "publish_date", _text: "2001-04-16"}, {_type: "description", _text: "Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C+, C#, and ASP are integrated into a comprehensive development environment."}], id: "bk112"}]}

我们再次获得一个表示 XML 结构的映射，但其结构与我们不使用简单模式时不同。这次，嵌套的 XML 元素可以通过以 _ 为前缀的元素名称属性访问。

我们可以编写以下查询来更好地了解我们的文件包含什么。

以下查询处理 book.xml 并解析结果以提取标题、描述、体裁和作者

WITH "https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
UNWIND value._catalog AS catalog
RETURN catalog.id AS bookId,
       [item in catalog._book WHERE item._type = "title"][0] AS title,
       [item in catalog._book WHERE item._type = "description"][0] AS description,
       [item in catalog._book WHERE item._type = "author"] AS authors,
       [item in catalog._book WHERE item._type = "genre"][0] AS genre;

表 8. 结果
bookId	标题	描述	作者	体裁
"bk101"	{_type: "title", _text: "XML Developer’s Guide"}	{_type: "description", _text: "An in-depth look at creating applications with XML."}	[{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}]	{_type: "genre", _text: "Computer"}
"bk102"	{_type: "title", _text: "Midnight Rain"}	{_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}	[{_type: "author", _text: "Ralls, Kim"}]	{_type: "genre", _text: "Fantasy"}
"bk103"	{_type: "title", _text: "Maeve Ascendant"}	{_type: "description", _text: "After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society."}	[{_type: "author", _text: "Corets, Eva"}]	{_type: "genre", _text: "Fantasy"}
"bk104"	{_type: "title", _text: "Oberon’s Legacy"}	{_type: "description", _text: "In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant."}	[{_type: "author", _text: "Corets, Eva"}]	{_type: "genre", _text: "Fantasy"}
"bk105"	{_type: "title", _text: "The Sundered Grail"}	{_type: "description", _text: "The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy."}	[{_type: "author", _text: "Corets, Eva"}]	{_type: "genre", _text: "Fantasy"}
"bk106"	{_type: "title", _text: "Lover Birds"}	{_type: "description", _text: "When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled."}	[{_type: "author", _text: "Randall, Cynthia"}]	{_type: "genre", _text: "Romance"}
"bk107"	{_type: "title", _text: "Splish Splash"}	{_type: "description", _text: "A deep sea diver finds true love twenty thousand leagues beneath the sea."}	[{_type: "author", _text: "Thurman, Paula"}]	{_type: "genre", _text: "Romance"}
"bk108"	{_type: "title", _text: "Creepy Crawlies"}	{_type: "description", _text: "An anthology of horror stories about roaches, centipedes, scorpions and other insects."}	[{_type: "author", _text: "Knorr, Stefan"}]	{_type: "genre", _text: "Horror"}
"bk109"	{_type: "title", _text: "Paradox Lost"}	{_type: "description", _text: "After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum."}	[{_type: "author", _text: "Kress, Peter"}]	{_type: "genre", _text: "Science Fiction"}
"bk110"	{_type: "title", _text: "Microsoft .NET: The Programming Bible"}	{_type: "description", _text: "Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference."}	[{_type: "author", _text: "O’Brien, Tim"}]	{_type: "genre", _text: "Computer"}
"bk111"	{_type: "title", _text: "MSXML3: A Comprehensive Guide"}	{_type: "description", _text: "The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more."}	[{_type: "author", _text: "O’Brien, Tim"}]	{_type: "genre", _text: "Computer"}
"bk112"	{_type: "title", _text: "Visual Studio 7: A Comprehensive Guide"}	{_type: "description", _text: "Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C+, C#, and ASP are integrated into a comprehensive development environment."}	[{_type: "author", _text: "Galos, Mike"}]	{_type: "genre", _text: "Computer"}

除了仅仅返回这些数据，我们还可以创建包含图书及其元数据、作者和体裁的图。

以下查询处理 book.xml 并解析结果以提取标题、描述、体裁和作者

WITH "https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
UNWIND value._catalog AS catalog
WITH catalog.id AS bookId,
       [item in catalog._book WHERE item._type = "title"][0] AS title,
       [item in catalog._book WHERE item._type = "description"][0] AS description,
       [item in catalog._book WHERE item._type = "author"] AS authors,
       [item in catalog._book WHERE item._type = "genre"][0] AS genre

MERGE (b:Book {id: bookId})
SET b.title = title._text, b.description = description._text

MERGE (g:Genre {name: genre._text})
MERGE (b)-[:HAS_GENRE]->(g)

WITH b, authors
UNWIND authors AS author
MERGE (a:Author {name:author._text})
MERGE (a)-[:WROTE]->(b);

下面的 Neo4j Browser 可视化显示了导入的图

XPath 表达式

我们还可以提供 XPath 表达式来从 XML 文档中选择节点。如果我们只想返回具有 Computer 体裁的图书，可以编写以下查询

CALL apoc.load.xml(
  "https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml",
  '/catalog/book[genre=\"Computer\"]'
)
YIELD value as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['title','price'] | attr._text] as pairs
RETURN id, pairs[0] as title, pairs[1] as price;

表 9. 结果
id	标题	价格
"bk101"	"XML Developer’s Guide"	"44.95"
"bk110"	"Microsoft .NET: The Programming Bible"	"36.95"
"bk111"	"MSXML3: A Comprehensive Guide"	"36.95"
"bk112"	"Visual Studio 7: A Comprehensive Guide"	"49.95"

在此情况下，我们只返回 id、title 和 prize，但我们可以返回任何其他元素

我们还可以只返回单个特定元素。例如，以下查询返回 id = bg102 的图书的 author

CALL apoc.load.xml(
  'https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml',
  '/catalog/book[@id="bk102"]/author'
)
YIELD value as result
WITH result._text as author
RETURN author;

表 10. 结果
作者
"Ralls, Kim"

提取数据结构

我们可以使用 apoc.map.fromPairs 函数将值转换为映射。

call apoc.load.xml("https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml")
yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
WITH id, apoc.map.fromPairs(pairs) AS value
RETURN id, value

表 11. 结果
id	值
"bk101"	{title: "XML Developer’s Guide", author: "Arciniegas, Fabio"}
"bk102"	{title: "Midnight Rain", author: "Ralls, Kim"}
"bk103"	{title: "Maeve Ascendant", author: "Corets, Eva"}
"bk104"	{title: "Oberon’s Legacy", author: "Corets, Eva"}
"bk105"	{title: "The Sundered Grail", author: "Corets, Eva"}
"bk106"	{title: "Lover Birds", author: "Randall, Cynthia"}
"bk107"	{title: "Splish Splash", author: "Thurman, Paula"}
"bk108"	{title: "Creepy Crawlies", author: "Knorr, Stefan"}
"bk109"	{title: "Paradox Lost", author: "Kress, Peter"}
"bk110"	{title: "Microsoft .NET: The Programming Bible", author: "O’Brien, Tim"}
"bk111"	{title: "MSXML3: A Comprehensive Guide", author: "O’Brien, Tim"}
"bk112"	{title: "Visual Studio 7: A Comprehensive Guide", author: "Galos, Mike"}

现在我们可以清晰地从映射中访问属性。

call apoc.load.xml("https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml")
yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
WITH id, apoc.map.fromPairs(pairs) AS value
RETURN id, value.title, value.author

表 12. 结果
id	value.title	value.author
"bk101"	"XML Developer’s Guide"	"Arciniegas, Fabio"
"bk102"	"Midnight Rain"	"Ralls, Kim"
"bk103"	"Maeve Ascendant"	"Corets, Eva"
"bk104"	"Oberon’s Legacy"	"Corets, Eva"
"bk105"	"The Sundered Grail"	"Corets, Eva"
"bk106"	"Lover Birds"	"Randall, Cynthia"
"bk107"	"Splish Splash"	"Thurman, Paula"
"bk108"	"Creepy Crawlies"	"Knorr, Stefan"
"bk109"	"Paradox Lost"	"Kress, Peter"
"bk110"	"Microsoft .NET: The Programming Bible"	"O’Brien, Tim"
"bk111"	"MSXML3: A Comprehensive Guide"	"O’Brien, Tim"
"bk112"	"Visual Studio 7: A Comprehensive Guide"	"Galos, Mike"

直接导入 XML

我们可以编写以下查询来创建 Microsoft books XML 文件的图结构。

以下基于 books.xml 的内容创建图结构

CALL apoc.import.xml(
  "https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml",
  {relType:'NEXT_WORD', label:'XmlWord'}
)
YIELD node
RETURN node;

节点
(:XmlDocument {_xmlVersion: "1.0", _xmlEncoding: "UTF-8", url: "https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml"})

节点

(:XmlDocument {_xmlVersion: "1.0", _xmlEncoding: "UTF-8", url: "https://raw.githubusercontent.com/neo4j/apoc/2025.05/core/src/test/resources/xml/books.xml"})

下面的 Neo4j Browser 可视化显示了导入的图