设置和使用复合数据库

复合数据库允许查询同时访问多个图。此功能支持

数据联邦：能够以不相交的图的形式访问分布式源中的数据。
数据分片：能够以在多个数据库上分区的公共图的形式访问分布式源中的数据。

在本教程中，您将学习如何

为复合数据库使用建模数据
为复合数据库创建数据库
将数据导入到您的数据库
配置复合数据库
使用单个 Cypher 查询检索数据

为复合数据库使用建模数据

本教程中的示例数据基于 Microsoft 创建的 Northwind 数据集。它包含一家名为“Northwind Traders”的虚构小型公司的销售数据。数据包括客户、产品、客户订单、仓库库存、运输、供应商、员工和销售区域。

有关 Northwind（关系数据集）如何建模为图的更多信息，请在 Neo4j Browser 中运行 :guide northwind-graph 以播放内置指南 Northwind Graph。请参阅Neo4j Browser 文档。

Northwind 图模型包含以下数据

节点标签
- :Product
- :Category
- :Supplier
- :Order
- :Customer
关系类型
- :SUPPLIES
- :PART_OF
- :ORDERS
- :PURCHASED

图 1. Northwind 数据模型

在此场景中，假设数据隐私限制要求客户数据存储在其原始区域。为简单起见，有两个区域：美洲 (AME) 和欧洲 (EU)。第一步是重新建模 Northwind 数据集，以便客户数据可以与没有隐私限制的产品目录分离。您创建两个图：一个用于产品目录，包括 :Product、:Category、:Supplier、:PART_OF 和 :SUPPLIES；另一个是分区的图，位于两个数据库中，分别用于 EU 和 AME 的客户订单，包含 :Product、:Order、:Customer、:PURCHASED 和 :ORDERS。

图 2. 新数据模型

数据联邦

通过这种方式，产品数据和客户数据位于两个不相交的图中，具有不同的标签和关系类型。这称为数据联邦。

要跨它们进行查询，您必须联邦化这些图，因为关系不能跨越它们。这通过使用代理节点建模模式来完成：具有 :Product 标签的节点必须存在于两个联邦域中。

在产品目录图中，带有 :Product 标签的节点包含与产品相关的所有数据，而在客户图中，相同的标签与一个仅包含 productID 的代理节点关联。productID 属性允许您在此联邦中链接跨图的数据。

图 3. 数据联邦

数据分片

由于客户数据属于两个区域（EU 和 AME），您必须将其分区到两个数据库中。生成的两个图具有相同的模型（相同的标签，相同的关系类型），但数据不同。这称为数据分片。

图 4. 数据分片

通常，分片主要有两种用例。最常见的是可伸缩性，即不同的分片可以部署在不同的服务器上，从而将负载分散到不同的资源上。另一个原因可能是数据法规：不同的分片可以部署在位于不同位置的服务器上，并独立管理。

为复合数据库创建数据库

在本教程中，您将创建以下数据库

db0 用于产品目录。
db1 用于欧盟客户数据。
db2 用于美洲客户。

启动 Neo4j DBMS。
```
bin/neo4j start
```

检查所有可用的数据库。

ls -al /data/databases/

total 0
drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 .
drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 ..
drwxr-xr-x  37 username  staff  1184  9 Jun 12:53 neo4j
-rw-r--r--   1 username  staff     0  9 Jun 12:53 store_lock
drwxr-xr-x  38 username  staff  1216  9 Jun 12:53 system

使用 cypher-shell 和默认凭据连接到 Neo4j DBMS，并在提示时更改密码

bin/cypher-shell -u neo4j -p neo4j

Password change required
new password: ********
Connected to Neo4j 2025.01 at neo4j://:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.

有关 Cypher Shell 命令行界面 (CLI) 及其使用方法的更多信息，请参阅Cypher Shell。

运行命令 SHOW DATABASES 以列出所有可用的数据库

SHOW DATABASES;

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name     | type       | aliases | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "neo4j"  | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []           |
| "system" | "system"   | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows available after 102 ms, consumed after another 11 ms

运行命令 CREATE DATABASE <database-name> 来创建数据库

CREATE DATABASE db0;

0 rows available after 137 ms, consumed after another 0 ms

CREATE DATABASE db1;

0 rows available after 141 ms, consumed after another 0 ms

CREATE DATABASE db2;

0 rows available after 135 ms, consumed after another 0 ms

再次运行命令 SHOW DATABASES 以验证新数据库是否已创建并处于在线状态

SHOW DATABASES;

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name     | type       | aliases | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "db0"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
| "db1"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
| "db2"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
| "neo4j"  | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []           |
| "system" | "system"   | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows available after 8 ms, consumed after another 7 ms

将数据导入到您的数据库

您可以使用命令 LOAD CSV WITH HEADERS FROM 将数据导入到数据库。

将产品目录加载到 db0 中

运行以下 Cypher 查询，将活动数据库更改为 db0，并添加产品数据

:use db0;

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
CREATE (n:Product)
SET n = row,
n.unitPrice = toFloat(row.unitPrice),
n.unitsInStock = toInteger(row.unitsInStock), n.unitsOnOrder = toInteger(row.unitsOnOrder),
n.reorderLevel = toInteger(row.reorderLevel), n.discontinued = (row.discontinued <> "0");

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/categories.csv" AS row
CREATE (n:Category)
SET n = row;

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/suppliers.csv" AS row
CREATE (n:Supplier)
SET n = row;

CREATE INDEX FOR (p:Product) ON (p.productID);
CREATE INDEX FOR (c:Category) ON (c.categoryID);
CREATE INDEX FOR (s:Supplier) ON (s.supplierID);

MATCH (p:Product),(c:Category)
WHERE p.categoryID = c.categoryID
CREATE (p)-[:PART_OF]->(c);

MATCH (p:Product),(s:Supplier)
WHERE p.supplierID = s.supplierID
CREATE (s)-[:SUPPLIES]->(p);

按回车键。

验证产品数据是否已加载到 db0 中

MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category)
RETURN s.companyName AS Supplier, p.productName AS Product, c.categoryName AS Category
LIMIT 5;

+--------------------------------------------------------------------------+
| Supplier                     | Product                     | Category    |
+--------------------------------------------------------------------------+
| "Bigfoot Breweries"          | "Sasquatch Ale"             | "Beverages" |
| "Pavlova"                    | "Outback Lager"             | "Beverages" |
| "Bigfoot Breweries"          | "Laughing Lumberjack Lager" | "Beverages" |
| "Bigfoot Breweries"          | "Steeleye Stout"            | "Beverages" |
| "Aux joyeux ecclésiastiques" | "Côte de Blaye"             | "Beverages" |
+--------------------------------------------------------------------------+

5 rows available after 202 ms, consumed after another 5 ms

将欧盟客户和相关订单加载到 db1 中

运行以下 Cypher 查询，将活动数据库更改为 db1，并添加欧盟客户和订单

:use db1;

:param europe => ['Germany', 'UK', 'Sweden', 'France', 'Spain', 'Switzerland', 'Austria', 'Italy', 'Portugal', 'Ireland', 'Belgium', 'Norway', 'Denmark', 'Finland'];

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row
WITH row
WHERE row.country IN $europe
CREATE (n:Customer)
SET n = row;

CREATE INDEX FOR (c:Customer) ON (c.customerID);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row
WITH row
MATCH (c:Customer)
WHERE row.customerID = c.customerID
CREATE (o:Order)
SET o = row;

CREATE INDEX FOR (o:Order) ON (o.orderID);

MATCH (c:Customer),(o:Order)
WHERE c.customerID = o.customerID
CREATE (c)-[:PURCHASED]->(o);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
CREATE (n:Product)
SET n.productID = row.productID;

CREATE INDEX FOR (p:Product) ON (p.productID);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row
MATCH (p:Product), (o:Order)
WHERE p.productID = row.productID AND o.orderID = row.orderID
CREATE (o)-[details:ORDERS]->(p)
SET details = row, details.quantity = toInteger(row.quantity);

按回车键。

验证欧盟客户订单数据是否已加载到 db1 中

MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
LIMIT 5;

+-------------------------------------------------------------+
| Customer              | CustomerCountry | Order   | Product |
+-------------------------------------------------------------+
| "Alfreds Futterkiste" | "Germany"       | "10692" | "63"    |
| "Alfreds Futterkiste" | "Germany"       | "10835" | "77"    |
| "Alfreds Futterkiste" | "Germany"       | "10835" | "59"    |
| "Alfreds Futterkiste" | "Germany"       | "10702" | "76"    |
| "Alfreds Futterkiste" | "Germany"       | "10702" | "3"     |
+-------------------------------------------------------------+

5 rows available after 47 ms, consumed after another 2 ms

将美洲客户和相关订单加载到 db2 中

运行以下 Cypher 查询，将活动数据库更改为 db2，并添加美洲客户和订单

:use db2;

:param americas => ['Mexico', 'Canada', 'Argentina', 'Brazil', 'USA', 'Venezuela'];

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row
WITH row
WHERE row.country IN $americas
CREATE (n:Customer)
SET n = row;

CREATE INDEX FOR (c:Customer) ON (c.customerID);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row
WITH row
MATCH (c:Customer)
WHERE row.customerID = c.customerID
CREATE (o:Order)
SET o = row;

CREATE INDEX FOR (o:Order) ON (o.orderID);

MATCH (c:Customer),(o:Order)
WHERE c.customerID = o.customerID
CREATE (c)-[:PURCHASED]->(o);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
CREATE (n:Product)
SET n.productID = row.productID;

CREATE INDEX FOR (p:Product) ON (p.productID);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row
MATCH (p:Product), (o:Order)
WHERE p.productID = row.productID AND o.orderID = row.orderID
CREATE (o)-[details:ORDERS]->(p)
SET details = row,
details.quantity = toInteger(row.quantity);

按回车键。

验证美洲客户订单数据是否已加载到 db2 中

MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
LIMIT 5;

+----------------------------------------------------------------------------+
| Customer                             | CustomerCountry | Order   | Product |
+----------------------------------------------------------------------------+
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10759" | "32"    |
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "72"    |
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "13"    |
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "19"    |
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "11"    |
+----------------------------------------------------------------------------+

5 rows available after 42 ms, consumed after another 1 ms

配置复合数据库

使用 CREATE COMPOSITE DATABASE Cypher 命令设置复合数据库，并将本地数据库别名作为组成部分添加到复合数据库中。在此示例中，复合数据库名为 compositenw。

运行命令 CREATE COMPOSITE DATABASE <composite-database-name> 来创建复合数据库

CREATE COMPOSITE DATABASE compositenw;

0 rows available after 137 ms, consumed after another 0 ms

运行命令 CREATE ALIAS <composite-database-name>.<alias-name> FOR DATABASE <database-name> 来创建组成数据库别名

CREATE ALIAS compositenw.product FOR DATABASE db0;

0 rows available after 101 ms, consumed after another 0 ms

CREATE ALIAS compositenw.customerEU FOR DATABASE db1;

0 rows available after 107 ms, consumed after another 0 ms

CREATE ALIAS compositenw.customerAME FOR DATABASE db2;

0 rows available after 98 ms, consumed after another 0 ms

本教程中的组成数据库别名是本地数据库别名（目标是同一 Neo4j DBMS 中的数据库），但它们也可以是远程数据库别名（目标是另一个 Neo4j DBMS 中的数据库）。

运行命令 SHOW DATABASES 以验证复合数据库是否已配置并处于在线状态

SHOW DATABASES;

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name          | type        | aliases                     | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "db0"         | "standard"  | ["compositenw.product"]     | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
| "db1"         | "standard"  | ["compositenw.customerEU"]  | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
| "db2"         | "standard"  | ["compositenw.customerAME"] | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
| "compositenw" | "composite" | []                          | "read-only"  | "localhost:7687" | "primary" | FALSE  | "online"        | "online"      | ""            | FALSE   | FALSE | ["compositenw.customerAME", "compositenw.customerEU", "compositenw.product"] |
| "neo4j"       | "standard"  | []                          | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []                                                                           |
| "system"      | "system"    | []                          | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows available after 242 ms, consumed after another 18 ms

运行命令 SHOW ALIASES FOR DATABASES 以验证数据库别名是否已配置

SHOW ALIASES FOR DATABASES;

+--------------------------------------------------------------------------------+
| name                      | composite     | database  | location | url  | user |
+--------------------------------------------------------------------------------+
| "compositenw.product"     | "compositenw" | "db0"     | "local"  | null | null |
| "compositenw.customerEU"  | "compositenw" | "db1"     | "local"  | null | null |
| "compositenw.customerAME" | "compositenw" | "db2"     | "local"  | null | null |
+--------------------------------------------------------------------------------+
3 rows available after 203 ms, consumed after another 16 ms

使用单个 Cypher 查询检索数据

查询单个数据库

连接到复合数据库后，您可以使用 Cypher 子句 USE 和别名名称从单个数据库检索数据

:use compositenw

USE compositenw.product
MATCH (p:Product)
RETURN p.productName AS product
LIMIT 5;

+--------------------------------+
| product                        |
+--------------------------------+
| "Chai"                         |
| "Chang"                        |
| "Aniseed Syrup"                |
| "Chef Anton's Cajun Seasoning" |
| "Chef Anton's Gumbo Mix"       |
+--------------------------------+

5 rows available after 6 ms, consumed after another 21 ms

跨多个分片查询

使用复合数据库查询两个分片，并获取姓名以 A 开头的客户

:use compositenw

USE compositenw.customerAME
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
  UNION
USE compositenw.customerEU
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
LIMIT 5;

+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 25 ms, consumed after another 56 ms

或者，使用更常见的复合数据库惯用法

:use compositenw

UNWIND ['compositenw.customerAME', 'compositenw.customerEU'] AS g
CALL {
  USE graph.byName(g)
  MATCH (c:Customer)
  WHERE c.customerID STARTS WITH 'A'
  RETURN c.customerID AS name, c.country AS country
}
RETURN name, country
LIMIT 5;

+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 61 ms, consumed after another 8 ms

跨联邦和分片查询

这是一个更复杂的查询，它使用所有 3 个数据库来查找所有购买了肉类/家禽类别中已停产产品的客户

:use compositenw

CALL {
  USE compositenw.product
  MATCH (p:Product)-[:PART_OF]->(c:Category)
  WHERE p.discontinued = true
    AND c.categoryName = 'Meat/Poultry'
  RETURN COLLECT(p.productID) AS pids
}
WITH *
UNWIND [g IN graph.names() WHERE g STARTS WITH 'compositenw.customer'] AS g
CALL {
  USE graph.byName(g)
  WITH pids
  UNWIND pids as pid
  MATCH (p:Product{productID:pid})<-[:ORDERS]-(:Order)<-[:PURCHASED]-(c:Customer)
  RETURN DISTINCT c.customerID AS customer, c.country AS country
}
RETURN customer, country
LIMIT 20;

+--------------------------+
| customer | country       |
+--------------------------+
| "RICSU"  | "Switzerland" |
| "PERIC"  | "Mexico"      |
| "WARTH"  | "Finland"     |
| "WELLI"  | "Brazil"      |
| "DRACD"  | "Germany"     |
| "RATTC"  | "USA"         |
| "HUNGO"  | "Ireland"     |
| "QUEDE"  | "Brazil"      |
| "SEVES"  | "UK"          |
| "ANTON"  | "Mexico"      |
| "BERGS"  | "Sweden"      |
| "SAVEA"  | "USA"         |
| "AROUT"  | "UK"          |
| "FAMIA"  | "Brazil"      |
| "WANDK"  | "Germany"     |
| "WHITC"  | "USA"         |
| "ISLAT"  | "UK"          |
| "LONEP"  | "USA"         |
| "QUICK"  | "Germany"     |
| "HILAA"  | "Venezuela"   |
+--------------------------+

20 rows available after 51 ms, consumed after another 2 ms

此查询的工作方式是 compositenw 调用数据库 db0 以检索肉类/家禽类别中所有已停产的产品。然后，使用返回的产品 ID，它并行查询 db1 和 db2，并获取购买了这些产品的客户及其国家/地区。

您刚刚学习了如何使用单个 Cypher 查询从多个数据库存储和检索数据。

有关复合数据库的更多详细信息，请参阅概念。