设置和使用复合数据库

复合数据库允许查询同时访问多个图。此功能支持

  • 数据联邦:能够以不相交的图的形式访问分布式源中的数据。

  • 数据分片:能够以在多个数据库上分区的公共图的形式访问分布式源中的数据。

在本教程中,您将学习如何

为复合数据库使用建模数据

本教程中的示例数据基于 Microsoft 创建的 Northwind 数据集。它包含一家名为“Northwind Traders”的虚构小型公司的销售数据。数据包括客户、产品、客户订单、仓库库存、运输、供应商、员工和销售区域。

有关 Northwind(关系数据集)如何建模为图的更多信息,请在 Neo4j Browser 中运行 :guide northwind-graph 以播放内置指南 Northwind Graph。请参阅Neo4j Browser 文档

Northwind 图模型包含以下数据

  • 节点标签

    • :Product

    • :Category

    • :Supplier

    • :Order

    • :Customer

  • 关系类型

    • :SUPPLIES

    • :PART_OF

    • :ORDERS

    • :PURCHASED

northwind datamodel
图 1. Northwind 数据模型

在此场景中,假设数据隐私限制要求客户数据存储在其原始区域。为简单起见,有两个区域:美洲 (AME) 和欧洲 (EU)。第一步是重新建模 Northwind 数据集,以便客户数据可以与没有隐私限制的产品目录分离。您创建两个图:一个用于产品目录,包括 :Product:Category:Supplier:PART_OF:SUPPLIES;另一个是分区的图,位于两个数据库中,分别用于 EU 和 AME 的客户订单,包含 :Product:Order:Customer:PURCHASED:ORDERS

northwind composite datamodel
图 2. 新数据模型

数据联邦

通过这种方式,产品数据和客户数据位于两个不相交的图中,具有不同的标签和关系类型。这称为数据联邦

要跨它们进行查询,您必须联邦化这些图,因为关系不能跨越它们。这通过使用代理节点建模模式来完成:具有 :Product 标签的节点必须存在于两个联邦域中。

在产品目录图中,带有 :Product 标签的节点包含与产品相关的所有数据,而在客户图中,相同的标签与一个仅包含 productID 的代理节点关联。productID 属性允许您在此联邦中链接跨图的数据。

federation
图 3. 数据联邦

数据分片

由于客户数据属于两个区域(EU 和 AME),您必须将其分区到两个数据库中。生成的两个图具有相同的模型(相同的标签,相同的关系类型),但数据不同。这称为数据分片

sharding2
图 4. 数据分片

通常,分片主要有两种用例。最常见的是可伸缩性,即不同的分片可以部署在不同的服务器上,从而将负载分散到不同的资源上。另一个原因可能是数据法规:不同的分片可以部署在位于不同位置的服务器上,并独立管理。

为复合数据库创建数据库

在本教程中,您将创建以下数据库

  • db0 用于产品目录。

  • db1 用于欧盟客户数据。

  • db2 用于美洲客户。

  1. 启动 Neo4j DBMS。

    bin/neo4j start
  2. 检查所有可用的数据库。

    ls -al /data/databases/
    total 0
    drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 .
    drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 ..
    drwxr-xr-x  37 username  staff  1184  9 Jun 12:53 neo4j
    -rw-r--r--   1 username  staff     0  9 Jun 12:53 store_lock
    drwxr-xr-x  38 username  staff  1216  9 Jun 12:53 system
  3. 使用 cypher-shell 和默认凭据连接到 Neo4j DBMS,并在提示时更改密码

    bin/cypher-shell -u neo4j -p neo4j
    Password change required
    new password: ********
    Connected to Neo4j 2025.01 at neo4j://localhost:7687 as user neo4j.
    Type :help for a list of available commands or :exit to exit the shell.
    Note that Cypher queries must end with a semicolon.

    有关 Cypher Shell 命令行界面 (CLI) 及其使用方法的更多信息,请参阅Cypher Shell

  4. 运行命令 SHOW DATABASES 以列出所有可用的数据库

    SHOW DATABASES;
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | name     | type       | aliases | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | "neo4j"  | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []           |
    | "system" | "system"   | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    2 rows available after 102 ms, consumed after another 11 ms
  5. 运行命令 CREATE DATABASE <database-name> 来创建数据库

    CREATE DATABASE db0;
    0 rows available after 137 ms, consumed after another 0 ms
    CREATE DATABASE db1;
    0 rows available after 141 ms, consumed after another 0 ms
    CREATE DATABASE db2;
    0 rows available after 135 ms, consumed after another 0 ms
  6. 再次运行命令 SHOW DATABASES 以验证新数据库是否已创建并处于在线状态

    SHOW DATABASES;
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | name     | type       | aliases | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | "db0"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    | "db1"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    | "db2"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    | "neo4j"  | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []           |
    | "system" | "system"   | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    5 rows available after 8 ms, consumed after another 7 ms

将数据导入到您的数据库

您可以使用命令 LOAD CSV WITH HEADERS FROM 将数据导入到数据库。

将产品目录加载到 db0 中

  1. 运行以下 Cypher 查询,将活动数据库更改为 db0,并添加产品数据

    :use db0;
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
    CREATE (n:Product)
    SET n = row,
    n.unitPrice = toFloat(row.unitPrice),
    n.unitsInStock = toInteger(row.unitsInStock), n.unitsOnOrder = toInteger(row.unitsOnOrder),
    n.reorderLevel = toInteger(row.reorderLevel), n.discontinued = (row.discontinued <> "0");
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/categories.csv" AS row
    CREATE (n:Category)
    SET n = row;
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/suppliers.csv" AS row
    CREATE (n:Supplier)
    SET n = row;
    
    CREATE INDEX FOR (p:Product) ON (p.productID);
    CREATE INDEX FOR (c:Category) ON (c.categoryID);
    CREATE INDEX FOR (s:Supplier) ON (s.supplierID);
    
    MATCH (p:Product),(c:Category)
    WHERE p.categoryID = c.categoryID
    CREATE (p)-[:PART_OF]->(c);
    
    MATCH (p:Product),(s:Supplier)
    WHERE p.supplierID = s.supplierID
    CREATE (s)-[:SUPPLIES]->(p);
  2. 按回车键。

  3. 验证产品数据是否已加载到 db0

    MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category)
    RETURN s.companyName AS Supplier, p.productName AS Product, c.categoryName AS Category
    LIMIT 5;
    +--------------------------------------------------------------------------+
    | Supplier                     | Product                     | Category    |
    +--------------------------------------------------------------------------+
    | "Bigfoot Breweries"          | "Sasquatch Ale"             | "Beverages" |
    | "Pavlova"                    | "Outback Lager"             | "Beverages" |
    | "Bigfoot Breweries"          | "Laughing Lumberjack Lager" | "Beverages" |
    | "Bigfoot Breweries"          | "Steeleye Stout"            | "Beverages" |
    | "Aux joyeux ecclésiastiques" | "Côte de Blaye"             | "Beverages" |
    +--------------------------------------------------------------------------+
    
    5 rows available after 202 ms, consumed after another 5 ms

将欧盟客户和相关订单加载到 db1 中

  1. 运行以下 Cypher 查询,将活动数据库更改为 db1,并添加欧盟客户和订单

    :use db1;
    
    :param europe => ['Germany', 'UK', 'Sweden', 'France', 'Spain', 'Switzerland', 'Austria', 'Italy', 'Portugal', 'Ireland', 'Belgium', 'Norway', 'Denmark', 'Finland'];
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row
    WITH row
    WHERE row.country IN $europe
    CREATE (n:Customer)
    SET n = row;
    
    CREATE INDEX FOR (c:Customer) ON (c.customerID);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row
    WITH row
    MATCH (c:Customer)
    WHERE row.customerID = c.customerID
    CREATE (o:Order)
    SET o = row;
    
    CREATE INDEX FOR (o:Order) ON (o.orderID);
    
    MATCH (c:Customer),(o:Order)
    WHERE c.customerID = o.customerID
    CREATE (c)-[:PURCHASED]->(o);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
    CREATE (n:Product)
    SET n.productID = row.productID;
    
    CREATE INDEX FOR (p:Product) ON (p.productID);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row
    MATCH (p:Product), (o:Order)
    WHERE p.productID = row.productID AND o.orderID = row.orderID
    CREATE (o)-[details:ORDERS]->(p)
    SET details = row, details.quantity = toInteger(row.quantity);
  2. 按回车键。

  3. 验证欧盟客户订单数据是否已加载到 db1

    MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
    RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
    LIMIT 5;
    +-------------------------------------------------------------+
    | Customer              | CustomerCountry | Order   | Product |
    +-------------------------------------------------------------+
    | "Alfreds Futterkiste" | "Germany"       | "10692" | "63"    |
    | "Alfreds Futterkiste" | "Germany"       | "10835" | "77"    |
    | "Alfreds Futterkiste" | "Germany"       | "10835" | "59"    |
    | "Alfreds Futterkiste" | "Germany"       | "10702" | "76"    |
    | "Alfreds Futterkiste" | "Germany"       | "10702" | "3"     |
    +-------------------------------------------------------------+
    
    5 rows available after 47 ms, consumed after another 2 ms

将美洲客户和相关订单加载到 db2 中

  1. 运行以下 Cypher 查询,将活动数据库更改为 db2,并添加美洲客户和订单

    :use db2;
    
    :param americas => ['Mexico', 'Canada', 'Argentina', 'Brazil', 'USA', 'Venezuela'];
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row
    WITH row
    WHERE row.country IN $americas
    CREATE (n:Customer)
    SET n = row;
    
    CREATE INDEX FOR (c:Customer) ON (c.customerID);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row
    WITH row
    MATCH (c:Customer)
    WHERE row.customerID = c.customerID
    CREATE (o:Order)
    SET o = row;
    
    CREATE INDEX FOR (o:Order) ON (o.orderID);
    
    MATCH (c:Customer),(o:Order)
    WHERE c.customerID = o.customerID
    CREATE (c)-[:PURCHASED]->(o);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
    CREATE (n:Product)
    SET n.productID = row.productID;
    
    CREATE INDEX FOR (p:Product) ON (p.productID);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row
    MATCH (p:Product), (o:Order)
    WHERE p.productID = row.productID AND o.orderID = row.orderID
    CREATE (o)-[details:ORDERS]->(p)
    SET details = row,
    details.quantity = toInteger(row.quantity);
  2. 按回车键。

  3. 验证美洲客户订单数据是否已加载到 db2

    MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
    RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
    LIMIT 5;
    +----------------------------------------------------------------------------+
    | Customer                             | CustomerCountry | Order   | Product |
    +----------------------------------------------------------------------------+
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10759" | "32"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "72"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "13"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "19"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "11"    |
    +----------------------------------------------------------------------------+
    
    5 rows available after 42 ms, consumed after another 1 ms

配置复合数据库

使用 CREATE COMPOSITE DATABASE Cypher 命令设置复合数据库,并将本地数据库别名作为组成部分添加到复合数据库中。在此示例中,复合数据库名为 compositenw

  1. 运行命令 CREATE COMPOSITE DATABASE <composite-database-name> 来创建复合数据库

    CREATE COMPOSITE DATABASE compositenw;
    0 rows available after 137 ms, consumed after another 0 ms
  2. 运行命令 CREATE ALIAS <composite-database-name>.<alias-name> FOR DATABASE <database-name> 来创建组成数据库别名

    CREATE ALIAS compositenw.product FOR DATABASE db0;
    0 rows available after 101 ms, consumed after another 0 ms
    CREATE ALIAS compositenw.customerEU FOR DATABASE db1;
    0 rows available after 107 ms, consumed after another 0 ms
    CREATE ALIAS compositenw.customerAME FOR DATABASE db2;
    0 rows available after 98 ms, consumed after another 0 ms

    本教程中的组成数据库别名是本地数据库别名(目标是同一 Neo4j DBMS 中的数据库),但它们也可以是远程数据库别名(目标是另一个 Neo4j DBMS 中的数据库)。

  3. 运行命令 SHOW DATABASES 以验证复合数据库是否已配置并处于在线状态

    SHOW DATABASES;
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | name          | type        | aliases                     | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents                                                                 |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | "db0"         | "standard"  | ["compositenw.product"]     | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
    | "db1"         | "standard"  | ["compositenw.customerEU"]  | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
    | "db2"         | "standard"  | ["compositenw.customerAME"] | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
    | "compositenw" | "composite" | []                          | "read-only"  | "localhost:7687" | "primary" | FALSE  | "online"        | "online"      | ""            | FALSE   | FALSE | ["compositenw.customerAME", "compositenw.customerEU", "compositenw.product"] |
    | "neo4j"       | "standard"  | []                          | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []                                                                           |
    | "system"      | "system"    | []                          | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    6 rows available after 242 ms, consumed after another 18 ms
  4. 运行命令 SHOW ALIASES FOR DATABASES 以验证数据库别名是否已配置

    SHOW ALIASES FOR DATABASES;
    +--------------------------------------------------------------------------------+
    | name                      | composite     | database  | location | url  | user |
    +--------------------------------------------------------------------------------+
    | "compositenw.product"     | "compositenw" | "db0"     | "local"  | null | null |
    | "compositenw.customerEU"  | "compositenw" | "db1"     | "local"  | null | null |
    | "compositenw.customerAME" | "compositenw" | "db2"     | "local"  | null | null |
    +--------------------------------------------------------------------------------+
    3 rows available after 203 ms, consumed after another 16 ms

使用单个 Cypher 查询检索数据

查询单个数据库

连接到复合数据库后,您可以使用 Cypher 子句 USE 和别名名称从单个数据库检索数据

:use compositenw
USE compositenw.product
MATCH (p:Product)
RETURN p.productName AS product
LIMIT 5;
+--------------------------------+
| product                        |
+--------------------------------+
| "Chai"                         |
| "Chang"                        |
| "Aniseed Syrup"                |
| "Chef Anton's Cajun Seasoning" |
| "Chef Anton's Gumbo Mix"       |
+--------------------------------+

5 rows available after 6 ms, consumed after another 21 ms

跨多个分片查询

使用复合数据库查询两个分片,并获取姓名以 A 开头的客户

:use compositenw
USE compositenw.customerAME
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
  UNION
USE compositenw.customerEU
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
LIMIT 5;
+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 25 ms, consumed after another 56 ms

或者,使用更常见的复合数据库惯用法

:use compositenw
UNWIND ['compositenw.customerAME', 'compositenw.customerEU'] AS g
CALL {
  USE graph.byName(g)
  MATCH (c:Customer)
  WHERE c.customerID STARTS WITH 'A'
  RETURN c.customerID AS name, c.country AS country
}
RETURN name, country
LIMIT 5;
+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 61 ms, consumed after another 8 ms

跨联邦和分片查询

这是一个更复杂的查询,它使用所有 3 个数据库来查找所有购买了肉类/家禽类别中已停产产品的客户

:use compositenw
CALL {
  USE compositenw.product
  MATCH (p:Product)-[:PART_OF]->(c:Category)
  WHERE p.discontinued = true
    AND c.categoryName = 'Meat/Poultry'
  RETURN COLLECT(p.productID) AS pids
}
WITH *
UNWIND [g IN graph.names() WHERE g STARTS WITH 'compositenw.customer'] AS g
CALL {
  USE graph.byName(g)
  WITH pids
  UNWIND pids as pid
  MATCH (p:Product{productID:pid})<-[:ORDERS]-(:Order)<-[:PURCHASED]-(c:Customer)
  RETURN DISTINCT c.customerID AS customer, c.country AS country
}
RETURN customer, country
LIMIT 20;
+--------------------------+
| customer | country       |
+--------------------------+
| "RICSU"  | "Switzerland" |
| "PERIC"  | "Mexico"      |
| "WARTH"  | "Finland"     |
| "WELLI"  | "Brazil"      |
| "DRACD"  | "Germany"     |
| "RATTC"  | "USA"         |
| "HUNGO"  | "Ireland"     |
| "QUEDE"  | "Brazil"      |
| "SEVES"  | "UK"          |
| "ANTON"  | "Mexico"      |
| "BERGS"  | "Sweden"      |
| "SAVEA"  | "USA"         |
| "AROUT"  | "UK"          |
| "FAMIA"  | "Brazil"      |
| "WANDK"  | "Germany"     |
| "WHITC"  | "USA"         |
| "ISLAT"  | "UK"          |
| "LONEP"  | "USA"         |
| "QUICK"  | "Germany"     |
| "HILAA"  | "Venezuela"   |
+--------------------------+

20 rows available after 51 ms, consumed after another 2 ms

此查询的工作方式是 compositenw 调用数据库 db0 以检索肉类/家禽类别中所有已停产的产品。然后,使用返回的产品 ID,它并行查询 db1db2,并获取购买了这些产品的客户及其国家/地区。

您刚刚学习了如何使用单个 Cypher 查询从多个数据库存储和检索数据。

有关复合数据库的更多详细信息,请参阅概念

© . All rights reserved.