设置和使用复合数据库

复合数据库允许同时访问多个图的查询。这是一个启用以下功能的功能:

  • 数据联合:以非连接图的形式访问分布式来源中可用的数据的能力。

  • 数据分片:以在多个数据库上分区的公共图的形式访问分布式来源中可用的数据的能力。

在本教程中,您将学习如何:

为复合数据库使用对您的数据进行建模

本教程中的示例数据基于由 Microsoft 创建的 Northwind 数据集。它包含一家名为“Northwind Traders”的虚构小型公司的销售数据。数据包括客户、产品、客户订单、仓库库存、运输、供应商、员工和销售区域。

有关如何将 Northwind(关系型数据集)建模成图的更多信息,请在 Neo4j 浏览器中运行 :guide northwind-graph 来播放内置的 Northwind 图指南。请参阅 Neo4j 浏览器文档

Northwind 图模型包含以下数据

  • 节点标签

    • :Product

    • :Category

    • :Supplier

    • :Order

    • :Customer

  • 关系类型

    • :SUPPLIES

    • :PART_OF

    • :ORDERS

    • :PURCHASED

northwind datamodel
图 1. Northwind 数据模型

在此场景中,假设数据隐私限制要求客户数据存储在其原始区域。为简单起见,有两个区域:美洲 (AME) 和欧洲 (EU)。第一步是对 Northwind 数据集进行重新建模,以便将客户数据与产品目录分离,产品目录没有隐私限制。您将创建两个图:一个用于产品目录,其中包括 :Product:Category:Supplier:PART_OF:SUPPLIES,以及一个在两个数据库中为 EU 和 AME 的客户订单进行分区图,其中包括 :Product:Order:Customer:PURCHASED:ORDERS

northwind composite datamodel
图 2. 新的数据模型

数据联合

通过这种方式,产品和客户数据位于两个**不相交的图**中,具有不同的标签和关系类型。这称为数据联合

要跨图查询,您必须联合这些图,因为关系不能跨图。这是通过使用代理节点建模模式来完成的:具有 :Product 标签的节点必须存在于两个联合域中。

在产品目录图中,具有 :Product 标签的节点包含与产品相关的所有数据,而在客户图中,同一个标签与一个代理节点关联,该代理节点只包含 productIDproductID 属性允许您在该联合中的图之间链接数据。

federation
图 3. 数据联合

数据分片

由于客户数据适用于两个区域(EU 和 AME),因此您必须将其划分为两个数据库。生成的两个图具有相同的模型(相同的标签、相同的关系列),但数据不同。这称为数据分片

sharding2
图 4. 数据分片

一般而言,有几种主要用例需要分片。最常见的是可扩展性,即不同的分片可以部署在不同的服务器上,将负载分散到不同的资源上。另一个原因可能是数据法规:不同的分片可以部署在位于不同位置的服务器上,并独立管理。

为复合数据库创建数据库

在本教程中,您将创建以下数据库

  • db0 用于产品目录。

  • db1 用于 EU 客户数据。

  • db2 用于 AME 客户。

  1. 启动 Neo4j DBMS。

    bin/neo4j start
  2. 检查所有可用数据库。

    ls -al /data/databases/
    total 0
    drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 .
    drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 ..
    drwxr-xr-x  37 username  staff  1184  9 Jun 12:53 neo4j
    -rw-r--r--   1 username  staff     0  9 Jun 12:53 store_lock
    drwxr-xr-x  38 username  staff  1216  9 Jun 12:53 system
  3. 使用默认凭据通过 cypher-shell 连接到 Neo4j DBMS,并在提示时更改密码

    bin/cypher-shell -u neo4j -p neo4j
    Password change required
    new password: ********
    Connected to Neo4j 5 at neo4j://localhost:7687 as user neo4j.
    Type :help for a list of available commands or :exit to exit the shell.
    Note that Cypher queries must end with a semicolon.

    有关 Cypher Shell 命令行界面 (CLI) 及其使用方法的更多信息,请参阅 Cypher Shell

  4. 运行命令 SHOW DATABASES 列出所有可用数据库

    SHOW DATABASES;
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | name     | type       | aliases | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | "neo4j"  | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []           |
    | "system" | "system"   | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    2 rows available after 102 ms, consumed after another 11 ms
  5. 运行命令 CREATE DATABASE <database-name> 创建数据库

    CREATE DATABASE db0;
    0 rows available after 137 ms, consumed after another 0 ms
    CREATE DATABASE db1;
    0 rows available after 141 ms, consumed after another 0 ms
    CREATE DATABASE db2;
    0 rows available after 135 ms, consumed after another 0 ms
  6. 再次运行命令 SHOW DATABASES 验证新数据库是否已创建且处于 online 状态

    SHOW DATABASES;
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | name     | type       | aliases | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | "db0"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    | "db1"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    | "db2"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    | "neo4j"  | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []           |
    | "system" | "system"   | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    5 rows available after 8 ms, consumed after another 7 ms

将数据导入数据库

您可以使用命令 LOAD CSV WITH HEADERS FROM 将数据导入数据库。

将产品目录加载到 db0 中

  1. 运行以下 Cypher 查询将活动数据库更改为 db0 并添加产品数据

    :use db0;
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
    CREATE (n:Product)
    SET n = row,
    n.unitPrice = toFloat(row.unitPrice),
    n.unitsInStock = toInteger(row.unitsInStock), n.unitsOnOrder = toInteger(row.unitsOnOrder),
    n.reorderLevel = toInteger(row.reorderLevel), n.discontinued = (row.discontinued <> "0");
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/categories.csv" AS row
    CREATE (n:Category)
    SET n = row;
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/suppliers.csv" AS row
    CREATE (n:Supplier)
    SET n = row;
    
    CREATE INDEX FOR (p:Product) ON (p.productID);
    CREATE INDEX FOR (c:Category) ON (c.categoryID);
    CREATE INDEX FOR (s:Supplier) ON (s.supplierID);
    
    MATCH (p:Product),(c:Category)
    WHERE p.categoryID = c.categoryID
    CREATE (p)-[:PART_OF]->(c);
    
    MATCH (p:Product),(s:Supplier)
    WHERE p.supplierID = s.supplierID
    CREATE (s)-[:SUPPLIES]->(p);
  2. 按 Enter 键。

  3. 验证产品数据是否已加载到 db0

    MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category)
    RETURN s.companyName AS Supplier, p.productName AS Product, c.categoryName AS Category
    LIMIT 5;
    +--------------------------------------------------------------------------+
    | Supplier                     | Product                     | Category    |
    +--------------------------------------------------------------------------+
    | "Bigfoot Breweries"          | "Sasquatch Ale"             | "Beverages" |
    | "Pavlova"                    | "Outback Lager"             | "Beverages" |
    | "Bigfoot Breweries"          | "Laughing Lumberjack Lager" | "Beverages" |
    | "Bigfoot Breweries"          | "Steeleye Stout"            | "Beverages" |
    | "Aux joyeux ecclésiastiques" | "Côte de Blaye"             | "Beverages" |
    +--------------------------------------------------------------------------+
    
    5 rows available after 202 ms, consumed after another 5 ms

将 EU 客户及其相关订单加载到 db1 中

  1. 运行以下 Cypher 查询将活动数据库更改为 db1 并添加 EU 客户和订单

    :use db1;
    
    :param europe => ['Germany', 'UK', 'Sweden', 'France', 'Spain', 'Switzerland', 'Austria', 'Italy', 'Portugal', 'Ireland', 'Belgium', 'Norway', 'Denmark', 'Finland'];
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row
    WITH row
    WHERE row.country IN $europe
    CREATE (n:Customer)
    SET n = row;
    
    CREATE INDEX FOR (c:Customer) ON (c.customerID);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row
    WITH row
    MATCH (c:Customer)
    WHERE row.customerID = c.customerID
    CREATE (o:Order)
    SET o = row;
    
    CREATE INDEX FOR (o:Order) ON (o.orderID);
    
    MATCH (c:Customer),(o:Order)
    WHERE c.customerID = o.customerID
    CREATE (c)-[:PURCHASED]->(o);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
    CREATE (n:Product)
    SET n.productID = row.productID;
    
    CREATE INDEX FOR (p:Product) ON (p.productID);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row
    MATCH (p:Product), (o:Order)
    WHERE p.productID = row.productID AND o.orderID = row.orderID
    CREATE (o)-[details:ORDERS]->(p)
    SET details = row, details.quantity = toInteger(row.quantity);
  2. 按 Enter 键。

  3. 验证 EU 客户订单数据是否已加载到 db1

    MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
    RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
    LIMIT 5;
    +-------------------------------------------------------------+
    | Customer              | CustomerCountry | Order   | Product |
    +-------------------------------------------------------------+
    | "Alfreds Futterkiste" | "Germany"       | "10692" | "63"    |
    | "Alfreds Futterkiste" | "Germany"       | "10835" | "77"    |
    | "Alfreds Futterkiste" | "Germany"       | "10835" | "59"    |
    | "Alfreds Futterkiste" | "Germany"       | "10702" | "76"    |
    | "Alfreds Futterkiste" | "Germany"       | "10702" | "3"     |
    +-------------------------------------------------------------+
    
    5 rows available after 47 ms, consumed after another 2 ms

将 AME 客户及其相关订单加载到 db2 中

  1. 运行以下 Cypher 查询将活动数据库更改为 db2 并添加 AME 客户和订单

    :use db2;
    
    :param americas => ['Mexico', 'Canada', 'Argentina', 'Brazil', 'USA', 'Venezuela'];
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row
    WITH row
    WHERE row.country IN $americas
    CREATE (n:Customer)
    SET n = row;
    
    CREATE INDEX FOR (c:Customer) ON (c.customerID);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row
    WITH row
    MATCH (c:Customer)
    WHERE row.customerID = c.customerID
    CREATE (o:Order)
    SET o = row;
    
    CREATE INDEX FOR (o:Order) ON (o.orderID);
    
    MATCH (c:Customer),(o:Order)
    WHERE c.customerID = o.customerID
    CREATE (c)-[:PURCHASED]->(o);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
    CREATE (n:Product)
    SET n.productID = row.productID;
    
    CREATE INDEX FOR (p:Product) ON (p.productID);
    
    LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row
    MATCH (p:Product), (o:Order)
    WHERE p.productID = row.productID AND o.orderID = row.orderID
    CREATE (o)-[details:ORDERS]->(p)
    SET details = row,
    details.quantity = toInteger(row.quantity);
  2. 按 Enter 键。

  3. 验证 AME 客户订单数据是否已加载到 db2

    MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
    RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
    LIMIT 5;
    +----------------------------------------------------------------------------+
    | Customer                             | CustomerCountry | Order   | Product |
    +----------------------------------------------------------------------------+
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10759" | "32"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "72"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "13"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "19"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "11"    |
    +----------------------------------------------------------------------------+
    
    5 rows available after 42 ms, consumed after another 1 ms

配置复合数据库

使用 CREATE COMPOSITE DATABASE Cypher 命令设置复合数据库,并将本地数据库别名作为组成部分添加到复合数据库。在此示例中,复合数据库称为 compositenw

  1. 运行命令 CREATE COMPOSITE DATABASE <composite-database-name> 创建复合数据库

    CREATE COMPOSITE DATABASE compositenw;
    0 rows available after 137 ms, consumed after another 0 ms
  2. 运行命令 CREATE ALIAS <composite-database-name>.<alias-name> FOR DATABASE <database-name> 创建组成数据库别名

    CREATE ALIAS compositenw.product FOR DATABASE db0;
    0 rows available after 101 ms, consumed after another 0 ms
    CREATE ALIAS compositenw.customerEU FOR DATABASE db1;
    0 rows available after 107 ms, consumed after another 0 ms
    CREATE ALIAS compositenw.customerAME FOR DATABASE db2;
    0 rows available after 98 ms, consumed after another 0 ms

    本教程中的组成数据库别名是本地数据库别名(针对同一个 Neo4j DBMS 中的数据库),但它们也可以是远程数据库别名(针对另一个 Neo4j DBMS 中的数据库)。

  3. 运行命令 SHOW DATABASES 验证复合数据库是否已配置且处于 online 状态

    SHOW DATABASES;
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | name          | type        | aliases                     | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents                                                                 |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | "db0"         | "standard"  | ["compositenw.product"]     | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
    | "db1"         | "standard"  | ["compositenw.customerEU"]  | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
    | "db2"         | "standard"  | ["compositenw.customerAME"] | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
    | "compositenw" | "composite" | []                          | "read-only"  | "localhost:7687" | "primary" | FALSE  | "online"        | "online"      | ""            | FALSE   | FALSE | ["compositenw.customerAME", "compositenw.customerEU", "compositenw.product"] |
    | "neo4j"       | "standard"  | []                          | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []                                                                           |
    | "system"      | "system"    | []                          | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    6 rows available after 242 ms, consumed after another 18 ms
  4. 运行命令 SHOW ALIASES FOR DATABASES 验证数据库别名是否已配置

    SHOW ALIASES FOR DATABASES;
    +--------------------------------------------------------------------------------+
    | name                      | composite     | database  | location | url  | user |
    +--------------------------------------------------------------------------------+
    | "compositenw.product"     | "compositenw" | "db0"     | "local"  | null | null |
    | "compositenw.customerEU"  | "compositenw" | "db1"     | "local"  | null | null |
    | "compositenw.customerAME" | "compositenw" | "db2"     | "local"  | null | null |
    +--------------------------------------------------------------------------------+
    3 rows available after 203 ms, consumed after another 16 ms

使用单个 Cypher 查询检索数据

查询单个数据库

连接到复合数据库后,您可以通过使用 Cypher 子句 USE 和别名名称来检索单个数据库中的数据

:use compositenw
USE compositenw.product
MATCH (p:Product)
RETURN p.productName AS product
LIMIT 5;
+--------------------------------+
| product                        |
+--------------------------------+
| "Chai"                         |
| "Chang"                        |
| "Aniseed Syrup"                |
| "Chef Anton's Cajun Seasoning" |
| "Chef Anton's Gumbo Mix"       |
+--------------------------------+

5 rows available after 6 ms, consumed after another 21 ms

跨多个分片查询

使用复合数据库查询两个分片并获取名称以 A 开头的客户

:use compositenw
USE compositenw.customerAME
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
  UNION
USE compositenw.customerEU
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
LIMIT 5;
+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 25 ms, consumed after another 56 ms

或者,使用更常见的复合数据库习惯用法

:use compositenw
UNWIND ['compositenw.customerAME', 'compositenw.customerEU'] AS g
CALL {
  USE graph.byName(g)
  MATCH (c:Customer)
  WHERE c.customerID STARTS WITH 'A'
  RETURN c.customerID AS name, c.country AS country
}
RETURN name, country
LIMIT 5;
+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 61 ms, consumed after another 8 ms

跨联合和分片查询

这是一个更复杂的查询,它使用所有 3 个数据库查找购买了肉类/家禽类别的停产产品的客户

:use compositenw
CALL {
  USE compositenw.product
  MATCH (p:Product)-[:PART_OF]->(c:Category)
  WHERE p.discontinued = true
    AND c.categoryName = 'Meat/Poultry'
  RETURN COLLECT(p.productID) AS pids
}
WITH *
UNWIND [g IN graph.names() WHERE g STARTS WITH 'compositenw.customer'] AS g
CALL {
  USE graph.byName(g)
  WITH pids
  UNWIND pids as pid
  MATCH (p:Product{productID:pid})<-[:ORDERS]-(:Order)<-[:PURCHASED]-(c:Customer)
  RETURN DISTINCT c.customerID AS customer, c.country AS country
}
RETURN customer, country
LIMIT 20;
+--------------------------+
| customer | country       |
+--------------------------+
| "RICSU"  | "Switzerland" |
| "PERIC"  | "Mexico"      |
| "WARTH"  | "Finland"     |
| "WELLI"  | "Brazil"      |
| "DRACD"  | "Germany"     |
| "RATTC"  | "USA"         |
| "HUNGO"  | "Ireland"     |
| "QUEDE"  | "Brazil"      |
| "SEVES"  | "UK"          |
| "ANTON"  | "Mexico"      |
| "BERGS"  | "Sweden"      |
| "SAVEA"  | "USA"         |
| "AROUT"  | "UK"          |
| "FAMIA"  | "Brazil"      |
| "WANDK"  | "Germany"     |
| "WHITC"  | "USA"         |
| "ISLAT"  | "UK"          |
| "LONEP"  | "USA"         |
| "QUICK"  | "Germany"     |
| "HILAA"  | "Venezuela"   |
+--------------------------+

20 rows available after 51 ms, consumed after another 2 ms

此查询的工作方式是 compositenw 调用数据库 db0 检索肉类/家禽类别的所有停产产品。然后,使用返回的产品 ID,它**并行**查询 db1db2,并获取购买了这些产品的客户及其国家/地区。

您刚刚学习了如何使用单个 Cypher 查询从多个数据库存储和检索数据。

有关复合数据库的更多详细信息,请参阅 概念