知识库

使用 Neo4j-admin copy 在 4.0 中进行数据库压缩

本文演示了如何使用 neo4j-admin copy 工具回收被 neo4j 存储文件占用的未使用空间。

1). 添加 10 万个节点:foreach (x in range (1,100000) | create (n:testnode1 {id:x}))

2). 检查分配的 ID 范围:MATCH (n:testnode1) RETURN ID(n) as ID order by ID limit 5

  • ID 升序:0, 1, 2, 3, 4;ID 降序:99999, 99998, 99997, 99996, 99995。

3). 执行 :sysinfo: 命令:总存储大小=18.6 MiB,ID 分配:节点 ID 100000,属性 ID 100000。

4). 然后我们可以通过 Match (n) detach delete n 命令删除上述创建的节点。

5). 报告的总存储大小为 :sysinfo: 总存储大小=18.6 MiB,ID 分配:节点 ID 100000,属性 ID 100000。

6). 然后我们可以执行完整的 neo4j-admin 备份 (https://neo4j.ac.cn/docs/operations-manual/current/backup-restore/online-backup/) 来执行在线备份,该备份默认执行检查点(将 pagecache 中缓存的任何更新刷新到存储文件)。

7). 从上面的步骤 6 可以看出,分配的 ID 保持不变,并且尽管进行了删除,存储大小也未改变。如果在此时,或在经常进行大量加载/删除操作并可能导致存储文件占用大量未使用空间的生产数据库中,我们可以使用在 4.0 中引入的 neo4j-admin copy 工具(实质上是 store-utils 的合并)(https://neo4j.ac.cn/docs/operations-manual/current/tools/neo4j-admin/#neo4j-admin-syntax-and-commands)。然后我们可以使用步骤 6 中执行的备份来执行 neo4j-admin copy 工具。请注意,neo4j-admin copy 只能在离线数据库或备份上执行

8). 执行 neo4j-admin copy 命令,例如:

$./bin/neo4j-admin copy --from-database=neo4j --to-database=1/backups/copy:

Starting to copy store, output will be saved to: /$neo4j_home/logs/neo4j-admin-copy-2020-01-16.12.06.38.log
2020-01-16 12:06:38.777+0000 INFO [StoreCopy] ### Copy Data ###
2020-01-16 12:06:38.778+0000 INFO [StoreCopy] Source: /Users/um/neo4j/4.0/cc/1/data/databases/neo4j
2020-01-16 12:06:38.778+0000 INFO [StoreCopy] Target: /Users/um/neo4j/4.0/cc/1/data/databases/1/backups/copy
2020-01-16 12:06:38.779+0000 INFO [StoreCopy] Empty database created, will start importing readable data from the source.
2020-01-16 12:06:40.159+0000 INFO [o.n.i.b.ImportLogic] Import starting

Import starting 2020-01-16 12:06:40.227+0000
  Estimated number of nodes: 0.00
  Estimated number of node properties: 0.00
  Estimated number of relationships: 0.00
  Estimated number of relationship properties: 0.00
  Estimated disk space usage: 3.922MiB
  Estimated required memory usage: 7.969MiB

(1/4) Node import 2020-01-16 12:06:40.604+0000
  Estimated number of nodes: 0.00
  Estimated disk space usage: 1.961MiB
  Estimated required memory usage: 7.969MiB
(2/4) Relationship import 2020-01-16 12:06:42.804+0000
  Estimated number of relationships: 0.00
  Estimated disk space usage: 1.961MiB
  Estimated required memory usage: 7.969MiB
(3/4) Relationship linking 2020-01-16 12:06:43.046+0000
  Estimated required memory usage: 7.969MiB
(4/4) Post processing 2020-01-16 12:06:43.461+0000
  Estimated required memory usage: 7.969MiB
-......... .......... .......... .......... ..........   5% ∆226ms
.......... .......... .......... .......... ..........  10% ∆1ms
.......... .......... .......... .......... ..........  15% ∆1ms
.......... .......... .......... .......... ..........  20% ∆1ms
.......... .......... .......... .......... ..........  25% ∆0ms
.......... .......... .......... .......... ..........  30% ∆1ms
.......... .......... .......... .......... ..........  35% ∆0ms
.......... .......... .......... .......... ..........  40% ∆1ms
.......... .......... .......... .......... ..........  45% ∆0ms
.......... .......... .......... .......... ..........  50% ∆1ms
.......... .......... .......... .......... ..........  55% ∆0ms
.......... .......... .......... .......... ..........  60% ∆0ms
.......... .......... .......... .......... ..........  65% ∆1ms
.......... .......... .......... .......... ..........  70% ∆0ms
.......... .......... .......... .......... ..........  75% ∆1ms
.......... .......... .......... .......... ..........  80% ∆0ms
.......... .......... .......... .......... ..........  85% ∆0ms
.......... .......... .......... .......... ..........  90% ∆1ms
.......... .......... .......... .......... ..........  95% ∆0ms
.......... .......... .......... .......... .......... 100% ∆1ms

IMPORT DONE in 3s 860ms.
Imported:
  0 nodes
  0 relationships
  0 properties
Peak memory usage: 7.969MiB
2020-01-16 12:06:44.031+0000 INFO [o.n.i.b.ImportLogic] Import completed successfully, took 3s 860ms. Imported:
  0 nodes
  0 relationships
  0 properties
2020-01-16 12:06:44.318+0000 INFO [StoreCopy] Import summary: Copying of 200622 records took 5 seconds (40124 rec/s). Unused Records 200622 (100%) Removed Records 0 (0%)
2020-01-16 12:06:44.318+0000 INFO [StoreCopy] ### Extracting schema ###
2020-01-16 12:06:44.319+0000 INFO [StoreCopy] Trying to extract schema...
2020-01-16 12:06:44.330+0000 INFO [StoreCopy] ... found 0 schema definition. The following can be used to recreate the schema:
2020-01-16 12:06:44.332+0000 INFO [StoreCopy]

上述示例在大约 6 秒内完成,生成了一个紧凑且一致的存储(任何不一致的节点、属性、关系都不会被复制到新创建的存储中)。另一点需要注意的是,上述的 '/copy' 是在 $neo4j_home/data/databases/neo4j/1/backups/copy 创建的,而不是在 /current-directory/1/backups/copy 创建的,因为 copy 工具会在指定的目的地目录前加上 $neo4j_home/data/databases/<database_name> 前缀。

9). 然后我们可以将上述副本在独立的 Neo4j 4.0 实例上恢复,并与之前 61.6MiB 的存储大小进行比较:执行 ./sa/bin/neo4j-admin restore --from=cc/1/data/databases/1/backups/copy --verbose --database=sa/data/databases/neo4j --force

请注意,恢复的 neo4j 数据库被恢复到 $neo4j_home/data/databases/sa/data/databases,同样在指定的目的地目录前加上了 $neo4j_home/data/databases 前缀。

10). 最后,将现在(压缩后)的总存储大小与之前进行比较

在此示例中,上述恢复的数据库上的 sysinfo 显示总存储大小 = 800.00 KiB

这表明 neo4j-admin copy 工具成功地压缩了存储,并且操作系统回收了 ID 存储为未来 ID 创建保留的空间。

参考

© . All rights reserved.