知识库

使用 Neo4j-admin copy 在 4.0 中进行数据库压缩

本文演示了如何使用 neo4j-admin copy 工具回收 neo4j 存储文件占用的未用空间。

1). 添加 100k 个节点:foreach (x in range (1,100000) | create (n:testnode1 {id:x})).

2). 检查已分配的 ID 范围:MATCH (n:testnode1) RETURN ID(n) as ID order by ID limit 5.

  • ID 升序:0, 1, 2, 3, 4; ID 降序:99999, 99998, 99997, 99996, 99995.

3). 执行 :sysinfo: 总存储大小 = 18.6 MiB,ID 分配:节点 ID 100000,属性 ID 100000.

4). 然后我们可以通过 Match (n) detach delete n 删除上面创建的节点。

5). 报告的总存储大小为 :sysinfo: 总存储大小 = 18.6 MiB,ID 分配:节点 ID 100000,属性 ID 100000.

6). 然后我们可以执行完整的 neo4j-admin 备份 (https://neo4j.ac.cn/docs/operations-manual/current/backup-restore/online-backup/) 以执行在线备份,该备份默认情况下会执行检查点(将页面缓存中的任何缓存更新刷新到存储文件)。

7). 从上面的步骤 6 中,似乎已分配的 ID 保持不变,并且尽管已删除,但存储大小没有改变。如果此时,或者在频繁执行大量加载/删除操作的生产数据库中,可能会导致存储文件占用大量未用空间,我们可以使用在 4.0 中引入的 neo4j-admin copy 工具(实际上是 store-utils 的合并版本)(https://neo4j.ac.cn/docs/operations-manual/current/tools/neo4j-admin/#neo4j-admin-syntax-and-commands)。然后,我们可以使用步骤 6 中执行的备份来执行 neo4j-admin copy 工具。请注意,neo4j-admin copy 只能在脱机数据库或备份上执行。

8). 执行 neo4j-admin copy,例如:

$./bin/neo4j-admin copy --from-database=neo4j --to-database=1/backups/copy:

Starting to copy store, output will be saved to: /$neo4j_home/logs/neo4j-admin-copy-2020-01-16.12.06.38.log
2020-01-16 12:06:38.777+0000 INFO [StoreCopy] ### Copy Data ###
2020-01-16 12:06:38.778+0000 INFO [StoreCopy] Source: /Users/um/neo4j/4.0/cc/1/data/databases/neo4j
2020-01-16 12:06:38.778+0000 INFO [StoreCopy] Target: /Users/um/neo4j/4.0/cc/1/data/databases/1/backups/copy
2020-01-16 12:06:38.779+0000 INFO [StoreCopy] Empty database created, will start importing readable data from the source.
2020-01-16 12:06:40.159+0000 INFO [o.n.i.b.ImportLogic] Import starting

Import starting 2020-01-16 12:06:40.227+0000
  Estimated number of nodes: 0.00
  Estimated number of node properties: 0.00
  Estimated number of relationships: 0.00
  Estimated number of relationship properties: 0.00
  Estimated disk space usage: 3.922MiB
  Estimated required memory usage: 7.969MiB

(1/4) Node import 2020-01-16 12:06:40.604+0000
  Estimated number of nodes: 0.00
  Estimated disk space usage: 1.961MiB
  Estimated required memory usage: 7.969MiB
(2/4) Relationship import 2020-01-16 12:06:42.804+0000
  Estimated number of relationships: 0.00
  Estimated disk space usage: 1.961MiB
  Estimated required memory usage: 7.969MiB
(3/4) Relationship linking 2020-01-16 12:06:43.046+0000
  Estimated required memory usage: 7.969MiB
(4/4) Post processing 2020-01-16 12:06:43.461+0000
  Estimated required memory usage: 7.969MiB
-......... .......... .......... .......... ..........   5% ∆226ms
.......... .......... .......... .......... ..........  10% ∆1ms
.......... .......... .......... .......... ..........  15% ∆1ms
.......... .......... .......... .......... ..........  20% ∆1ms
.......... .......... .......... .......... ..........  25% ∆0ms
.......... .......... .......... .......... ..........  30% ∆1ms
.......... .......... .......... .......... ..........  35% ∆0ms
.......... .......... .......... .......... ..........  40% ∆1ms
.......... .......... .......... .......... ..........  45% ∆0ms
.......... .......... .......... .......... ..........  50% ∆1ms
.......... .......... .......... .......... ..........  55% ∆0ms
.......... .......... .......... .......... ..........  60% ∆0ms
.......... .......... .......... .......... ..........  65% ∆1ms
.......... .......... .......... .......... ..........  70% ∆0ms
.......... .......... .......... .......... ..........  75% ∆1ms
.......... .......... .......... .......... ..........  80% ∆0ms
.......... .......... .......... .......... ..........  85% ∆0ms
.......... .......... .......... .......... ..........  90% ∆1ms
.......... .......... .......... .......... ..........  95% ∆0ms
.......... .......... .......... .......... .......... 100% ∆1ms

IMPORT DONE in 3s 860ms.
Imported:
  0 nodes
  0 relationships
  0 properties
Peak memory usage: 7.969MiB
2020-01-16 12:06:44.031+0000 INFO [o.n.i.b.ImportLogic] Import completed successfully, took 3s 860ms. Imported:
  0 nodes
  0 relationships
  0 properties
2020-01-16 12:06:44.318+0000 INFO [StoreCopy] Import summary: Copying of 200622 records took 5 seconds (40124 rec/s). Unused Records 200622 (100%) Removed Records 0 (0%)
2020-01-16 12:06:44.318+0000 INFO [StoreCopy] ### Extracting schema ###
2020-01-16 12:06:44.319+0000 INFO [StoreCopy] Trying to extract schema...
2020-01-16 12:06:44.330+0000 INFO [StoreCopy] ... found 0 schema definition. The following can be used to recreate the schema:
2020-01-16 12:06:44.332+0000 INFO [StoreCopy]

上面的示例在大约 6 秒内完成,并导致了一个紧凑且一致的存储(任何不一致的节点、属性、关系都不会复制到新创建的存储)。另一个需要注意的是,上面的“/copy”是在 $neo4j_home/data/databases/neo4j/1/backups/copy 中创建的,而不是 /current-directory/1/backups/copy,因为 copy 工具将 $neo4j_home/data/databases/<database_name> 添加到指定的目录。

9). 然后,我们可以像在独立的 Neo4j 4.0 实例上一样恢复上面的副本,并将存储大小差异与之前的 61.6MiB 进行比较:执行 ./sa/bin/neo4j-admin restore --from=cc/1/data/databases/1/backups/copy --verbose --database=sa/data/databases/neo4j --force

请注意,恢复的 neo4j 数据库被恢复到 $neo4j_home/data/databases/sa/data/databases,同样将指定的目录与 $neo4j_home/data/databases 结合起来。

10). 最后,将现在的总存储大小(压缩后)与之前进行比较

上面的恢复数据库上的 sysinfo 现在显示总存储大小在此示例中 = 800.00 KiB

这表明 neo4j-admin copy 工具已成功压缩存储,并且操作系统已回收 ID 存储为将来创建 ID 而保留的空间。

参考资料