知识库

如何通过消息或调试日志快速识别长时间的 GC 暂停

Neo4j 中的 Java 垃圾回收 (GC) 暂停由 MonitorGc 进程监控,并记录在 $NEO4J_HOME/logs/debug.log 文件中(对于 Neo4j v2.3.x 及更早版本,记录在 $NEO4J_HOME/data/graph.db/messages.log 中)。要快速找到 Neo4j 中前 10 个最长的 GC 暂停,

在 Neo4j 3.0 到 Neo4j 3.3 版本中,可以运行

$ grep -n -i blocked debug.log | sort -r -n -k 11 | head -10

结果将类似如下所示:

7830:2016-12-05 19:04:23.865+0000 WARN  [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 707ms.
3283:2016-12-02 13:25:52.264+0000 WARN  [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 579ms.
6360:2016-12-02 15:38:35.502+0000 WARN  [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 381ms.
6349:2016-12-02 15:02:33.156+0000 WARN  [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 285ms.
6361:2016-12-02 15:50:45.560+0000 WARN  [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 274ms.
6920:2016-12-05 16:57:47.170+0000 WARN  [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 251ms.

示例中的 grep 命令将对 debug.log 文件进行不区分大小写的搜索(-i 参数的作用),查找字符串 'blocked' 并返回相应的行号(-n 参数的作用)。然后,这些结果将根据表示时间的字段进行排序(分别为 -k 11-k 10 的作用),并按数字顺序而非 ASCII 顺序排序(-n 参数的作用),最后按时间倒序显示(-r 参数的作用)。最后,head -10 将只显示排序后的前 10 个结果。

对于 Neo4j 3.4 及更高版本,可以运行

$ grep -n "VM stop-the-world" debug.log | sort -t= -nrk2 | head -10

结果将类似如下所示:

27675:2019-04-09 10:39:27.115+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=516, gcTime=508, gcCount=1}
2671:2019-04-08 08:03:12.921+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=483, gcTime=487, gcCount=1}
27885:2019-04-09 11:01:07.033+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=371, gcTime=422, gcCount=1}
16302:2019-04-09 09:53:31.066+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=338, gcTime=372, gcCount=1}
14283:2019-04-08 16:54:03.108+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=272, gcTime=358, gcCount=1}
27904:2019-04-09 11:26:29.463+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=346, gcTime=355, gcCount=1}
4006:2019-04-08 12:14:33.514+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=352, gcTime=349, gcCount=1}
14519:2019-04-08 20:37:12.126+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=253, gcTime=345, gcCount=1}
3983:2019-04-08 11:17:23.039+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=301, gcTime=322, gcCount=1}
16110:2019-04-09 07:19:25.626+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=318, gcTime=314, gcCount=1}
© . All rights reserved.