知识库

解析 query.log 的一种方法

如果通过 Neo4j Enterprise 参数 dbms.logs.query.enabled 启用了 query.log,则可以使用包含的 bash shell 脚本快速解析日志并识别基于总执行时间的 10 个最昂贵的查询,如果启用了 dbms.logs.query.time_logging_enabled,则还会报告基于计划、CPU 和等待时间的 10 个最昂贵的查询。

bash shell 脚本如下所示

#!/bin/bash
# Copyright (c) 2002-2018 "Neo Technology,"
# Network Engine for Objects in Lund AB [http://neotechnology.com]
# This file is a commercial add-on to Neo4j Enterprise Edition.

# only parse those lines where the 3rd field is 'INFO'
# count total number of lines in query.log which report INFO in the 3rd field

parse() {
logfile=$1
if [ -e $logfile ]; then
    # log file exits
    awk '$3== "INFO" {print "First Query Reported at: " $1 " " $2}' $logfile | head -1
    awk '$3== "INFO" {print " Last Query Reported at: " $1 " " $2}' $logfile | tail -1


    awk '$3== "INFO" {count++;  total=total+$4 } END{print "\n*******EXECUTION:******* \nTotal # of Completed Queries: " count "\n       Total Duration (msec): " total "\n   Avg of all Queries (msec): " total/count "\n" }' $logfile

    # print longest query
    printf "Top 10 Longest Queries longest on top and leading line number: \n"
    printf "Note: Queries which are multi-line will only report first line of query !!!\n\n\n"
    awk '$3== "INFO" {print "line:" NR "\t" $0}' $logfile | sort -n -r -k 5 | head -10

    pcw=`grep "INFO" $logfile | grep "(planning:" | head -1`
    if [ ! -z "$pcw" ]; then
          # only enter this block if conf/neo4j.conf has
          #     dbms.logs.query.time_logging_enabled=true
          # which thus causes field 6 to appear as `(planning:...........`
          #
          # produce metrics on planning, CPU and waiting
          #PLANING
          awk '$3== "INFO" && $6== "(planning:" {count++;  total=total+$7 } $7== "0," {planzero++;} END{print "\n\t\t*******PLANNING:******* \n Total # of Completed Queries: " count "\n        Total Duration (msec): " total "\n    Avg of all Queries (msec): " total/count "\nNumber of Queries NOT Planned: " planzero "    " (planzero/count)*100 "%\n" }' $logfile

          # print longest query
          awk '$3== "INFO" {print "line:" NR "\t" $0}' $logfile | sort -n -r -k 8 | head -10



          #CPU
          awk '$3== "INFO" && $8== "cpu:" {count++;  total=total+$9 } END{print "\n\t\t*******CPU:******* \nTotal # of Completed Queries: " count "\n       Total Duration (msec): " total "\n   Avg of all Queries (msec): " total/count "\n" }' $logfile
awk '$3== "INFO" {print "line:" NR "\t" $0}' $logfile | sort -n -r -k 10 | head -10

          #WAITING
          awk '$3== "INFO" &&$10== "waiting:" {count++;  total=total+$11 } $11== "0)" {nowait++;} END{print "\n\t\t\t*******WAITING:******* \n Total # of Completed Queries: " count "\n        Total Duration (msec): " total "\n    Avg of all Queries (msec): " total/count "\nNumber of Queries NOT Waiting: " nowait "      " (nowait/count)*100 "%\n" }' $logfile
awk '$3== "INFO" {print "line:" NR "\t" $0}' $logfile | sort -n -r -k 12 | head -10
     fi
else
    # logfile does not exist
    printf "\n$logfile does not exist\n"
    printf "Usage: $0 <filename>\n"
fi
}

if [ -z $1 ]; then
        parse "query.log"
  else
        parse $1
fi

运行脚本并将要分析的 query.log 文件传递给它,例如 ./parse_querylog.sh query.log,输出将类似于

  1. dbms.logs.query.time_logging_enabled=false

    First Query Reported at: 2018-04-10 11:48:36.425+0000
     Last Query Reported at: 2018-04-10 11:49:01.213+0000
    
    *******EXECUTION:*******
    Total # of Completed Queries: 6
           Total Duration (msec): 2521
       Avg of all Queries (msec): 420.167
    
    Top 10 Longest Queries longest on top and leading line number:
    Note: Queries which are multi-line will only report first line of query !!!
    
    
    line:1  2018-04-10 11:48:36.425+0000 INFO  2398 ms: embedded-session             - MATCH (a:` Arbitrary label name that really doesn't matter `) RETURN a LIMIT 0 - {} - {}
    line:2  2018-04-10 11:48:37.106+0000 INFO  48 ms: bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   - RETURN 1 - {} - {}
    line:3  2018-04-10 11:48:43.839+0000 INFO  33 ms: bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   - match (n) return count(n); - {} - {}
    line:9  2018-04-10 11:49:01.213+0000 INFO  21 ms: bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   -
    line:4  2018-04-10 11:48:47.494+0000 INFO  12 ms: bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   -
    line:6  2018-04-10 11:48:51.613+0000 INFO  9 ms: bolt-session   bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   -
  2. dbms.logs.query.time_logging_enabled=true

    (**注意:**每个部分的 10 个查询列表已缩减为 2 行以简洁起见。此外,查询的 Cypher 也已缩减)

    First Query Reported at: 2018-01-24 07:58:13.360+0000
     Last Query Reported at: 2018-01-24 15:19:40.897+0000
    
    *******EXECUTION:*******
    Total # of Completed Queries: 13655
           Total Duration (msec): 1207191
       Avg of all Queries (msec): 88.4065
    
    Top 10 Longest Queries longest on top and leading line number:
    Note: Queries which are multi-line will only report first line of query !!!
    
    line:12100      2018-01-24 08:49:14.144+0000 INFO  10082 ms: (planning: 0, cpu: 81, waiting: 0) - bolt-session  bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.1.3:56870    server/10.0.1.2:7687>        neo4j - MATCH (n:Person............)
    line:58883      2018-01-24 12:28:17.472+0000 INFO  2530 ms: (planning: 313, cpu: 2520, waiting: 0) - bolt-session       bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:56870        server/10.0.1.2:7687>        neo4j -  MATCH (n:Person)
    
                    *******PLANNING:*******
     Total # of Completed Queries: 13655
            Total Duration (msec): 8798
        Avg of all Queries (msec): 0.644306
    Number of Queries NOT Planned: 11448    83.8374%
    
    line:37438      2018-01-24 10:43:28.532+0000 INFO  2255 ms: (planning: 991, cpu: 2239, waiting: 0) - bolt-session       bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.45:56870      server/10.0.1.2:7687>        neo4j -  MATCH (n:Person...............)
    line:89641      2018-01-24 15:13:42.191+0000 INFO  513 ms: (planning: 409, cpu: 510, waiting: 0) - bolt-session bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.45:56870       server/10.0.1.2:7687>        neo4j - MATCH (n:Person...........) {}
    
                    *******CPU:*******
    Total # of Completed Queries: 13655
           Total Duration (msec): 1194341
       Avg of all Queries (msec): 87.4655
    
    line:58883      2018-01-24 12:28:17.472+0000 INFO  2530 ms: (planning: 313, cpu: 2520, waiting: 0) - bolt-session       bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:56870        server/10.0.1.2:7687>       neo4j -  MATCH (n:Person .......)
    line:386        2018-01-24 07:59:54.851+0000 INFO  2359 ms: (planning: 256, cpu: 2350, waiting: 0) - bolt-session       bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:49536        server/10.0.1.2:7687>        neo4j -  MATCH (n:Person {.......})
    
                            *******WAITING:*******
     Total # of Completed Queries: 13655
            Total Duration (msec): 27
        Avg of all Queries (msec): 0.0019773
    Number of Queries NOT Waiting: 13654      99.9927%
    
    line:81         2018-01-24 07:58:33.168+0000 INFO  67 ms: (planning: 0, cpu: 67, waiting: 27) - bolt-session    bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:49536        server/10.0.1.2:7687>        neo4j -  MATCH (n:Person) return n:Deleted
    line:9991       2018-01-24 08:40:16.476+0000 INFO  74 ms: (planning: 0, cpu: 74, waiting: 0) - bolt-session     bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:49536        server/10.0.1.2:7687>        neo4j -  MATCH (n:Person..............)

从第二次运行的输出中,我们看到日志跨越了从 First Query Reported at: 2018-01-24 07:58:13.360+0000Last Query Reported at: 2018-01-24 15:19:40.897+0000。然后,脚本报告我们发现了 Total # of Completed Queries: 13655,然后提供了基于总/平均执行时间、总/平均计划时间、总/平均 CPU 时间和总/平均等待时间的 10 个最昂贵查询的列表。此外,从输出中我们可以看到“查询计划”不是一个重要问题,因为 Number of Queries NOT Planned: 11448 83.8374% 表示 83.8374% 的查询来自查询计划缓存。最后,“锁定”不是问题,因为 Number of Queries NOT Waiting: 13654 99.9927% 表示 99.9927% 的所有查询在“等待”状态下花费的时间为零。

脚本有一些“注意事项”。
  • 上面的脚本将报告满足任何“前 10 名”结果的查询在日志文件中的行号。但是,如果查询是多行的,包括回车符,则仅显示查询的第一行。需要在特定的行号读取 query.log 以完全了解相关的查询。

  • 如果针对 Neo4j 冷启动后的 query.log 运行,则报告的时间可能会被夸大。这是因为查询不在查询计划缓存或页面缓存中。

  • 可以通过参数 dbms.logs.query.threshold 配置 query.log 日志记录,仅记录持续时间超过 X 的查询。如果设置为 0,则记录所有内容。如果设置为 2 秒,则仅记录超过 2 秒的查询,这将影响以上结果。