知识库

诊断存储性能问题

缓慢的存储会影响 Neo4j 的性能,因此我们建议在 产品文档 中使用固态硬盘。

基准测试底层系统

在 Ubuntu 或 RedHat 上,可以使用 fio 工具对底层存储进行基准测试。

以下命令将在当前文件夹中创建一个 1GB 的文件

$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=50

它在具有 gp2(通用用途 SSD)的 AWS 实例上产生以下输出。我们可以看到 1500iops

test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.1.5
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [m] [100.0% done] [6100KB/5892KB/0KB /s] [1525/1473/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2394: Thu Nov  7 15:34:32 2019
  read : io=524704KB, bw=6074.2KB/s, iops=1518, runt= 86383msec
  write: io=523872KB, bw=6064.6KB/s, iops=1516, runt= 86383msec
  cpu          : usr=0.23%, sys=1.21%, ctx=86679, majf=0, minf=6
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=131176/w=130968/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=524704KB, aggrb=6074KB/s, minb=6074KB/s, maxb=6074KB/s, mint=86383msec, maxt=86383msec
  WRITE: io=523872KB, aggrb=6064KB/s, minb=6064KB/s, maxb=6064KB/s, mint=86383msec, maxt=86383msec

Disk stats (read/write):
  nvme0n1: ios=131032/130833, merge=0/27, ticks=1289168/1306400, in_queue=2508680, util=99.78%

在现代笔记本电脑上,您可以预期 13000iops。

$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=50
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=52.4MiB/s,w=52.4MiB/s][r=13.4k,w=13.4k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=8720: Thu Nov 14 15:25:49 2019
   read: IOPS=13.4k, BW=52.2MiB/s (54.7MB/s)(512MiB/9804msec)
   bw (  KiB/s): min=51625, max=55192, per=99.55%, avg=53220.42, stdev=977.74, samples=19
   iops        : min=12906, max=13798, avg=13304.89, stdev=244.52, samples=19
  write: IOPS=13.4k, BW=52.2MiB/s (54.8MB/s)(512MiB/9804msec)
   bw (  KiB/s): min=51865, max=54634, per=99.55%, avg=53247.68, stdev=774.47, samples=19
   iops        : min=12966, max=13658, avg=13311.63, stdev=193.63, samples=19
  cpu          : usr=4.52%, sys=75.46%, ctx=1117, majf=0, minf=28
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=131040,131104,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=52.2MiB/s (54.7MB/s), 52.2MiB/s-52.2MiB/s (54.7MB/s-54.7MB/s), io=512MiB (537MB), run=9804-9804msec
  WRITE: bw=52.2MiB/s (54.8MB/s), 52.2MiB/s-52.2MiB/s (54.8MB/s-54.8MB/s), io=512MiB (537MB), run=9804-9804msec

Disk stats (read/write):
  sda: ios=128856/128900, merge=19/39, ticks=148871/43881, in_queue=192710, util=99.10%

在本地连接的 nvme 驱动器上:我们得到 100 000 iops。

$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=50
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.1.5
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1)
test: (groupid=0, jobs=1): err= 0: pid=5451: Sat Nov 30 04:12:00 2019
  read : io=524704KB, bw=411856KB/s, iops=102963, runt=  1274msec
  write: io=523872KB, bw=411203KB/s, iops=102800, runt=  1274msec
  cpu          : usr=28.91%, sys=63.71%, ctx=5080, majf=0, minf=5
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=131176/w=130968/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=524704KB, aggrb=411855KB/s, minb=411855KB/s, maxb=411855KB/s, mint=1274msec, maxt=1274msec
  WRITE: io=523872KB, aggrb=411202KB/s, minb=411202KB/s, maxb=411202KB/s, mint=1274msec, maxt=1274msec

Disk stats (read/write):
  nvme0n1: ios=108546/108561, merge=0/0, ticks=40192/9032, in_queue=48820, util=86.23%

吞吐量优化的 HDD 将能够处理更少的 iops:66 iops

$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=50
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.1.5
Starting 1 process
Jobs: 1 (f=1): [m] [99.9% done] [768KB/732KB/0KB /s] [192/183/0 iops] [eta 00m:02s]
test: (groupid=0, jobs=1): err= 0: pid=967: Fri Nov 29 17:35:59 2019
  read : io=524704KB, bw=273928B/s, iops=66, runt=1961452msec
  write: io=523872KB, bw=273493B/s, iops=66, runt=1961452msec
  cpu          : usr=0.08%, sys=0.18%, ctx=235668, majf=0, minf=5
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=131176/w=130968/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=524704KB, aggrb=267KB/s, minb=267KB/s, maxb=267KB/s, mint=1961452msec, maxt=1961452msec
  WRITE: io=523872KB, aggrb=267KB/s, minb=267KB/s, maxb=267KB/s, mint=1961452msec, maxt=1961452msec

Disk stats (read/write):
  xvdba: ios=130360/130859, merge=641/1085, ticks=59917432/59357236, in_queue=119277180, util=100.00%

实时监控特定存储设备

首先,您需要确定 Neo4j 使用的是哪个存储设备。

$ df /var/lib/neo4j/
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda1        4972896 3534496   1188180  75% /

您可以使用 sysstat 包中提供的 iostat。以下命令将每 3 秒为您提供所有设备的磁盘活动。

$ iostat -m 3
Linux 3.10.0-1062.4.1.el7.x86_64 (node02) 	14/11/19 	_x86_64_	(1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.67    0.00    0.79    0.06    0.00   98.48

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda             119.72         0.35         0.27        807        624

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

查看控制台中的异常错误消息

运行以下命令

$ dmesg | grep -v "eth0\|IPv6\|IPVS\|docker\|promiscuous"

如果您看到类似错误 INFO: task neo4j.FileIOHel:78205 blocked for more than 120 seconds.,您可能需要与系统管理员一起查看存储配置。

[Sun Oct 20 10:26:10 2019] INFO: task neo4j.FileIOHel:78205 blocked for more than 120 seconds.
[Sun Oct 20 10:26:10 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sun Oct 20 10:26:10 2019] neo4j.FileIOHel D ffff959abd738000     0 78205      1 0x00000080
[Sun Oct 20 10:26:10 2019] Call Trace:
[Sun Oct 20 10:26:10 2019]  [<ffffffff90d7ceb4>] ? __radix_tree_lookup+0x84/0xf0
[Sun Oct 20 10:26:10 2019]  [<ffffffff91167020>] ? bit_wait+0x50/0x50
[Sun Oct 20 10:26:10 2019]  [<ffffffff91168ed9>] schedule+0x29/0x70
[Sun Oct 20 10:26:10 2019]  [<ffffffff911669e1>] schedule_timeout+0x221/0x2d0
[Sun Oct 20 10:26:10 2019]  [<ffffffff90b01292>] ? ktime_get_ts64+0x52/0xf0
[Sun Oct 20 10:26:10 2019]  [<ffffffff91167020>] ? bit_wait+0x50/0x50
[Sun Oct 20 10:26:10 2019]  [<ffffffff911685ad>] io_schedule_timeout+0xad/0x130
[Sun Oct 20 10:26:10 2019]  [<ffffffff91168648>] io_schedule+0x18/0x20
[Sun Oct 20 10:26:10 2019]  [<ffffffff91167031>] bit_wait_io+0x11/0x50
[Sun Oct 20 10:26:10 2019]  [<ffffffff91166b57>] __wait_on_bit+0x67/0x90
[Sun Oct 20 10:26:10 2019]  [<ffffffff90bb960e>] ? __find_get_pages+0x11e/0x1c0
[Sun Oct 20 10:26:10 2019]  [<ffffffff90bb5ab1>] wait_on_page_bit+0x81/0xa0
[Sun Oct 20 10:26:10 2019]  [<ffffffff90ac2f30>] ? wake_bit_function+0x40/0x40
[Sun Oct 20 10:26:10 2019]  [<ffffffff90bc722b>] truncate_inode_pages_range+0x42b/0x700
[Sun Oct 20 10:26:10 2019]  [<ffffffffc025c167>] ? __xfs_trans_commit+0x157/0x260 [xfs]
[Sun Oct 20 10:26:10 2019]  [<ffffffffc025c530>] ? xfs_trans_commit+0x10/0x20 [xfs]
[Sun Oct 20 10:26:10 2019]  [<ffffffff90c7005a>] ? __inode_wait_for_writeback+0x7a/0xf0
[Sun Oct 20 10:26:10 2019]  [<ffffffff90ac2f30>] ? wake_bit_function+0x40/0x40
[Sun Oct 20 10:26:10 2019]  [<ffffffff90bc756f>] truncate_inode_pages_final+0x4f/0x60
[Sun Oct 20 10:26:10 2019]  [<ffffffff90c5f16c>] evict+0x16c/0x180
[Sun Oct 20 10:26:10 2019]  [<ffffffff90c5f9bc>] iput+0xfc/0x190
[Sun Oct 20 10:26:10 2019]  [<ffffffff90c53a6e>] do_unlinkat+0x1ae/0x2d0
[Sun Oct 20 10:26:10 2019]  [<ffffffff90c4777e>] ? SYSC_newstat+0x3e/0x60
[Sun Oct 20 10:26:10 2019]  [<ffffffff90c54b26>] SyS_unlink+0x16/0x20
[Sun Oct 20 10:26:10 2019]  [<ffffffff91175ddb>] system_call_fastpath+0x22/0x27