知识库

提升 Systemd 线程限制

问题

在某些高负载和大型多数据库环境中,您可能会发现您的 Systemd 单元配置将最大进程数(“任务”)限制得太低,无法满足您的用例。

当此问题发生时,您可以在 Neo4j 的 debug.log 文件中看到以下内容

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

如果您查看 systemctl status neo4j 的输出,您会看到正在运行的任务数量达到或接近限制,在本例中为 4915

● neo4j.service - Neo4j Graph Database
Loaded: loaded (/lib/systemd/system/neo4j.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 1970-01-01 00:00:00 UTC; 48min ago
Main PID: 20000 (java)
Tasks: 4915 (limit: 4915)
CGroup: /system.slice/neo4j.service
  ├─20000 /usr/bin/java     org.neo4j.server.startup.Neo4jCommand console
  └─20047 /usr/lib/jvm/java-11-openjdk-amd64/bin/java     com.neo4j.server.enterprise.EnterpriseEntryPoint

进一步查看,您可能会在 Linux 的 syslogmessages 文件中发现类似以下内容的报告

Jan 01 00:48:00 neo4j-core1 kernel: [ TIME ] cgroup: fork rejected by pids controller in /system.slice/neo4j.service
Jan 01 00:48:02 neo4j-core1 neo4j[ PID ]: [ TIME ][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
Jan 01 00:48:02 neo4j-core1 neo4j[ PID ]: Exception in thread "neo4j.Scheduler-1" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

Systemd 可能会终止并重新启动 Neo4j 服务器。

解决方法

在这些情况下,您可能希望作为解决方法 *谨慎地* 提高最大线程限制。除了极少数情况外,这只需要临时进行。

首先,确保 Neo4j 已停止

systemctl stop neo4j

然后编辑 neo4j.service 单元文件

systemctl edit neo4j

并添加/增加 TasksMax 设置

[Service]
# The user and group which the service runs as.
User=neo4j
Group=neo4j

# If it takes longer than this then the shutdown is considered to have failed.
# This may need to be increased if the system serves long-running transactions.
TimeoutSec=120

# Increase the systemd process / task limit
TasksMax=6500

取消以上行的注释后,重新启动 neo4j。

systemctl daemon-reload
systemctl start neo4j

同样,在增加系统服务的任务限制时要非常谨慎。高进程线程计数可能是配置不当、大小调整问题或行为异常的插件的强烈指标。

与所有解决方法一样,在完全识别根本原因并实施更永久的解决方案之前,应谨慎使用它们。