Ingress controller reported: epoll_create () failed (24: Too many open files)

2022-09-15 03:18:34 ⋅ 10559 ⋅ 0 ⋅ 0

Ingress controller does not work with a POWER machine, which has 160 cores. Ingress controller might fail when it is running on a node with lots of cores.

Causes
Ingress controller might be running on a node that has too many cores. The maximum number of open file descriptors is calculated with the following formula: *RLIMIT_NOFILE/worker-processes) - 1024. To resolve, you can either decrease the value of the worker processes, or increase the value of the RLIMIT_NOFILE of the container.

Solution one: Edit the configMap of nginx-ingress-controller with a decreased value of worker-processes.
To edit the configmap of nginx-ingress-controller, run the following command:

kubectl -n kube-system edit cm nginx-ingress-controller

Add worker-processes: "2" to the configMap, as it is in the following example. Note: The value might not be 2, depending on your sysctl configuration.

# Edit the following object. Lines beginning with a '#' are ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
 body-size: "0"
 disable-access-log: "true"
 worker-processes: "2"

What happened?
Just deploy a fresh k8s cluster with crio, deploy ingress-nginx and try to load ingresses. Nginx throws errors because it cannot respawn its worker processes:

2022/04/05 09:27:49 [alert] 56#56: sendmsg() failed (9: Bad file descriptor)
2022/04/05 09:27:49 [alert] 56#56: sendmsg() failed (9: Bad file descriptor)
2022/04/05 09:27:49 [alert] 1415#1415: pthread_create() failed (11: Resource temporarily unavailable)
2022/04/05 09:27:49 [alert] 1411#1411: pthread_create() failed (11: Resource temporarily unavailable)
2022/04/05 09:27:49 [alert] 1431#1431: pthread_create() failed (11: Resource temporarily unavailable)
2022/04/05 09:27:50 [alert] 56#56: worker process 1190 exited with fatal code 2 and cannot be respawned
2022/04/05 09:27:50 [alert] 56#56: worker process 1191 exited with fatal code 2 and cannot be respawned

If i just switch from crio to docker it works without any error - thats why i think the issue is related to crio and not to ingress-nginx.

The issue only happens if nginx has many worker processes. With testing i think the sweet spot is around 14- 18 worker processes.

We ran into the issue because by default, nginx-ingress has an auto setting for worker processes which spawns as many workers as cores are detected. If you have large systems with xxx cores the issue occures.

解决方案,请参考以下两篇文章。

为者常成，行者常至

Ingress controller reported: epoll_create () failed (24: Too many open files)

AI

作者：Corwien

专栏推荐

Ingress controller reported: epoll_create () failed (24: Too many open files)

添加附言

AI

作者：Corwien

专栏推荐