Ingress controller reported: epoll_create () failed (24: Too many open files)

Ingress controller does not work with a POWER machine, which has 160 cores. Ingress controller might fail when it is running on a node with lots of cores.

Causes
Ingress controller might be running on a node that has too many cores. The maximum number of open file descriptors is calculated with the following formula: *RLIMIT_NOFILE/worker-processes) - 1024. To resolve, you can either decrease the value of the worker processes, or increase the value of the RLIMIT_NOFILE of the container.

Solution one: Edit the configMap of nginx-ingress-controller with a decreased value of worker-processes.
To edit the configmap of nginx-ingress-controller, run the following command:

kubectl -n kube-system edit cm nginx-ingress-controller

Add worker-processes: "2" to the configMap, as it is in the following example. Note: The value might not be 2, depending on your sysctl configuration.

# Edit the following object. Lines beginning with a '#' are ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
 body-size: "0"
 disable-access-log: "true"
 worker-processes: "2"

What happened?
Just deploy a fresh k8s cluster with crio, deploy ingress-nginx and try to load ingresses. Nginx throws errors because it cannot respawn its worker processes:

2022/04/05 09:27:49 [alert] 56#56: sendmsg() failed (9: Bad file descriptor)
2022/04/05 09:27:49 [alert] 56#56: sendmsg() failed (9: Bad file descriptor)
2022/04/05 09:27:49 [alert] 1415#1415: pthread_create() failed (11: Resource temporarily unavailable)
2022/04/05 09:27:49 [alert] 1411#1411: pthread_create() failed (11: Resource temporarily unavailable)
2022/04/05 09:27:49 [alert] 1431#1431: pthread_create() failed (11: Resource temporarily unavailable)
2022/04/05 09:27:50 [alert] 56#56: worker process 1190 exited with fatal code 2 and cannot be respawned
2022/04/05 09:27:50 [alert] 56#56: worker process 1191 exited with fatal code 2 and cannot be respawned

If i just switch from crio to docker it works without any error - thats why i think the issue is related to crio and not to ingress-nginx.

The issue only happens if nginx has many worker processes. With testing i think the sweet spot is around 14- 18 worker processes.

We ran into the issue because by default, nginx-ingress has an auto setting for worker processes which spawns as many workers as cores are detected. If you have large systems with xxx cores the issue occures.

解决方案,请参考以下两篇文章。


相关文章:
IBM | Ingress controller reported: epoll_create() failed (24: Too many open files)
nginx-ingress unable to reload worker processes

为者常成,行者常至