In my K8S cluster there is a occasional issue of pods getting stuck in "CreateContainerError" state without any errors or problems. Every time the problem is fixed simply by deleting stuck pod and letting deployment recreate it. I suspect some NFS PVC issue, but this problem occurs only once every 3-4 months making debugging very difficult.
The real problem is that when this happens the only way to bring back affected service is to manually delete this pod. I tried to find a way to do it according to the k8s philosophy, but apparently there is no option to force deployment to recreate pod.
My question is: is there a way or tool to automate recreating pods in "CreateContainerError" state when it happens?