Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew

Question

I've created and pushed a cron job to deployment, but when I see it running in OpenShift, I get the following error message:

Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.

From what I understand by this, is that a job failed to run. But I don't understand why it is failing. Why isn't that logged somewhere? - or if it is, where can I find it?

The CronJob controller will keep trying to start a job according to the most recent schedule, but keeps failing and obviously it has done so >100 times.

I've checked the syntax of my cron job, which doesn't give any errors. Also if there are any syntax messages, I'm not even allowed to push.

Anyone know what's wrong?

my Cron Job:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: my-cjob
  labels:
    job-name: my-cjob
spec:
  schedule: "*/5 * * * *" 
  # activeDeadlineSeconds: 180 # 3 min <<- should this help and why?      
  jobTemplate:
      spec:
        template:         
          metadata:
            name: my-cjob
            labels:
              job-name: my-cjob
          spec:
            containers:
            - name: my-cjob
              image: my-image-name
            restartPolicy: OnFailure

Or should I be using startingDeadlineSeconds? Anyone who has hit this error message and found a solution?

Update as according to comment

When running kubectl get cronjob I get the following:

NAME           SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE
my-cjob        */5 * * * *   False     0         <none>          2d

When running kubectl logs my-cjob I get the following:

Error from server (NotFound): pods "my-cjob" not found

When running kubectl describe cronjob my-cjob I get the following:

Error from server (NotFound): the server could not find the requested resource

When running kubectl logs <cronjob-pod-name> I get many lines o code... Very difficult for me to understand and sort out..

When running kubectl describe pod <cronjob-pod-name> I also get a lot, but this is way easier to sort. Anything specific?

Running kubectl get events I get a lot, but I think this is the related one:

LAST SEEN   FIRST SEEN   COUNT     NAME                                            KIND                    SUBOBJECT                                 TYPE      REASON              SOURCE                                      MESSAGE
1h          1h           2         xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx             Pod                     spec.containers{apiproxy}                 Warning   Unhealthy           kubelet, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   Liveness probe failed: Get http://xxxx/xxxx: dial tcp xxxx:8080: connect: connection refused

You can check the logs of cronjob like anyother object in the cluster. Use kubectl logs <cronjob-pod-name> and kubectl describe pod <cronjob-pod-name>. Can you update your question with results of that commands? — acid_fuji, Mar 5, 2020 at 9:30
Were your cronjob somehow suspended or did you shutdown the cluster? Can you test this with setting .spec.restartPolicy to Forbid and test it? — acid_fuji, Mar 5, 2020 at 9:49
@acid_fuji I've updated the question - cant run the commands, you specify, as you can see. Haven't tested the restartPolicy-option yet. How can I find the cronjob-pod-name? If its the one I think it is, it says: Readiness probe failed: Get http://xxxxxx:8080\xxx: net/http: request canceled (Client.Timeout exceeded while awaiting headers) — nelion, Mar 5, 2020 at 9:59
should I try adding initialDelaySeconds - trying different long shots, I guess. — nelion, Mar 5, 2020 at 10:08
If you can't see it it means that it never actually got to run. What does you cronjob do? What kind of image is that? Can you check kubectl get events ? — acid_fuji, Mar 5, 2020 at 10:24

nelion · Accepted Answer · 2020-03-06 17:38:50Z

15

Setting the startingDeadlineSeconds to 180 fixed the problem + removing the spec.template.metadata.labels.

answered Mar 6, 2020 at 17:38

nelion

1,8326 gold badges21 silver badges39 bronze badges

1

Here is a great explanation why setting startingDeadlineSeconds can fix this error.
– sk1me
Jul 26, 2022 at 12:17

Add a comment |

Jin Ma · Accepted Answer · 2022-05-17 15:35:36Z

1

I suspended my workload then resumed it after quite a while and saw the same error. Isn't it a bug because I triggered the suspend action on purpose anytime between suspend and resume should NOT be counted against missing starting.

answered May 17, 2022 at 15:35

Jin Ma

1993 silver badges13 bronze badges

Add a comment |

Tiriyon · Accepted Answer · 2023-03-12 11:27:37Z

The root cause for this issue:

For every CronJob, the CronJob Controller checks how many schedules it missed in the duration from its last scheduled time until now. If there are more than 100 missed schedules, then it does not start the job and logs the error.^1

A CronJob is counted as missed if it has failed to be created at its scheduled time. For example, if concurrencyPolicy is set to Forbid and a CronJob was attempted to be scheduled when there was a previous schedule still running, then it would count as missed.^1

The simplest solution I can think of is recreating the cronjob to clean the missed schedules.

Collectives™ on Stack Overflow

Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged
kubernetes
cron
openshift
devops
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged kubernetescronopenshiftdevops or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
kubernetes
cron
openshift
devops
or ask your own question.