49

I have set up a Cloudwatch rule event where an ECS task definition is started when a previous task definition is completed.

I can see the event triggers the task definition however it fails.

The only visibility of this failure is in the rule metrics, where I see the metric failedinnvocations.

Question, are there any logs to see why the trigger failed?

I can manually set up the rule via the management console and everything works fine.

The error occurs when I set up the rule via a cloudformation template.

I have compared the two rules and both are identical, except the role. However, both roles have the same permissions.

1
  • Could you please post the relevant parts of your template?
    – Miles
    Feb 4, 2018 at 8:18

9 Answers 9

30

If the rule has been successfully triggered, but the invocation on the target failed, you should see a trace of the API call in the Event History inside the AWS CloudTrail looking at the errorCode and errorMessage properties:

{
   [..]
   "errorCode": "InvalidInputException",
   "errorMessage": "Artifacts type is required",
   [..]
}
3
  • Thank you for this. This was the only way I could figure out what was actually going on with my CloudWatch rule.
    – dimiguel
    Apr 18, 2019 at 23:51
  • I think this should be the correct answer. Thank you for posting this.
    – metasync
    Feb 28, 2020 at 18:17
  • Here is the way I found out what's going on. I enable cloudtrail and search it on cloudwatch to find the problem. Thank you for your answer
    – Son Lam
    Feb 27, 2021 at 4:13
25

CloudTrail logs helped. event Name is RunTask. The issue was: "errorCode": "InvalidParameterException", "errorMessage": "Override for container named rds-task is not a container in the TaskDefinition.",

The AWS documentation for debugging CloudWatch events is here:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CWE_Troubleshooting.html

I opened a PR to add documentation for debugging failed ECS Task Invocations from CloudWatch Events:
https://github.com/awsdocs/amazon-cloudwatch-events-user-guide/pull/12/files

1
  • I was not able to find any RunTask events but got the logs from looking at the StartPipelineExecution event. (Use case was triggering a sagemaker pipeline from eventbridge)
    – user1114
    Jul 16, 2022 at 13:45
18

This stumped us for ages, the main issue is the role problem Nathan B mentions but something else that tripped us up is that Scheduled Containers won't work in awsvpc mode (and by extension Fargate). Here's a sample CloudFormation template:

---
AWSTemplateFormatVersion: 2010-09-09
Description: Fee Recon infrastructure

Parameters:
  
  ClusterArn:
    Type: String
    Description: The Arn of the ECS Cluster to run the scheduled container on

  SecurityGroup:
    Type: String
    Description: The security group the task will use

  Subnet0:
    Type: String
    Description: A subnet that the task will run in

  Subnet1:
    Type: String
    Description: A subnet that the task will run in

Resources:

  TaskRole:
    Type: AWS::IAM::Role
    Properties:
      Path: /
      AssumeRolePolicyDocument:
        Statement:
          - Action:
              - sts:AssumeRole
            Effect: Allow
            Principal:
              Service:
                - ecs-tasks.amazonaws.com
        Version: 2012-10-17
      Policies:
       - PolicyName: TaskPolicy
         PolicyDocument:
           Version: 2012-10-17
           Statement:
             - Effect: Allow
               Action:
                 - 'ses:SendEmail'
                 - 'ses:SendRawEmail'
               Resource: '*'

  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      TaskRoleArn: !Ref TaskRole
      ContainerDefinitions:
        - Name: !Sub my-container
          Essential: true
          Image: !Sub <aws-account-no>.dkr.ecr.eu-west-1.amazonaws.com/mycontainer
          Memory: 2048
          Cpu: 1024

  CloudWatchEventECSRole:
   Type: AWS::IAM::Role
   Properties:
     AssumeRolePolicyDocument:
       Version: 2012-10-17
       Statement:
         - Effect: Allow
           Principal:
             Service:
               - events.amazonaws.com
           Action:
             - sts:AssumeRole
     Path: /
     Policies:
       - PolicyName: CloudwatchEventsInvokeECSRunTask
         PolicyDocument:
           Version: 2012-10-17
           Statement:
             - Effect: Allow
               Action: 'ecs:RunTask'
               Resource: !Ref TaskDefinition

  TaskSchedule:
    Type: AWS::Events::Rule
    Properties:
      Description: Runs every 10 minutes
      Name: ScheduledTask
      ScheduleExpression: cron(0/10 * * * ? *)
      State: ENABLED
      Targets:
        - Id: ScheduledEcsTask
          RoleArn: !GetAtt CloudWatchEventECSRole.Arn
          EcsParameters:
            TaskDefinitionArn: !Ref TaskDefinition
            TaskCount: 1
            NetworkConfiguration:
              AwsVpcConfiguration:
                SecurityGroups:
                  - !Ref SecurityGroup
                Subnets:
                  - !Ref PrivateSubnet0
                  - !Ref PrivateSubnet1
          Arn: !Ref ClusterArn

Note: I've added the ClusterArn as a parameter to the script as well as the security group and subnets you wish the task to run in.

There are two roles you need to care about, the first is the role (TaskRole) for the task itself: in this example the container just sends an email using SES so it has the necessary permissions. The second role (CloudWatchEventECSRole) is the one that makes it all work, note that in its Policies array the principle is events.amazonaws.com and the resource is the ECS task defined in the template.

2
  • I think it actually can work with awsvpc mode, but you need to explicitly specify the network policy (mismo.team/…)
    – ritmatter
    Dec 23, 2021 at 0:58
  • At the time of writing this answer was correct but you can now, I'll update the answer.
    – Stefano
    Jan 5, 2022 at 13:12
15

This problem was due to not setting the principle services to include events.amazonaws.com. The task couldn't assume the role.

Shame aws doesn't have better logging for failedinvocations.

4
  • 1
    Thanks for posting this! I'm having what I think is a similar issue. If you could share more about the solution you found, that may help me get this over the finish line. Really appreciate any more detail you can offer. Feb 18, 2018 at 4:50
  • 1
    For future readers: I found out the events are actually logged, including the error message of the failed invocation. Just go to cloudtrail -> event history. Jun 1, 2021 at 13:23
  • @RobbertvandenBogerd any tips on filtering? I'm not finding anything specific to the invocations.
    – Taylor
    Jun 1, 2021 at 19:18
  • Not really, just try to invoke the thing you are testing with and see which events come by at that moment. good luck Jun 2, 2021 at 20:09
7

In case other people come here looking for the setup necessary to make this work for a task in Fargate. There is some extra configuration in addition to Stefano's answer. Running tasks in Fargate requires setting up an execution role, so you need to enable the CloudWatchEventECSRole to use it. Add this statement to that role:

{
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": [
        "arn:aws:iam::<account>:role/<executionRole>"
    ]
}
2
  • 2
    The built in policy "AmazonEC2ContainerServiceEventsRole" has iam:PassRole too. It does a "StringLike": { "iam:PassedToService": "ecs-tasks.amazonaws.com" } instead of naming resources.
    – twamley
    Sep 13, 2019 at 20:06
  • This was the part that I was missing... It really would be helpful if the documentation said "The policy that the Trigger needs to use needs these permissions: ..."
    – Mike Caron
    Feb 12, 2021 at 16:36
5

I too was not seeing my lambda executing, but I did find evidence of FailedInvocations in CloudWatch Events (but only via the Event Rule Metrics link, which took me to https://console.aws.amazon.com/cloudwatch/home?region={your_aws_region}#metricsV2:graph=~();query=~'*7bAWS*2fEvents*2cRuleName*7d*2{Lambda_Physical_ID})

I was not seeing the "trigger" in the console either so I took a step back, decided to do a more "simple" SAM deploy with the Events property set, then looked at the processed template to determine how it was done in that case. Below is what I ended up using to implement "EventBridge" to have a ScheduledEvent fire my Lambda (alias in my case, which is why I discovered this).

Simple SAM approach to scheduled invocations

(Add this property to your AWS::Serverless::Function)

Events:
  InvokeMyLambda:
    Type: Schedule
    Properties:
      Schedule: rate(1 minute)
      Description: Run SampleLambdaFunction once every minute.
      Enabled: True

By looking at the converted template in CloudFormation and comparing to the version without Events, I was able to identify not on the expected AWS::Events::Rule (which is what I expected to see invocing the lambad), but I also saw AWS::Lambda::Permission.

Hopefully this is what you all are needing as well to get invocations working (and not needing the missing logs to see why) :P

Working approach

MyLambdaScheduledEvent:
  Type: AWS::Events::Rule
  Properties:
    Name: MyLambdaScheduledEvent
    EventBusName: "default"
    State: ENABLED
    ScheduleExpression: rate(5 minutes) # same as cron(0/5 * * * ? *)
    Description: Run MyLambda once every 5 minutes.
    Targets:
    - Id: EventMyLambdaScheduled
      Arn: !Ref MyLambda
MyLambdaScheduledEventPermission:
  Type: AWS::Lambda::Permission
  Properties:
    Action: lambda:InvokeFunction
    Principal: events.amazonaws.com
    FunctionName: !Ref MyLambda
    SourceArn: !GetAtt MyLambdaScheduledEvent.Arn
2

For anyone that is struggling with setting up scheduled tasks on Fargate, and is using Terraform to set-up their cloud, take a look at this module. https://github.com/dxw/terraform-aws-ecs-scheduled-task

It helps in setting up the scheduled tasks through CloudEvents and sets the correct IAM roles.

1
  • Thanks. As Jonny Cundall's answer said, the role with an "iam:PassRole" action should be used for the CloudWatchEvent service.
    – Jay
    Oct 14, 2020 at 17:49
1

I spent ages trying to troubleshoot this, when creating an ECS scheduled task via the command line the task was created but never started. Thanks for this post, I discovered by looking at the EventHistory in CloudTrail that the ECS instances had all died and there were no EC2 instances running!

{
   [..]
 "errorCode": "InvalidParameterException",
 "errorMessage": "No Container Instances were found in your cluster.",
   [..]
}
0

For me - my target was an SQS FIFO queue. It only became apparent - after recreating the queue without FIFO that everything worked.

enter image description here So the root cause was content deduplication was not enabled. (at the time queue was enabled - I didn't need it. Amazon - please help EventBridge track these failedinvocations

In mean time - I submitted a PR to updated docs https://github.com/awsdocs/amazon-cloudwatch-events-user-guide/pull/14

Not the answer you're looking for? Browse other questions tagged or ask your own question.