Lock down AWS Fargate networking when using ECR as an image repository


We setup an 'internal only' Fargate task the other day that locked down all outbound egress traffic. This required more effort than anticipated and I want to have some reference I can look back on in case I run into this issue again.

References

Symptoms

When you attempt to run a Fargate task the task instance fails with this message: Error Message: CannotPullContainerError: context canceled. You may also see a message that looks like this (guid values scrubbed):

Status reason    CannotPullContainer

Error: error pulling image configuration: Get https://prod-us-east-1-starport-layer-bucket.s3.amazonaws.com/61d6a6b7-a74d-4472-96e4-710ec9a7a96b/d834cc02-771d-4c2b-b7bb-7da5334eea15?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-A

Solution / Configuration

  • First, be sure to enable the appropriate VPC endoints (interfaces and gateway) in your region and VPC:

    • com.amazonaws.us-east-1.ecr.api
    • com.amazonaws.us-east-1.ecr.dkr
    • com.amazonaws.us-east-1.s3

    Be sure to check the Enable Private DNS Name checkbox for the endpoints!

  • The next requirement is to be using Route53 for DNS. There will be complications if you run your own DNS that can be simplified if you use the Route53 Resolver Service. Using Route53 ensures that the dns records for these AWS services are set to resolve to an address inside your VPC which allows us to meet our compliance goals of restricting outbound internet access

  • Now for the fun / non-obvious part: In the security group assigned to your Fargate task you need to enable outbound/egress traffic to the S3 endpoint in your VPC. To do this we need to identify the S3 Prefix in your VPC region (identifiers scrubbed):

    $ aws ec2 describe-prefix-lists --region us-east-1
    {
    "PrefixLists": [
        {
            "Cidrs": [
                "9.8.7.0/17",
                "10.9.8.0/15"
            ],
            "PrefixListId": "pl-55443322",
            "PrefixListName": "com.amazonaws.us-east-1.s3"
        },
        {
            "Cidrs": [
                "11.10.9.0/22",
                "12.11.10.0/20"
            ],
            "PrefixListId": "pl-11223344",
            "PrefixListName": "com.amazonaws.us-east-1.dynamodb"
        }
    ]
    }

    From the above output you would take the PrefixListId: pl-55443322 and use that as the value for your security group like so:

Concepts / Conclusion

AWS Elastic Container Registry (ECR) leverages S3 on the back-end. While Amazon provides VPC endpoints to communicate with ECR (API and DKR), this is purely at the 'api level'. Our assumuption was that ECR could be taken as a unit and any back-end dependencies would be handled on the AWS side, which unfortunately proved to not be the case. In order for Fargate to pull a container image it has to have a channel open to S3 to pull the image and start the task. While there is documentation that describes this, it was hard for us to identify and we ended up spinning our wheels for awhile.

Key takeaways for us are to do better POCs of AWS services prior to implementation in production- we were not anticipating this snag and it impacted our timetable to deploy.