AWS Public Sector Blog
Detect vulnerabilities in the Docker images in your applications
A guest post by Shane Riddell, Technical Fellow, ellucian®
With cyberattacks on the rise against higher education institutions, it is critical to detect vulnerabilities in the Docker images that may run your applications. This involves not just vulnerability scanning of the applications, but of the OS packages installed in a Docker image as well.
The ecr-cve-monitor project is an open-source proof-of-concept designed to fill the OS/package vulnerability scanning space for Docker images stored in Amazon Elastic Container Registry (ECR). It’s based on Clair and Klar, and designed specifically for use with ECR. Any images pushed to a repository in an ECR will be automatically scanned and have a report generated for them. Any new CVEs that come in that affect an already scanned image will trigger the creation of an updated report.
Reports are stored as gzip compressed JSON files in time-series in Amazon Simple Storage Service (Amazon S3), making it easy to query for images with CVEs via Amazon Athena.
Ecr-cve-monitor is message-based. All operations are passed as messages on an Amazon Simple Queue Service (SQS) queue to provide automatic retries, with a final dead-letter queue.
Clair itself functions by ‘indexing’ all layers in a Docker image for ‘features’ and then stores those features in PostgreSQL. If a new CVE comes in to Clair that affects a layer Clair has already indexed, it issues a notification to the custom ecr-cve-monitor notification endpoint, which converts it to a rescan message on the input queue.
If a new image is pushed to ECR, CloudTrail generates a CloudWatch event, which triggers a small AWS Lambda function that puts a scan image message on the input queue for the new image. Thus, new images are automatically added to those monitored and existing images that are affected by new CVEs can be identified.
To bootstrap a new installation, some simple python scripts are provided to generate and load ‘ScanImage’ messages to the pending scan queue for all existing images in a given ECR. The installation can also be temporarily scaled up during the initial load to reduce the time it takes to index an existing ECR containing many images.
Although it has only been tested with a single registry so far, it was designed to handle multiple registries and regions, assuming you set up the necessary cross-account permissions to allow the account ecr-cve-monitor it is deployed in to pull all images in the other account’s ECR.
Any time an image is scanned, either because it was just pushed to a repo or because Clair detected that a layer in the image is affected by a new CVE, ecr-cve-monitor generates a new JSON report of all vulnerabilities in that image and stores the result in S3 under a year/month/day time-series scheme. For a given day, only one report (at most) will exist for an image.
Amazon Athena can be used to generate reports such as ‘show me any image with 1 or more high level CVEs’. Or ‘show me any images with new high CVE vulnerabilities in the last 2 days’. The time-series storage also allows only a small amount of the data to be loaded into an Athena partition, so you can scan a small subset of the data for any new vulnerabilities in the last 24 hours, for example.
Note, Clair does not recognize or track images directly – it only scans and knows about layers. Software that uses Clair (in ecr-cve-monitor, the clair scanner) is responsible for sending in each layer with a unique id. The clair scanner is responsible for tracking which layers are present in which Docker images in the ECR. Ecr-cve-monitor accomplishes this by mapping the unique layer ID to the image as identified by its unique registry ID in a Dynamo DB table.
Uniquely identifying images can be confusing. Images have an internal sha256 identifier, but this is not the address of the image in ECR. ECR appears to assign images a unique sha256 ID, separate from the image sha256. This is the true unique ID within an ECR. Docker tags are mutable and cannot be tracked because they could change over time.
So the reports are generated in terms of the ECR identifier, repo id, and registry sha256 ID (which permanently identifies an image in an ECR repository).
A second layer of reporting would be necessary to translate Athena query results into images using the human-friendly tags assigned to that image, although it would only be guaranteed accurate at the time of the report, as the tags (particularly the ‘latest’ tag) can change.
The README.md in the project contains examples of using Amazon Glue and Athena to set up the table to query. An example of querying for any level 1 CVEs detected on a particular date would be:
ALTER TABLE reports ADD PARTITION (year=’2019′,month=’01’,day=’15’) location ‘s3://my-scan-results/year=2019/month=01/day=15/’
You can then query for any images that were detected to have at least 1 High CVE on the 15th.
select distinct ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest from reports where cardinality(vulnerabilities.High) > 0 and year=’2019′ and month=’01’ and day=’15’ order by ECRMetadata.registryId, ECRMetadata.repositoryName, ECRMetadata.imageId.imageDigest;
This would be a typical query you would want to run daily to identify any new vulnerabilities that have entered or been identified in your repositories in the previous day, without scanning through all of the historical data. With this in place, you can have continuous monitoring of all images in your repositories against the CVE feeds.
Ecr-cve-monitor just tells you about OS-level CVE vulnerabilities in your containers. To put it into effect, you need to decide for your organization how to deal with images identified as vulnerable. This could involve quarantining them so they can’t be further deployed, blocking deployment of them via your CI/CD pipeline, or further reporting on how you run your containers (Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Container Service for Kubernetes (Amazon EKS) ) to identify vulnerable images that are actively running. The optimal mix of techniques depends on your SLAs, sensitivity to downtime, and the severity of a new CVE that is detected in an already running image.
To run or experiment with the ecr-cve-monitor project, visit https://github.com/sriddell/ecr-cve-monitor and view the README.md for details on installations, operation, and underlying architecture.