AWS Public Sector Blog

Using AWS CDK to build an extensible file-scanning solution for Amazon S3 buckets

AWS branded background design with text overlay that says "Using AWS CDK to build an extensible file-scanning solution for Amazon S3 buckets"

In today’s digital world, ensuring the security of information is essential. One common scenario where data security is crucial is when files from external sources are received by an organization. While Amazon Simple Storage Service (Amazon S3) provides robust security features like the default encryption of all uploaded files, there are scenarios where it’s important to ensure that the incoming files are free from any kind of malware, such as viruses, before they are processed or stored inside the organization. Especially in the public sector, customers are often facing the requirement to use a chain of multiple antivirus scanners to make sure incoming files are not malicious.

You can now take advantage of a solution that we released to an AWS samples GitHub repository, which uses the power of AWS Cloud Development Kit (AWS CDK) to provide a fully extensible and scalable file scanning pipeline for Amazon S3 buckets. The solution allows customers to seamlessly integrate any available virus scanner that runs on Microsoft Windows, ensuring comprehensive protection against a wide range of threats.

Solution architecture

The following Figure 1 shows the architectural diagram of the solution described in this post. It illustrates two exemplary antivirus scan solutions and additional solutions that can be integrated as needed.

Figure 1. Architectural diagram of the solution described in this post. The major components are an Amazon S3 bucket, Amazon Simple Notification Service (Amazon SNS), AWS Lambda, Amazon Simple Queue Service (Amazon SQS), Amazon Elastic Compute Cloud (Amazon EC2), Amazon CloudFront, and Amazon Cognito.

The solution is built with a CDK stack that includes the following components:

  1. S3 bucket: This bucket serves as the primary ingestion point for files.
  2. Amazon Simple Queue Service (Amazon SQS) queue: Incoming files trigger notifications that are sent to an SQS queue.
  3. Auto Scaling groups: For each configured virus scanning solution, there is an Auto Scaling group that scales dynamically based on the depth of the SQS queue.
  4. Scanning instances: These instances, launched by the Auto Scaling groups, run the configured virus scanners and process files from the SQS queue.
  5. Clean and infected buckets: After scanning, clean files are moved to the clean bucket, while infected files are quarantined in the infected bucket.

By using the AWS CDK, customers can easily configure, adapt, change, and deploy the solution with their preferred virus scanning solutions. This flexibility helps organizations adapt to evolving threat landscapes and use the most advanced virus scanning technologies available. The project contains a CDK construct that is configurable and allows you to add additional anti-virus scanners:

const additionalScanner = new ec2Scanner(this, "AdditionalScanner", {
    vpc, // VPC the scanner will be deployed in
    inputTopic, // SNS topic S3 send bucket notifications to
    inputBucket, // S3 bucket to scan
    tagPrefix: "ADDITIONAL_SCANNER", // prefix for the tags added to scanned files
    avPath: "./examples/clamav/", // folder with AV scanner scripts
});

Solution walkthrough

To use your custom antivirus solution, you need to create two PowerShell scripts:

  1. a) install.ps1 – Install the antivirus solution and all dependencies
  2. b) scan.ps1 – This script will be called for each file scanned and return the results in a specific format

You can find examples for ClamAV and Microsoft Defender in the “examples” folder of the repository. In addition to this, the solution comes with a demo front-end application that allows you to easily upload files and test the antivirus scanning. To produce positive test results, you can use the eicar Anti Malware Testfile.

Conclusion

This solution offers a comprehensive and flexible approach to ensuring the security of files uploaded to Amazon S3 buckets. Organizations can easily integrate and customize their preferred virus scanning solutions and stay ahead of evolving threats. To get started with this adaptable solution, visit the AWS Samples GitHub repository and explore the sample code and documentation. Enhance your data security posture, protect your sensitive information, and use the power of AWS CDK to build a resilient and adaptable file-scanning solution tailored to your organization’s needs.

Benedikt Pauwels

Benedikt Pauwels

Benedikt is a senior solutions architect dedicated to the public sector in Germany. He combines a software architecture background with a passion for innovation. Benedikt's expertise lies in the realm of application modernization and the seamless integration of serverless technologies. In the ever-evolving landscape of public sector services, he tries to be a vital resource, spearheading digital transformation.

Michael Wahlers

Michael Wahlers

Michael Wahlers, a principal solutions architect and public sector specialist, operates in Germany, Austria, and Switzerland to enable public institutions with innovative digital solutions. His expertise ensures seamless service delivery and helps shape the digital future of the public sector in the region. Beyond his professional commitments, he enjoys exploring complex distributed systems, integrating local contexts into his work.