AWS Public Sector Blog
Using AWS CDK to build an extensible file-scanning solution for Amazon S3 buckets
In today’s digital world, ensuring the security of information is essential. One common scenario where data security is crucial is when files from external sources are received by an organization. While Amazon Simple Storage Service (Amazon S3) provides robust security features like the default encryption of all uploaded files, there are scenarios where it’s important to ensure that the incoming files are free from any kind of malware, such as viruses, before they are processed or stored inside the organization. Especially in the public sector, customers are often facing the requirement to use a chain of multiple antivirus scanners to make sure incoming files are not malicious.
You can now take advantage of a solution that we released to an AWS samples GitHub repository, which uses the power of AWS Cloud Development Kit (AWS CDK) to provide a fully extensible and scalable file scanning pipeline for Amazon S3 buckets. The solution allows customers to seamlessly integrate any available virus scanner that runs on Microsoft Windows, ensuring comprehensive protection against a wide range of threats.
Solution architecture
The following Figure 1 shows the architectural diagram of the solution described in this post. It illustrates two exemplary antivirus scan solutions and additional solutions that can be integrated as needed.
The solution is built with a CDK stack that includes the following components:
- S3 bucket: This bucket serves as the primary ingestion point for files.
- Amazon Simple Queue Service (Amazon SQS) queue: Incoming files trigger notifications that are sent to an SQS queue.
- Auto Scaling groups: For each configured virus scanning solution, there is an Auto Scaling group that scales dynamically based on the depth of the SQS queue.
- Scanning instances: These instances, launched by the Auto Scaling groups, run the configured virus scanners and process files from the SQS queue.
- Clean and infected buckets: After scanning, clean files are moved to the clean bucket, while infected files are quarantined in the infected bucket.
By using the AWS CDK, customers can easily configure, adapt, change, and deploy the solution with their preferred virus scanning solutions. This flexibility helps organizations adapt to evolving threat landscapes and use the most advanced virus scanning technologies available. The project contains a CDK construct that is configurable and allows you to add additional anti-virus scanners:
const additionalScanner = new ec2Scanner(this, "AdditionalScanner", {
vpc, // VPC the scanner will be deployed in
inputTopic, // SNS topic S3 send bucket notifications to
inputBucket, // S3 bucket to scan
tagPrefix: "ADDITIONAL_SCANNER", // prefix for the tags added to scanned files
avPath: "./examples/clamav/", // folder with AV scanner scripts
});
Solution walkthrough
To use your custom antivirus solution, you need to create two PowerShell scripts:
- a) install.ps1 – Install the antivirus solution and all dependencies
- b) scan.ps1 – This script will be called for each file scanned and return the results in a specific format
You can find examples for ClamAV and Microsoft Defender in the “examples” folder of the repository. In addition to this, the solution comes with a demo front-end application that allows you to easily upload files and test the antivirus scanning. To produce positive test results, you can use the eicar Anti Malware Testfile.
Conclusion
This solution offers a comprehensive and flexible approach to ensuring the security of files uploaded to Amazon S3 buckets. Organizations can easily integrate and customize their preferred virus scanning solutions and stay ahead of evolving threats. To get started with this adaptable solution, visit the AWS Samples GitHub repository and explore the sample code and documentation. Enhance your data security posture, protect your sensitive information, and use the power of AWS CDK to build a resilient and adaptable file-scanning solution tailored to your organization’s needs.