Preventing Malware in Serverless Web Applications with SophosLabs Intelix

By James Wilson, Sr. Product Manager at SophosLabs
By Tom Ellis, Sr. Solutions Architect ISV at AWS

Building web applications in a serverless environment has brought many advantages, including scalability and a focus on maintaining code rather than servers.

As with every big shift, however, there are certain practices such as malware protection that need to be reinvented.

Gone are the days of managing your own servers, being able to install your chosen anti-malware product, and feeling safe in the knowledge that any malicious files will be caught as they are saved to the server.

Customers are building more and more web applications that take uploaded files from application users. In a serverless environment, it’s much harder to use traditional anti-malware products to keep you and your customers safe.

To solve for this, organizations need a solution that is easy to query from web application via API with no infrastructure required. This solution needs to give a clear verdict, telling you whether a file is clean or malicious, and provide additional metadata on the findings.

In this post, we will introduce SophosLabs Intelix, a suite of APIs which provide specific, actionable intelligence about files. Intelix has been built by combining tools and information from SophosLabs and Sophos AI. Sophos is an AWS Security Competency Partner.

We’ll also cover how to use the Intelix APIs for analysis and deploy a solution to scan objects upon upload to Amazon Simple Storage Service (Amazon S3). This uses an S3 event trigger and AWS Lambda function calling to Intelix APIs, and copies valid objects to a second S3 bucket and logs the analysis results into Amazon CloudWatch Logs.

By using Intelix within a Lambda function, you can maintain your serverless infrastructure and protect assets from malicious content.

SophosLabs Intelix is available on AWS Marketplace.

SophosLabs Intelix

SophosLabs is a tier-1 threat research lab with decades of experience analyzing the global threat landscape and staying ahead of emerging threats.

The Sophos AI team, meanwhile, is pushing the boundaries of applying machine learning (ML) and other techniques to cybersecurity.

The combination of these teams’ efforts led to a powerful solution which is provided through easy-to-use APIs. Intelix offers a layered approach to security and allows you to deeply inspect suspicious files, while letting obviously clean files pass through the scanning quickly.

Figure 1 – Intelix solution architecture.

The services that SophosLabs Intelix offers can be split into three layers:

Cloud lookups
Static analysis
Dynamic analysis

As you go down through the layers, the time taken to complete the analysis increases as the depth of the inspection increases. At the first two layers, the response from Intelix can be clean, malicious, or not known.

Dynamic analysis analyzes the behavior of the file within the sandbox. It will always give a clean or malicious response depending on whether or not malicious behavior was observed.

Implementation Plan

The flow chart below shows how to implement Intelix, as we are hunting for the first definitive answer as to whether the file is clean or malicious.

Figure 2 – Intelix services flow chart.

To allow Intelix to easily integrate this your application, we’re going to build a Lambda function that will take a file and tell us whether it’s clean or malicious.

This Lambda function can be deployed as part of your serverless application. An example is provided whereby an S3 event trigger calls Lambda when a new object is uploaded. It then runs through the analysis before moving clean (non-malicious) data to a second bucket and logging the results to Amazon CloudWatch Logs.

The example code is available on GitHub.

Login

Once you have registered to the Intelix service via AWS Marketplace, you are provided with API credentials in the form of a client_id and client_secret. These are required to access the Intelix APIs and must be base64 encoded:

# Generate base64 encoded API credentials without linespaces or wrap
    $ echo -n “my_client_id:my_client_secret” | base64 -w0

This will return the base64 credentials, which you can export as INTELIX_CREDENTIALS environment variable to run locally, or use in the Lambda environment configuration.

Within the Lambda function, you’ll need to get an authorization token for Intelix to make the request with your credentials. This can be obtained quickly via the login token endpoint:

    # Setup the authorization header using credentials supplied
    auth_string = "Basic " + intelix_credentials
    headers = {"Authorization": auth_string}
    data = {"grant_type": "client_credentials"}
    response = requests.post("https://api.labs.sophos.com/oauth2/token", data=data, headers=headers)

Cloud Lookups

Cloud lookups provide the ability to query SophosLabs’ database of known clean and malicious files. This is the fastest request to the Intelix API you can make, as this is a synchronous request.

The hash of the file is sent to the cloud lookup API, and this provides a JSON response which includes the reputation score we can use to see if the file is clean, malicious, or not known.

def cloud_lookup(file_hash):
    # Based on the SHA256 get the score of the file from Cloud Lookup - File Reputation
    login()
    headers = {"Authorization": access_token}
    url = "https://de.api.labs.sophos.com/lookup/files/v1/" + file_hash
    response = requests.get(url, headers=headers)
    score = response.json()['reputationScore']
    print("Score: " + str(score))
    print("Raw response: \n" + json.dumps(json.loads(response.text), indent=4))
    return score

In the following example, the cloud lookup has identified the file as being malicious; the reputation score returned is below 20.

{
    "reputationScore": 18,
    "requestId": "daad9e8a-ee68-46c9-a188-f517e4e258c5,1499499514"
}

Static Analysis

Static analysis will scan the file with the Sophos Anti-Malware engine and will, for files such as documents and Portable Executable (PE) files that contain active content, use machine learning models to inspect the file features and structure for malicious content.

In this case, the complete file will need to be sent to Intelix for analysis as part of the request:

    headers = {"Authorization": access_token}
    files = {"file": open(filename, 'rb')}
    response = requests.post(url, headers=headers, files=files)

The response will either be the report, or a job id indicating the analysis is in progress. When the job is in progress, the following JSON is returned:

{
    "jobId": "476615a45483be21231e967b59ec6004",
    "jobStatus": "IN_PROGRESS",
    "requestId": "31945e71-263f-4f2e-bb0b-b52c714a5716"
}

We can determine which response we are getting based on the status code:

    # Response 200 means that the report has been provided in the response
    if response.status_code == 200:
        return response
    # Response 202 means that the job is in progress and we need to wait for the response
    elif response.status_code == 202:
        jobId = response.json()["jobId"]

If we have the job id, then we’ll need to poll the service every few seconds to get the report. These can take up to 15 minutes depending on the analysis carried out, but it typically returns with minutes.

To put this together for static analysis, we will send the file for analysis and then get the score from the report provided:

def static_analysis(filename):
    # Send the file for Static analysis and return the score
    url = "https://de.api.labs.sophos.com/analysis/file/static/v1/"
    response = get_analysis(filename, url)
    print("Score: " + str(response.json()["report"]["score"]))
    print("Raw response: \n" + json.dumps(json.loads(response.text), indent=4))
    return response.json()["report"]["score"]

When we get the report, there is a full intelligence report. As shown above, the score under the report object is the important field. In the example below, the score is 50:

{
    "jobId": "405efb10f827b81c0a5a35122712fcbc",
    "jobStatus": "SUCCESS",
    "report": {
        "analysis_subject": {
            "mime_type": "text/plain",
            "sha1": "9f75bdb4d8bdd6d628ca77d3c524876e097bbcff",
            "sha256": "9e1f489443982d7452b6a9cdf6be9b224be85c9f468e6e03a9100c7b2c57831c"
        },
        "analysis_type": "static",
        "detection": {
            "permalink": "",
            "positives": 0,
            "sophos": "",
            "sophos_ml": "",
            "total": 0
        },
        "object_type": "file",
        "reputation": {
            "first_seen": "",
            "last_seen": "",
            "prevalence": "",
            "score": 30,
            "score_string": "Unknown reputation"
        },
        "score": 50,
        "submission": "2021-02-25T19:11:11Z"
    },
    "requestId": "ae655d0c-c43f-4f19-900f-8cfaf92557cd"
}

Dynamic Analysis

Dynamic analysis detonates the file in the SophosLabs sandbox environment. This is the longest scan and typically takes five minutes.

The sample is detonated in a virtual machine (VM) containing all of Sophos’ detection capabilities, as well as specific tools for monitoring behavior through memory dumps and other techniques.

This provides a detailed report on whether any malicious behavior is exhibited.

Submitting a sample for dynamic analysis is the same as for static analysis (as is collecting the verdict), but uses a different URL for the request, as shown in the code example below:

def dynamic_analysis(filename):
    # Send the file for dynamic analysis and return the score
    url = "https://de.api.labs.sophos.com/analysis/file/dynamic/v1/"
    response = get_analysis(filename, url)
    print("Score: " + str(response.json()["report"]["score"]))
    print("Raw response: \n" + json.dumps(json.loads(response.text), indent=4))
    return response.json()["report"]["score"]

The response you get from dynamic analysis will have the full intelligence report, containing all of the information observed during detonation of the sample.

As with the static analysis, the score in the report section is the important field. In the example below (which has detail removed for brevity) you can see the score is 100:

{
    "jobId": "4d9a0c15dfbdb9cf2fb6755234d90ff4",
    "jobStatus": "SUCCESS",
    "report": {
        "activity_tree": {
            "data": {
                "edges": [
                    {
                        "id": "3304:132587538952774909_1684:132587538992585120",
                        "label": "CreateProcess",
                        "source": "3304:132587538952774909",
                        "target": "1684:132587538992585120"
                        ...

        "processes": [
            {
                "command_line": "%programfiles%\\Microsoft Office\\root\\Office16\\WINWORD.EXE \"%input_sample%\"",
                "parent_process": "%sandbox_framework%",
                "pid": 1684,
                "ppid": 3304,
                "process": "%programfiles%\\microsoft office\\root\\office16\\winword.exe",
                "start_time": "2021-02-25T19:11:39Z"
            }
        ],
        "score": 100,
        ...
        "submission": "2021-02-25T19:11:21Z"
    },
    "requestId": "c5474042-0245-4a76-bd40-dd54c12294a8"
}

By using a Lambda function to access Intelix functionality, you can get a detailed analysis of each file uploaded ensuring it’s not malicious before you use it any further within your web applications and AWS environment.

AWS CDK Deployment Example

An example deployment template using the AWS Cloud Development Toolkit (CDK) has been provided within the GitHub repository. This allows you to easily deploy an Amazon S3 bucket for data uploads, AWS Lambda function for Intelix analysis, and an S3 bucket for clean data.

When the Lambda function runs the analysis, results are published to Amazon CloudWatch Logs and, if clean, the data is copied to a second S3 bucket. This can be deployed using in a few simple steps. For further information please see README.md within the GIT repository:

$ git clone https://github.com/sophoslabs/intelix-lambda-example.git && cd intelix-lambda-example
$ python3 -m venv .venv && source .venv/bin/activate # setup a new python environment
$ pip3 install -r requirements.txt # install dependencies
$ pip3 install requests -t ./resources # pull in dependencies in the lambda
$ cdk bootstrap # bootstrap AWS CDK
# Add your INTELIX_CREDENTIALS to cdk_intelix_lambda/intelix_lambda_service.py or via the AWS console after deployment
$ cdk deploy

Summary

With SophosLabs Intelix, you have a robust anti-malware solution that integrates via RESTful API and provides flexibility to integrate the service into your own serverless application.

The example code we shared in this post provides a great starting point to interact with the Intelix API. You could further customize this with a state machine in AWS Step Functions or event triggers with Amazon EventBridge.

If you’re building a web application that accepts files from untrusted sources, try out SophosLabs Intelix to keep your customers and systems free from malware.

Sign up for Intelix on AWS Marketplace, which includes a free tier to help you get started. You can also access the API documentation, and deploy the sample code used in this post on the SophosLabs GitHub repository.

.

.

Sophos – AWS Partner Spotlight

Sophos is an AWS Security Competency Partner that designs products to eliminate complexity, from network to endpoint to server security.

Contact Sophos | Partner Overview | AWS Marketplace

*Already worked with Sophos? Rate the Partner

*To review an AWS Partner, you must be a customer that has worked with them directly on a project.