AWS for Industries
FSI Services Spotlight: Athena
Editor’s note: Visit AWS FSI Services Spotlight to view the previous posts of the monthly blog series.
In this edition of Financial Services Industry (FSI) Service Spotlight monthly blog series, we highlight five key considerations as they pertain to Amazon Athena including achieving compliance, data protection, isolation of compute environments, audits with APIs, and operational access and security. For each of these five areas, we will provide specific guidance, suggested reference architectures and technical code that can help streamline service approval for the featured service, which may need to be adapted to your specific use case and environment.
Amazon Athena is a serverless interactive SQL query service that enables customers to query large volumes of data stored on Amazon Simple Storage Service (Amazon S3) or in other sources without the need to manage the underlying infrastructure or having to set up complex ETL processes. Customers simply register their preexisting S3 datasets as Tables in Athena’s underlying metastore and can immediately begin querying it using standard SQL. With the rise in prevalence of data lake and lakehouse architectures, Athena has become a popular option for FSI customers for interactive SQL analytics.
A common usage pattern for FSI customers is to set up Athena as an underlying query engine for downstream analytic use cases such as Machine Learning and Business Intelligence. To support this pattern, Athena provides a number of different options including interactive access through the AWS Console, programmatic access through SDK and analytics libraries such as AWS Data Wrangler, and JDBC or ODBC drivers for access from external BI Tools and SQL clients.
Amazon Athena Use Cases in the Financial Services Industry
Tech-driven insurance agency Aioi Nissay Dowa Insurance Services USA (AIS) knows that connected vehicles are the next frontier for the automotive industry. AIS sought to collect data from vehicles and other telematics devices and process that data on a single service. From there, the company could use the data to assess a driver’s risk for insurance purposes. Among the AWS services that AIS found helpful was Amazon Athena. “Previously, it was difficult to scale cost effectively while delivering the promised insights from telematics data,” says Michael Fischer, chief of innovation at AIS . “So we turned to the elasticity of a serverless architecture and used Amazon Athena as one of our processing methods.”
Eko, which democratizes banking and financial services by helping low- to moderate-income workers digitize their earnings, has built a data lake on the AWS Cloud using Amazon EMR for big data processing and AWS Glue to prepare and load data for analysis. Amazon Athena is at the core of its analytics pipeline and is used to run serverless queries from data stored in Amazon S3.
Lastly, TNG FinTech Group aims to improve the lives of the unbanked population in Asia by providing social and financial inclusion. TNG also takes advantage of Amazon Athena and AWS Glue to run serverless queries with extract, transform, and load functionality. It uses Amazon Athena to automate queries in its wallet service, so customers can easily retrieve information on past transactions, with no manual intervention required from TNG staff. The Group relies on AWS Glue to transform raw data from the CSV file format before it is integrated into Amazon Athena for processing or transferred to the data lake on Amazon S3.
“In the past, we had to do a lot of coding work by ourselves and scalability was also a problem with CSV files. After moving to AWS, we can leverage cloud resources like AWS Glue and Amazon Athena to shorten the processing time,” says Chris Chan, head of engineering at TNG.
Achieving Compliance with Amazon Athena
Security is a shared responsibility between AWS and the customer. AWS is responsible for protecting the infrastructure that runs AWS services in the AWS Cloud and also provides you with services that you can use securely. Your responsibility is determined by the AWS service that you use. On the customer’s side of the shared responsibility model, customers should first determine their requirements for network connectivity, encryption and access to other AWS resources. We will dive deeper into those topics in the upcoming sections.
Amazon Athena falls under the scope of the following compliance programs with regards to AWS’s side of the shared responsibility model. In following sections, we will cover topics on the customer side of the shared responsibility model.
- SOC 1,2,3
- PCI
- FedRAMP High
- DoD CC SRG
- HIPAA BAA
- IRAP Protected
- FINMA
- ISO/IEC 27001:2013, 27017:2015, 27018:2019, and ISO/IEC 9001:2015
- OSPAR
- C5
- MTCS
- ISMAP
- K-ISMS
- ENS-HIGH
- HIGHTRUST CSF
Data Protection with Amazon Athena
Encryption is a commonly used mechanism to protect data in transit and rest. Athena uses Transport Layer Security (TLS) encryption for data in-transit between Athena and Amazon S3, and between Athena and customer applications accessing it. TLS is also used to encrypt query results that are streamed to JDBC or ODBC clients. Customers can also apply an aws:SecureTransport condition on the underlying S3 bucket to deny access to non encrypted HTTP requests. Compliance to this policy can further be enforced by enabling the s3-bucket-ssl-requests-only rule in AWS Config.
Since Athena queries data that is stored in Amazon S3 you can leverage a number of supported options for encryption at rest including SSE-S3, SSE-KMS, and CSE-KMS. Furthermore, Athena allows you to encrypt the query results regardless of whether the source data itself is encrypted. Additionally, Athena makes use of the AWS Glue Catalog to store metadata of the datasets, which includes the path to the S3 dataset along with the schema information. Customers also have the option of encrypting the Data Catolog with their own AWS Key Management Service (KMS) key.
In order to ensure encryption when certain Amazon Athena actions are being performed, customers can use service control policies (SCPs) and S3’s IAM conditions keys as a preventive mechanism. SCPs are a type of organization policy that you can use to manage permissions in your organization. SCPs offer central control over the maximum available permissions for all accounts in your organization. SCPs help you to ensure your accounts stay within your organization’s access control guidelines. See the following example which illustrates how an SCP configured with a deny strategy can be used to restrict uploads of unencrypted data to S3.
{
"Version": "2012-10-17",
"Id": "PutObjectPolicy",
"Statement": [
{
"Sid": "DenyIncorrectEncryptionHeader",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
},
{
"Sid": "DenyUnencryptedObjectUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "*",
"Condition": {
"Null": {
"s3:x-amz-server-side-encryption": "true"
}
}
}
]
}
Isolation of compute environments with Amazon Athena
AWS Athena is a managed service that does not provide any isolated compute resources on the customer’s side of the shared responsibility model. As a managed service, Amazon Athena is protected by the AWS global network security procedures that are described in the Introduction to AWS Security whitepaper.
Customers can establish a private connection between their VPC and Amazon Athena by creating an interface VPC endpoint. Interface endpoints are powered by AWS PrivateLink, a technology that enables you to privately access Amazon Athena APIs or JDBC/ODBC endpoints without an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. Using an interface VPC endpoint, instances in your VPC don’t need public IP addresses to communicate with Amazon Athena. The use of interface VPC endpoints also ensure that traffic between your VPC and Amazon Athena does not leave the Amazon network.
When using interface endpoint, customers attach an endpoint policy to it and this policy controls access to the service to which you are connecting. The following is an example of an endpoint policy for Amazon Athena. When attached to an VPC endpoint, this policy grants access to the specified Amazon Athena actions for all principals on all resources.
{
"Statement": [{"Principal": "*",
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:RunQuery",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:CancelQueryExecution",
"athena:ListWorkGroups",
"athena:GetWorkGroup",
"athena:TagResource"
],
"Resource": [
"*"
]
}]
}
Similarly, by using resource-based policies, customers can restrict access to their resources to only allow access from VPC endpoints. For example, by using S3 bucket policies, customers can restrict access to a given Amazon S3 bucket only through the endpoint. This ensures that traffic remains private and only flows through the endpoint. The following is an example of a policy that allows VPC vpc-111bbb22 to access my_secure_bucket and its objects. The policy denies all access to the bucket if the specified VPC is not being used. The aws:sourceVpc condition does not require an ARN for the VPC resource, only the VPC ID.
{
"Version": "2012-10-17",
"Id": "Policy1415115909152",
"Statement": [
{
"Sid": "Access-to-specific-VPC-only",
"Principal": "*",
"Action": "s3:*",
"Effect": "Deny",
"Resource": ["arn:aws:s3:::my_secure_bucket",
"arn:aws:s3:::my_secure_bucket/*"],
"Condition": {
"StringNotEquals": {
"aws:sourceVpc": "vpc-111bbb22"
}
}
}
]
}
Access can further be restricted to specific AWS Services by leveraging the aws:CalledVia condition key, which allows you to create distinct access rules for the actions performed by your IAM principals. Further details along with an example policy showing how to limit S3 access exclusively to Athena can be found in this blog post.
To secure access to Athena from on premises JDBC/ODBC clients or applications, customers can establish an AWS Site-to-Site VPN connection or an AWS Direct Connect connection leveraging MACsec. Either approach will enable private and secure communication between the customers’ private network and their AWS VPC.
Automating audits with APIs with Amazon Athena
AWS Config monitors the configuration of resources and provides some out of the box rules to alert when resources fall into a non-compliant state. There are a number of S3 specific managed rules within AWS Config that can be applied to ensure that appropriate data protection policies are in place.
A wide array of options are also available to monitor usage and detect any issues. Athena integrates with AWS CloudTrail to automatically log actions taken by a user, role, or an AWS service in Athena. All Athena API calls, such initiating a query execution, are logged within CloudTrail. Particular attention should be placed on non-idempotent API calls listed below. These should be tightly monitored and controlled as improper access can lead to security risks and unexpected charges.
- StartQueryExecution and StopQueryExecution
- CreateDataCatalog
- CreatePreparedStatement
- CreateWorkGroup
- UpdateDataCatalog
- UpdatePreparedStatement
- UpdateWorkGroup
- DeleteDataCatalog
- DeleteNamedQuery
- DeletePreparedStatement
- DeleteWorkGroup
Here is an example of what a CloudTrail log looks like for a successful StartQueryExecution API:
{"eventVersion":"1.05",
"userIdentity":{"type":"IAMUser",
"principalId":"EXAMPLE_PRINCIPAL_ID",
"arn":"arn:aws:iam::123456789012:user/johndoe",
"accountId":"123456789012",
"accessKeyId":"EXAMPLE_KEY_ID",
"userName":"johndoe"
},
"eventTime":"2017-05-04T00:23:55Z",
"eventSource":"athena.amazonaws.com",
"eventName":"StartQueryExecution",
"awsRegion":"us-east-1",
"sourceIPAddress":"77.88.999.69",
"userAgent":"aws-internal/3",
"requestParameters":{"clientRequestToken":"16bc6e70-f972-4260-b18a-db1b623cb35c",
"resultConfiguration":{"outputLocation":"s3://athena-johndoe-test/test/"
},
"queryString":"Select 10"
},
"responseElements":{"queryExecutionId":"b621c254-74e0-48e3-9630-78ed857782f9"
},
"requestID":"f5039b01-305f-11e7-b146-c3fc56a7dc7a",
"eventID":"c97cf8c8-6112-467a-8777-53bb38f83fd5",
"eventType":"AwsApiCall",
"recipientAccountId":"123456789012"
}
Financial services customers can also use AWS Audit Manager to continuously audit their AWS usage and simplify how they assess risk and compliance with regulations and industry standards. AWS Audit Manager automates evidence collection and organizes the evidence as defined by the control set in the framework selected such as PCI-DSS, SOC 2, and GDPR. Audit Manager collects data from sources including CloudTrail to compare the environment’s configurations against the compliance controls. By logging all Amazon Athena calls in CloudTrail, Audit Manager’s integration with CloudTrail becomes advantageous when needing to ensure that controls have been met.
Considering the encryption requirement in SOC 2 for example, rather than querying across all CloudTrail logs to ensure the S3 bucket for Amazon Athena’s output is encrypted, customers can centrally collect evidence in Audit Manager to check whether the requirement is being met. Audit Manager saves time with automated collection of evidence and provides audit-ready reports for customers to review. The Audit Manager assessment report uses cryptographic verification to help you ensure the integrity of the assessment report.
The following screenshot illustrates the configuration of a customer control’s data source for the Amazon Athena action of interest.
Furthermore, the GetQueryExecution API call provides additional information about the query such as its execution time and the amount of data scanned. This API can be used in conjunction with CloudWatch and Lambda, to capture query execution details for real time monitoring and analysis. Refer to this blog post for an end-to-end implementation of an Athena monitoring solution.
Operational access and security with Amazon Athena
AWS customers in the financial services industry may require visibility to any access to their data stored on AWS. Customers can review third-party auditor reports such as the AWS SOC 2 Type II report, ISO 27001, and others in AWS Artifact.
Fine grained permissions can be defined for users, groups, and services using AWS IAM. Athena documentation provides a number of sample policies to restrict access to particular S3 locations, databases, and tables. For example, a common Amazon Athena scenario is granting access to users in an account different from the bucket owner so that they can perform queries. In this case, use a bucket policy to grant access.
The following example bucket policy, created and applied to bucket s3://my-athena-data-bucket by the bucket owner, grants access to all users in account 123456789123, which is a different account.
{"Version": "2012-10-17",
"Id": "MyPolicyID",
"Statement": [
{"Sid": "MyStatementSid",
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::123456789123:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::my-athena-data-bucket",
"arn:aws:s3:::my-athena-data-bucket/*"
]
}
]
}
Athena also integrates with AWS Lake Formation, a service provides an authorization and governance layer on data stored in Amazon S3. With AWS Lake Formation, customers can more easily define fine grained access policies at the database, table, or column levels. Each time an Athena principal (user, group, or role) runs a query on data registered with Lake Formation, Lake Formation verifies that the principal has the appropriate permissions to the database, table, and the underlying S3 objects. If the principal has access, Lake Formation vends temporary credentials to Athena, and the query runs.
Additional controls can be implemented by assigning users to Athena Workgroups. Workgroups are used to to separate users, teams, applications, and workloads, set data usage controls, and to track costs. Furthermore, data query patterns within a Workgroup can be restricted to a predefined set of parameterized queries using Prepared Statements. These can help prevent SQL injection attacks and reduce likelihood of users executing expensive queries.
SAML 2.0-based federation can be used to securely access Athena via JDBC/ODBC clients without the need to embed credentials into the connection string. Alternatively IAM credentials can be supplied directly into the JDBC/ODBC connection string, however extra care would need to be taken to ensure that these credentials are not inadvertently leaked. As such, the SAML 2.0 approach is the recommended approach for accessing Athena via JDBC/ODBC.
Conclusion
In this post we reviewed Amazon Athena and highlighted key information that can help FSI customers accelerate the approval of the service within these five categories: achieving compliance, data protection, isolation of compute environments, automating audits with APIs, and operational access and security. While not a one size fits all approach, the guidance provided in the preceding section can be adapted to meet your organization’s security and compliance requirements and provide a consolidated list of key areas for Athena.
In the meantime, be sure to visit AWS FSI Services Spotlight and stay tuned for more financial services news and best practices.