AWS Partner Network (APN) Blog
Delivering Comprehensive Cybersecurity Insights with Tenable One Data Platform on AWS
By Tom Milner, Director of Engineering – Tenable
By Gavin D’Mello, Sr. Data Engineer – Tenable
By Tim Sitze, Solutions Architect – AWS
By John Klacynski, Sr. CSM – AWS
Tenable |
The Tenable One Exposure Management Platform allows organizations to gain a comprehensive view of their attack surface and vulnerabilities to prevent likely attacks and accurately communicate cyber risk.
Tenable One is driven by a data platform that uses data from all of Tenable’s vulnerability management, cloud security, identity exposure, web app scanning, and external attack surface management point products to provide cybersecurity leaders a comprehensive and contextual view of their attack surface.
Each of Tenable’s point products store and model data in separate silos for their own purposes. The data is structured in each of these silos to address the customer problem within the remit of each product.
To support the platform’s advanced analytics, Tenable needed a solution to bring all of the siloed datasets into a single standardized structure. To do this, Tenable created a new internal Apache Kafka service called Kafka-sync that can handle the rich variety of customer data.
In this post, we outline how the Tenable data engineering team uses Amazon Web Services (AWS) to ingest data from multiple sources and transform it into a single standard structure. By standardizing into a single data structure, Tenable is able to focus on giving customers the business insights and actionable intelligence they need from an exposure management platform.
Tenable is an AWS Security Competency Partner and AWS Marketplace Seller that empowers organizations to understand and reduce their cyber risk by providing visibility across the entire attack surface.
Deliver Business Value Faster with Managed Services
From the start, Tenable wanted its Exposure Management Platform to provide a comprehensive set of features to customers. The platform provides aggregated cyber risk insights in the form of exposure view, attack path analysis, and a full asset inventory to provide a centralized view of all customer assets.
These features are powered by the data customers send to Tenable and needed a scalable and reliable platform to build upon. The data engineering team wanted to keep their focus on delivering business features to customers, not on managing infrastructure. The decision was made to utilize AWS serverless and/or managed services first. This allowed the team to reduce operational workload and maintain the focus on customers.
The AWS services that Tenable chose to build the Tenable One data platform are:
- AWS Batch for hosting and running long-running data processing and enrichment jobs.
- AWS Lambda for running shorter-lived and less frequent workloads.
- Amazon EventBridge to carry event-based notifications from Amazon Simple Storage Service (Amazon S3).
- AWS Step Functions for orchestrating AWS Batch jobs.
- Amazon S3 for staging data and event notifications, which are a key feature in an event-driven architecture.
Figure 1 – Tenable One data flow.
Kafka as a Service
Tenable uses Apache Kafka as a message broker to publish and transport data between services. Kafka was chosen due to its high throughput and support for many different data formats including Apache Avro and JSON.
Kafka is used by different teams to publish data about their domains to the wider organization. Apache Avro, meanwhile, is the preferred message format due to its support for row-level data and schema evolution. Avro’s support for union complex types and nested records allows for more flexibility in how
Tenable structures its data within Kafka, but this pushes more complexity onto the consuming services and these advanced features are not always supported by the consumer service providers. For example, by using these advanced features Tenable is unable to use Kafka Connect for streaming data into Tenable One for several reasons, including a known issue with Kafka Connect supporting Avro union schema normalization, and high cardinality of nested records.
In addition, Kafka Connect makes it hard to control file sizes and too many small files can drive up costs for data processing. For these reasons, Tenable built a custom Kafka consumer service called Kafka-sync.
Custom Kafka Consumer Service
Kafka-sync helps speed up the ingestion of existing and new data sources into Tenable One. It does this by implementing the following features:
- Avro schema normalization to handle union complex types and nested records.
- Containerized code can be swapped between AWS Batch and AWS Lambda for easy deployment and integration.
- Size of file output is now configurable by size or time window.
- Ingestion time of published messages reduces from 24 hours to one hour.
- Configurable service for scale to handle n number of topics with n number of partitions.
The service has now reached a level where the addition of a new dataset is a one-line Terraform change. This means new sources of data can be integrated into Tenable One within a matter of hours, which accelerates the ability to deliver new insights for customers. This is important as Tenable looks to address customers’ needs to use more data to analyze and manage exposure.
Implementing Claim Check Pattern
Kafka has a soft limit of 1 MB per message, and this is often too low to accommodate the data customers send to Tenable via scans. This limit could be increased, but Tenable prefers to keep this constraint in place. Their cluster is shared across multiple teams and customer-facing products, and the 1 MB constraint helps protect the message broker for all customers and products.
However, this data is needed by multiple downstream services including Tenable One. The solution to this challenge was to implement the claim-check pattern using Kafka and Amazon DynamoDB.
Figure 2 – Claim-check data flow.
The first step in the pattern is when the producer service splits the incoming data into two parts. The first part is primarily unstructured, and this is written to Amazon DynamoDB.
Next, a unique resource identifier (URI) is generated that points to the record. This URI, and a second set of structured fields, are written as a new message to Kafka. These structured fields are generally enough for most consumer services to work with.
Each consumer service can then decide if they need to retrieve the extra data from DynamoDB, which was chosen for its almost infinite ability to scale elastically and low operational cost.
Tenable One needs the unstructured data from DynamoDB to build the most comprehensive analytics for customers. The team could have extended Kafka-sync to fetch this data when consuming the original structured message from their Kafka cluster, but this would have coupled all datasets to the same pattern even those that did not utilize the claim-check pattern.
For this reason, the team decided to decouple it into a separate event-driven service. Integration between the Kafka-sync and claim check services is driven by an Amazon S3 event notification. Data flows between the two services and is enriched in the following steps:
- Each file written to Amazon S3 by Kafka-sync triggers an event notification.
- These events are published to Amazon EventBridge.
- There is an EventBridge rule checking for the file prefix for the datasets to be enriched. This rule has an AWS Step Functions state machine as the target.
- The state machine does further filtering to check if enrichment needs to run for the S3 object. If it does, it invokes the scan-enrichment AWS Batch job.
- The AWS Batch job enriches the original Kafka message by using the embedded URI to read from DynamoDB. It then combines and writes both datasets into one S3 object to be ingested into Tenable One.
Summary
In this post, you learned how the Tenable data engineering team uses AWS to build data standardization and enrichment services to deliver data into the Tenable One Exposure Management Platform. This processed data is a key element in the platform and enables the team to develop new and exciting features for customers.
By leveraging AWS Cloud technologies and continuously seeking to improve their solutions, Tenable’s team can stay focused on delivering customer value without managing their own infrastructure. Tenable will continue to integrate new data sources and provide organizations with the tools they need to effectively mitigate risk and protect assets in the ever-changing threat landscape.
You can also learn more about Tenable in AWS Marketplace.
Tenable – AWS Partner Spotlight
Tenable is an AWS Security Competency Partner that empowers organizations to understand and reduce their cyber risk by providing visibility across the entire attack surface.