AWS Partner Network (APN) Blog

Network Access Patterns of AWS Lambda for Confluent Cloud

By Geetha Anne, Sr. Solutions Engineer – Confluent

Confluent-AWS-Partners-2022
Confluent
Connect with Confluent-1

Networking plays a crucial role in design decisions for building a serverless application, and AWS Lambda is arguably the backbone of the Amazon Web Services (AWS) serverless platform.

With its event-driven nature, AWS Lambda provides seamless integration with modern-day platforms like Confluent Cloud. To facilitate this communication, it’s worth investing time in learning about how to set up communication channels properly between Lambda and other services.

In this post, I will cover best practices to set up network access paths for Lambda when integrating with Confluent Cloud. I’ll also review details about various resources in Confluent Cloud like connectors, Kafka endpoints, and private links, as well as ways to establish connectivity between them and Lambda.

Founded by the creators of Apache Kafka, Confluent is an AWS Data and Analytics Competency Partner that enables organizations to harness business value from stream data. Confluent is also a validated AWS Lambda Ready product.

What is Apache Kafka, Confluent Cloud, and Connectors?

Apache Kafka is a community distributed event streaming platform capable of handling trillions of events a day. Confluent re-architected Kafka for the cloud to be elastically scalable and globally available, providing a serverless, cost-effective, and fully managed service ready to deploy, operate, and scale in a matter of minutes.

Kafka Connect allows you to integrate Apache Kafka with other apps and data systems with no new code. Confluent takes it one step further by offering an extensive portfolio of pre-built Kafka connectors, enabling you to modernize your entire data architecture even faster with powerful integrations on any scale.

Confluent connectors also provide peace of mind with enterprise-grade security, reliability, compatibility, and support. With 120+ connectors, stream processing, security and data governance, and global availability, Confluent Cloud enables you to meet all of your data in motion needs.

Confluent Cloud is a fully managed cloud service that provides a simple, scalable, resilient, and secure event streaming platform. You can access Confluent Cloud clusters through secure internet endpoints, AWS PrivateLink connections, virtual private cloud (VPC) peering, or AWS Transit Gateway.

AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. Lambda functions can be triggered by a variety of AWS events or by supporting third-party services to build reactive, event-driven systems.

Confluent Cloud + AWS Lambda

In order to create a cloud-based powerhouse, you need to combine tools that can complement each other’s strengths. Confluent Cloud clusters can act as an event source for AWS Lambda, and can receive data that is sent over from AWS. Confluent can send aggregated data to AWS through the Lambda Sink Connector for processing and sending via RESTful HTTPS connection to downstream web applications, databases, or microservices.

The AWS Lambda Sink connector pulls records from one or more Kafka topics, converts them to JSON, and executes a Lambda function. The response of the Lambda function can optionally be written to another Kafka topic. The Lambda function can be invoked either synchronously or asynchronously.

It’s worth noting that Lambda event source mapping can enable automatic invocation of the Lambda function when events occur, and maps an event source like Confluent Cloud Kafka topics to a Lambda function. Especially in use cases like auto scaling, event source mapping can be leveraged. If you have a consistent workload and wish to invoke asynchronously, or want a dead-letter queue, Lambda Sink connector will be a right choice.

Confluent-Lambda-Networking-1

Figure 1 – Holistic view of Confluent Cloud and AWS Lambda integration.

In a similar scenario, requests can be accepted on AWS via Amazon API Gateway and processed in Lambda, which sends messages to a Confluent Cloud topic so that Confluent can use one of its connectors to send the data to a destination system such as Oracle, MongoDB, Amazon Relational Database Service (Amazon RDS), and object storage like Amazon Simple Storage Service (Amazon S3).

AWS PrivateLink Overview

AWS PrivateLink provides private connectivity from a customer’s Amazon Virtual Private Cloud (Amazon VPC) and Confluent Cloud VPC, without exposing the traffic to the public network.

To create a dedicated cluster with AWS PrivateLink, you’ll need to create a Confluent Cloud network first in the required cloud and region. Follow the steps in the Confluent documentation to get started.

When configured to use PrivateLink with Confluent Cloud, a customer’s VPC must allow outbound internet connections for domain name system (DNS) resolution, Confluent Cloud Schema Registry, and Confluent CLI. DNS requests to the public authority traversing to a private hosted zone is required.

Once the PrivateLink connectivity is established, in the Confluent Cloud console you’ll find the following information for your Confluent Cloud cluster under “Cluster Settings” and Confluent Cloud network under “Confluent Cloud Network Overview.”

  • Kafka Bootstrap (in the General tab)
  • Availability Zone IDs (in the Networking tab)
  • VPC Service Endpoint Name (in the Networking tab)
  • DNS Domain Name (in the Networking tab)
  • Zonal DNS Subdomain Names (in the Networking tab)

Network Architecture for AWS Lambda and Confluent Cloud

The following diagram summarizes the network architecture with respect to the Customer VPC and Confluent VPC and Lambda Functions.

Confluent-Lambda-Networking-2

Figure 2 – Network architecture depicting Confluent Cloud and AWS Lambda integration.

VPC Network Elements

There are three kinds of VPCs involved when using Confluent Cloud dedicated clusters as event sources to Lambda:

  • Confluent Cloud VPC
  • Customer AWS VPC
  • Customer Lambda Service VPC

Confluent VPC is where all Confluent-managed resources reside. AWS PrivateLink enables secured uni-directional communication between the customer VPC and Confluent VPC and protects against data exfiltration.

As there is no direct network access to the execution environment where the Lambda functions run, invocation of Lambda functions triggered by Confluent Cloud only occurs through the Lambda API. By default, when Lambda is not configured to connect to a VPC, the function can access anything available on the public internet, such as other AWS services, Kafka HTTPS bootstrap endpoints for APIs, or services and endpoints outside of AWS.

When you configure a Lambda function connected to a VPC, it creates an elastic network interface (ENI) in the customer VPC. These ENIs allow network access from Lambda functions to private resources. The Lambda functions continue to run inside of the service’s VPC and can only access resources over the network through the customer VPC. All invocations for functions continue to come from the Lambda API.

Under the hood, AWS Lambda creates a hyperplane ENI, which is a managed network resource the Lambda service controls. It allows multiple execution environments to securely access resources inside of VPCs in your account.

Instead of the previous solution of mapping network interfaces in your VPC directly to Lambda execution environments, network interfaces in your VPC are mapped to the hyperplane ENI and the functions connect using it. This ENI is created when Lambda functions are created or VPC settings are updated. Hyperplane ENIs can scale to support large numbers of concurrent function executions.

When your Lambda and customer VPC reside in the same region, you could leverage VPC endpoints to establish a private connection between the customer VPC and Lambda. With VPC interface endpoints, traffic between is retained within the AWS private network itself without routing through the public internet. You simply need to add that specific VPC interface endpoint as a route in the route table of the private subnet where the Lambda ENI exists.

Below are some notable points when integrating Confluent Cloud with AWS Lambda:

  • Lambda API is available via the public internet endpoints or VPC interface endpoint. When conferring the Lambda Sink Connector, Lambda functions are referred to by their names so the connector will use the public AWS API endpoints, use public addresses, and traffic stays within the AWS backbone.
  • When using Confluent Cloud as a trigger via event source mapping, Lambda polls a topic for events and consumes a batch of records. If Confluent Kafka is set up with private network configuration, then Lambda needs to be attached to the customer VPC to access resources in the VPC-like topics.
  • Confluent’s managed Lambda Sink Connector will trigger the Lambda function by calling the API with a JSON payload of events from a configured Kafka topic. The connector can reach the Lambda API endpoint over a public address space even when Lambda has an AWS PrivateLink configuration.

For detailed information on how to integrate AWS Lambda with Confluent Cloud’s dedicated Clusters by leveraging Lambda functions, refer to this detailed step-by-step guide.

VPC Permissions

If the AWS Lambda service is VPC-bound, the customer’s Lambda function must have permission to access VPC resources. To access these resources, the function’s execution role must have the following permissions:

  • ec2:CreateNetworkInterface
  • ec2:DescribeNetworkInterfaces
  • ec2:DescribeVpcs
  • ec2:DeleteNetworkInterface
  • ec2:DescribeSubnets
  • ec2:DescribeSecurityGroups

By default, Lambda does not perform the required or optional actions for a self-managed Kafka cluster. You need to define the requirements in an identity and access management (IAM) trust policy, and then attach the policy to an execution role.

Before you can run Lambda Sink Connector, you must provide credentials and the region where the Lambda function is located. The credentials provided need to have permissions to the actions lambda:InvokeFunction and lambda:GetFunction. An example of how this policy may be set up is shown below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction",
                "lambda:GetFunction"
            ],
            "Resource": "*"
        }
    ]
}

Rules to follow while configuring a customer VPC to be linked to Confluent VPC:

  • Inbound rules: Allow all traffic on the Kafka broker port for the security groups specified for your event source. Kafka uses port 9092 by default.
  • Outbound rules: Allow all traffic on port 443 for all destinations. Allow all traffic on the Kafka broker port for the security groups specified for your event source. Kafka uses port 9092 by default.
  • For VPC endpoints: Instead of a network address translation (NAT) gateway, the security groups associated with the VPC endpoints must allow all inbound traffic on port 443 from the event source’s security groups.

Configuring Lambda Networking for Confluent Cloud

As mentioned, the two connectivity patterns of AWS Lambda to and from Confluent Cloud resources when using AWS PrivateLink as the network configuration are:

  • Accessibility between Lambda and Confluent Cloud VPC and the customer VPC.
  • Reachability between a Confluent Cloud resource like Lambda Sink Connector to Lambda’s function.

Establish Lambda Service to Confluent Cloud Connectivity

The Lambda function, when configured in default settings, is required to resolve the DNS name of Confluent Cloud instance with a private hosted zone associated with customer VPC via Lambda ENIs.

The Kafka bootstrap URL that’s collected from Confluent Cloud user interface (UI) will need access to a private hosted zone located in customer VPC for successful DNS resolution. To achieve this, make sure you gather all of the information mentioned in the “AWS PrivateLink Overview” section of this post.

The Confluent Cloud VPC and cluster is created in specific zones that, for optimal usage, should match the zones of the VPC you want to make the AWS PrivateLink connections from. You must have subnets in your VPC for these zones so that IP addresses can be allocated from them. Note that it’s allowed to also have subnets in zones outside of these.

AWS Zone IDs should be used for this, and you can find the specific AWS Availability Zones (AZs) for your Confluent Cloud cluster in the Confluent Cloud console.

DNS changes must be made to ensure connectivity passes through AWS PrivateLink in the supported pattern. Any DNS provider such as Amazon Route 53 can be used to ensure DNS is routed as follows is acceptable.

This documentation will enable on ways to:

  • Set up the VPC endpoint for AWS PrivateLink in your AWS account.
  • Verify that DNS hostnames and DNS resolution are enabled.
  • Create the VPC endpoint.
  • Set up DNS records to use AWS VPC endpoints with private hosted zones.

Confluent-Lambda-Networking-3

Figure 3 – AWS Lambda console showing resolution of Confluent Cloud Kafka Bootstrap URL via Lambda ENI.

In the allocated private hosted zone, set up DNS records for Confluent Cloud multi-AZ clusters. Create the records with the “Create Record” button using the multi-zone VPC endpoint DNS names mapped from the previous executed steps. You also need to add the VPC endpoint name as a route in the route table of the private subnet where the Lambda ENI exists.

*.$domain CNAME “All Zones VPC Endpoint” TTL 60

For example:

*.labcd565.us-west-2.aws.confluent.cloud CNAME vpce-09f9f82eed-9gxp2f7v.vpce70ee9e.us-west-2.vpce.amazonaws.com TTL 60

The CNAME is used to ensure Amazon Route 53 health checks are used in the case of AWS outages.

Confluent-Lambda-Networking-4

Figure 4 – Log events on AWS lambda displaying Confluent Cloud DNS resolution.

Connectivity from Confluent Cloud to AWS Lambda

The Lambda function can be invoked by the fully-managed Lambda Sink Connector and push messages from the Confluent Cloud’s topic into Lambda. The connector will communicate with the public Lambda API and the function will interact with Confluent Cloud through the customer VPC.

  • In synchronous mode: Kafka records within topic partitions are processed in parallel. The response from Lambda is written to the success-<connector-id> topic. If an error occurs during Lambda execution, the connector is configured to write the error to the error-<connector-id> topic and proceed. For additional details about Lambda invocation, see the documentation on Synchronous invocation.
    .
  • In asynchronous mode: Kafka messages are processed on a best-effort, sequential basis. Lambda will automatically retry up to two times, after which it can move the request to a dead letter queue. For additional details about Lambda invocation, see the documentation on Asynchronous invocation.

Confluent-Lambda-Networking-5

Figure 5 – Confluent Cloud console displaying a running Lambda Sink Connector.

Conclusion

Understanding how AWS Lambda’s inherent event-driven nature can work alongside a highly reliable streaming platform like Confluent Cloud can help you build scalable applications in an automated fashion.

In this post, I explained how to establish a secure network connection between AWS Lambda and Confluent Cloud when utilized as an event source. I covered how to set up hosted zones and DNS resolution for Confluent Cloud when configured with AWS PrivateLink.

Finally, I discussed how Confluent’s Lambda Sink Connector can act as a trigger to a Lambda function and be configured over Lambda’s public API.

The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

.
Confluent-APN-Blog-Connect-2022
.


Confluent – AWS Partner Spotlight

Confluent is an AWS Data and Analytics Competency Partner that was founded by the creators of Apache Kafka and enables organizations to harness business value from stream data.

Contact Confluent | Partner Overview | AWS Marketplace