Posted On: Nov 22, 2022
Amazon EMR Serverless announces support for reading and writing data in Amazon DynamoDB with your Spark and Hive workflows. You can now export, import, query and, join tables in Amazon DynamoDB directly from your EMR Serverless Spark and/or Hive applications. Amazon DynamoDB is a fully managed NoSQL database that meets the latency and throughput requirements of highly demanding applications by providing single-digit millisecond latency and predictable performance with seamless throughput and storage scalability.
AWS users often have a need to process data stored in Amazon DynamoDB efficiently and at scale for downstream analytics. Amazon EMR team built and open-sourced emr-dynamodb-connector to help customers simplify access and configuration to Amazon DynamoDB using their Apache Spark and Apache Hive applications. This connector enables multiple analytics use cases including efficiently processing data in Amazon DynamoDB or joining tables in Amazon DynamoDB with external tables in Amazon S3, Amazon RDS, or other data stores that can be accessed by Amazon EMR Serverless. With Amazon EMR release 6.9, you can get all the benefits of the Amazon DynamoDB connector with your Amazon EMR Serverless applications. You can use both cross-region and cross-account access Amazon DynamoDB tables.
We are also delighted to share that EMR Serverless supports accessing specific Amazon S3 buckets from other AWS accounts to process data from your Spark and Hive applications. AWS customers use multiple AWS accounts to better separate different projects or lines of business. Having cross-account capabilities simplifies securing and managing distributed data lakes across multiple accounts through a centralized approach. With cross-account access to Amazon S3, you can use your EMR Serverless Spark or Hive application in an AWS account and access data stored in specific buckets from other AWS accounts for processing.
These features are now available in all EMR Serverless regions. To learn more, refer to the Amazon EMR Serverless documentation.