AWS Big Data Blog

Tag: pandas

Advanced patterns with AWS SDK for pandas on AWS Glue for Ray

September 2023: This post was reviewed and updated with a new dataset and related code blocks and images. AWS SDK for pandas is a popular Python library among data scientists, data engineers, and developers. It simplifies interaction between AWS data and analytics services and pandas DataFrames. It allows easy integration and data movement between 22 […]

Scale AWS SDK for pandas workloads with AWS Glue for Ray

September 2023: This post was reviewed and updated with a new dataset and related code blocks and images. AWS SDK for pandas is an open-source library that extends the popular Python pandas library, enabling you to connect to AWS data and analytics services using pandas data frames. We’ve seen customers use the library in combination […]

Introducing AWS Glue for Ray: Scaling your data integration workloads using Python

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. Today, AWS Glue processes customer jobs using either Apache Spark’s distributed processing engine for large workloads or Python’s single-node processing engine for smaller workloads. Customers […]