AWS Partner Network (APN) Blog

Tag: Apache Arrow

Dremio-AWS-Partners

Using Dremio for Fast and Easy Analysis of Amazon S3 Data

Although many SQL engines allow tools to query Amazon S3 data, organizations face multiple challenges, including high latency and infrastructure costs. Learn how Dremio empowers analysts and data scientists to analyze data in S3 directly at interactive speed, without having to physically copy data into other systems or create extracts, cubes, and/or aggregation tables. Dremio’s unique architecture enables faster and more reliable query performance than traditional SQL engines.

Training Multiple Machine Learning Models Simultaneously Using Spark and Apache Arrow

Spark is a distributed computing framework that added new features like Pandas UDF by using PyArrow. You can leverage Spark for distributed and advanced machine learning model lifecycle capabilities to build massive-scale products with a bunch of models in production. Learn how Perion Network implemented a model lifecycle capability to distribute the training and testing stages with few lines of PySpark code. This capability improved the performance and accuracy of Perion’s ML models.