AWS Partner Network (APN) Blog
Tag: Apache Spark
Building an Agile Business Rules Engine on AWS
When implemented in legacy on-premises systems, business rules tend to be rigid in nature and impact business agility. By using a combination of AWS managed services, Capgemini can build a cloud-based rules engine that can be scaled to increasing data volume, easily configured, and quickly accessed. This enables organizations to be completely agile in developing new business rules, or updating/retiring business rules, without relying on IT.
Read MoreHow Verisk Argus Migrated Petabytes of SQL Server Data to AWS
Verisk provides data analytic insights to customers in insurance, energy and specialized markets, and financial services. Learn how a joint team from AWS and Verisk Argus migrated petabytes of SQL Server data from on-premises to Amazon S3 and Amazon S3 Glacier using AWS Snowball Edge and the Amazon EMR custom Spark ingestion framework. Through this migration framework, Verisk Argus is positioned to save millions of dollars by moving to AWS from their data center.
Read MoreUsing Databricks SQL on Photon to Power Your AWS Lake House
Databricks SQL is a dedicated workspace for data analysts that comprises a native SQL editor, drag-and-drop dashboards, and built-in connectors for all major business intelligence tools as well as Photon. In this post, Volker Tjaden, an APN Ambassador from Databricks, shares the technical capabilities of Databricks SQL and walks through two examples: ingesting, querying, and visualizing AWS CloudTrail log data, and building near real-time dashboards on data coming from Amazon Kinesis.
Read MoreFully Managed Data Governance with Amazon EMR Integration with Apache Ranger and Privacera
Privacera is an AWS Partner that provides security and privacy tools for enterprises to secure and govern user access to databases and datastores in the cloud. PrivaceraCloud reduces the burden of self-managing Apache Ranger by providing Ranger as a hosted service. It provides centralized management of data access, authorization policies, and auditing. Learn how Amazon EMR can integrate with PrivaceraCloud to provide a fully-managed data governance solution.
Read MoreHow Tamr Optimized Amazon EMR Workloads to Unify 200 Billion Records 5x Faster than On-Premises
Global business leaders recognize the value of advanced and augmented big data analytics over various internal and external data sources. However, technical leaders also face challenges capturing insights from data silos without unified master data. Learn how migrating Tamr’s data mastering solutions from on-premises to AWS allowed a customer to process billions of records five times faster with fully managed Amazon EMR clusters.
Read MoreBigstream Provides Big Data Acceleration with Apache Spark and Amazon EMR
Apache Spark and its parallel processing framework, along with the ease of scaling up in public clouds, have pushed out the limits for data analytics. Learn how Bigstream addresses growing Spark needs, with software that optimizes existing CPU infrastructure and can also seamlessly incorporate advanced programmable hardware. With the same number of servers, Bigstream can accelerate Spark clusters 3x with software alone and 10x when introducing FPGAs.
Read MoreHelping a Pharmaceutical Company Drive Business Insights Using ZS Accelerators on Amazon Redshift
With increasing data variety and volumes, it’s become increasingly necessary to ensure all of an organization’s workloads run in the most efficient manner to reduce overall turn-around time and TCO. Get an overview of the data and analytics platform ZS built to streamline and improve contracting analytics for a top life sciences company. Then, dive deep into the data architecture and learn how ZS evolved its data technology stack to get maximum performance.
Read MoreHow SnapLogic eXtreme Helps Visualize Spark ETL Pipelines on Amazon EMR
Fully managed cloud services enable global enterprises to focus on strategic differentiators versus maintaining infrastructure. They do this by creating data lakes and performing big data processing in the cloud. SnapLogic eXtreme allows citizen integrators, those who can’t code, and data integrators to efficiently support and augment data-integration use cases by performing complex transformations on large volumes of data. Learn how to set up SnapLogic eXtreme and use Amazon EMR to do Amazon Redshift ETL.
Read MoreTraining Multiple Machine Learning Models Simultaneously Using Spark and Apache Arrow
Spark is a distributed computing framework that added new features like Pandas UDF by using PyArrow. You can leverage Spark for distributed and advanced machine learning model lifecycle capabilities to build massive-scale products with a bunch of models in production. Learn how Perion Network implemented a model lifecycle capability to distribute the training and testing stages with few lines of PySpark code. This capability improved the performance and accuracy of Perion’s ML models.
Read MoreAccelerating Machine Learning with Qubole and Amazon SageMaker Integration
Data scientists creating enterprise machine learning models to process large volumes of data spend a significant portion of their time managing the infrastructure required to process the data, rather than exploring the data and building ML models. You can reduce this overhead by running Qubole data processing tools and Amazon SageMaker. An open data lake platform, Qubole automates the administration and management of your resources on AWS.
Read More