AWS Big Data Blog

Category: Artificial Intelligence

Use a linear learner algorithm in Amazon Redshift ML to solve regression and classification problems

Amazon Redshift is the fastest, most widely used, fully managed, and petabyte-scale cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. Amazon Redshift ML, powered by Amazon SageMaker, makes it easy for SQL users such as data analysts, data scientists, and […]

Read More

Secure data movement across Amazon S3 and Amazon Redshift using role chaining and ASSUMEROLE

Data lakes use a ring of purpose-built data services around a central data lake. Data needs to move between these services and data stores easily and securely. The following are some examples of such services: Amazon Simple Storage Service (Amazon S3), which stores structured, unstructured, and semi-structured data Amazon Redshift, a fully managed, petabyte-scale data […]

Read More

Use Amazon CodeGuru Profiler to monitor and optimize performance in Amazon Kinesis Data Analytics applications for Apache Flink

Amazon Kinesis Data Analytics makes it easy to transform and analyze streaming data and gain actionable insights in real time with Apache Flink. Apache Flink is an open-source framework and engine for processing data streams in real time. Kinesis Data Analytics reduces the complexity of building and managing Apache Flink applications using open-source libraries and […]

Read More
Solution Architecture

Build and deploy custom connectors for Amazon Redshift with Amazon Lookout for Metrics

Amazon Lookout for Metrics detects outliers in your time series data, determines their root causes, and enables you to quickly take action. Built from the same technology used by Amazon.com, Lookout for Metrics reflects 20 years of expertise in outlier detection and machine learning (ML). Read our GitHub repo to learn more about how to […]

Read More

How Cynamics built a high-scale, near-real-time, streaming AI inference system using AWS

This post is co-authored by Dr. Yehezkel Aviv, Co-Founder and CTO of Cynamics and Sapir Kraus, Head of Engineering at Cynamics. Cynamics provides a new paradigm of cybersecurity — predicting attacks long before they hit by collecting small network samples (less than 1%), inferring from them how the full network (100%) behaves, and predicting threats […]

Read More

Backtest trading strategies with Amazon Kinesis Data Streams long-term retention and Amazon SageMaker

Real-time insight is critical when it comes to building trading strategies. Any delay in data insight can cost lot of money to the traders. Often, you need to look at historical market trends to predict future trading pattern and make the right bid. More the historical data you analyze, better trading prediction you get. Back […]

Read More

Provide data reliability in Amazon Redshift at scale using Great Expectations library

Ensuring data reliability is one of the key objectives of maintaining data integrity and is crucial for building data trust across an organization. Data reliability means that the data is complete and accurate. It’s the catalyst for delivering trusted data analytics and insights. Incomplete or inaccurate data leads business leaders and data analysts to make […]

Read More

WeatherBug reduced ETL latency to 30 times faster using Amazon Redshift Spectrum

This post is co-written with data engineers, Anton Morozov and James Phillips, from Weatherbug. WeatherBug is a brand owned by GroundTruth, based in New York City, that provides location-based advertising solutions to businesses. WeatherBug consists of a mobile app reporting live and forecast data on hyperlocal weather to consumer users. The WeatherBug Data Engineering team […]

Read More

How MEDHOST’s cardiac risk prediction successfully leveraged AWS analytic services

MEDHOST has been providing products and services to healthcare facilities of all types and sizes for over 35 years. Today, more than 1,000 healthcare facilities are partnering with MEDHOST and enhancing their patient care and operational excellence with its integrated clinical and financial EHR solutions. MEDHOST also offers a comprehensive Emergency Department Information System with […]

Read More

Simplify data discovery for business users by adding data descriptions in the AWS Glue Data Catalog

In this post, we discuss how to use AWS Glue Data Catalog to simplify the process for adding data descriptions and allow data analysts to access, search, and discover this cataloged metadata with BI tools. In this solution, we use AWS Glue Data Catalog, to break the silos between cross-functional data producer teams, sometimes also known […]

Read More