AWS Architecture Blog

Category: AWS Glue

Figure 2. Lake House architecture on AWS

Architecting Persona-centric Data Platform with On-premises Data Sources

Many organizations are moving their data from silos and aggregating it in one location. Collecting this data in a data lake enables you to perform analytics and machine learning on that data. You can store your data in purpose-built data stores, like a data warehouse, to get quick results for complex queries on structured data. […]

Read More
Figure 1. Architecture for managing, anonymizing, and analyzing medical image data

Using AppStream 2.0 to Deliver PACS and Image Analysis in Clinical Trials

Hospitals and clinical trial sites manage sensitive patient data. They are often required to grant remote access to custom Windows-based applications for patient record review and medical image analysis. This typically requires providing physicians and staff with remote access to on-premises workstations over VPN, with some flavor of remote desktop software. This can be both […]

Read More
The following diagram shows the components that are used in this solution. We use an AWS CloudFormation template to set up the required ntworking components (for example, VPC, subnets).

Field Notes: Develop Data Pre-processing Scripts Using Amazon SageMaker Studio and an AWS Glue Development Endpoint

This post was co-written with Marcus Rosen, a Principal  – Machine Learning Operations with Rio Tinto, a global mining company.  Data pre-processing is an important step in setting up Machine Learning (ML) projects for success. Many AWS customers use Apache Spark on AWS Glue or Amazon EMR to run data pre-processing scripts while using Amazon SageMaker […]

Read More
Figure 1. Example architecture using AWS Managed Services

Building a Cloud-based OLAP Cube and ETL Architecture with AWS Managed Services

For decades, enterprises used online analytical processing (OLAP) workloads to answer complex questions about their business by filtering and aggregating their data. These complex queries were compute and memory-intensive. This required teams to build and maintain complex extract, transform, and load (ETL) pipelines to model and organize data, oftentimes with commercial-grade analytics tools. In this […]

Read More
Pilot consideration process

Designing a Successful Pilot Phase for Your Cloud Migration

Pilot phases, or pilots, as we will call them from now on, should be conducted to test and find the positive and negative aspects of a particular use case, design pattern, or application migration approach. They allow you to validate the foundation of your architecture (for example, with a landing zone governed by AWS Control […]

Read More
AI-powered Passenger Callback System for Airlines

NLX is Helping Travelers Amid Disruption with AI-Powered Automation

This post was co-written by Andrei Papancea and Vlad Papancea of NLX and Sekhar Mallipeddi Travel impacts brought by the global pandemic left several airlines experiencing frequent flight disruptions, which increased flight scheduling change notifications being made to affected travelers. Every month, tens of thousands of passengers and related flight crew have to be contacted […]

Read More
Figure 5. The Full Architectural Diagram

Reduce Operational Load using AWS Managed Services for your Data Solutions

As the volume of customers’ data grows, companies are realizing the benefits that data has for their business. Amazon Web Services (AWS) offers many database and analytics services, which give companies the ability to build complex data management workloads. At the same time, these services can reduce the operational overhead compared to traditional operations. Using […]

Read More
Figure 3. Replay Architecture

Amazon MSK Backup for Archival, Replay, or Analytics

Amazon MSK is a fully managed service that helps you build and run applications that use Apache Kafka to process streaming data. Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications. With Amazon MSK, you can use native Apache Kafka APIs to populate data lakes. You can also stream changes to […]

Read More
Amazon Personalize: from datasets to a recommendation API

Automating Recommendation Engine Training with Amazon Personalize and AWS Glue

Customers from startups to enterprises observe increased revenue when personalizing customer interactions. Still, many companies are not yet leveraging the power of personalization, or, are relying solely on rule-based strategies. Those strategies are effort-intensive to maintain and not effective. Common reasons for not launching machine learning (ML) based personalization projects include: the complexity of aggregating […]

Read More
Mercado Libre logo

Mercado Libre: How to Block Malicious Traffic in a Dynamic Environment

Blog post contributors: Pablo Garbossa and Federico Alliani of Mercado Libre Introduction Mercado Libre (MELI) is the leading e-commerce and FinTech company in Latin America. We have a presence in 18 countries across Latin America, and our mission is to democratize commerce and payments to impact the development of the region. We manage an ecosystem […]

Read More