AWS Architecture Blog

Category: AWS Glue

Serverless S3 metadata search

Swiftly Search Metadata with an Amazon S3 Serverless Architecture

As you increase the number of objects in Amazon Simple Storage Service (Amazon S3), you’ll need the ability to search through them and quickly find the information you need. In this blog post, we offer you a cost-effective solution that uses a serverless architecture to search through your metadata. Using a serverless architecture helps you […]

Read More
Figure 2. Containerized application for ingestion and Amazon Kinesis for format conversion

Designing a High-volume Streaming Data Ingestion Platform Natively on AWS

The total global data storage is projected to exceed 200 zettabytes by 2025. This exponential growth of data demands increased vigilance against cybercrimes. Emerging cybersecurity trends include increasing service attacks, ransomware, and critical infrastructure threats. Businesses are changing how they approach cybersecurity and are looking for new ways to tackle these threats. In the past, […]

Read More

Field Notes: How to Build an AWS Glue Workflow using the AWS Cloud Development Kit

Many customers use AWS Glue workflows to build and orchestrate their ETL (extract-transform-load) pipelines directly in the AWS Glue console using the visual tool to author workflows. This can be time consuming, harder to version control, and error prone due to manual configurations, when compared to managing your workflows as code. To improve your operational […]

Read More
Figure 1. Data flow - Source to data lake target

Hybrid Cloud Architectures Using Self-hosted Apache Kafka and AWS Glue

Using analytics to gain insights from a variety of datasets is key to successful transformation. There are many options to consider to realize the full value and potential of our data in a hybrid cloud infrastructure. Common practice is to route data produced from on-premises to a central repository or data lake. Here it can […]

Read More
Figure 1. OR optimization options

Emerging Solutions for Operations Research on AWS

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Operations research (OR) uses mathematical and analytical tools to arrive at optimal solutions for complex business problems like workforce scheduling. The mathematical techniques used to solve these problems, such as linear programming and mixed-integer programming, require the use of optimization […]

Read More
Figure 2. Building Lake House architectures with AWS Glue

How to Accelerate Building a Lake House Architecture with AWS Glue

Customers are building databases, data warehouses, and data lake solutions in isolation from each other, each having its own separate data ingestion, storage, management, and governance layers. Often these disjointed efforts to build separate data stores end up creating data silos, data integration complexities, excessive data movement, and data consistency issues. These issues are preventing […]

Read More
High-level design for an AWS lake house implementation

Benefits of Modernizing On-premises Analytics with an AWS Lake House

Organizational analytics systems have shifted from running in the background of IT systems to being critical to an organization’s health. Analytics systems help businesses make better decisions, but they tend to be complex and are often not agile enough to scale quickly. To help with this, customers upgrade their traditional on-premises online analytic processing (OLAP) […]

Read More
Figure 1. Notional architecture for improving forecasting accuracy solution and SAP integration

Improving Retail Forecast Accuracy with Machine Learning

The global retail market continues to grow larger and the influx of consumer data increases daily. The rise in volume, variety, and velocity of data poses challenges with demand forecasting and inventory planning. Outdated systems generate inaccurate demand forecasts. This results in multiple challenges for retailers. They are faced with over-stocking and lost sales, and […]

Read More
Figure 2. A serverless architecture representing data workflow

Building a Showback Dashboard for Cost Visibility with Serverless Architectures

Enterprises with centralized IT organizations and multiple lines of businesses frequently use showback or chargeback mechanisms to hold their departments accountable for their technology usage and costs. Chargeback involves actually billing a department for the cost of their division’s usage. Showback focuses on visibility to make the department more cost conscientious and encourage operational efficiency. […]

Read More
Figure 2. Lake House architecture on AWS

Architecting Persona-centric Data Platform with On-premises Data Sources

Many organizations are moving their data from silos and aggregating it in one location. Collecting this data in a data lake enables you to perform analytics and machine learning on that data. You can store your data in purpose-built data stores, like a data warehouse, to get quick results for complex queries on structured data. […]

Read More