AWS Architecture Blog

Category: Amazon Simple Storage Service (S3)

Figure 1. River architecture diagram, depicting the flow of data from data producers through the River data ingestion service into Snowflake.

How Cimpress Built a Self-service, API-driven Data Platform Ingestion Service

Cimpress is a global company that specializes in mass customization, empowering individuals and businesses to design, personalize and customize their own products – such as packaging, signage, masks, and clothing – and to buy those products affordably. Cimpress is composed of multiple businesses that have the option to use the Cimpress data platform. To provide […]

Serverless S3 metadata search

Swiftly Search Metadata with an Amazon S3 Serverless Architecture

As you increase the number of objects in Amazon Simple Storage Service (Amazon S3), you’ll need the ability to search through them and quickly find the information you need. In this blog post, we offer you a cost-effective solution that uses a serverless architecture to search through your metadata. Using a serverless architecture helps you […]

Figure 2. AWS Storage Gateway now supports AWS PrivateLink for Amazon S3 endpoints and Amazon S3 Access Points

Connect Amazon S3 File Gateway using AWS PrivateLink for Amazon S3

AWS Storage Gateway is a set of services that provides on-premises access to virtually unlimited cloud storage. You can extend your on-premises storage capacity, and move on-premises backups and archives to the cloud. It provides low-latency access to cloud storage by caching frequently accessed data on premises, while storing data securely and durably in the […]

CloudWatch for monitoring your storage resources

Optimizing your AWS Infrastructure for Sustainability, Part II: Storage

In Part I of this series, we introduced you to strategies to optimize the compute layer of your AWS architecture for sustainability. We provided you with success criteria, metrics, and architectural patterns to help you improve resource and energy efficiency of your AWS workloads. This blog post focuses on the storage layer of your AWS infrastructure and provides […]

Figure 1. Data flow - Source to data lake target

Hybrid Cloud Architectures Using Self-hosted Apache Kafka and AWS Glue

Using analytics to gain insights from a variety of datasets is key to successful transformation. There are many options to consider to realize the full value and potential of our data in a hybrid cloud infrastructure. Common practice is to route data produced from on-premises to a central repository or data lake. Here it can […]

Figure 4. Machine to Cloud Connectivity (M2C2) Framework architecture

Securely Ingest Industrial Data to AWS via Machine to Cloud Solution

As a manufacturing enterprise, maximizing your operational efficiency and optimizing output are critical factors in this competitive global market. However, many manufacturers are unable to frequently collect data, link data together, and generate insights to help them optimize performance. Furthermore, decades of competing standards for connectivity have resulted in the lack of universal protocols to […]

Figure 1. Example request dataflow through AWS

Practical Entity Resolution on AWS to Reconcile Data in the Real World

This post was co-written with Mamoon Chowdry, Solutions Architect, previously at AWS. Businesses and organizations from many industries often struggle to ensure that their data is accurate. Data often has to match people or things exactly in the real world, such as a customer name, an address, or a company. Matching our data is important […]

Document processing architectural diagram

Convert and Watermark Documents Automatically with Amazon S3 Object Lambda

When you provide access to a sensitive document to someone outside of your organization, you likely need to ensure that the document is read-only. In this case, your document should be associated with a specific user in case it is shared. For example, authors often embed user-specific watermarks into their ebooks. This way, if their […]

Figure 1. Object expiry architecture flow

Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs

Organizations are using Amazon Simple Storage Service (S3) for building their data lakes, websites, mobile applications, and enterprise applications. As the number of objects within your S3 bucket increases, you may want to move older objects into lower-cost tiers of Amazon S3. In some cases you may want to delete the objects altogether to further […]

Figure 2. Fraud detection using machine learning architecture on AWS

Analyze Fraud Transactions using Amazon Fraud Detector and Amazon Athena

Organizations with online businesses have to be on guard constantly for fraudulent activity, such as fake accounts or payments made with stolen credit cards. One way they try to identify fraudsters is by using fraud detection applications. Some of these applications use machine learning (ML). A common challenge with ML is the need for a […]