AWS Open Source Blog
Category: Amazon Simple Storage Service (S3)
Build, train, and deploy Amazon Fraud Detector models using the open source Python SDK
Companies providing digital services are looking for ways to effectively identify fraudulent activities, such as online payment fraud and fake account creation. Amazon Fraud Detector is a fully managed service that uses machine learning (ML) and builds on 20 years of fraud detection expertise from Amazon Web Services (AWS) and Amazon.com to automatically identify potentially […]
Learn Amazon Simple Storage Service transfer configuration with Syne Tune
The object storage service Amazon Simple Storage Service (Amazon S3) is a foundational storage building block powering a variety of workloads from asset backup and serving, to analytics and machine learning. In this blog post, we describe how to search and find a scenario-specific optimized S3 download configuration in minutes using the open source distributed […]
Delta Sharing on AWS
This post was written by Frank Munz, Staff Developer Advocate at Databricks. An introduction to Delta Sharing During the past decade, much thought went into system and application architectures using domain-driven design and microservices, but we are still on the verge of building distributed data meshes. Such data meshes are based on two fundamental principles: […]
Introducing Assisted Log Enabler for AWS
Logging information is important for troubleshooting issues and analyzing performance, and when Amazon Web Services (AWS) customers do not have logging turned on, the ability to assist them becomes limited, to the point that performing analysis may be impossible. In some cases, customers may not have the technical expertise needed to set up logging properly […]
How Falco uses Prow on AWS for open source testing
This post was co-written with Leo Di Donato, an open source software engineer at Sysdig in the Office of the CTO. Kubernetes has seen massive growth in the past few years. However, with all growth comes growing pains, and CI/CD has brought a few interesting problems to the space, especially for the open source community. […]
Community collaboration: The S3A story
Sometimes the best open source contributions involve doing less, not more. For example, Charity Majors has posited, “The best senior engineers I’ve worked with are the ones who worked the hardest not to have to write new code.” It’s not that writing new lines of code is bad. No, it’s really a matter of keeping […]
Improving HA and long-term storage for Prometheus using Thanos on EKS with S3
Prometheus is an open source systems monitoring and alerting toolkit that is widely adopted as a standard monitoring tool with self-managed and provider-managed Kubernetes. Prometheus provides many useful features, such as dynamic service discovery, powerful queries, and seamless alert notification integration. Beyond certain scale, however, problems arise when basic Prometheus capabilities do not meet requirements […]
How to deploy a live events solution built with the Amazon Chime SDK
In this tutorial, I will explain how to deploy an interactive live events solution with which speakers can present to a large pre-selected audience, and moderators can screen attendees to participate in the broadcast. This interactive live events solution, built with the Amazon Chime SDK, addresses many of the shortcomings of traditional online meeting platforms […]
How a startup wants to help secure the open source ecosystem with huntr, a bug bounty board
This article is a guest post from 418sec co-founders Adam Nygate, Jake Mimoni, and Jamie Slome. Dependency on open source code has grown over the years, and as new open source technologies are introduced, so are more vulnerabilities. Review by “many eyes” helps secure open source software, and depends on exposing the code to as […]
fMRI data preprocessing on AWS using fMRIprep
A typical fMRI study often produces imaging data of terabytes or more. Storing and preprocessing this data can be challenging on a single computer because it often has neither enough disk space to store the data nor enough computing power to preprocess it. Traditionally, researchers use a combination of cloud-based storage and on-premises high-performance clusters […]