AWS Open Source Blog

Category: Amazon Simple Storage Service (S3)

Build, train, and deploy Amazon Fraud Detector models using the open source Python SDK

Companies providing digital services are looking for ways to effectively identify fraudulent activities, such as online payment fraud and fake account creation. Amazon Fraud Detector is a fully managed service that uses machine learning (ML) and builds on 20 years of fraud detection expertise from Amazon Web Services (AWS) and Amazon.com to automatically identify potentially […]

Learn Amazon Simple Storage Service transfer configuration with Syne Tune

The object storage service Amazon Simple Storage Service (Amazon S3) is a foundational storage building block powering a variety of workloads from asset backup and serving, to analytics and machine learning. In this blog post, we describe how to search and find a scenario-specific optimized S3 download configuration in minutes using the open source distributed […]

Delta Sharing on AWS

This post was written by Frank Munz, Staff Developer Advocate at Databricks. An introduction to Delta Sharing During the past decade, much thought went into system and application architectures using domain-driven design and microservices, but we are still on the verge of building distributed data meshes. Such data meshes are based on two fundamental principles: […]

Song_about_summer – stock.adobe.com

Introducing Assisted Log Enabler for AWS

Logging information is important for troubleshooting issues and analyzing performance, and when Amazon Web Services (AWS) customers do not have logging turned on, the ability to assist them becomes limited, to the point that performing analysis may be impossible. In some cases, customers may not have the technical expertise needed to set up logging properly […]

How Falco uses Prow on AWS for open source testing

This post was co-written with Leo Di Donato, an open source software engineer at Sysdig in the Office of the CTO. Kubernetes has seen massive growth in the past few years. However, with all growth comes growing pains, and CI/CD has brought a few interesting problems to the space, especially for the open source community. […]

Improving HA and long-term storage for Prometheus using Thanos on EKS with S3

Prometheus is an open source systems monitoring and alerting toolkit that is widely adopted as a standard monitoring tool with self-managed and provider-managed Kubernetes. Prometheus provides many useful features, such as dynamic service discovery, powerful queries, and seamless alert notification integration. Beyond certain scale, however, problems arise when basic Prometheus capabilities do not meet requirements […]

Amazon Chime SDK: Deploying live events solution screenshot

How to deploy a live events solution built with the Amazon Chime SDK

In this tutorial, I will explain how to deploy an interactive live events solution with which speakers can present to a large pre-selected audience, and moderators can screen attendees to participate in the broadcast. This interactive live events solution, built with the Amazon Chime SDK, addresses many of the shortcomings of traditional online meeting platforms […]

How a startup wants to help secure the open source ecosystem with huntr, a bug bounty board

This article is a guest post from 418sec co-founders Adam Nygate, Jake Mimoni, and Jamie Slome. Dependency on open source code has grown over the years, and as new open source technologies are introduced, so are more vulnerabilities. Review by “many eyes” helps secure open source software, and depends on exposing the code to as […]

User uploads data in BIDS format to S3 and starts the Lambda function → Lambda parses the uploaded data and launches a cluster of EC2 instances → EC2 instances run fMRIprep which preprocesses the data → preprocessed data are saved to S3.

fMRI data preprocessing on AWS using fMRIprep

A typical fMRI study often produces imaging data of terabytes or more. Storing and preprocessing this data can be challenging on a single computer because it often has neither enough disk space to store the data nor enough computing power to preprocess it. Traditionally, researchers use a combination of cloud-based storage and on-premises high-performance clusters […]