AWS Big Data Blog
Category: Analytics
A dive into redBus’s data platform and how they used Amazon QuickSight to accelerate business insights
This post is co-authored with Girish Kumar Chidananda from redBus. redBus is one of the earliest adopters of AWS in India, and most of its services and applications are hosted on the AWS Cloud. AWS provided redBus the flexibility to scale their infrastructure rapidly while keeping costs extremely low. AWS has a comprehensive suite of services […]
Enable cross-account sharing with direct IAM principals using AWS Lake Formation Tags
With AWS Lake Formation, you can build data lakes with multiple AWS accounts in a variety of ways. For example, you could build a data mesh, implementing a centralized data governance model and decoupling data producers from the central governance. Such data lakes enable the data as an asset paradigm and unleash new possibilities with […]
Build a serverless streaming pipeline with Amazon MSK Serverless, Amazon MSK Connect, and MongoDB Atlas
This post was cowritten with Babu Srinivasan and Robert Walters from MongoDB. Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed, highly available Apache Kafka service. Amazon MSK makes it easy to ingest and process streaming data in real time and use that data easily within the AWS ecosystem. With Amazon MSK […]
How NETSCOUT built a global DDoS awareness platform with Amazon OpenSearch Service
This post was co-written with Hardik Modi, AVP, Threat and Migitation Products at NETSCOUT. NETSCOUT Omnis Threat Horizon is a global cybersecurity awareness platform providing users with highly contextualized visibility into “over the horizon” threat activity on the global DDoS (Distributed Denial of Service) landscape—threats that could be impacting their industry, their customers, or their […]
Build highly available streams with Amazon Kinesis Data Streams
Many use cases are moving towards a real-time data strategy due to demand for real-time insights, low-latency response times, and the ability to adapt to the changing needs of end-users. For this type of workload, you can use Amazon Kinesis Data Streams to seamlessly provision, store, write, and read data in a streaming fashion. With […]
Build near real-time logistics dashboards using Amazon Redshift and Amazon Managed Grafana for better operational intelligence
Amazon Redshift is a fully managed data warehousing service that is currently helping tens of thousands of customers manage analytics at scale. It continues to lead price-performance benchmarks, and separates compute and storage so each can be scaled independently and you only pay for what you need. It also eliminates data silos by simplifying access […]
Amazon QuickSight AWS re:Invent recap 2022
AWS re:Invent is a learning conference hosted by AWS for the global cloud computing community. Re:Invent was held at the end of 2022 in Las Vegas, Nevada, from November 28 to December 2. Amazon QuickSight powers data-driven organizations with unified business intelligence (BI) at hyperscale. This post walks you through a full recap of QuickSight […]
How BookMyShow saved 80% in costs by migrating to an AWS modern data architecture
This is a guest post co-authored by Mahesh Vandi Chalil, Chief Technology Officer of BookMyShow. BookMyShow (BMS), a leading entertainment company in India, provides an online ticketing platform for movies, plays, concerts, and sporting events. Selling up to 200 million tickets on an annual run rate basis (pre-COVID) to customers in India, Sri Lanka, Singapore, […]
Accelerate orchestration of an ELT process using AWS Step Functions and Amazon Redshift Data API
Extract, Load, and Transform (ELT) is a modern design strategy where raw data is first loaded into the data warehouse and then transformed with familiar Structured Query Language (SQL) semantics leveraging the power of massively parallel processing (MPP) architecture of the data warehouse. When you use an ELT pattern, you can also use your existing […]
Add your own libraries and application dependencies to Spark and Hive on Amazon EMR Serverless with custom images
Amazon EMR Serverless allows you to run open-source big data frameworks such as Apache Spark and Apache Hive without managing clusters and servers. Many customers who run Spark and Hive applications want to add their own libraries and dependencies to the application runtime. For example, you may want to add popular open-source extensions to Spark, […]