AWS Big Data Blog
Analyzing petabytes of trade and quote data with Amazon FinSpace
We recently announced Amazon FinSpace, a fully-managed data management and analytics service that makes it easy to store, catalog, and prepare financial industry data at scale, reducing the time it takes for financial services industry (FSI) customers to find and access all types of financial data for analysis from months to minutes. Financial services organizations […]
How Digital Infuzion solves the challenge of large-scale scientific data collaboration with Amazon Quicksight
This is a guest post by Digital Infuzion. In their own words, “Digital Infuzion (DIFZ), a leader in information technology, helps solve complex challenges related to genomics, health, and biomedical data, while collaborating with partners including the J. Craig Venter Institute, Gryphon Scientific, ICF International, and others engaged in scientific research. Together, we create novel […]
Orchestrate AWS Glue DataBrew jobs using Amazon Managed Workflows for Apache Airflow
As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Analysts are building complex data transformation pipelines that include multiple steps for data preparation and cleansing. However, analysts may want a simpler orchestration mechanism with a graphical user interface that […]
Enrich your data stream asynchronously using Amazon Kinesis Data Analytics for Apache Flink
Streaming data into or out of a data system must be fast. One of the most expensive pieces of any streaming system is the I/O of the system: reading from the streaming layer using Apache Kafka or Amazon Kinesis, reading a file, writing to an Amazon Simple Storage Service (Amazon S3) data lake, or communicating […]
Amazon EMR introduces EMR runtime for Presto, providing a 2.6 times speedup
Presto is an open-source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics, and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook. Running Presto […]
Amazon Redshift announces general availability of support for JSON and semi-structured data processing
At AWS re:Invent 2020, we announced the preview of native support for JSON and semi-structured data in Amazon Redshift. This includes a new data type, SUPER, which allows you to store JSON and other semi-structured data in Amazon Redshift tables, and support for the PartiQL query language, which allows you to seamlessly query and process […]
Build a Lake House Architecture on AWS
October 2022: This post was reviewed for accuracy. Organizations can gain deeper and richer insights when they bring together all their relevant data of all structures and types and from all sources to analyze. In order to analyze these vast amounts of data, they are taking all their data from various silos and aggregating all […]
How Goldman Sachs migrated from their on-premises Apache Kafka cluster to Amazon MSK
This is a guest post by Zachary Whitford, Associate, Richa Prajapati, Vice President and Aldo Piddiu, Vice President in the Global Investment Research engineering team at Goldman Sachs. To see how Goldman Sachs is innovating more with AWS visit Goldman Sachs Leading Cloud Innovator page. The Global Investment Research (GIR) division at Goldman Sachs delivers […]
Manage fine-grained access control using AWS Lake Formation
AWS Lake Formation is a fully managed service that helps you build, secure, and manage data lakes, and provide access control for data in the data lake. Customers across lines of business (LOBs) need a way to manage granular access permissions for different users at the table and column level. Lake Formation helps you manage […]
Set up and manage data ingestion easily with Amazon Redshift native console integration with partners
We’re excited to announce that Amazon Redshift console partner integration is now generally available. This new console integration provides rapid provisioning and seamless integration with AWS partners. You can onboard with data integration partner solutions in less than a minute directly on the Amazon Redshift console, and ingest data from multiple data sources using partners’ […]