AWS Storage Blog
Isima.io optimizes price performance for OLAP workloads using Amazon EBS
Isima.io, a unified analytics startup founded in 2016, aims to accelerate analytics outcomes for organizations. Isimia.io does this by combining multiple data management disciplines – including Enterprise Service Bus (ESB), Extract-Transform-Load (ETL), Enterprise-Data-Warehouse (EDW), and Business Intelligence (BI) – into one hyper-converged system. IT teams can only win by building differentiated, agile data apps. The current best practice mandates point solutions for onboarding, processing, consuming, and operating data. This fusion of glue-it-together SaaS tooling requires many months and an army of data engineers, architects, and scientists, which isn’t sustainable. Isima offers a solution to address this engineering challenge with its flagship product – bi(OS).
In this post, we describe how Isima’s bi(OS) uses Amazon Elastic Block Storage (EBS) and local non-volatile memory express (NVMe) instance store volumes on Amazon Elastic Compute Cloud (EC2) i3/i4 instances to deliver the best price performance for its customers.
Bi(OS) on AWS: The middleware for data
“Succeeding with data in today’s world really requires taking the end-to-end view of your data and not looking at point solutions along the journey,” said Adam Selipsky, CEO of AWS in an interview in November, 2022. Isima shared the same belief upon the company’s founding.
Isima created bi(OS) on AWS 3+ years ago. Bi(OS) is a lean modern data stack that can onboard, process, consume, and operate data for real-time analytics. It acts as a data hub between Online Transaction Processing (OLTP), Online Analytical Processing (OLAP), and third-party software-as-a-service (SaaS) systems. Bi(OS) includes no-code connectors to ingest data from OLTP systems (Amazon Relational Database Service and Amazon DynamoDB), OLAP systems (Amazon Simple Storage Service and Amazon Redshift), and third-party SaaS systems (for example, Google/Facebook Ads, Clevertap, and Google Tag Manager). Low latency micro-services, near-real-time analytics jobs, and ad-hoc queries can interface with bi(OS) via SQL-friendly Python, Java, and JavaScript SDKs.
Overcoming the performance requirement to better serve customers
Isima serves global eCommerce unicorns and enterprises with a mean throughput of 3K operations/sec and a 3X+ peak on Black Fridays. Delivering timely recommendations with low latency is key to any e-commerce application. A higher level of latency can result in poor customer experience and lost sales opportunities. To counteract this, a micro-service that delivers real-time recommendations to customers will time out if the supporting data sub-system causes customer experienced latency to exceed one second. Since bi(OS) supports these use cases in production at scale, it must deliver a p99 latency at sub-100 milliseconds. However, the challenge of delivering the desired latency is that each component in the infrastructure (storage, compute, networking) is individually optimized to be composable and elastic. To support real-time use cases, it requires the convergence of consistent and high-performance storage, compute, and networking. Bi(OS) achieves this by relying on Amazon EBS and local NVMe drives on i3/i4 instances. It uses the gp3 volumes – the latest generation of general-purpose SSD-based EBS volumes – to store log data and NVMe drives to store raw data for processing. Combining these two storage types allows bi(OS) to meet its Quality of Service (QoS) promises to its customers.
Benefits of using Amazon EBS and local NVMe storage
Recently, Isima’s engineering team compared storage characteristics across different cloud providers. A single bi(OS) cluster was deployed across the major cloud providers in the closest regions with similar infrastructure choices. A 48-hour gradually increasing load that simulates an extreme example of a real-time recommendation engine was run that peaked at the following load:
inserts+upserts
– ~2800 ops/secselects
– ~185 operations/sec
Each select query returned ~650 rows. Data inserted 20 hours ago was selected to avoid host caching effects. All inserts+upserts
and selects
were done with QUORUM
consistency (i.e., blocking reads across at least two clouds). Moreover, metrics were reported for the last 12 hours to avoid cold-start effects. This test concluded that when viewed holistically, Amazon EBS and local NVMe storage from Amazon EC2 i3/i4 instances provide the best predictability of price performance across the comparable cloud options.
“Compared to Amazon EBS, the equivalent block storage options from other leading cloud vendors had 7X more reads breaching the high latency threshold and spent 317X more elapsed time in the outlier area. Furthermore, these breaches happened within two weeks, compared to 90 days for Amazon EBS.” – Darshan Rawal, CEO, Isima.io
Conclusion
Isima’s validation shows that the AWS infrastructure featuring Amazon EBS and EC2 is the best choice for them to run real-time OLAP data platforms. Specifically, the AWS cloud infrastructure provides the most predictable price performance across all storage classes, converged compute and storage, and self-serve experience for data analysts, scientists, and engineers. To learn more, read about bi(OS), and watch a two-min video about how eCommerce pioneers use bi(OS).