Data lakehouse analytics will replace data warehouse analytics

Albert Wong
3 min readNov 2, 2023

--

Data Lakehouse will eventually replace data warehouse

It is my belief that data lakehouse analytics will replace data warehouse analytics in the future. Data lakehouses offer a number of advantages over traditional data warehouses, including:

  • Flexibility: Data lakehouses are more flexible than data warehouses because they can store and process all types of data, including structured, semi-structured, and unstructured data. This makes them ideal for organizations that need to analyze a variety of data sources, such as social media data, IoT data, and sensor data.
  • Cost-effectiveness: Data lakehouses are more cost-effective than data warehouses because they are built on top of cloud object storage. This means that organizations only pay for the storage they use. In some cases, the saving from AWS EBS to AWS S3 is a reduction in 90% of storage costs.
  • Scalability: Data lakehouses are more scalable than data warehouses because they can be easily scaled up or down to meet changing needs.
  • Performance: Data lakehouses can offer high performance for both data engineering and data science workloads. In our own testing of StarRocks with AWS EBS SSD vs. StarRock with Apache Iceberg on S3, the performance is within 10% of each other.
  • Feature parity: StarRocks in data lakehouse mode provides similar features that you would see in StarRocks in data warehouse mode like views, materialized views, data security, caching and more.
  • Data pipeline complexity: Data duplication and associated costs, data ingestion and data pipeline scripts to multiple systems — it is a lot of people time, software costs, new processes built, multiple cycles of testing. People and organizations don’t want it. Writing to a single data lake and then just making queries to it seems more “simple”.

These advantages make data lakehouses a more attractive option for organizations that are looking to build a modern data platform.

Query data on top of the lake, support performant JOINS at scale, support 1000s of users doing adhoc queries
Run StarRocks on top of raw data and then create views or materialized views as needed.
AirBnB with StarRocks: 4 JOINS with billions of rows in under 4 seconds
Tencent Games with StarRocks: 400+ users doing ad hoc queries on xx+ petabytes of data on Apache Iceberg files.

This is our goal at StarRocks. Check us out at http://starrocks.io

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. StarRocks received InfoWorld’s 2023 BOSSIE Award for best open source software.

--

--

Albert Wong

#eCommerce #Java #Database #k8s #Automation. Hobbies: #BoardGames #Comics #Skeet #VideoGames #Pinball #Magic #YelpElite #Travel #Candy