Data lakehouse analytics will replace data warehouse analytics
It is my belief that data lakehouse analytics will replace data warehouse analytics in the future. Data lakehouses offer a number of advantages over traditional data warehouses, including:
- Flexibility: Data lakehouses are more flexible than data warehouses because they can store and process all types of data, including structured, semi-structured, and unstructured data. This makes them ideal for organizations that need to analyze a variety of data sources, such as social media data, IoT data, and sensor data.
- Cost-effectiveness: Data lakehouses are more cost-effective than data warehouses because they are built on top of cloud object storage. This means that organizations only pay for the storage they use. In some cases, the saving from AWS EBS to AWS S3 is a reduction in 90% of storage costs.
- Scalability: Data lakehouses are more scalable than data warehouses because they can be easily scaled up or down to meet changing needs.
- Performance: Data lakehouses can offer high performance for both data engineering and data science workloads. In our own testing of StarRocks with AWS EBS SSD vs. StarRock with Apache Iceberg on S3, the performance is within 10% of each other.
- Feature parity: StarRocks in data lakehouse mode provides similar features that you would see in StarRocks in data warehouse mode like views, materialized views, data security, caching and more.
- Data pipeline complexity: Data duplication and associated costs, data ingestion and data pipeline scripts to multiple systems — it is a lot of people time, software costs, new processes built, multiple cycles of testing. People and organizations don’t want it. Writing to a single data lake and then just making queries to it seems more “simple”.
These advantages make data lakehouses a more attractive option for organizations that are looking to build a modern data platform.
This is our goal at StarRocks. Check us out at http://starrocks.io
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. StarRocks received InfoWorld’s 2023 BOSSIE Award for best open source software.