E-commerce Funnel Analysis with StarRocks: 87 Million Records, Apache Hudi, Apache Iceberg, Delta Lake (MinIO, Apache HMS, Apache xTable)

Albert Wong
Mar 1, 2024

--

StarRocks Open Data Lakehouse Architecture
Open Data Lakehouse vs others

By the way, I’m Albert, and I’m Head of Community and Developer Relations at CelerData. CelerData is a $60 million VC-funded startup that is building StarRocks, an open-source replacement for Snowflake, Big Query, RedShift and Databricks SQL warehouse.

In this tutorial, I’ll guide you through working with an 87 million record e-commerce dataset. The data is initially stored in Hudi format and resides on MinIO S3 storage. We’ll then use StarRocks to perform queries involving JOINs (3 queries). Additionally, the tutorial covers creating mirrored datasets in Iceberg and Delta Lake formats using the xtable library.

StarRocks Open Data Lakehouse Architecture with Apache xTable
Query data on top of the lake, support performant JOINS at scale, support 1000s of users doing adhoc queries
AirBnB with StarRocks: 4 JOINS with billions of rows in under 4 seconds
Tencent Games with StarRocks: 400+ users doing ad hoc queries on xx+ petabytes of data on Apache Iceberg files.

See more at https://github.com/StarRocks/demo/tree/master/documentation-samples/datalakehouse

--

--

Albert Wong
Albert Wong

Written by Albert Wong

#eCommerce #Java #Database #k8s #Automation. Hobbies: #BoardGames #Comics #Skeet #VideoGames #Pinball #Magic #YelpElite #Travel #Candy

Responses (1)