Origin story for Apache Iceberg, Apache Hudi and Delta Lake with vendor control

Albert Wong
1 min readSep 25, 2024

--

#Iceberg came from Netflix to support BI and dashboards. Designed for read heavy workloads (90% reads and 10% writes). Per https://tableformats.sundeck.io/, Tabular employs 36% of the committers and wrote about ~60% of the codebase.

#Hudi came from Uber to store receipts. Designed for read heavy workloads (90% read and 10% writes via copy-on-write tables) and balanced read and write workloads (50% reads and 50% writes via merge-on-read tables). Per https://tableformats.sundeck.io/, Onehouse employs 19% of the committers and wrote about ~20% of the codebase.

#Delta Lake came from databricks. Designed for API/ML and spark pipelines. Per https://tableformats.sundeck.io/, Databricks employs 100% of the committers and wrote about ~100% of the codebase.

Will Iceberg win? I think the answer is interoperability. You need a data lakehouse that supports all 3 formats due to different data workloads. Onehouse.ai and others can provide this by storing and servicing the data in all 3 formats (no duplication).

--

--

Albert Wong
Albert Wong

Written by Albert Wong

#eCommerce #Java #Database #k8s #Automation. Hobbies: #BoardGames #Comics #Skeet #VideoGames #Pinball #Magic #YelpElite #Travel #Candy

No responses yet