Origin story for Apache Iceberg, Apache Hudi and Delta Lake with vendor control

1 min readSep 25, 2024

#Iceberg came from Netflix to support BI and dashboards. Designed for read heavy workloads (90% reads and 10% writes). Per https://tableformats.sundeck.io/, Tabular employs 36% of the committers and wrote about ~60% of the codebase.

#Hudi came from Uber to store receipts. Designed for read heavy workloads (90% read and 10% writes via copy-on-write tables) and balanced read and write workloads (50% reads and 50% writes via merge-on-read tables). Per https://tableformats.sundeck.io/, Onehouse employs 19% of the committers and wrote about ~20% of the codebase.

#Delta Lake came from databricks. Designed for API/ML and spark pipelines. Per https://tableformats.sundeck.io/, Databricks employs 100% of the committers and wrote about ~100% of the codebase.

Will Iceberg win? I think the answer is interoperability. You need a data lakehouse that supports all 3 formats due to different data workloads. Onehouse.ai and others can provide this by storing and servicing the data in all 3 formats (no duplication).

Origin story for Apache Iceberg, Apache Hudi and Delta Lake with vendor control

Written by Albert Wong

No responses yet