My favorite articles to understand the differences between open table formats Apache Hudi, Apache Iceberg, Apache Hive and Delta Lake

Albert Wong
2 min readAug 22, 2023
Popular Open Table Format: Apache Iceberg, Apache Hudi, Apache Hive and Delta Lake

Personally, I think it’s important to go to the source for your information.

Written by Onehouse.ai, the main contributors to Apache Hudi — https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison. This article was written by Kyle, who is the VP of Product at Onehouse. I think it should be the authority on what Apache Hudi has or does not have in terms of features and capability.

Since Apache Hudi was created at Uber, this is a good article on their architectural decision https://www.uber.com/blog/ubers-lakehouse-architecture/.

Apache Iceberg came out of Netflix but it was written by the team that created Apache Parquet. Here is an article in their own words about ther value of Apache Iceberg https://tabular.medium.com/iceberg-in-modern-data-architecture-c647a1f29cb3

Written by AWS — https://aws.amazon.com/blogs/big-data/choosing-an-open-table-format-for-your-transactional-data-lake-on-aws/. In this situation, AWS can be seen as an unbiased third party, but they do talk about their implementation of the various open table formats in their infrastructure. I’ve seen in the past where their implementation and architecture are sub-optimal (due to how AWS works).

Update Nov 2023: Onehouse.ai just released https://onetable.dev/. One table allows you to covert one open table format to another (eg. Hudi to Iceberg).

--

--

Albert Wong

#eCommerce #JavaEE #Database #k8s. Hobbies: #BoardGames #Comics #Skeet #VideoGames #Pinball #Magic #YelpElite #Travel #Candy