OneTable: A way to convert one open table format to another
If you’ve read some of the articles on Apache Hudi, Apache Iceberg, and Delta Lake, all 3 formats were built with specific use cases in mind and thus there are feature differences (see https://atwong.medium.com/my-favorite-articles-to-understand-the-differences-between-open-table-formats-apache-hudi-apache-de0bd760eead to understand more). With that said, it becomes a difficult choice to pick the “correct” open table format.
Onehouse.ai has come with a solution? Why pick at all? Why not use one and then give the ability to covert it to 1 or more different open table format type (eg. write in Hudi and allow read from Iceberg and Delta Lake). Check out this interesting project at https://onetable.dev/
By the way, if you need a query engine that can read Apache Hudi, Apache Iceberg and Delta Lake, check these out:
StarRocks: StarRocks was designed to address the challenges of real-time analytics, including the need to support high concurrency, low latency, a wide range of analytical workloads and offers the ability to query data directly from data lakes. StarRocks received InfoWorld’s 2023 BOSSIE Award for best open source software. Read more at http://starrocks.io
Clickhouse: ClickHouse is an open-source column-oriented database management system (DBMS) for online analytical processing (OLAP). It is designed for real-time analytics and can handle large volumes of data with high performance. ClickHouse is used by a number of companies, including Netflix, Airbnb, and Uber, to power their real-time analytics applications. Read more at http://clickhouse.com
Trino: Trino, formerly known as PrestoSQL, is an open-source distributed SQL query engine that is designed to run fast analytic queries against various data sources ranging in size from gigabytes to petabytes. It is a popular choice for data lakehouse architectures, where it can query data directly from its native storage format, such as Iceberg or Delta Lake. Trino is known for its high performance and ability to handle complex queries efficiently. It is also scalable and can be deployed on a cluster of machines to handle large workloads. Read more at http://trino.io