How does onehouse.ai differ from other established lakehouses from the likes of Databricks, Snowflake, Cloudera, AWS Data Lake, Fivetran?

2 min read4 days ago

Question: How does onehouse.ai differ from other established lakehouses from the likes of Databricks, Snowflake, Cloudera, AWS Data Lake, Fivetran.

Onehouse differentiates with open, engine-neutral data lakehouse architecture, that makes a single copy of data universally accessible from Databricks, Snowflake, Cloudera, AWS native services, instead of building separate data silos on each. Onehouse decouples lakehouse data storage from lakehouse/warehouse compute engines, in a way that avoids data lockin and promotes interoperability. We played an important role around interoperability by open-sourcing “onetable” last year with Microsoft and Google, which is now Apache XTable (Incubating).

From a technology perspective, Onehouse provides higher-scale, faster, more efficient data ingestion and ETL data pipeline capabilities, to populate data in such a data lakehouse. This is due to a newer incremental data processing model implemented on top of Apache Hudi, proven at industry-scale on massive data lakes at Uber, Walmart, Zoom, TikTok, .. Onehouse also provides centralized data management across the aforementioned lakehouse platforms, such that users pay for the cost and complexity of data management just once across all their compute engines.

Question: Can you briefly summarize the difficulty in deploying Hudi yourself, and explain how Onehouse makes that simpler?

Hudi is a open data lakehouse platform, that also provides a table format optimized for incremental reads/writes. While Hudi provides rich functionality to ingest/manage/transform data, companies have to still integrate about half a dozen open source tools to achieve their goals of a production quality data lakehouse. They may have to invest engineering resources to manage/optimize/tune 100s-1000s of tables, for optimal performance.

Onehouse provides a turn-key “snowflake-like” user experience for building such data lakehouses. Users can get an open data lakehouse up and running in under an hour, with broad interoperability with all major cloud native services, warehouses and data lake engines. Onehouse fully manages the data stored, the ingestion/ETL pipelines needed to build the data lakehouse, reducing engineering time and time-to-value.

How does onehouse.ai differ from other established lakehouses from the likes of Databricks, Snowflake, Cloudera, AWS Data Lake, Fivetran?

Written by Albert Wong