Universal data lakehouse: The most vendor/tool neutral data architecture

3 min readSep 30, 2024

What is a Universal Data Lakehouse?

A universal data lakehouse is a new type of data platform that allows organizations to store and process all of their data in a single place, regardless of the data’s source, format, or use case. As defined by OneHouse, a universal data lakehouse is built on open data formats with universal data interoperability, and it provides a true separation of storage and compute. This means that organizations can ingest and transform data from any source, manage it centrally in a data lakehouse, and query or access it with the engine of their choice.

Data Lake vs. Data Lakehouse

Traditionally, data lakes have been used as a central repository for storing all of an organization’s data. However, data lakes typically lack the structure and governance required to effectively query and analyze the data. In contrast, data lakehouses combine the storage capabilities of a data lake with the structure and governance of a data warehouse. This allows organizations to not only store all of their data in a single place, but also to easily query and analyze it.

One of the key differentiators between a data lake and a data lakehouse is ACID capabilities. ACID stands for Atomicity, Consistency, Isolation, and Durability. These properties ensure that data transactions are completed reliably, preventing data corruption. Because data lakehouses have ACID capabilities, you can run queries on the data lake instead of just using data lakes for archive.

The Value of a Universal Data Lakehouse

There are many benefits to using a universal data lakehouse. As defined by OneHouse, some of the key benefits include:

Improved data management: A universal data lakehouse can help organizations to improve their data management by providing a single place to store and manage all of their data.
Simplified analytics: A universal data lakehouse makes it easier for organizations to analyze their data by providing a unified platform for querying and accessing data. They do this by supporting the Apache Hudi, Apache Iceberg and Delta Lake format for accessing the data. This allows the data lakehouse to support 99% of all the query engines, analytics frameworks and others in the marketplace.
Reduced costs: By eliminating the need for separate data lakes and data warehouses, a universal data lakehouse can help organizations to reduce their data storage costs. Now you don’t need a format specific data silo (eg. Iceberg to support read heavy workloads, Hudi to support insert/upsert heavy workloads, Delta for AI/ML/Spark workloads).
Increased agility: A universal data lakehouse can help organizations to be more agile by making it easier to access and analyze data. You can now pick the best query engine, analytics framework for the job and not be “locked-in”.

Who Builds Universal Data Lakehouses?

Several vendors offer universal data lakehouse solutions. Some of the most popular vendors include:

Databricks
Snowflake
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
Microsoft Azure
OneHouse

These vendors offer a variety of features and capabilities, so it is important to carefully evaluate your needs before selecting a vendor.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Albert Wong

338 Followers

7 Following

#eCommerce #Java #Database #k8s #Automation. Hobbies: #BoardGames #Comics #Skeet #VideoGames #Pinball #Magic #YelpElite #Travel #Candy

Responses (2)

Write a response

What are your thoughts?

Also publish to my profile

Giacomo

Dec 15, 2024 (edited)

After reading this, I'm convinced that a universal data lakehouse is the way forward for my org. Albert, can you elaborate on how OneHouse's implementation compares to the others mentioned?

Dario Bernabeu

Oct 17, 2024

Great article, thank you

More from Albert Wong

Top Open Source Alternatives to OLAP databases Snowflake, RedShift, and BigQuery

Albert Wong

Top Open Source Alternatives to OLAP databases Snowflake, RedShift, and BigQuery

Snowflake, AWS RedShift and GCP BigQuery are popular cloud-based OLAP data warehouses that offer a wide range of features, including…

Aug 29, 2023

115

Best Open Source replacements for Business Intelligence Tools Power BI, Tableau, Looker…

Albert Wong

Best Open Source replacements for Business Intelligence Tools Power BI, Tableau, Looker…

There are the top 4 open source replacements for Microsoft Power BI, Tableau, Looker, MicroStrategy and other Business Intelligence and…

Nov 13, 2023

125

Open Source Alternatives to DataBricks SQL Warehouse

Albert Wong

Open Source Alternatives to DataBricks SQL Warehouse

Databricks SQL Warehouse is a managed service within the Databricks platform that provides scalable SQL compute resources decoupled from…

Jan 5, 2024

Database cubes are dead; what is their replacement?

Albert Wong

Database cubes are dead; what is their replacement?

A database cube, also known as a data cube, is a multidimensional data structure that is used for data analysis and reporting. It organizes…

Nov 11, 2023

161

See all from Albert Wong

Recommended from Medium

The Shift Left Architecture — From Batch and Lakehouse to Data Streaming

Kai Waehner

The Shift Left Architecture — From Batch and Lakehouse to Data Streaming

The Shift Left Architecture using Data Streaming (Kafka/Flink) enables Data Products for DWH, Data Lake, Lakehouse like…

Oct 4, 2024

279

Delta Lake 4.0: Next-Level Big Data Management

Vijay Gadhave

Delta Lake 4.0: Next-Level Big Data Management

Note: If you’re not a medium member, CLICK HERE

Feb 21

Lists

Natural Language Processing

1958 stories1605 saves

Staff picks

819 stories1637 saves

Data Products: A Case Against Medallion Architecture

Modern Data 101

Data Products: A Case Against Medallion Architecture

The Significance of Medallion, Crux of the Differences between the two 3-Tiered DataFlow Models, and a Colourful Visual Journey!

Feb 18

429

Microsoft Fabric: The Good, The Bad & The Ugly

BastiaanRudolf

Microsoft Fabric: The Good, The Bad & The Ugly

A few weeks ago, I started a greenfield implementation of Fabric, Microsoft’s next-gen data platform that promises unified experiences…

Feb 20

Mikkel Dengsøe

Benchmark Your Data Team

What data from hundreds of top data teams tells us about team size, role distribution, data-to-engineer ratios, and salaries.

Feb 13

Centralized vs Decentralized vs Federated Data Teams

SeattleDataGuy By SeattleDataGuy

Ben Rogojan

Centralized vs Decentralized vs Federated Data Teams

Setting Up Your Organization For Success

Jan 12, 2024

594

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams