Top Open Source Alternatives to OLAP databases Snowflake, RedShift, and BigQuery

Albert Wong
4 min readAug 29, 2023

--

Snowflake, AWS RedShift and GCP BigQuery are popular cloud-based OLAP data warehouses that offer a wide range of features, including scalability, performance, and security. However, it can be expensive, especially for small businesses or organizations with limited data needs or companies with A LOT of data.

There are a number of open source alternatives to Snowflake, RedShift, and BigQuery that offer similar features at a lower cost. Here are some of the top open source alternatives to Snowflake, RedShift, and BigQuery:

ClickBench, a benchmark of many OLAP databases

StarRocks:
StarRocks is an open-source, distributed, MPP (Massively Parallel Processing) OLAP database that is designed for high performance and scalability. It excels at real time sub-second analytics and supports open data lakehouse through the support of all the major open table formats: Apache Hudi, Apache Iceberg, Apache Hive, and Delta Lake.

StarRocks Architecture Overview
AirBnB with StarRocks: 4 JOINS with billions of rows in under 4 seconds
Tencent Games with StarRocks: 400+ users doing ad hoc queries on xx+ petabytes of data on Apache Iceberg files.

StarRocks is a Linux Foundation project and CelerData who is one of the main sponsors of StarRocks is based in Silicon Valley, CA.

The StarRocks project has been adopted by a number of organizations, including AirBnB, Alibaba, Tencent, and JD.com. It is a promising new OLAP database that has the potential to revolutionize the way we analyze data.

ClickHouse:
ClickHouse is an open-source column-oriented database management system (DBMS) for online analytical processing (OLAP). It is designed to be fast and scalable for analytical workloads, such as aggregations and joins. ClickHouse is written in C++ and is available for Linux, macOS, and Windows.

ClickHouse was created by Alexey Milovidov and Yury Izrailevsky at Yandex, a Russian technology company. The first version of ClickHouse was released in 2016.

DataBend:
Databend is an open-source, cloud-native data warehouse that is designed to be fast, scalable, and cost-effective.

Databend is developed by Datafuse Labs, a company that is based in Beijing, China. The company was founded in 2020 and is backed by investors such as Sequoia Capital China and Source Code Capital.

Databend is still under development, but it has already been adopted by a number of organizations, including ByteDance, Meituan, and NetEase. The company is also planning to expand to other regions, such as North America and Europe.

SelectDB:
SelectDB is a cloud-native real-time data warehouse that is built on top of Apache Doris. It is a fully-managed service that can be deployed on AWS, Azure, or Alibaba Cloud. SelectDB is designed to be easy to use and scalable, making it a good choice for businesses of all sizes.

SelectDB is developed by Flywheel Data, a Chinese company that is also the core developer of Apache Doris. Flywheel Data was founded in 2018 and is headquartered in Beijing. The company has raised over CN¥400 million in funding from investors such as IDG Capital and Sequoia Capital China.

SelectDB is a relatively new product, but it has already been adopted by a number of organizations, including JD.com, Meituan, and 360. The company is also planning to expand to other regions, such as North America and Europe.

DuckDB:
DuckDB is an open-source, embedded, in-process, relational, OLAP (Online Analytical Processing) database management system (DBMS) that aims to be the OLAP version of SQLite. It is designed to be fast and efficient for analytical workloads, such as aggregations and joins.

These are just a few of the many open source alternatives to Snowflake, RedShift, and BigQuery. The best choice for your organization will depend on your specific needs and budget.

Here are some factors to consider when choosing an open source alternative to Snowflake, RedShift, and BigQuery:

  • Your data needs: How much data do you need to store? What type of data is it?
  • Your budget: How much are you willing to spend on a data warehouse?
  • Your technical expertise: How much technical expertise do you have? Some open source data warehouses are more complex to set up and manage than others.
  • Your integration needs: Do you need to integrate your data warehouse with other applications?

Once you have considered these factors, you can start to narrow down your choices and choose the open source alternative to Snowflake, RedShift, and BigQuery that is right for you.

--

--

Albert Wong
Albert Wong

Written by Albert Wong

#eCommerce #Java #Database #k8s #Automation. Hobbies: #BoardGames #Comics #Skeet #VideoGames #Pinball #Magic #YelpElite #Travel #Candy