The hottest area of database design: Querying billions of rows per second with SIMD

2 min readJul 18, 2023

As a database engineer, people ask me all the time, “What is the hottest area of database design” right now. It’s using SIMD which stands for Single instruction, multiple data, to process a lot of data very, very fast.

Josh Weinstein said it best (full article linked below).

SIMD. Single instruction multi data. You may not have heard of these four words before, but they have the power to make software run at lightning speed. They can accelerate actions like copying or searching data 10x, 20x or more times faster than with traditionally written code. The CPUs that power our computers today possess a special set of instructions that can process data simultaneously, and in parallel. In fact, the sets of these instructions have been around for a number of years. They are seldom explored or discussed, but have the potential to provide unparalleled performance in a world of ever growing software capacity.

This idea of using SIMD has been around for a while. Academic papers like http://www.cs.columbia.edu/~kar/pubsk/simd.pdf and https://15721.courses.cs.cmu.edu/spring2016/papers/p1493-polychroniou.pdf have been written as early as 2000s about its potential use but it has only been recently that database developers started to put it into a database product.

As of right now, I’m only aware of 4 databases that use SIMD as the core of their query layer: StarRocks (OLAP), Apache Druid (OLAP), ClickHouse (OLAP) and QuestDB (time-series). All of them are fast however among the OLAP DBMS that I mentioned, only StarRocks does performant JOINS at scale. Read more about how StarRocks implements at https://docs.starrocks.io/en-us/2.5/introduction/Features

Performance difference of various OLAP databases

Graphic of JOIN performance using the TPC-DS test data between StarRocks and Trino (I would expect similar with AWS Athena and PrestoDB)

Graphic of SSB Flat Table Benching among SIMD database StarRocks, ClickHouse and Apache Druid. Note: ClickHouse and Druid partially support JOINS so we compared denormalized tables.

More info at https://github.com/alberttwong/databasecomparison

Aggregating billions of rows per second with SIMD | QuestDB

How SIMD instructions make aggregations faster in QuestDB, including benchmark results and a comparison with Postgres.

questdb.io

Searching Gigabytes of Data Per Second With SIMD

SIMD. Single instruction multi data. You may not have heard of these four words before, but they have the power to make…

medium.com

The hottest area of database design: Querying billions of rows per second with SIMD

Aggregating billions of rows per second with SIMD | QuestDB

How SIMD instructions make aggregations faster in QuestDB, including benchmark results and a comparison with Postgres.

Searching Gigabytes of Data Per Second With SIMD

SIMD. Single instruction multi data. You may not have heard of these four words before, but they have the power to make…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Albert Wong

No responses yet