What is TPC-DS and why should you care?

Albert Wong
2 min readOct 12, 2023

--

StarRocks vs Trino TPC-DS 1TB with cache

TL-DR; It’s a schema of 24 tables built for a retail scenario. There are 99 complex SQL queries that execute:

  • Sequential scans of large amounts of data
  • Aggregation of large amounts of data
  • Multi-table joins
  • Possibly extensive sorting

To try to answer questions like:

Query #57: Find the item brands and categories for each call center and their monthly sales figures for a specified year, where the monthly sales figure deviated more than 10% of the average monthly sales for the year, sorted by deviation and call center. Report the sales deviation from the previous and following month.

Longer answer

I need a sample data set and queries that represent a real life life scenario. I could build something from scratch or export some of the production data or use a public reference. Here is where TPC comes into play. TPC is the Transaction Processing Performance Council and they created a tool to generate data and they provide a bunch of real life business queries that you can run against that generated data. This is a good way to benchmark databases against each other. The latest schema and queries can be found at https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-DS_v3.2.0.pdf.

For instance, StarRocks publishes a benchmark that they are 5x faster than Trino (see benchmark report at https://www.starrocks.io/blog/benchmark-test). They also have benchmarks against other databases like Apache Druid and Clickhouse.

--

--

Albert Wong

#eCommerce #Java #Database #k8s #Automation. Hobbies: #BoardGames #Comics #Skeet #VideoGames #Pinball #Magic #YelpElite #Travel #Candy