Easiest way to load TPC-DS data into StarRocks

Albert Wong
1 min readOct 6, 2023

--

StarRocks with TPC DS data

At StarRocks/Celerdata, we do a lot of testing. One the tests I had to run was TPC-DS benchmarking in StarRocks. Here are the steps that I used.

  1. Use duckDB to generate the TPC-DS data set

Install duckDB and then run the following commands in the duckdb shell.

INSTALL tpcds;
LOAD tpcds;
SELECT * FROM dsdgen(sf=1);
EXPORT DATABASE ‘TARGET_DIR’ (
FORMAT PARQUET);

This will generate 24 tables and 2 scripts (schema.sql and load.sql)

Note: Generating TPC-DS data is painful. TPC.org scripts are not user friendly. Using duckDB is the easiest way by FAR.

Tip: If you’re going to load data for an OLAP database, I would export to parquet format. If you are going to load data for an OLTP database, use CSV.

2. Load the parquet files into StarRocks

Using https://docs.starrocks.io/en-us/latest/sql-reference/sql-functions/table-functions/files as a reference, run the follow SQL command.

INSERT INTO item SELECT * FROM FILES( “path” = “s3://item.parquet”, “format” = “parquet”, “aws.s3.access_key” = “XXXXXXXXXX”, “aws.s3.secret_key” = “YYYYYYYYYY”, “aws.s3.region” = “us-west-2”

--

--

Albert Wong
Albert Wong

Written by Albert Wong

#eCommerce #Java #Database #k8s #Automation. Hobbies: #BoardGames #Comics #Skeet #VideoGames #Pinball #Magic #YelpElite #Travel #Candy

No responses yet