How to register Delta Lake open table format files into Apache Hive Metastore (HMS)

Albert Wong
Mar 14, 2024

--

In my open data lakehouse tutorial at https://github.com/StarRocks/demo/tree/master/documentation-samples/datalakehouse, you can see that I have Delta Lake files in S3 compatible min.io.

All you now need is spark-sql.

Run spark-sql with Delta Lake configs:

spark-sql --packages io.delta:delta-core_2.12:2.0.0 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
--conf "spark.sql.catalogImplementation=hive"
--conf "spark.sql.hive.thriftServer.singleSession=false"
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
--conf "spark.hive.metastore.uris=thrift://hive-metastore:9083"
--conf "spark.hive.metastore.schema.verification=false"
--conf "spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"
--conf "spark.hadoop.fs.s3n.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"
--conf "spark.hadoop.fs.s3a.endpoint=http://minio:9000"
--conf "spark.hadoop.fs.s3a.path.style.access=true"
--conf "spark.hadoop.fs.s3a.access.key=admin"
--conf "spark.hadoop.fs.s3a.secret.key=password"

Register the Delta Lake files into HMS

CREATE SCHEMA delta_db LOCATION 's3://warehouse/';

CREATE TABLE delta_db.user_behavior USING DELTA LOCATION 's3://huditest/hudi_ecommerce_user_behavior';

CREATE TABLE delta_db.item USING DELTA LOCATION 's3://huditest/hudi_ecommerce_item';

See more at https://xtable.apache.org/docs/hms#register-the-target-table-in-hive-metastore

--

--

Albert Wong

#eCommerce #Java #Database #k8s #Automation. Hobbies: #BoardGames #Comics #Skeet #VideoGames #Pinball #Magic #YelpElite #Travel #Candy