How to register Delta Lake open table format files into Apache Hive Metastore (HMS)
Mar 14, 2024
In my open data lakehouse tutorial at https://github.com/StarRocks/demo/tree/master/documentation-samples/datalakehouse, you can see that I have Delta Lake files in S3 compatible min.io.
All you now need is spark-sql.
Run spark-sql with Delta Lake configs:
spark-sql --packages io.delta:delta-core_2.12:2.0.0 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
--conf "spark.sql.catalogImplementation=hive"
--conf "spark.sql.hive.thriftServer.singleSession=false"
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
--conf "spark.hive.metastore.uris=thrift://hive-metastore:9083"
--conf "spark.hive.metastore.schema.verification=false"
--conf "spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"
--conf "spark.hadoop.fs.s3n.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"
--conf "spark.hadoop.fs.s3a.endpoint=http://minio:9000"
--conf "spark.hadoop.fs.s3a.path.style.access=true"
--conf "spark.hadoop.fs.s3a.access.key=admin"
--conf "spark.hadoop.fs.s3a.secret.key=password"
Register the Delta Lake files into HMS
CREATE SCHEMA delta_db LOCATION 's3://warehouse/';
CREATE TABLE delta_db.user_behavior USING DELTA LOCATION 's3://huditest/hudi_ecommerce_user_behavior';
CREATE TABLE delta_db.item USING DELTA LOCATION 's3://huditest/hudi_ecommerce_item';
See more at https://xtable.apache.org/docs/hms#register-the-target-table-in-hive-metastore