How to register Apache Iceberg files into Apache Hive Metastore (HMS)
Mar 14, 2024
In my open data lakehouse tutorial at https://github.com/StarRocks/demo/tree/master/documentation-samples/datalakehouse, you can see that I have Apache Iceberg files in S3 compatible min.io.
All you now need is spark-sql.
Run spark-sql with Iceberg configs
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.2.1 \
--conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
--conf "spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog" \
--conf "spark.sql.catalog.spark_catalog.type=hive" \
--conf "spark.sql.catalog.hive_prod=org.apache.iceberg.spark.SparkCatalog" \
--conf "spark.sql.catalog.hive_prod.type=hive"
--conf "spark.sql.catalogImplementation=hive"
--conf "spark.sql.hive.thriftServer.singleSession=false"
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
--conf "spark.hive.metastore.uris=thrift://hive-metastore:9083"
--conf "spark.hive.metastore.schema.verification=false"
--conf "spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"
--conf "spark.hadoop.fs.s3n.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"
--conf "spark.hadoop.fs.s3a.endpoint=http://minio:9000"
--conf "spark.hadoop.fs.s3a.path.style.access=true"
--conf "spark.hadoop.fs.s3a.access.key=admin"
--conf "spark.hadoop.fs.s3a.secret.key=password"
Register the Iceberg files into HMS
CREATE SCHEMA iceberg_db LOCATION 's3://warehouse/';
CALL hive_prod.system.register_table(
table => 'hive_prod.iceberg_db.user_behavior',
metadata_file => 's3://huditest/hudi_ecommerce_user_behavior/metadata/v2.metadata.json'
);
CALL hive_prod.system.register_table(
table => 'hive_prod.iceberg_db.item',
metadata_file => 's3://huditest/hudi_ecommerce_item/metadata/v2.metadata.json'
);
See more at https://xtable.apache.org/docs/hms#register-the-target-table-in-hive-metastore