Configure AWS’ EMR Trino with Apache Hudi + AWS Glue Catalog + AWS S3
1 min readApr 9, 2025
EMR offers a Trino CLI environment, but documentation for configuring it with Hudi and Glue is limited. These are the relevant configurations.
0. First log into the primary EMR node
❯ aws ssm start-session --target i-07a771f39b703e0d3
Starting session with SessionId: botocore-session-1744161528-anuqba7noocecf5995hoel346e
sh-5.2$ sudo su -
EEEEEEEEEEEEEEEEEEEE MMMMMMMM MMMMMMMM RRRRRRRRRRRRRRR
E::::::::::::::::::E M:::::::M M:::::::M R::::::::::::::R
EE:::::EEEEEEEEE:::E M::::::::M M::::::::M R:::::RRRRRR:::::R
E::::E EEEEE M:::::::::M M:::::::::M RR::::R R::::R
E::::E M::::::M:::M M:::M::::::M R:::R R::::R
E:::::EEEEEEEEEE M:::::M M:::M M:::M M:::::M R:::RRRRRR:::::R
E::::::::::::::E M:::::M M:::M:::M M:::::M R:::::::::::RR
E:::::EEEEEEEEEE M:::::M M:::::M M:::::M R:::RRRRRR::::R
E::::E M:::::M M:::M M:::::M R:::R R::::R
E::::E EEEEE M:::::M MMM M:::::M R:::R R::::R
EE:::::EEEEEEEE::::E M:::::M M:::::M R:::R R::::R
E::::::::::::::::::E M:::::M M:::::M RR::::R R::::R
EEEEEEEEEEEEEEEEEEEE MMMMMMM
- Modify the Trino configuration to integrate Hudi. If a
hudi.properties
file is absent, create it. - Populate the
hudi.properties
file with the necessary Hudi connection details. - Restart the Trino service.
- After the restart, verify the Hudi integration by examining the
hudi
catalog for available schemas/databases."
[root@ip-10-0-105-40 ~]# cat /etc/trino/conf/catalog/hudi.properties
connector.name=hudi
hive.metastore=glue
connector.name=hive
[root@ip-10-0-105-40 ~]# sudo systemctl restart trino-server
[root@ip-10-0-105-40 ~]# trino-cli --catalog=hudi
trino> show catalogs;
Catalog
---------
hive
hudi
system
(3 rows)
trino> show schemas from hudi;
Schema
-----------------------
cdc_db
cdc_inc
database_001
database_0103_002
db1
db2507
db_001
trino> use cdc_inc;
USE
trino:cdc_inc> show tables;
Table
------------------
cdc_inc_table_ro
cdc_inc_table_rt
(2 rows)
Query 20250409_014150_00009_2w5es, FINISHED, 3 nodes
Splits: 20 total, 20 done (100.00%)
0.32 [2 rows, 66B] [6 rows/s, 206B/s]