Ray.io + Apache HudiHistorically, documentation for using Apache Hudi with Ray.io has been limited due to the absence of a native Python library for Hudi…Apr 9Apr 9
Configure AWS’ EMR Trino with Apache Hudi + AWS Glue Catalog + AWS S3EMR offers a Trino CLI environment, but documentation for configuring it with Hudi and Glue is limited. These are the relevant…Apr 9Apr 9
Top Open Source Software for mirroring/offload/CDC data from OLTP to Data Lakehouse or OLAPOffloading data from an Online Transaction Processing (OLTP) system to a Data Lakehouse or an Online Analytical Processing (OLAP) system is…Mar 20Mar 20
What is missing in AWS EMR and Databricks. Autoscaling and …If you’re using AWS EMR for a while, you’ll notice that there are sub-optmizations. Here’s a short list.Mar 18Mar 18
FastAvro doesn’t work with Confluent AVRO and here’s how to get it workingYou’ll get an error like Unknown magic byte when using the python FastAVRO library to send messages to a kafka topic.Nov 22, 2024Nov 22, 2024
Connecting AWS Athena to Preset.ioYou would think that https://docs.preset.io/docs/aws-athena is enough to get it working, but it’s not.Nov 14, 2024Nov 14, 2024
Installing DataHub Core (docker compose) with personal tokensDataHub does not provide a free managed solution, so you need to install DataHub Core. DataHub Core is the open source version of DataHub…Nov 14, 2024Nov 14, 2024
Python Client for AWS MSK and AWS Glue Schema Registry and AVRO message payloadWhile Python is a powerful language for working with AWS MSK, there’s currently a gap in official documentation and examples for…Nov 13, 2024Nov 13, 2024
Python Client with AWS MSK and Confluent Schema Registry and AVRO message payloadDownload the official Python Client at https://github.com/confluentinc/confluent-kafka-python/blob/master/examples/avro_producer.py.Nov 13, 2024Nov 13, 2024
Creating and Connecting an AWS MSK cluster using SASLCreate an AWS Secret in AWS Secrets Manager (other type of secret, 2 key/value rows of username and password; need to create a new…Nov 6, 2024Nov 6, 2024