FastAvro doesn’t work with Confluent AVRO and here’s how to get it working
You’ll get an error like Unknown magic byte when using the python FastAVRO library to send messages to a kafka topic.
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.getByteBuffer(AbstractKafkaSchemaSerDe.java:244)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer$DeserializationContext.<init>(AbstractKafkaAvroDeserializer.java:334)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:151)
at org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer.deserialize(KafkaAvroSchemaDeserializer.java:78)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:53)
at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:60)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:1426)
... 73 more
It took me a while and with the help of my coworker Sagar, we found this article https://blog.datachef.co/deserialzing-confluent-avro-record-kafka-spark?x-host=blog.datachef.co#revealing-the-confidential-confluent-avro-format, that explained to us that the kafka avro isn’t “just an avro file”. It’s special. It has 5 additional bytes added to the avro file. If you use libraries like https://github.com/wbarnha/kafka-python-ng, the serializer and deserilizer class already know how to parse through this kafka message. If you use FASTAVRO you have to construct this.
Check this out if you want a python kafka client that will connect with Kafka/MSK and AWS Glue Schema Registry. https://github.com/sagarlakshmipathy/python-avro-msk-glue-sr