If you have an application that uses Apache Kafka and need to create a Kafka installation, perhaps you should give Azure EventHub a try first since it now supports the Kafka interface. Using EventHub you get a managed service that is very powerful and let’s you skip provisioning it on beefy VMs.
Deploy the Kafka enabled EventHub
There exists a page in the Microsoft documentation for how to do this step https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-quickstart-kafka-enabled-event-hubs and in that tutorial you will also find a sample program in github that you can compile and test. The docs forget to mention that you need that once you’ve created the EventHub namespace you must create an EventHub named “test” for the sample program to work.
The magic trick in making EventHub support the Kafka APIs is by selecting the checkbox “Enable Kafka”
Testing driving Kafka/EventHub with a sample app
That the Microsoft sample application works is perhaps no surprise, since it was targeted and tested to be used with Kafka/EventHub, so therefor, let’s test functionality with an app that written for Kafka but don’t even know what Azure EventHub is.
In looking for one, I found this page https://mapr.com/blog/getting-started-sample-programs-apache-kafka-09/ that describes the process. It uses a sample app in github https://github.com/mapr-demos/kafka-sample-programs that is written in Java and builds with Maven. I git cloned the repo and made the following changes:
- pom.xml – the dependency version for kafka-client must be changed from 0.9.0 to 1.0.0. The Kafka client 0.9.0 will not work with Kafka/EventHub because of it uses SASL_SSL to connect
<dependencies> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>1.0.0</version> </dependency>
- Who ever created the sample didn’t think of locales that uses decimal point as comma. If you like me write PI as 3,14 and have your laptop set for that, the consumer app will throw exceptions since the message is sent by the producer with json data having decomal point is comma. The solution to this problem is to add Locale.ROOT the String.format method calls
# Producer.java import java.util.Locale; for (int i = 0; i < <strong>10000</strong>; i++) { // send lots of messages producer.send(new ProducerRecord<String, String>( "fast-messages", String.format( Locale.ROOT, "{\"type\":\"test\", \"t\":%.3f, \"k\":%d}", System.nanoTime() * 1e-9, i)));
- I also changed the for-loop not to send 1 million messages since that is a little bit over the top for my test case.
Changing the connection properties
As described in the Microsoft docs referenced above, you need to modify the properties files in the resource folder so that it connects to Azure EventHub.
bootstrap.servers=myevthubns.servicebus.windows.net:9093 acks=all retries=0 batch.size=16384 #auto.commit.interval.ms=1000 linger.ms=0 key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=org.apache.kafka.common.serialization.StringSerializer #block.on.buffer.full=true security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://myevthubns.servicebus.windows.net/;SharedAccessKeyName=KafkaSasPolicyMSL;SharedAccessKey=...";
You need to change the bootstrap.servers to point to the FQDN of your EventHub. Then you need to add the three last lines that enable SASL and also holds the complete EventHub Connection String (which you can find in the Azure portal). I’ve also created a SAS Policy for the app so I don’t use the default one.
Building the sample (mvn clean package) and running it proves that an application written to use the Kafka APIs can work directly with EventHub with Kafka support enabled.
Before you run the producer you should create the topics it is using as EventHubs in the Azure portal. If you don’t, the producer will run painfully slow since it is getting a time out in each send it does.
At this point I changed the for-loop back again to send 1 million messages in order to giv EventHub a little work out. The Metrics dashboard in the Azure portal illustrates the two test runs. You can see first batch of 10000 messages being just a blip on the chart. When sending 1 million messages, we get throughput of around 6MB/second over a public internet connections, which is quite good.
So the benefits of using EventHub with Kafka enabled is quite substantial, since you get a managed service that is very performant, scalable and with and affordable pay-as-you-go model.