Kafka Connect
03/02/2023
Kafka Connect is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems, using so-called Connectors.
Concepts
Connectors ( Sources, Sinks â re-usable piece of code(java jars) )
⢠Transforms ( Simple logic to alter each message produced by or sent to a connector )
Tasks ( Connectors + user configuration )
Workers ( Tasks are executed by )
Dead Letter Queue ( How Connect handles connector errors
Converters ( The code used to translate data between Connect and the system sending or receiving data )
Sample Connectors
Elasticsearch Service Sink
HDFS 2 Sink
Amazon S3 Sink
**Replicator (**Replicator allows you to easily and reliably replicate topics from one Apache Kafka cluster to another.)
Jira
MySQL Source
Why Not Write Your Own Integrations?
All of this sounds great, but youâre probably asking, âWhy Kafka Connect? Why not write our own integrations?â
Apache Kafka has its own very capable producer and consumer APIs and client libraries available in many languages, including C/C++, Java, Python, and Go. So it makes sense for you to wonder why you wouldnât just write your own code to move data from a system and write it to Kafkaâdoesnât it make sense to write a quick bit of consumer code to read from a topic and push it to a target system?
The problem is that if you are going to do this properly, then you need to be able to account for and handle failures, restarts, logging, scaling out and back down again elastically, and also running across multiple nodes. And thatâs all before youâve thought about serialization and data formats. Of course, once youâve done all of these things, youâve written something that is probably similar to Kafka Connect, but without the many years of development, testing, production validation, and community that exists around Kafka Connect. Even if you have built a better mousetrap, is all the time that youâve spent writing that code to solve this problem worth it? Would your effort result in something that significantly differentiates your business from anyone else doing similar integration?
The bottom line is that integrating external data systems with Kafka is a solved problem. There may be a few edge cases where a bespoke solution is appropriate, but by and large, youâll find that Kafka Connect will become the first thing you think of when you need to integrate a data system with Kafka.
Last updated
Was this helpful?