📃
Tech White Papers
  • 📃White Papers
  • 🪶Apache
    • Kafka (EN)
      • Kafka Connect
      • Kafka Streams
      • ksqlDB
    • Ignite (TR)
      • Clustering
        • Baseline Topology
      • Thin Clients
      • Data Modeling
        • Data Partitioning
        • Affinity Colocation
      • Memory Architecture
      • Persistence
        • External Storage
        • Swapping
        • Snapshot
        • Disk Compression
        • Persistence Tuning
        • Change Data Capture
      • Cluster Snapshots
      • Data Rebalancing
      • Data Streaming
      • Using Key-Value API
        • Basic Cache Operations
        • Working With Binary Objects
      • Performing Transactions
      • Working with SQL
        • Understanding Schemas
        • Defining Indexes
        • Distributed Joins
      • Distributed Computing
      • Machine Learning
      • Using Continuous Queries
      • Using Ignite Messaging
      • .NET Specific
        • LINQ
        • Serialization
      • Working With Events
        • Events
      • Performance and Troubleshooting
        • Generic Performance Tips
        • Memory and JVM Tuning
        • Persistence Tuning
        • SQL Performance Tuning
        • Thread Pools Tuning
    • Pulsar (TR)
  • 📜Data
    • ClickHouse (TR)
    • QuestDB (TR)
  • Comparison
    • Pulsar vs Kafka
    • ClickHouse vs QuestDB
  • Architectural
    • Microservices
      • Design Principles
      • Design Patterns
Powered by GitBook
On this page
  • Concepts
  • Sample Connectors
  • Why Not Write Your Own Integrations?

Was this helpful?

  1. Apache
  2. Kafka (EN)

Kafka Connect

03/02/2023

PreviousKafka (EN)NextKafka Streams

Last updated 2 years ago

Was this helpful?

Kafka Connect is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems, using so-called Connectors.

Concepts

  • ( Sources, Sinks ⇒ re-usable piece of code(java jars) )

  • • ( Simple logic to alter each message produced by or sent to a connector )

  • ( Connectors + user configuration )

  • ( Tasks are executed by )

  • ( How Connect handles connector errors

  • ( The code used to translate data between Connect and the system sending or receiving data )

Sample Connectors

  • Elasticsearch Service Sink

  • HDFS 2 Sink

  • Amazon S3 Sink

  • **Replicator (**Replicator allows you to easily and reliably replicate topics from one Apache Kafka cluster to another.)

  • Jira

  • MySQL Source

Why Not Write Your Own Integrations?

All of this sounds great, but you’re probably asking, “Why Kafka Connect? Why not write our own integrations?”

Apache Kafka has its own very capable producer and consumer APIs and client libraries available in many languages, including C/C++, Java, Python, and Go. So it makes sense for you to wonder why you wouldn’t just write your own code to move data from a system and write it to Kafka—doesn’t it make sense to write a quick bit of consumer code to read from a topic and push it to a target system?

The problem is that if you are going to do this properly, then you need to be able to account for and handle failures, restarts, logging, scaling out and back down again elastically, and also running across multiple nodes. And that’s all before you’ve thought about serialization and data formats. Of course, once you’ve done all of these things, you’ve written something that is probably similar to Kafka Connect, but without the many years of development, testing, production validation, and community that exists around Kafka Connect. Even if you have built a better mousetrap, is all the time that you’ve spent writing that code to solve this problem worth it? Would your effort result in something that significantly differentiates your business from anyone else doing similar integration?

The bottom line is that integrating external data systems with Kafka is a solved problem. There may be a few edge cases where a bespoke solution is appropriate, but by and large, you’ll find that Kafka Connect will become the first thing you think of when you need to integrate a data system with Kafka.

🪶
Connectors
Transforms
Tasks
Workers
Dead Letter Queue
Converters
Much more on here…