Kafka (EN)

03/02/2023

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

  • Message broker

  • Index based message read/write, that is why Kafka is fast

  • Written with Scala and Java

  • Name came from Franz Kafka

  • Funded from LinkedIn

  • Developed with leadership of Jay Kreps

Features

  • Fast ( high throughput and low latency )

  • Scalable ( Horizontally scalable with node and partitions )

  • Reliable ( Fault tolerant and distrubuted )

  • Durable ( Zero data loss, messages persisted to disk with immutable log )

Use Cases

  • Application analytics

  • Monitoring/Metrics

  • Log collecting

  • Stream processing

  • Recommendation engine

  • Fraud and anomaly detection

  • Integrate systems

Companies Using Kafka

  • Uber

  • Netflix

  • Spotify

  • Activision

  • Slack

  • Pinterest

  • Linkedin

  • Shopify

Concepts

  • Producer

    • Producer acknowledgment

      • acks = 0, Fastest but most risky, message loss possibility is high. Send message to kafka but don’t wait response and keep going

      • acks = 1, Mid level fast and safe, message loss possibilty is little. Send message to kafka and wait until leader gets message, don’t wait for followers gets message.

      • acks = all or -1, Slower but most safe, message loss possibility is none. Send message and wait untill leader and followers gets messages

  • Consumer ( Assign 1 consumer to 1 partition ⇒ best practice )

    • Read Strategies

      • At Most Once

      • At Least Once ( most used )

      • Exactly Once ( transactional, performance impact )

  • Partition ( event/message/record holder )

  • Record/Event/Message ( each item in partition )

  • Offset ( message position/index in partition )

  • Topic ( partition holder )

  • Kafka Broker ( topics holder )

  • Consumer Group ( allows parallel processing for partitions, like pub-sub pattern )

  • Distrubuted Systems

    • Leader ( Master )

    • Follower ( Slave )

    • Topic Based Scaling

    • Partition Based Scaling

  • Kafka Connect

  • Kafka Streams

  • Apache ZooKeeper ( Distribution management, Gossip Protocol ⇒ Who is leader? Who is slave? Ok you are leader, take this message )

  • Confluent Cloud

  • Apache Flink ( Stateful Computations over Data Streams )

  • Apache Hadoop

Key Differences With Other Messaging Systems

  • Kafka differs from traditional messaging queues in several ways. Kafka retains a message after it has been consumed. Quite the opposite, competitor RabbitMQ deletes messages immediately after they've been consumed.

  • RabbitMQ pushes messages to consumers and Kafka fetches messages using pulling.

  • Kafka can be scaled horizontally and traditional messaging queues can scale vertically.

Last updated

Was this helpful?