Kafka (EN)

03/02/2023

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Message broker
Index based message read/write, that is why Kafka is fast
Written with Scala and Java
Name came from Franz Kafka
Funded from LinkedIn
Developed with leadership of Jay Kreps

Features

Fast ( high throughput and low latency )
Scalable ( Horizontally scalable with node and partitions )
Reliable ( Fault tolerant and distrubuted )
Durable ( Zero data loss, messages persisted to disk with immutable log )

Use Cases

Application analytics
Monitoring/Metrics
Log collecting
Stream processing
Recommendation engine
Fraud and anomaly detection
Integrate systems

Companies Using Kafka

Uber
Netflix
Spotify
Activision
Slack
Pinterest
Linkedin
Shopify

Concepts

Producer
- Producer acknowledgment
  - acks = 0, Fastest but most risky, message loss possibility is high. Send message to kafka but don’t wait response and keep going
  - acks = 1, Mid level fast and safe, message loss possibilty is little. Send message to kafka and wait until leader gets message, don’t wait for followers gets message.
  - acks = all or -1, Slower but most safe, message loss possibility is none. Send message and wait untill leader and followers gets messages
Consumer ( Assign 1 consumer to 1 partition ⇒ best practice )
- Read Strategies
  - At Most Once
  - At Least Once ( most used )
  - Exactly Once ( transactional, performance impact )
Partition ( event/message/record holder )
Record/Event/Message ( each item in partition )
Offset ( message position/index in partition )
Topic ( partition holder )
Kafka Broker ( topics holder )
Consumer Group ( allows parallel processing for partitions, like pub-sub pattern )
Distrubuted Systems
- Leader ( Master )
- Follower ( Slave )
- Topic Based Scaling
- Partition Based Scaling
Kafka Connect
Kafka Streams

Apache ZooKeeper ( Distribution management, Gossip Protocol ⇒ Who is leader? Who is slave? Ok you are leader, take this message )
Confluent Cloud
Apache Flink ( Stateful Computations over Data Streams )
Apache Hadoop

Key Differences With Other Messaging Systems

Kafka differs from traditional messaging queues in several ways. Kafka retains a message after it has been consumed. Quite the opposite, competitor RabbitMQ deletes messages immediately after they've been consumed.
RabbitMQ pushes messages to consumers and Kafka fetches messages using pulling.
Kafka can be scaled horizontally and traditional messaging queues can scale vertically.

PreviousWhite Papers NextKafka Connect

Last updated 2 years ago

Was this helpful?

Features

Use Cases

Companies Using Kafka

Concepts

Related Techs

Key Differences With Other Messaging Systems