Kafka (EN)
03/02/2023
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Message broker
Index based message read/write, that is why Kafka is fast
Written with Scala and Java
Name came from Franz Kafka
Funded from LinkedIn
Developed with leadership of Jay Kreps
Features
Fast ( high throughput and low latency )
Scalable ( Horizontally scalable with node and partitions )
Reliable ( Fault tolerant and distrubuted )
Durable ( Zero data loss, messages persisted to disk with immutable log )
Use Cases
Application analytics
Monitoring/Metrics
Log collecting
Stream processing
Recommendation engine
Fraud and anomaly detection
Integrate systems
Companies Using Kafka
Uber
Netflix
Spotify
Activision
Slack
Pinterest
Linkedin
Shopify
Concepts
Producer
Producer acknowledgment
acks = 0, Fastest but most risky, message loss possibility is high. Send message to kafka but don’t wait response and keep going
acks = 1, Mid level fast and safe, message loss possibilty is little. Send message to kafka and wait until leader gets message, don’t wait for followers gets message.
acks = all or -1, Slower but most safe, message loss possibility is none. Send message and wait untill leader and followers gets messages
Consumer ( Assign 1 consumer to 1 partition ⇒ best practice )
Read Strategies
At Most Once
At Least Once ( most used )
Exactly Once ( transactional, performance impact )
Partition ( event/message/record holder )
Record/Event/Message ( each item in partition )
Offset ( message position/index in partition )
Topic ( partition holder )
Kafka Broker ( topics holder )
Consumer Group ( allows parallel processing for partitions, like pub-sub pattern )
Distrubuted Systems
Leader ( Master )
Follower ( Slave )
Topic Based Scaling
Partition Based Scaling
Kafka Connect
Kafka Streams
Related Techs
Apache ZooKeeper ( Distribution management, Gossip Protocol ⇒ Who is leader? Who is slave? Ok you are leader, take this message )
Confluent Cloud
Apache Flink ( Stateful Computations over Data Streams )
Apache Hadoop
Key Differences With Other Messaging Systems
Kafka differs from traditional messaging queues in several ways. Kafka retains a message after it has been consumed. Quite the opposite, competitor RabbitMQ deletes messages immediately after they've been consumed.
RabbitMQ pushes messages to consumers and Kafka fetches messages using pulling.
Kafka can be scaled horizontally and traditional messaging queues can scale vertically.
Last updated
Was this helpful?