# Kafka (EN)

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

* Message broker
* Index based message read/write, that is why Kafka is fast
* Written with Scala and Java
* Name came from Franz Kafka
* Funded from LinkedIn
* Developed with leadership of Jay Kreps

### Features

* Fast ( high throughput and low latency )
* Scalable ( Horizontally scalable with node and partitions )
* Reliable ( Fault tolerant and distrubuted )
* Durable ( Zero data loss, messages persisted to disk with immutable log )

### Use Cases

* Application analytics
* Monitoring/Metrics
* Log collecting
* Stream processing
* Recommendation engine
* Fraud and anomaly detection
* Integrate systems

### Companies Using Kafka

* Uber
* Netflix
* Spotify
* Activision
* Slack
* Pinterest
* Linkedin
* Shopify

### Concepts

* Producer
  * Producer acknowledgment
    * acks = 0, Fastest but most risky, message loss possibility is high. Send message to kafka but don’t wait response and keep going
    * acks = 1, Mid level fast and safe, message loss possibilty is little. Send message to kafka and wait until leader gets message, don’t wait for followers gets message.
    * acks = all or -1, Slower but most safe, message loss possibility is none. Send message and wait untill leader and followers gets messages
* Consumer ( Assign 1 consumer to 1 partition ⇒ best practice )
  * Read Strategies
    * At Most Once
    * At Least Once ( most used )
    * Exactly Once ( transactional, performance impact )
* Partition ( event/message/record holder )
* Record/Event/Message ( each item in partition )
* Offset ( message position/index in partition )
* Topic ( partition holder )
* Kafka Broker ( topics holder )
* Consumer Group ( allows parallel processing for partitions, like pub-sub pattern )
* Distrubuted Systems
  * Leader ( Master )
  * Follower ( Slave )
  * Topic Based Scaling
  * Partition Based Scaling
* Kafka Connect
* Kafka Streams

### Related Techs

* Apache ZooKeeper ( Distribution management, Gossip Protocol ⇒ Who is leader? Who is slave? Ok you are leader, take this message )
* Confluent Cloud
* Apache Flink ( Stateful Computations over Data Streams **)**
* Apache Hadoop

### Key Differences With Other Messaging Systems

* Kafka differs from traditional messaging queues in several ways. Kafka retains a message after it has been consumed. Quite the opposite, competitor RabbitMQ deletes messages immediately after they've been consumed.
* RabbitMQ pushes messages to consumers and Kafka fetches messages using pulling.
* Kafka can be scaled horizontally and traditional messaging queues can scale vertically.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://techwhitepapers.milvasoft.com/apache/kafka-en.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
