📃
Tech White Papers
  • 📃White Papers
  • 🪶Apache
    • Kafka (EN)
      • Kafka Connect
      • Kafka Streams
      • ksqlDB
    • Ignite (TR)
      • Clustering
        • Baseline Topology
      • Thin Clients
      • Data Modeling
        • Data Partitioning
        • Affinity Colocation
      • Memory Architecture
      • Persistence
        • External Storage
        • Swapping
        • Snapshot
        • Disk Compression
        • Persistence Tuning
        • Change Data Capture
      • Cluster Snapshots
      • Data Rebalancing
      • Data Streaming
      • Using Key-Value API
        • Basic Cache Operations
        • Working With Binary Objects
      • Performing Transactions
      • Working with SQL
        • Understanding Schemas
        • Defining Indexes
        • Distributed Joins
      • Distributed Computing
      • Machine Learning
      • Using Continuous Queries
      • Using Ignite Messaging
      • .NET Specific
        • LINQ
        • Serialization
      • Working With Events
        • Events
      • Performance and Troubleshooting
        • Generic Performance Tips
        • Memory and JVM Tuning
        • Persistence Tuning
        • SQL Performance Tuning
        • Thread Pools Tuning
    • Pulsar (TR)
  • 📜Data
    • ClickHouse (TR)
    • QuestDB (TR)
  • Comparison
    • Pulsar vs Kafka
    • ClickHouse vs QuestDB
  • Architectural
    • Microservices
      • Design Principles
      • Design Patterns
Powered by GitBook
On this page
  • Features
  • Use Cases
  • Companies Using Kafka
  • Concepts
  • Related Techs
  • Key Differences With Other Messaging Systems

Was this helpful?

  1. Apache

Kafka (EN)

03/02/2023

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

  • Message broker

  • Index based message read/write, that is why Kafka is fast

  • Written with Scala and Java

  • Name came from Franz Kafka

  • Funded from LinkedIn

  • Developed with leadership of Jay Kreps

Features

  • Fast ( high throughput and low latency )

  • Scalable ( Horizontally scalable with node and partitions )

  • Reliable ( Fault tolerant and distrubuted )

  • Durable ( Zero data loss, messages persisted to disk with immutable log )

Use Cases

  • Application analytics

  • Monitoring/Metrics

  • Log collecting

  • Stream processing

  • Recommendation engine

  • Fraud and anomaly detection

  • Integrate systems

Companies Using Kafka

  • Uber

  • Netflix

  • Spotify

  • Activision

  • Slack

  • Pinterest

  • Linkedin

  • Shopify

Concepts

  • Producer

    • Producer acknowledgment

      • acks = 0, Fastest but most risky, message loss possibility is high. Send message to kafka but don’t wait response and keep going

      • acks = 1, Mid level fast and safe, message loss possibilty is little. Send message to kafka and wait until leader gets message, don’t wait for followers gets message.

      • acks = all or -1, Slower but most safe, message loss possibility is none. Send message and wait untill leader and followers gets messages

  • Consumer ( Assign 1 consumer to 1 partition ⇒ best practice )

    • Read Strategies

      • At Most Once

      • At Least Once ( most used )

      • Exactly Once ( transactional, performance impact )

  • Partition ( event/message/record holder )

  • Record/Event/Message ( each item in partition )

  • Offset ( message position/index in partition )

  • Topic ( partition holder )

  • Kafka Broker ( topics holder )

  • Consumer Group ( allows parallel processing for partitions, like pub-sub pattern )

  • Distrubuted Systems

    • Leader ( Master )

    • Follower ( Slave )

    • Topic Based Scaling

    • Partition Based Scaling

  • Kafka Connect

  • Kafka Streams

Related Techs

  • Apache ZooKeeper ( Distribution management, Gossip Protocol ⇒ Who is leader? Who is slave? Ok you are leader, take this message )

  • Confluent Cloud

  • Apache Flink ( Stateful Computations over Data Streams )

  • Apache Hadoop

Key Differences With Other Messaging Systems

  • Kafka differs from traditional messaging queues in several ways. Kafka retains a message after it has been consumed. Quite the opposite, competitor RabbitMQ deletes messages immediately after they've been consumed.

  • RabbitMQ pushes messages to consumers and Kafka fetches messages using pulling.

  • Kafka can be scaled horizontally and traditional messaging queues can scale vertically.

PreviousWhite PapersNextKafka Connect

Last updated 2 years ago

Was this helpful?

🪶