Intra-cluster Replication for Apache Kafka

 
Intra-cluster Replication for
Apache Kafka
 
Jun Rao
 
About myself
 
Engineer at LinkedIn since 2010
Worked on Apache Kafka and Cassandra
Database researcher at IBM
 
Outline
 
Overview of Kafka
Kafka architecture
Kafka replication design
Performance
Q/A
 
What’s Kafka
 
A distributed pub/sub messaging system
Used in many places
LinkedIn, Twitter, Box, FourSquare …
What do people use it for?
log aggregation
real-time event processing
monitoring
queuing
Example Kafka Apps at LinkedIn
Kafka Deployment at LinkedIn
Live data center
Offline data center
Live
service
Live
service
Live
service
interactive data
(human, machine)
Monitorin
g
 
Per day stats
 writes: 10+ billion messages (2+TB compressed data)
 reads: 50+ billion messages
 
Kafka vs. Other Messaging Systems
 
Scale-out from groundup
Persistence to disks
High throughput (10s MB/sec per server)
Multi-subscription
 
Outline
 
Overview of Kafka
Kafka architecture
Kafka replication design
Performance
Q/A
 
Kafka Architecture
Producer
Consumer
Producer
Broker
Broker
Broker
Broker
Consumer
Zookeeper
 
Terminologies
 
Topic = message stream
Topic has partitions
partitions distributed to brokers
Partition has a log on disk
message persisted in log
message addressed by offset
API
Producer
messages = new List<
KeyedMessage
<K,V>>();
messages.add(new 
KeyedMessage
(“topic1”, null, “msg1”);
send
(messages);
 
Simple storage
 
 
 
 
Batched writes and reads
Zero-copy transfer from file to socket
Compression (batched)
 
Deliver High Throughput
segment-1
segment-2
segment-n
 
topic1:part1
segment-1
segment-2
segment-n
 
topic2:part1
 
append()
 
read()
 
logs in broker
msg-1
msg-2
msg-3
msg-4
msg-5
msg-n
 
index
 
Outline
 
Overview of Kafka
Kafka architecture
Kafka replication design
Performance
Q/A
 
Why Replication
 
Broker can go down
controlled: rolling restart for code/config push
uncontrolled: isolated broker failure
If broker down
some partitions unavailable
could be permanent data loss
Replication 
 higher availability and
durability
 
CAP Theorem
 
Pick two from
consistency
availability
network partitioning
 
Kafka Replication: Pick CA
 
Brokers within a datacenter
i.e., network partitioning is rare
Strong consistency
replicas byte-wise identical
Highly available
typical failover time: < 10ms
 
Replicas and Layout
 
Partition has replicas
Replicas spread evenly among brokers
topic1-part1
 
logs
 
broker 1
topic1-part2
 
logs
 
broker 2
topic2-part2
topic2-part1
 
logs
 
broker 3
topic1-part1
 
logs
 
broker 4
topic1-part2
topic2-part2
topic1-part1
topic1-part2
topic2-part1
topic2-part2
topic2-part1
 
Maintain Strongly Consistent Replicas
 
One of the replicas is leader
All writes go to leader
Leader propagates writes to followers in order
Leader decides when to commit message
 
Conventional Quorum-based Commit
 
Wait for majority of replicas (e.g. Zookeeper)
Plus: good latency
Minus: 2f+1 replicas 
 tolerate f failures
ideally want to tolerate 2f failures
 
Commit Messages in Kafka
 
Leader maintains in-sync-replicas (ISR)
initially, all replicas in ISR
message committed if received by ISR
follower fails 
 dropped from ISR
leader commits using new ISR
Benefit: f replicas 
 tolerate f-1 failures
latency less an issue within datacenter
 
Data Flow in Replication
broker 1
producer
leader
broker 2
follower
broker 3
follower
4
2
2
3
commit
ack
topic1-part1
topic1-part1
topic1-part1
consumer
1
 
Only committed messages exposed to consumers
 independent of ack type chosen by producer
 
Extend to Multiple Partitions
 
Leaders are evenly spread among brokers
 
broker 1
 
broker 2
topic3-part1
 
follower
 
broker 3
topic3-part1
 
follower
 
broker 4
topic3-part1
 
leader
 
producer
topic2-part1
 
producer
 
leader
topic2-part1
 
follower
topic2-part1
 
follower
 
Handling Follower Failures
 
Leader maintains last committed offset
propagated to followers
checkpointed to disk
When follower restarts
truncate log to last committed
fetch data from leader
fully caught up 
 added to ISR
 
Handling Leader Failure
 
Use an embedded controller (inspired by Helix)
detect broker failure via Zookeeper
on leader failure: elect new leader from ISR
committed messages not lost
Leader and ISR written to Zookeeper
for controller failover
expected to change infrequently
1. ISR = {A,B,C}; Leader A commits message m1;
Example of Replica Recovery
m1
m2
m1
m1
m2
m3
L (A)
last committed
F (B)
F (C)
 
Outline
 
Overview of Kafka
Kafka architecture
Kafka replication design
Performance
Q/A
 
Setup
 
3 brokers
1 topic with 1 partition
Replication factor=3
Message size = 1KB
 
Choosing btw Latency and Durability
Producer Throughput
 
Consumer Throughput
 
Q/A
 
Kafka 0.8.0 (intra-cluster replication)
expected to be released in Mar
various performance improvements in the future
Checkout more about Kafka
http://kafka.apache.org/
Kafka meetup tonight
Slide Note
Embed
Share

Engineer at LinkedIn discusses Kafka architecture, deployment at LinkedIn, Kafka vs other messaging systems, and Kafka terminologies. Covers topics such as Kafka's usage, performance, and design. Includes images and examples of Kafka applications at LinkedIn.

  • Apache Kafka
  • LinkedIn
  • Messaging System
  • Data Replication
  • Data Processing

Uploaded on Feb 28, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Intra-cluster Replication for Apache Kafka Jun Rao

  2. About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM

  3. Outline Overview of Kafka Kafka architecture Kafka replication design Performance Q/A

  4. Whats Kafka A distributed pub/sub messaging system Used in many places LinkedIn, Twitter, Box, FourSquare What do people use it for? log aggregation real-time event processing monitoring queuing

  5. Example Kafka Apps at LinkedIn

  6. Kafka Deployment at LinkedIn Live data center Offline data center Live service Live service Live service interactive data (human, machine) Monitorin g Hadoop Hadoop Hadoop Kafka Kafka Kafka Kafka Kafka Kafka Per day stats writes: 10+ billion messages (2+TB compressed data) reads: 50+ billion messages

  7. Kafka vs. Other Messaging Systems Scale-out from groundup Persistence to disks High throughput (10s MB/sec per server) Multi-subscription

  8. Outline Overview of Kafka Kafka architecture Kafka replication design Performance Q/A

  9. Kafka Architecture Producer Producer Broker Broker Broker Broker Zookeeper Consumer Consumer

  10. Terminologies Topic = message stream Topic has partitions partitions distributed to brokers Partition has a log on disk message persisted in log message addressed by offset

  11. API Producer messages = new List<KeyedMessage<K,V>>(); messages.add(new KeyedMessage( topic1 , null, msg1 ); send(messages); Consumer streams[] = Consumer.createMessageStream( topic1 , 1); for(message: streams[0]) { // do something with message }

  12. Deliver High Throughput Simple storage logs in broker msg-1 msg-2 msg-3 msg-4 msg-5 msg-n topic1:part1 topic2:part1 index segment-1 segment-1 segment-2 segment-2 read() segment-n segment-n append() Batched writes and reads Zero-copy transfer from file to socket Compression (batched)

  13. Outline Overview of Kafka Kafka architecture Kafka replication design Performance Q/A

  14. Why Replication Broker can go down controlled: rolling restart for code/config push uncontrolled: isolated broker failure If broker down some partitions unavailable could be permanent data loss Replication higher availability and durability

  15. CAP Theorem Pick two from consistency availability network partitioning

  16. Kafka Replication: Pick CA Brokers within a datacenter i.e., network partitioning is rare Strong consistency replicas byte-wise identical Highly available typical failover time: < 10ms

  17. Replicas and Layout Partition has replicas Replicas spread evenly among brokers logs logs logs logs topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part1 topic2-part2 topic1-part1 topic1-part2 broker 1 broker 2 broker 3 broker 4

  18. Maintain Strongly Consistent Replicas One of the replicas is leader All writes go to leader Leader propagates writes to followers in order Leader decides when to commit message

  19. Conventional Quorum-based Commit Wait for majority of replicas (e.g. Zookeeper) Plus: good latency Minus: 2f+1 replicas tolerate f failures ideally want to tolerate 2f failures

  20. Commit Messages in Kafka Leader maintains in-sync-replicas (ISR) initially, all replicas in ISR message committed if received by ISR follower fails dropped from ISR leader commits using new ISR Benefit: f replicas tolerate f-1 failures latency less an issue within datacenter

  21. Data Flow in Replication producer 2 1 ack 2 leader follower follower 3 commit 4 topic1-part1 topic1-part1 topic1-part1 consumer broker 1 broker 2 broker 3 When producer receives ack Latency Durability on failures no ack no network delay some data loss wait for leader 1 network roundtrip a few data loss wait for committed 2 network roundtrips no data loss Only committed messages exposed to consumers independent of ack type chosen by producer

  22. Extend to Multiple Partitions producer leader follower follower topic1-part1 topic1-part1 topic1-part1 producer leader follower follower producer topic2-part1 topic2-part1 topic2-part1 follower follower leader topic3-part1 topic3-part1 topic3-part1 broker 1 broker 2 broker 3 broker 4 Leaders are evenly spread among brokers

  23. Handling Follower Failures Leader maintains last committed offset propagated to followers checkpointed to disk When follower restarts truncate log to last committed fetch data from leader fully caught up added to ISR

  24. Handling Leader Failure Use an embedded controller (inspired by Helix) detect broker failure via Zookeeper on leader failure: elect new leader from ISR committed messages not lost Leader and ISR written to Zookeeper for controller failover expected to change infrequently

  25. Example of Replica Recovery 1. ISR = {A,B,C}; Leader A commits message m1; L (A) F (B) F (C) m1 m2 m1 m1 m2 m3 last committed 2. A fails and B is new leader; ISR = {B,C}; B commits m2, but not m3 L (A) F (C) L (B) m1 m2 m1 m2 m1 m2 m3 3. B commits new messages m4, m5 F (C) L (A) L (B) m1 m2 m4 m5 m1 m2 m4 m5 m1 m2 m3 4. A comes back, truncates to m1 and catches up; finally ISR = {A,B,C} F (C) F (C) F (A) L (B) F (A) L (B) m1 m2 m4 m5 m1 m2 m4 m5 m1 m2 m4 m5 m1 m2 m4 m5 m1 m2 m4 m5 m1

  26. Outline Overview of Kafka Kafka architecture Kafka replication design Performance Q/A

  27. Setup 3 brokers 1 topic with 1 partition Replication factor=3 Message size = 1KB

  28. Choosing btw Latency and Durability When producer receives ack Time to publish a message (ms) Durability on failures no ack 0.29 some data loss wait for leader 1.05 a few data loss wait for committed 2.05 no data loss

  29. Producer Throughput varying messages per send varying # concurrent producers 70 70 60 60 50 50 MB/s MB/s 40 40 no ack no ack 30 30 leader leader 20 20 committed committed 10 10 0 0 1 10 100 1000 1 5 10 20 messages per send # producers

  30. Consumer Throughput throughput vs fetch size 100 80 60 MB/s 40 20 0 1KB 10KB 100KB 1MB fetch size

  31. Q/A Kafka 0.8.0 (intra-cluster replication) expected to be released in Mar various performance improvements in the future Checkout more about Kafka http://kafka.apache.org/ Kafka meetup tonight

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#