Understanding Cassandra - A High Availability Database Solution

Slide Note
Embed
Share

Cassandra, a massively scalable open-source NoSQL database, offers high availability and resilience with its peer-to-peer architecture, replication, and continuous availability across multiple data centers. Its query language, CQL, provides SQL-like functionality with some limitations. Explore the features, architecture, and usage of Cassandra in this comprehensive overview.


Uploaded on Sep 18, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. HDB++: HIGH AVAILABILITY WITH Page 1 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  2. OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  3. OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 3 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  4. WHAT IS CASSANDRA? Mythology: an excellent Oracle not believed. A massively scalable open source NoSQL (Not Only SQL) database Created by Facebook Open Source since 2008 Apache license, 2.0, compatible with GPLV3 Page 4 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  5. WHAT IS CASSANDRA? Peer to peer architecture No Single Point of Failure Replication Continuous Availability Multi Data Centers support 100s to 1000s nodes Java High Write Throughput Read efficiency Page 5 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  6. WHAT IS CASSANDRA? Source: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Page 6 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  7. OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 7 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  8. WHO IS USING CASSANDRA? Page 8 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  9. OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 9 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  10. CASSANDRA QUERY LANGUAGE CQL: Cassandra Query Language Very similar to SQL But restrictions and limitations JOIN requests are forbidden No subqueries String comparisons are limited (when not using SOLR) select * from my_table where mystring like %tango% No OR operator Can only apply a WHERE condition on an indexed column (or primary key) Page 10 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  11. CASSANDRA QUERY LANGUAGE Collections (64K Limitation): list set map TTL INSERT = UPDATE (UPSERT) Doc: http://www.datastax.com/documentation/cql/3.1/cql/cql_intro_c.html cqlsh Page 11 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  12. CASSANDRA QUERY LANGUAGE CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro ( att_conf_id timeuuid, period text, data_time timestamp, data_time_us int, value_r double, quality int, error_desc text, PRIMARY KEY ((att_conf_id ,period),data_time,data_time_us) ) WITH comment='Scalar DevDouble ReadOnly Values Table ; Page 12 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  13. CASSANDRA QUERY LANGUAGE CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro ( att_conf_id timeuuid, period text, data_time timestamp, data_time_us int, value_r double, quality int, error_desc text, PRIMARY KEY ((att_conf_id ,period),data_time,data_time_us) ); Partition key Clustering columns Page 13 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  14. OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 14 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  15. CASSANDRA ARCHITECTURE Node: one Cassandra instance (Java process) Token Range +263-1 -263 Node 2 Node 4 Node 1 Node 3 Node 6 Node 8 Node 5 Node 7 Page 15 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  16. CASSANDRA ARCHITECTURE Partition: ordered and replicable unit of data on a node identified by a token Partitioner (based on mumur3 algorithm by default) will distribute the data across the nodes. Token Range +263-1 -263 Node 2 Node 4 Node 1 Node 3 Node 6 Node 8 Node 5 Node 7 Page 16 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  17. CASSANDRA ARCHITECTURE Rack: logical set of nodes Token Range +263-1 -263 Rack 3 Node 2 Node 4 Rack 1 Node 1 Node 3 Node 6 Node 8 Rack 4 Node 7 Node 5 Rack 2 Page 17 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  18. CASSANDRA ARCHITECTURE Data Center: logical set of racks Token Range +263-1 Data Center 1 Data Center 2 -263 Rack 3 Node 2 Node 4 Rack 1 Node 1 Node 3 Node 6 Node 7 Rack 4 Node 7 Node 5 Rack 2 Page 18 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  19. REQUEST COORDINATION Cluster: full set of nodes which maps to a single complete token ring Cassandra Cluster Token Range +263-1 Data Center 2 Data Center 1 -263 Rack 3 Node 2 Node 4 Rack 1 Node 1 Node 3 Node 6 Node 7 Rack 4 Node 7 Node 5 Rack 2 Page 19 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  20. OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 20 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  21. REQUEST COORDINATION Coordinator: the node chosen by the client to receive a particular read or write request to its cluster Data Center 1 Node 1 Node 4 Node 2 Node 3 Client Page 21 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  22. REQUEST COORDINATION Coordinator: the node chosen by the client to receive a particular read or write request to its cluster Data Center 1 Node 1 Coordinator Node 4 Node 2 Node 3 Client Page 22 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  23. REQUEST COORDINATION Coordinator: the node chosen by the client to receive a particular read or write request to its cluster Data Center 1 Node 1 Coordinator Read/Write Node 4 Node 2 Node 3 Client Page 23 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  24. REQUEST COORDINATION Any node can coordinate any request Each client request may be coordinated by a different node Data Center 1 Node 1 Coordinator No Single Point of Failure Node 4 Node 2 Acknowledge Node 3 Client Page 24 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  25. REQUEST COORDINATION The Cassandra driver chooses the coordinator node Round-Robin pattern, token-aware pattern Client library to manage requests Many open source drivers for many programming languages Python Java C++ Node 1 Node.js Perl C# Node 4 Node 2 Go PHP Clojure Scala R (GNU S) Ruby Node 3 ODBC Erlang Haskell Coordinator Driver Client Rust Page 25 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  26. REQUEST COORDINATION The coordinator manages the replication process Replication Factor (RF): onto how many nodes should a write be copied The write will occur on the nodes responsible for that partition 1 RF (#nodes in cluster) Every write is time-stamped Node 1 Coordinator Node 4 Node 2 RF=3 Node 3 Driver Client Page 26 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  27. REQUEST COORDINATION The coordinator manages the replication process Replication Factor (RF): onto how many nodes should a write be copied The write will occur on the nodes responsible for that partition 1 RF (#nodes in cluster) Every write is time-stamped Node 1 Coordinator RF=3 Node 4 Node 2 Node 3 Driver Client Page 27 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  28. OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 28 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  29. CONSISTENCY The coordinator applies the Consistency Level (CL) Consistency Level (CL): Number of nodes which must acknowledge a request Examples of CL: ONE TWO THREE ANY ALL QUORUM (= RF/2 + 1) EACH_QUORUM LOCAL_QUORUM CL may vary for each request On success, the coordinator notifies the client (with most recent partition data in case of read request) Node 4 Page 29 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  30. CONSISTENCY ONE - READ - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Node 4 Digest Read Request (Hash) + eventual read repair Page 30 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  31. CONSISTENCY ONE - READ - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Node 4 Digest Read Request (Hash) + eventual read repair Page 31 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  32. CONSISTENCY ONE READ - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Node 4 Digest Read Request (Hash) + eventual read repair Page 32 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  33. CONSISTENCY ONE - READ - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Node 4 Digest Read Request (Hash) + eventual read repair Page 33 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  34. CONSISTENCY QUORUM READ - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Node 4 Digest Read Request (Hash) Page 34 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  35. CONSISTENCY QUORUM READ - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Node 4 Digest Read Request (Hash) Page 35 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  36. CONSISTENCY QUORUM READ - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Direct Read Request Node 4 Digest Read Request (Hash) Page 36 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  37. CONSISTENCY QUORUM READ - SINGLE DC Coordinator Node 1 Client Driver Node 2 Node 6 In case of inconsistency: the most recent data is returned RF=3 Node 3 Node 5 Direct Read Request Node 4 Digest Read Request (Hash) Page 37 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  38. CONSISTENCY QUORUM READ - SINGLE DC Coordinator Node 1 Client Driver Node 2 Node 6 Read repair if needed RF=3 Node 3 Node 5 Direct Read Request Node 4 Digest Read Request (Hash) Page 38 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  39. CONSISTENCY ONE WRITE - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Node 4 Page 39 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  40. CONSISTENCY ONE WRITE - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 40 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  41. CONSISTENCY ONE WRITE - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Node 4 Page 41 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  42. CONSISTENCY ONE WRITE - SINGLE DC Coordinator SUCCESS Node 1 Driver Client Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 42 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  43. CONSISTENCY ONE WRITE - SINGLE DC Coordinator SUCCESS hint Node 1 max_hint_window_in_ms property in cassandra.yaml file Driver Client Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Hinted handoff mechanism Page 43 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  44. CONSISTENCY ONE WRITE - SINGLE DC Coordinator hint Node 1 max_hint_window_in_ms property in cassandra.yaml file Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Node 4 Hinted handoff mechanism Page 44 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  45. CONSISTENCY ONE WRITE - SINGLE DC Coordinator hint Node 1 max_hint_window_in_ms property in cassandra.yaml file Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Node 4 Hinted handoff mechanism Page 45 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  46. CONSISTENCY ONE WRITE - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Node 4 Hinted handoff mechanism Page 46 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  47. CONSISTENCY if node downtime > max_hint_window_in_ms Node 4 Anti-entropy node repair Page 47 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  48. CONSISTENCY QUORUM WRITE - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 RF=3 Node 3 Node 5 Write Request Node 4 Page 48 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  49. CONSISTENCY QUORUM WRITE - SINGLE DC Coordinator Node 1 Driver Client Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 49 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

  50. CONSISTENCY QUORUM WRITE - SINGLE DC Coordinator SUCCESS Node 1 Driver Client Node 2 Node 6 ACK RF=3 Node 3 Node 5 Node 4 ACK Page 50 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

More Related Content