What is Cassandra databse?

Apache Cassandra is a free, open-source, NoSQL distributed database which was initially developed to be used by Facebook. It belongs to the NoSQL Column Oriented  Database , which has a RDBMS-like schema (keyspace), with tables and tables with columns .

Cassandra Cluster is masterless; the data is distributed evenly between its nodes to support scaleout .In addition data is replicated between nodes to support high availability .

The cluster nodes communicate between each other to understand the cluster and nodes state and on which node to write and read the data .

While Cassandra belongs to the NoSQL family, it has a CQL (SQL Like) interface. This enables the user to create tables, primary keys, cluster indexes and write “SQL” to insert and select data. This database distributes the data between the nodes according to the primary key values, thus balancing the read /writes requests between the nodes evenly.


In which cases is Cassandra the right choice?

Cassandra is a best fit for storing large amounts of structured data with a massive number of read and write requests. From 100k Ops (operations per second) to millions of OPS,  it’s good for storing users, user activities, sessions, IOT and time-series data . 

Cassandra will not be the best fit for unstructured data and document database as MongoDB or Couchbase.


Benefits of using Cassandra database

What makes Cassandra one of the most popular databases? Well, for most it’s the platform’s ability to store very large amounts of data in a reliable way. 

Other benefits of Cassandra include:


    1. High performance database – it supports millions of operations per second.
    2. High Availability and Data replication –  Data is replicated between nodes and even if node is down the system continue to operate regularly without downtime  
    3. Use CQL – SQL Like Syntax – easy for developers and analysts to work with 
    4. Open source – open source means that the database is free to use, modify and distribute. This means that costs are heavily reduced compared to proprietary databases.
    5. Masterless – Cassandra is masterless, meaning there’s no need to define a master node, as every node writes and reads its local shard data, bringing high availability and speed.
    6. Can be easily scaled out and in – by Adding nodes or decommission nodes. After each operation the cluster is rebalancing the data between the nodes.
    7. Multi Regional DC Master Master Replication – support multi zone master master replication .


Consideration when using Cassandra 

Although Cassandra is a favorite, there are some downsides to it that should be considered.


  1. It’s not ACID compliant – Cassandra DB doesn’t support ACID and relational data properties, meaning you can not commit or rollback transaction data consistency between nodes is not guaranteed.   
  2. Possible latency/memory issues – since it handles large amounts of data and many requests, transactions may slow down and cause latency and JVM memory management issues – Need to know how to optimize the system correctly to not reach this situation
  3. Data Repair – Requires to run repairs when node is recovered after a long time or after a long network issue between nodes or DC.
  4. Aggregation and joins – Aggregation and joins are not supported.


SeaData Services  

Whether you’re currently using Cassandra or thinking about implementing it as your database, there might be some issues you come across. These issues might seem big, but can be easily solved and prevented with the help of professional experts.


Sea Data is a leader in all Cassandra DBA services, including:

Big Data Architecture and Cassandra schema design

Cassandra DBA – Cluster Installation, performance tuning, backup and recovery.

Data Engineering Using Python and PySpark

Application development using Python and Node JS.

24*7 Support


 Contact us to help you troubleshoot or set up your Cassandra database. 


More Posts

We Are Here For You :

Or fill in your details and we will contact you ASAP: