Apache Hadoop is free open Source software for massive distributed computation and Big Data storage. It can store PetaBytes of data and process it very fast. It does it using a cluster of many commodity servers (nodes) where each data node stores a portion of the data and is used as a compute node to process its local data. Hadoop eco systems use Map Reduce model to be able to compute huge amounts of data at great speed. The Map does the filtering and sorting in local nodes and reduces the summary operation on all nodes output .

Hadoop storage layer is based on HDFS – Hadoop Distributed file system. Where Hadoop splits files into large blocks and distributes them between the cluster data nodes.

HDFS storage have following advantages:

  1. Storage is shared between all data nodes and HDFS clients.
  2. high availability – during writes data blocks are replicated to other nodes as well so in case of failure the data is still available for read and writes on other nodes.
  3. Scalable – Scaling out by adding more data nodes storage and processing capacity grows accordingly.
  4. Performance: more data nodes, more IO throughput the cluster produces.

The Hadoop ecosystem can be extended  with sub modules , software packages that can be installed on top or alone side Hadoop , such as Apache Hive , Apache HBase , Apache Spark , Impala and so On .

Hadoop comes today in 2 main distributions which can be free or licensed:

  1. Cloudera – Distribution Including Apache Hadoop (CDH)
  2. Hortonworks Data Platform – HDP

Hortonworks and Cloudera have been merged in 2019.

Use Case:

Hadoop is a great solution for on premise Big Data warehouses or data lakes where data is stored in batches and fast analytics on huge data is required + the licence is free.

Our Services:

  • Big Data Architecture
  • Hadoop DevOps / DataOps: Cluster Installation, Performance Tuning, Upgrades, backup and recovery
  • Data Engineering using Python and PySpark.
  • ETL and Analytics Development using Impala and Spark
  • 24*7 Support


More Posts

We Are Here For You :

Or fill in your details and we will contact you ASAP: