BigData

BigData platforms came to the world as the amount of data started to grow dramatically due to the use of web and mobile applications. lots of users’ data activity was gathered and needed to be stored on platforms with large and fast storage platforms and more computation resources to process this huge amount of data. These data platforms should be able to scale very easily without downtime in case the amount of data grows and give good response time to user and processes queries. Big data platforms use clusters to split the huge amount of data between many servers, each server stores a portion of the data in its local disk storage which is called sharding every shard is replicated to other nodes for high availability in case one node is down.

When the query is being executed all servers start to process the query in parallel each on its local data using the power of all nodes to process the data as fast as possible. Big Data platforms also use common architecture solutions:

  •    columnar storage –  instead of storing many columns in 1 row. Each column is stored in a dedicated storage segment. Thus when running a query on a smaller number of columns only these columns are being retrieved from the disk and not the whole record.

 

  • Compression – due to the repeating values of many records the column data is also compressed thus storing smaller data in the storage and speeding up queries that read fewer data from the disk.

 

  •  Cluster – Ability to distribute load and data among many servers and scale out incase more resources are needed

 

  •  sharding or partitions – data is distributed between many servers each server store and process portion of the whole data in shared nothing architecture

 

  • Parallelism – huge amount of data is being processed fast as many cluster servers are processing the data in the same time 

Traditional databases which use main storage  (shared everything architecture) could not store huge amounts of data (it will cost a lot)  , processing and running queries would take lots of time .

SeaData is an expert in the world of BigData and provides Data Architecture consulting services,  DataOps, and Data Engineering projects to leading companies in Israel and the world.

We use prem and cloud environments depending on customer use cases.

Some of the technologies we specialize in are listed below:

Hadoop

hadoop

Apache Hadoop is free open Source software for massively distributed computation and Big Data storage. It can store PetaBytes of data and process it very fast. It does it using a cluster of many commodity servers (nodes) where each data node stores a portion of the data and is used as a compute node to process its local data.

 

Read More about Hadoop

Google BigQuery

Google BigQuery is a fast, powerful, flexible, and cost-effective serverless data warehouse that’s tightly integrated with the other services on a Google Cloud Platform. Designed to help you make informed decisions quickly, the cloud-based data warehouse and analytics platform use a built-in query engine and a highly scalable serverless computing model to process terabytes of data in seconds and petabytes in minutes.

Read more about Google Big Query

Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 (Simple Storage Service), using standard SQL. Since Athena is serverless, there is no infrastructure to manage, and you pay only for the queries that you run.

Read More about Amazon Athena

Presto

Presto is an open-source, distributed SQL query engine that runs on Hadoop. It uses an architecture similar to a classic massively parallel processing (MPP) database management system and was designed for fast analytic queries against data of any size.

Read more about Presto 

Exasol

Exasol is a high-performance parallelized relational database management system (RDBMS) that runs on a cluster of standard computer hardware servers.

This database is designed to run in memory, although data is persistently stored on a disk following the ACID rules. 

Read More about Exasol

Vertica

Vertica is an elastically scalable, advanced SQL analytics database purpose-built to manage rapidly growing volumes of data, maximizing cloud economics for mission-critical big data analytics initiatives.

Read more about Vertica 

Amazon Redshift

Amazon Redshift is a fully managed, cloud-based big data warehouse service offered by Amazon.

The platform provides a storage system that stores petabytes of data in easy-to-access clusters that can be queried in parallel. Each of these nodes can be accessed independently by users and applications.

Read more about Amazon redshift  

Share:

More Posts

We Are Here For You :

Or fill in your details and we will contact you ASAP: