Data Warehouse (DWH) and BI

Data Warehouse is a data platform where organisations store all their information from external or internal sources . an external source can be data produced by Google analytics or facebook .Internal sources can be systems like CRM  and Accounting systems .To be able to move all data from different sources types that use different access way (Rest API , SQL Query Syntax) Data Engineer need build data pipelines which Extract , Transform and Load (ETL) the data from DWH sources and loaded into the Data Warehouse .
It’s important that all data will be in a single place so it can be analysed easily .
Data warehouse is used by business intelligence (BI)  , data analysts to get insights from the company activities and how to improve it by data science to do deep learning and to find behaviour patterns and AI algorithms which analyze big amounts of data and give the user the exact required information or find suspicious behaviour.

Following are examples for analysis processes that can run on a data warehouse .

  1. Understand users / customer activity (Profiling) and how to improve it to increase income 
  2. Cross application Data integrity checks  to verify data consistency , no data has been lost and there are no discrepancies .Data loss and discrepancies can lead you to the company losing money. 

Data warehouses are the biggest database in the organization as it stores all organization data .

For Data Warehouse (DWH) and BI we use three types of database solutions depending on the DWH size and the data platform the customer uses:

  1. Small DWH – Uses the same database brand the customer already uses so
    The application is fully compatible with the database that is already in use.
    The databases can be: Oracle, SQL Server, PostgreSQL and MySQL.

The first three listed above are more compatible as they support parallel query execution and analytic functions, MySQL only supports Version 8 and above.

  1. Medium Data Warehouse – Uses analytic columnar databases such as Exasol, Vertica and Redshift. These databases support scale out for extending CPU, RAM and Disk resources and are best suited for analytic queries. These are best suited for databases from 1 TB up to 10’s TBs of data.
  2. Big Data Warehouse – Uses Big Data Platforms Such as Hadoop, Google BigQuery, Amazon Athena or Presto over Cloud Storage. These data platforms are suited for 100’S TB storage. They scale out of the box and can give great response time for huge amounts of data and queries running on huge amounts of data.

 

For ETL tools or frameworks we use Apache Air Flow or talend.

 

It is important to define the correct DWH data model to have the best read and write performance and that it will answer all product / customer business requirements . In order to achieve this you need a good Data Architect.

Our Data Engineers use Python (small/medium DWH) and Spark (Huge Data Warehouse) to extract the data from source systems or messages queue systems like Kafka and RabbitMQ transform and load it into the DWH platform.

We use a data lake so all customer dimensions, fields and statistics are stored in one or more tables for easy and fast query analysis.

For vitalization we use Tableau, Qlik Sense and Power BI. Our experts build the optimized data layer above the DWH platform in order to best access the data to the data analysts and business stakeholders.

Contact Us today to hear more about our solutions. We will help build the best DWH to suit your business needs at the lowest cost and the best performance.

Share:

More Posts

Hadoop

 Apache Hadoop is free open Source software for massive distributed computation and Big Data storage. It can store

Read More »

We Are Here For You :

Or fill in your details and we will contact you ASAP: