Presto is an open-source, distributed SQL query engine that runs on Hadoop. It uses an architecture similar to a classic massively parallel processing (MPP) database management system and was designed for fast analytic queries against data of any size.

The system was initially designed at Facebook as they needed to run interactive queries against large data warehouses in Hadoop. It was explicitly designed to fill the gap/need to be able to run fast queries against data warehouses storing petabytes of data.

Presto supports both non-relational sources, such as the Hadoop Distributed File System (HDFS), Amazon S3, Cassandra, MongoDB, and HBase, and relational data sources such as MySQL, PostgreSQL, Amazon Redshift, Microsoft SQL Server, and Teradata.

Presto has one coordinator node working in sync with multiple worker nodes. Users submit their SQL query to the coordinator which uses a custom query and execution engine to parse, plan, and schedule a distributed query plan across the worker nodes.

It is designed to support standard ANSI SQL semantics, including complex queries, aggregations, joins, left/right outer joins, sub-queries, window functions, distinct counts, and approximate percentiles.

Presto can query data where it is stored, without needing to move data into a separate analytics system. Query execution runs in parallel over a pure memory-based architecture, with most results returning in seconds.

Use Cases:

Big Data Platform for Data Warehouse and analytic database

Our Services:

Data Architecture, Data Modelling , Data Engineering and Development , Data Analysis


More Posts

We Are Here For You :

Or fill in your details and we will contact you ASAP: