Loading…
PrestoCon 2020 has ended
Back To Schedule
Thursday, September 24 • 12:35pm - 1:05pm
Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse - Gene Pang & Calvin Jia, Alluxio, Inc.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Presto is commonly used to query existing Hive data warehouses. Due to existing applications, tech debt, or previous operational challenges, Presto may not be able to achieve its full potential but bound and limited by past decisions. Challenges include overloaded Hive Metastore, unoptimized data layouts such as too many small files, or lack of influence over existing Hive applications.

Ideally, Presto would access data independently from how the data was originally managed. Alluxio, as a data orchestration layer provides the physical data independence for Presto to interact with the data more efficiently. In addition to caching, Alluxio provides a catalog service to abstract the table metadata, and transformations to expose compute-optimized data. In this talk, Gene describes the challenges of using Presto with Hive, and discusses how Alluxio data orchestration can solve them.

Speakers
avatar for Calvin Jia

Calvin Jia

Software Engineer, Alluxio
Calvin Jia is the top contributor of the Alluxio project. He has been involved as a core maintainer and release manager since the early days when the project was known as Tachyon. Calvin has a B.S. from the University of California, Berkeley.
avatar for Gene Pang

Gene Pang

Head Architect, Alluxio, Inc.
Gene Pang is the PMC Maintainer of the Alluxio open source project and a founding member of Alluxio, Inc. He graduated with a Ph.D. from the AMPLab at UC Berkeley, working on distributed database systems. Before starting at Berkeley, he worked at Google and has an M.S. from Stanford... Read More →


Thursday September 24, 2020 12:35pm - 1:05pm PDT