Table Of ContentAn Adaptive Partitioning Scheme for Ad-hoc and
Time-varying Database Analytics
by
Anil Shanbhag
B.Tech. in Computer Science
Indian Institute of Technology Bombay, 2014
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2016
(cid:13)c Massachusetts Institute of Technology 2016. All rights reserved.
Author ..............................................................
Department of Electrical Engineering and Computer Science
May 19, 2016
Certified by..........................................................
Samuel Madden
Professor of Electrical Engineering and Computer Science
Thesis Supervisor
Accepted by.........................................................
Leslie A. Kolodziejski
Chairman, Department Committee on Graduate Students
2
An Adaptive Partitioning Scheme for Ad-hoc and
Time-varying Database Analytics
by
Anil Shanbhag
Submitted to the Department of Electrical Engineering and Computer Science
on May 19, 2016, in partial fulfillment of the
requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
Abstract
Data partitioning significantly improves query performance in distributed database
systems. A large number of techniques have been proposed to efficiently partition a
dataset, often focusing on finding the best partitioning for a particular query work-
load. However, many modern analytic applications involve ad-hoc or exploratory
analysiswhereusersdonothavearepresentativequeryworkload. Furthermore, work-
loads change over time as businesses evolve or as analysts gain better understanding
of their data. Static workload-based data partitioning techniques are therefore not
suitable for such settings. In this thesis, we present Amoeba, an adaptive distributed
storage system for data skipping. It does not require an upfront query workload and
adapts the data partitioning according to the queries posed by users over time. We
present the data structures, partitioning algorithms, and an efficient implementation
on top of Apache Spark and HDFS. Our experimental results show that the Amoeba
storage system provides improved query performance for ad-hoc workloads, adapts to
changes in the query workloads, and converges to a steady state in case of recurring
workloads. On a real world workload, Amoeba reduces the total workload runtime
by 1.8x compared to Spark with data partitioned and 3.4x compared to unmodified
Spark.
Thesis Supervisor: Samuel Madden
Title: Professor of Electrical Engineering and Computer Science
4
Acknowledgments
I would like to thank Alekh Jindal, Qui Nguyen, Aaron Elmore, Jorge Quiane and
Divyakanth Agarwal who have contributed many ideas to this work and helped build
the system.
I would also like to thank Prof. Samuel Madden, my thesis supervisor, for being
a constant source of guidance and feedback in this project and outside.
Finally, I am always grateful to my family and friends, who encouraged me and
supported me along the way.
5
6
Contents
1 Introduction 13
2 Related Work 17
3 System Overview 21
4 Upfront Data Partitioning 23
4.1 Key Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Upfront Partitioning Algorithm . . . . . . . . . . . . . . . . . . . . . 27
5 Adaptive Repartitioning 31
5.1 Workload Monitor and Cost Model . . . . . . . . . . . . . . . . . . . 32
5.2 Partitioning Tree Transformations . . . . . . . . . . . . . . . . . . . . 33
5.3 Divide-And-Conquer Repartitioning . . . . . . . . . . . . . . . . . . . 36
5.4 Handling Multiple Predicates . . . . . . . . . . . . . . . . . . . . . . 39
6 Implementation 41
6.1 Initial Robust Partitioning . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2 Query Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7 Discussion 45
7.1 Leveraging Replication . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.2 Handling Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8 Evaluation 47
8.1 Upfront Partitioning Performance . . . . . . . . . . . . . . . . . . . . 47
8.2 Micro-benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7
8.3 Amoeba on Real Workload . . . . . . . . . . . . . . . . . . . . . . . 53
9 Conclusion 55
Appendices 60
A Fast Remote Reads 61
8
List of Figures
1-1 Example partitioning tree with 8 blocks . . . . . . . . . . . . . . . . . 14
3-1 Amoeba Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4-1 Partitioning Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . 25
4-2 Upfront Partitioning Algorithm Example. . . . . . . . . . . . . . . . . 26
5-1 Node swap in the partitioning tree. . . . . . . . . . . . . . . . . . . . 34
5-2 Illustrating adaptive partitioning when predicate A appears repeatedly. 35
2
5-3 Node pushdown in partitioning tree. . . . . . . . . . . . . . . . . . . 35
5-4 Node rotation in partitioning tree. . . . . . . . . . . . . . . . . . . . . 35
7-1 Heterogenous Replication. . . . . . . . . . . . . . . . . . . . . . . . . 45
8-1 Ad-hoc query runtimes for different attributes of TPC-H lineitem. . . 48
8-2 Comparing the upload time in Amoeba with HDFS . . . . . . . . . . 49
8-3 Comparing performance of upfront partition tree vs kd-tree . . . . . . 49
8-4 Query runtimes for changing query attributes on TPC-H lineitem. . . 51
8-5 Query runtimes for changing predicates on the same attribute of TPC-
H lineitem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8-6 Cumulative Optimizer Runtime Across 100 queries . . . . . . . . . . 52
8-7 Cumulative Repartitioning Cost . . . . . . . . . . . . . . . . . . . . . 52
8-8 Total runtimes of the different approaches . . . . . . . . . . . . . . . 53
A-1 Response time with varying data locality (%) . . . . . . . . . . . . . 61
9
10
Description:suitable for such settings. In this thesis, we present Amoeba, an adaptive distributed . Adaptive partitioning/indexing is extensively used in modern single node in- memory column stores Similar to MAGIC, both Oracle and MySQL.