Scheduling Techniques in Different Architectures of Data-parallel Clusters for High Performance
Committee Members: Haiying Shen (Advisor), Andrew Grimshaw (Chair), Mary Lou Soffa, David Evans, Zongli Lin
Abstract:
Many previous studies show that current production clusters process increasingly diverse jobs with various job characteristics (e.g., input data size, shuffle data size, and output data size). First, previous works have shown that there are a large number of shuffle-heavy jobs in current production workloads, which may result in network bottleneck in the clusters and hence greatly degrades the performance of the clusters. Second, the small jobs (i.e., jobs with small data to process) often dominate the workloads in production. However, current architectures and schedulers of data-parallel clusters were originally built to process data-intensive jobs that have large input datasets. The mismatch between the actual workloads and design objectives results in poor performance of the jobs in the clusters.
The key contribution of this dissertation is designing job schedulers in different architectures of data-parallel clusters to handle the diverse workloads. First, we design a job scheduler that can schedule tasks carefully to avoid and reduce the network congestion caused by the large amount of shuffle-heavy jobs. Second, we present a job scheduler that can efficiently exploit optical circuit switch in hybrid electrical/optical datacenter network to improve the job performance by finding the optimal tradeoff between the parallelism and traffic aggregation. Third, recent works advocate hybrid scale-up/scale-out clusters (in short Hybrid clusters) to handle the workloads that consist of a majority of jobs with small input data sizes and a small number of jobs with large input data sizes. We design job placement and data placement strategies in Hybrid cluster to address the challenges, which can significantly improve the performance of workloads with a large amount of small jobs.