Submission #22: Towards Emergent Scheduling for Distributed Execution Frameworks
================================================================================
Authors
-------
1. Paul Allan Dean
(Lancaster University)
2. Barry Porter (Lancaster University)
Abstract
--------
The increased volume of data available to organisations, the computational resources available within data centres, and the demand for systems capable of processing Terabytes/Petabytes of data has influenced the development of numerous Distributed Execution Frameworks (DEFs), such as Apache Spark and Ray [1], [2]. DEFs have become a ubiquitous part of modern data processing systems, both as standalone processing frameworks and in a pipeline for a larger service, for example, feature extraction within a recommender system processing large volumes of data. The wide range of workload types for DEFs make it extremely challenging to build a scheduler able to maximise throughput for all cases, causing scheduling overheads for certain workload types, for example, Apache Spark adds latency to all scheduling decisions hindering the performance of latency-sensitive workloads. In turn, this has prompted the creation of DEFs with differing schedulers for Reinforcement learning [2], Machine Learning (Deep Learning) [3], [4], and real-time stream processing Workloads [5]–[7]. However, each of these DEFs has become highly specialised to a particular workload type, for example Ray focuses on being able to efficiently schedule Reinforcement learning workloads [2]. Other DEF schedulers have been developed to improve scalability [8]–[11], while limiting the scheduling policy to a single metric and losing guarantees for data-locality, significantly hindering the performance of data-intensive operations.Hybrid DEFs attempted to benefit from two scheduling architectures providing efficient scheduling for a larger set of workloads [11], at the cost of being unable to exploit the full performance of a single architecture. Previous approaches to adapting the schedulers of DEFs have been limited by their need for user intervention: policy adaptation requires user-provided completion time goals for incoming workloads [12]; architecture adaptation [13] requires user creation of a model for adapting a known workload; and learning based approaches to date have been limited to learning a single scheduling policy for a specific architecture [14]. Our research explores a new approach to scheduling within DEFs, an Emergent Scheduler, capable of autonomously selecting and composing the ideal architecture and policy for a given
workload at runtime, drawing on a large pool of potential building blocks. • Can we use behavioural composition (instead of policy adaptation) to form different optimal schedulers for different workloads / environments? • Can we learn the optimal composition at runtime? How much performance gain do we get vs a fixed architecture? We present initial results which demonstrate the performance of different scheduling strategies when being subjected to workloads of varying granularity, indicating different scheduling policies are better suited to different workloads. The results represent an interesting runtime search space for learning the ideal composition for a given workload. However, the learning process in this domain is uniquely challenging in that workloads can be highly divergent with few shared characteristics, are often non-repeating, and their arrival can be sporadic. This is a key area of future work, in addition to considering storage locations of data and more diverse scheduling architectures.