## Online Scheduling in Fault-prone Systems: Performance Optimization and Energy Efficiency

##### Autor(es)

Zavou, Elli##### Supervisor(es)/Director(es)

Fernández Anta, Antonio##### Fecha

2016-09-30##### Resumen

Everyone is familiar with the problem of online scheduling (even if they are not aware of it), from the way we prioritize our everyday decisions to the way a delivery service must decide on the route to follow in order to cover the ongoing requests. In computer science, this is a problem of even greater importance. This thesis considers two main families of online scheduling problems in computer science, and aims to provide an extended clear framework for their analysis, presenting at the same time some common characteristics that connect these problems.
The first and main family of online scheduling problems considered, is task scheduling in fault-prone computing systems. As the number of clients and the possibilities offered by the rapid development of computing systems, grow with time, the increase of demands of computationally-intensive tasks is inevitable. Uniprocessors are no longer capable of coping with the escalation of these demands, which among others, has led to the development of multicore-based parallel machines, Internet-based computing platforms and co-operational distributed systems. Nonetheless, the challenges of these systems, even of the simplest ones, are numerous: They have to deal with continuous dynamic requests from the clients, which are probably not of the same nature (require different amount of computational resources). The processing elements (i.e., machines) may suffer from unpredictable failures, either malicious or due to overload. Furthermore, depending on the size of these systems and the exact processing units, their power consumption may be of significant amount; even equal to the electricity needed for a small town. Hence, limiting their power consumption is another challenge.
To analyze such a system one must consider the online nature of the problem; the dynamic task arrivals (client requests) of different sizes (computational demands), and the unpredictable machine crashes and restarts (failures). It is important to give guarantees for the performance of the algorithms used in these systems, thus the thesis conducts worst-case competitive analysis and covers a significant level of the three dimensions of the problem. More precisely, it studies the effects of the number of machines, the number of different task sizes and the speed of the machines – which as will be explained through the thesis, affects the power consumption of the system – on the efficiency of online scheduling algorithms. As performance measures, this thesis uses the completed load, the pending load and the latency competitiveness of the algorithms. In some cases, it considers the long-term competitiveness versions of these measures as well.
One of the most important results shown, is that resource augmentation in the form of increasing the machine speedup, is necessary in order to achieve some competitiveness, or to reach optimal competitiveness. The sufficient amount of speedup is found, and online algorithms that achieve the desired competitiveness are proposed and analyzed. Apart from the algorithms designed, some of the most widely used algorithms in scheduling are also analyzed in the model considered for the first time; namely, Longest In System (LIS), Shortest In System (SIS), Largest Processing Time (LPT), and
Smallest Processing Time (SPT). Nonetheless, deciding on the best algorithm between them, is not easy. Each algorithm behaves better with respect to a different evaluation metric and under different model parameters.
The second family of problems considered, is packet scheduling over an unreliable wireless communication link. As claimed, these problems have a strong connection to the task scheduling problem, especially when considering one machine and no speedup, hence some of the results can be shared. A setting with a single pair of nodes is considered, connected through an unreliable wireless channel. The sending station transmits packets to a receiving station over the channel, which can be jammed and hence corrupt the packet being transmitted. First, worst-case scenarios are assumed for the channel jams, modeled by a malicious adversarial entity. The packet arrivals however, follow a stochastic distribution and competitive analysis of scheduling algorithms is pursued giving matching bounds for the most pessimistic scenarios of channel jams. The aim of the algorithms is to find the schedule (or order or transmission of the arriving packets) in order to maximize the asymptotic throughout, which corresponds to the long-term competitive ratio of total length of successfully transmitted packets.
Then, a slightly different problem is considered, assuming infinite amount of data to be transmitted over the same unreliable communication link. This time however, an adversarial entity with constrained power is assumed for the channel jams. The constrained power is modeled by an Adversarial Queueing Theory (AQT) approach, defined with two main parameters; p, the error availability rate, and o, the maximum batch of errors available to the adversary at any time. This is the first time AQT is used to model channel jams; it has been mostly used to model the packet arrivals in networking problems. In this problem, the scheduling algorithms must decide on the length of the packets to be transmitted, with the objective of maximizing the goodput rate; the rate of successfully transmitted load. It is seen, that even for the simplest settings, the analysis and results are not trivial.