Evaluating Reliability Techniques in the Master-Worker Paradigm
MetadataShow full item record
A distributed system is considered that carries out computational tasks according to the master-worker paradigm. A master has a set of computational tasks to resolve. She assigns each task to a set of workers over the Internet, instead of computing the task locally. For each task each worker reply to the master with the task result. Since the task was not computed locally, the master can not trust the result for two main reasons: (i) workers might deliberately provide an incorrect result, (ii) the result is corrupted due to some hardware or software failure during the execution of the task. Given the above, we can model our workers as either “altruistic”, always willing to provide the correct result to each task, or “troll” that are trying to provide an incorrect result to each task. Moreover we model the failure of the worker to comply with her intended behavior, as an error probability �. The goal of the master is to compute the correct result of all the tasks with high probability. In the literature two techniques have been used to achieve this goal: (i) “voting”, that determines the correct result of a task given multiple replies of distinct workers; (ii)“challenges”, that are tasks whose result is known and can be used to detect altruistic workers. What separates our work from the current literature is the realistic modelling of the worker’s behavior and the fact that we do not restrict the task result to a binary set of answers; the domain of possible replies for a task can have multiple correct and multiple incorrect results. Given the above we evaluate the performance of the two techniques described in the literature in the scenario where � = 0 and when � > 0. Performance is measured in terms of: (1) time, i.e., the number of rounds performed by an algorithm for the computation of all the tasks, and (2) work, i.e., the number of total task computations performed by the workers. The case where � = 0 is used as a best case scenario that provides the optimal time and work bounds of the problem. In the case where � > 0 we propose two “natural” algorithms: one using a combination of both voting and challenges, and a second one using only voting. Both algorithms assume that certain system parameters are known. Since this might not always be the case we also provide an algorithm that estimates correctly these parameters with high probability.