The CR3x+1 project runs a parallel algorithm on the Cometa Grid Infrastructure for the search of Class Records in the 3x+1 Problem. The search up to 2^59 was recently completed, following a 15-month computational effort that delivered 150 new Class Records. Besides presenting these results, this paper introduces a few techniques adopted in this project that are aimed at keeping the search proceed on schedule despite of occasional failures and delays of running jobs. The main idea is that of exploiting the space-time tradeoff enabled by the availability of several hundreds of processors, so as to recover the failure of a sequential job by subsegmenting its search interval and reassigning the search for it to a collection of shorter parallel jobs. The implementation of this idea is made easier by a recent upgrade of the Grid Workload Management System (WMS), which supports collection-type jobs, but it has been developed to run both under the new WMS as well as under the legacy one. Furthermore, although the implementation of the main idea specifically refers to the subject Problem, it seems easily adaptable to a wider class of search programs over a countable search space. Examples of problems solved by this kind of programs are provided, as well as an attempt to characterize such a wider applicability class. A summary of the measured performance of the CR3x+1 search program, along its evolution throughout several running versions, is also presented. An overview of current research questions concludes the paper, that are aimed at full automation of job control and management for this program, and in perspective for any suitable program in the aforementioned wider class.
Results and programming techniques from the CR3x+1 project
SCOLLO, Giuseppe
2010-01-01
Abstract
The CR3x+1 project runs a parallel algorithm on the Cometa Grid Infrastructure for the search of Class Records in the 3x+1 Problem. The search up to 2^59 was recently completed, following a 15-month computational effort that delivered 150 new Class Records. Besides presenting these results, this paper introduces a few techniques adopted in this project that are aimed at keeping the search proceed on schedule despite of occasional failures and delays of running jobs. The main idea is that of exploiting the space-time tradeoff enabled by the availability of several hundreds of processors, so as to recover the failure of a sequential job by subsegmenting its search interval and reassigning the search for it to a collection of shorter parallel jobs. The implementation of this idea is made easier by a recent upgrade of the Grid Workload Management System (WMS), which supports collection-type jobs, but it has been developed to run both under the new WMS as well as under the legacy one. Furthermore, although the implementation of the main idea specifically refers to the subject Problem, it seems easily adaptable to a wider class of search programs over a countable search space. Examples of problems solved by this kind of programs are provided, as well as an attempt to characterize such a wider applicability class. A summary of the measured performance of the CR3x+1 search program, along its evolution throughout several running versions, is also presented. An overview of current research questions concludes the paper, that are aimed at full automation of job control and management for this program, and in perspective for any suitable program in the aforementioned wider class.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.