Software developed by UB scientists streamlines the management of large-scale computing using commercial systems

Picture of the LHCb experiment
Picture of the LHCb experiment
(29/06/2010)

Researchers from the Institute of Cosmos Sciences of the University of Barcelona (http://icc.ub.edu/) have developed DIRAC, a software package for managing high-power commercial computing systems that optimizes the use of large-scale resources. The software is already used to manage data processing in one of the main experiments carried out at the Large Hadron Collider (LHC), operated by the European Organization for Nuclear Research. The aim of the DIRAC project - funded under the CPAN project of the National Centre for Particle, Astroparticle and Nuclear Physics as part of the Consolider-Ingenio 2010 program - is to develop the software's capability to manage the computing resources needed by users in all areas of the scientific community. The software has been successfully tested in simulations carried out as part of the Belle experiment (Japan), using 2,000 Amazon Elastic Compute Cloud (EC2) processors.

Picture of the LHCb experiment
Picture of the LHCb experiment
29/06/2010

Researchers from the Institute of Cosmos Sciences of the University of Barcelona (http://icc.ub.edu/) have developed DIRAC, a software package for managing high-power commercial computing systems that optimizes the use of large-scale resources. The software is already used to manage data processing in one of the main experiments carried out at the Large Hadron Collider (LHC), operated by the European Organization for Nuclear Research. The aim of the DIRAC project - funded under the CPAN project of the National Centre for Particle, Astroparticle and Nuclear Physics as part of the Consolider-Ingenio 2010 program - is to develop the software's capability to manage the computing resources needed by users in all areas of the scientific community. The software has been successfully tested in simulations carried out as part of the Belle experiment (Japan), using 2,000 Amazon Elastic Compute Cloud (EC2) processors.

DIRAC manages the execution of the data processing systems used in the Large Hadron Collider (LHCb) experiment to identify the different types of particles measured. It also controls the execution of the algorithms used to select the most relevant data from the huge volume recorded (10 million collisions are needed to reconstruct a single Beauty Particle, which provides a basis for the study of the asymmetry between matter and antimatter in the universe). DIRAC also distributes the results across the Worldwide LHC Computing Grid (WLCG), a network of more than 300 computing centres in 57 countries across the world, 120 of which contribute to the LHCb project, and retrieves the data required for post-experiment analysis. DIRAC is the result of a collaborative initiative between experts from the ICCUB and the University of Santiago de Compostela.beauty

 
According to Ricardo Graciani, a researcher for the ICCUB and director of the DIRAC project, "computer simulation of the collisions and the response of the LHCb detector is crucial to our research, but it requires an enormous volume of computing resources". Constructing a computing grid with resources dedicated exclusively to data processing for a single experiment is expensive and comparatively inefficient, since, as Graciani explains, "there are peaks and troughs in the demand on the system". Graciani gives the example of the Belle project, an international initiative at the KEK particle accelerator (Japan) which conducts similar tests to the LHCb project. Belle has a data collection period of six months per year, and a further three months are needed to carry out the corresponding computer simulations. "These requirements lead to the over-provision of computing resources during the remainder of the year", explains Graciani, which entails "additional costs" for installation and operation.
 
Cost-benefit ratio
To optimize computing costs, the team of Spanish researchers worked with the CPAN project to adapt the DIRAC software for managing resources created by Amazon, a leading provided of online computing services. Together with scientists from the University of Melbourne, the ICCUB researchers carried out simulations for the Belle project using 250 Amazon EC2 virtual machines, which provide the equivalent power of 2,000 networked processors. "The first results show that these new resources provide over 95% efficiency", says Graciani.
 
The test was carried out over a ten-day period and the results have been submitted for publication in the Journal of Grid Computing. During the 7,500 computing hours, an operating peak of 2,000 processors running simultaneously was reached for 18 hours. "DIRAC enables us to harness the flexibility of the Amazon system to optimize resource use according to specific requirements", explains Graciani. Calculations from the exercise have given an estimated cost of US$6,000 for 1,426 simulations (equivalent to 120 million collisions or 2,700 GB of experimental data).
 
"The cost-benefit ratio, which can still be improved, will help us to assess the suitability of resources offered by Amazon or other companies, identify possible grid-based resources, or select a suitable combination of the two solutions", says Graciani. Major international collaborations such as the LHC project or global biotechnology networks rely on the computational support of high-power systems. Ultimately, the ICCUB team aims to extend the capacities of DIRAC to make it compatible with the computing requirements of researchers working in any scientific discipline who run commercial compute capacity systems alongside shared grid resources or in-house packages. As Graciani explains, this will make it possible to "optimize the cost of computing resources".