# Special Session: Coding Efforts

This session is partially sponsored by the EU FP7 project PRACE-2IP under grant agreement number RI-283493, and organized by the Cyprus Institute (CASTORC).

PRACE: Partnership for Advanced Computing in Europe
Claudio Gheller
Fri, 14:00, Seminar Room G -- Parallels 9G (Slides)

tba

Back to Programme    Back to Participants    Back to Contributions

PLQCD library for Lattice QCD on multi-core machines
Abdou Abdel-Rehim, Constantia Alexandrou, Giannis Koutsou, Nikos Anastopoulos , Nikela Papadopoulou
Fri, 14:20, Seminar Room G -- Parallels 9G (Slides)

PLQCD is a stand alone software library developed under PRACE for lattice QCD. It provides an implementation of the Dirac operator for Wilson type fermions and few efficient linear solvers. The library is optimized for multi-core machines using a hybrid parallelization with openMP+MPI. The main objectives of the library is to provide a scalable implementation of the Dirac operator as well as to speed the computation of the quark propagator. In this talk a description of the PLQCD library is given together with some test results.

Back to Programme    Back to Participants    Back to Contributions

Recent development in the tmLQCD software suite
Carsten Urbach
Fri, 14:40, Seminar Room G -- Parallels 9G (Slides)

tmLQCD is a software suite for lattice QCD simulations. It offers a wide range of options for simulations with Wilson type fermions. We present an overview of recent developments in the tmLQCD software suite. In particular, we will discuss the performance obtained on BG/Q and other architectures. Moreover, we review the implemented dirac operators and actions.

Back to Programme    Back to Participants    Back to Contributions

Bridge++: an object-oriented C++ code for lattice simulations
Satoru Ueda, Sinya Aoki, Tatsumi Aoyama, Kazuyuki Kanaya , Hideo Matsufuru, Shinji Motoki, Yusuke Namekawa, Hidekatsu Nemura, Yusuke Taniguchi, Naoya Ukita
Fri, 15:00, Seminar Room G -- Parallels 9G (Slides)

We are developing a new code set Bridge++'' for lattice simulations aiming at extensible, readable, and portable workbench, while keeping a high performance at the same time. Bridge++ covers most conventionally used lattice actions and numerical algorithms, and works on massively parallel machines with or without arithmetic accelerators such as GPGPUs. In this talk, we explain our strategy and the basic design of Bridge++, as well as our current status of this project, including results of the sustained performance on several systems.

Back to Programme    Back to Participants    Back to Contributions

Overview of Columbia Physics System(CPS)
Chulwoo Jung
Fri, 15:20, Seminar Room G -- Parallels 9G (Slides)

I will give an overview of Columbia Physics System (CPS), a C++ based code suite developed for LatticeQCD mainly by members of Columbia University, Brookhaven National Laboratory and Edinburgh University. CPS has been and is being used extensively for ensemble generation and measurements done by RBC and UKQCD collaborations.

Back to Programme    Back to Participants    Back to Contributions

Experiences with Lattice QCD on the Juelich BG/Q
Stefan Krieg, Kalman Szabo
Fri, 15:40, Seminar Room G -- Parallels 9G (Slides)

The implementation and tuning of the Wuppertal software suite dynqcd'' on BG/Q is discussed and performance results are shown.

Back to Programme    Back to Participants    Back to Contributions

Experiences with OpenMP in tmLQCD
Bartosz Kostrzewa, Albert Deuzeman, Carsten Urbach
Fri, 16:30, Seminar Room G -- Parallels 10G (Slides)

In this contribution, the introduction of OpenMP into a lattice QCD code is exemplified on the basis of the tmLQCD software suite by the European Twisted Mass Collaboration (ETMC). Using specific examples from a number of routines, one possible approach for the addition of multi-threading through OpenMP is constructed and the perceived benefits of this particular method are clarified. As a particular focus area, problems of data concurrency, race conditions and subtle probabilistic program errors are analyzed and possible approaches for their mitigation are discussed. Finally, a short overview of overheads on different architectures is given and possible improvements of the approach are presented retrospectively.

Back to Programme    Back to Participants    Back to Contributions

The QUDA library for QCD on CUDA
M Clark
Fri, 16:50, Seminar Room G -- Parallels 10G (Slides)

The exponential growth of floating point power in GPUs, combined with high memory bandwidth, has given rise to an attractive platform upon which to deploy HPC applications. We review the QUDA library which is a domain-specific library designed to accelerate legacy lattice quantum chromodynamics application through providing a rich library of the common performance-critical algorithms, including highly optimized sparse linear solvers.

Back to Programme    Back to Participants    Back to Contributions

Mobius domain wall fermion method on QUDA
Hyung-Jin Kim, Taku Izubuchi, Chulwoo Jung, Eigo Shintani
Fri, 17:10, Seminar Room G -- Parallels 10G (Slides)

Mobius Domain Wall Fermion(DWF) method is an extended form of Shamir's domain wall fermion action, which provides the same overlap action correspondence in the limit of $$L_s\rightarrow\infty$$ without increasing the numerical cost. Obviously, Mobius DWF has an advantage in smaller size of chiral violation effect coming from residual mass compared with Shamir's DWF. At $$\alpha = O(L_s)$$, $$O(1/L_{s})$$ of $$M_{res}$$ error in Shamir's DWF can be reduced by $$O(1/L_{s}^2)$$ in Mobius DWF method. This chiral enhancement on Mobius operator enables us to use the smaller 5th dimensional size of lattice data without scarifying the precision. Furthermore, smaller size of lattice data is very helpful to compute the DWF algorithm on GPU environment. Recently, GPU has been successfully used in lattice QCD applications. However, limited size of GPU memory makes the DWF computation especially difficult. To solve this problem, we have implemented Mobius DWF method based on the QUDA library. Optimization is still in progress. We will show preliminary hadron vacuum polarization data which is measured with Mobius DWF method in QUDA.

Back to Programme    Back to Participants    Back to Contributions

Implementation of the twisted mass fermion operator in QUDA library
Alexei Strelchenko, Constantia Alexandrou, Giannis Koutsou, Alejandro Vaquero
Fri, 17:30, Seminar Room G -- Parallels 10G (Slides)

In this report we present results of implementation of twisted mass fermion operator within the QUDA framework, an open source library for performing calculations in lattice QCD on Graphics Processing Units (GPUs) using NVIDIA's CUDA platform. Performance analysis is provided for both degenerate and non-degenerate cases.

Back to Programme    Back to Participants    Back to Contributions

A QUDA-branch to compute disconnected diagrams in GPUs
Alejandro Vaquero, Constantia Alexandrou, Kyriacos Hadjiyiannakou, Giannis Koutsou, Alexei Strelchenko
Fri, 17:50, Seminar Room G -- Parallels 10G (Slides)

Although QUDA allows for an efficient computation of many QCD quantities, it is surprinsingly lacking tools to evaluate disconnected diagrams, for which GPUs are specially well suited. We aim to fill this gap by creating our own branch of QUDA, which includes new kernels and functions required to calculate fermion loops using several methods and fermionic regularizations.

Back to Programme    Back to Participants    Back to Contributions

Code development (not only) for NSPT
Michele Brambilla, Dirk Hesse, Francesco Di Renzo
Fri, 18:10, Seminar Room G -- Parallels 10G (Slides)

In recent years NSPT was proven capable of providing high order perturbative results more easily than traditional approaches. The technique is based on the numerical integration of the equations of stochastic quantization; one thus gets perturbative results in a way which is alternative to standard Feynman diagrams. One actually needs to solve a hierarchy of stochastic differential equations, the solution being looked for as a perturbative expansion. The key point for an efficient solution is a framework in which everything is computed order by order. We are currently developing "parmalgt", a general C++ framework, mainly intended for NSPT computations. Programs are highly templated and c++11 compliant in order to achieve good performances without loss of readability. Multithreading (via OpenMP) and MPI parallelization (the latter at the moment at a preliminary stage) are hidden to the user. Some results obtained with the current implementation of the code are presented in the talks by D. Hesse and M. Dalla Brida. The Parma group is also involved in other code oriented activities. In particular, some efforts are devoted to the general framework of multithread+multiprocess parallelization. An approach to hide inter-process communications in order to improve performances will be briefly discussed.

Back to Programme    Back to Participants    Back to Contributions