Algorithms and Machines

UV suppression by smearing and screening correlators
Nikhil Karthik, Sourendu Gupta
Mon, 14:00, Seminar Room D -- Parallels 1D (Slides)

We investigate the mechanism of smearing in the APE, Stout, HYP and HEX schemes through their effect on glue and quark Fourier modes. Using this, we non-perturbatively tune the smearing parameters to their optimum values. Smearing causes a super-linear improvement in taste symmetry breaking in the high temperature phase of QCD. We use optimal smearing in the high temperature phase and find close agreement of meson screening masses with weak coupling predictions.

Back to Programme Back to Participants Back to Contributions

Testing reweighting method for truncated Overlap fermions
Ken-Ichi Ishikawa
Mon, 14:20, Seminar Room D -- Parallels 1D (Slides)

It is a hard task to maintain the lattice chiral symmetry during the HMC algorithm. One possibility to reduce the total computational cost is to relax the requirement of the chiral symmetry and to use the reweighing method recovering the symmetry at the measurement phase. The HMC algorithm with the truncated overlap fermion with approximate lattice chiral symmetry has been proposed by Borici in terms of domain-wall type fermions. The reweighing factor is the determinant ratio between the truncated and exact overlap operators and is estimated by noise method. We implement the truncated overlap fermion in terms of the domain-wall fermions and test the behavior of the reweighing factor against the truncation level (fifth dimensional extent) on a set of small lattices.

Back to Programme Back to Participants Back to Contributions

Scaling, topological tunneling and actions for weak coupling DWF calculations
Greg McGlynn, Robert Mawhinney
Mon, 14:40, Seminar Room D -- Parallels 1D (Slides)

We present results from a 2+1 flavor DWF calculation at 1/a = 3 GeV and discuss strategies for similar calculations at finer lattice spacings which will target charm physics. At weak coupling the autocorrelation time of the global topological charge becomes very long because the HMC algorithm has trouble moving between topological sectors. We report the results of simulations that test several ideas for reducing the autocorrelation time of topological charge. In weak coupling quenched simulations we find that the open boundary conditions suggested by Lüscher and Schaefer do not improve topological autocorrelation times. We present preliminary results from simulations using a ``dislocation-enhancing determinant ratio'' to improve topological tunneling.

Back to Programme Back to Participants Back to Contributions

Adaptive Aggregation Based Domain Decomposition Multigrid for the Lattice Wilson Dirac Operator
Matthias Rottmann, Andreas Frommer, Karsten Kahl, Stefan Krieg, Björn Leder
Mon, 15:00, Seminar Room D -- Parallels 1D (Slides)

In this talk, we present a multigrid approach for the inversion of the lattice Wilson Dirac operator. It combines components that have already been used separately in lattice QCD, namely the domain decomposition method ``Schwarz Alternating Procedure'' as a smoother, also known from the ``Inexact Deflation'' method and the \(\gamma_5\)-preserving aggregation based interpolation, introduced by the Boston and Boulder group. We will point out the major differences to the existing hierarchical approaches and we will show numerical results from our MPI-C Code. Aspects from the recently published numbers in [arXiv:1303.1377] for the two-grid approach will be picked up but also new three-grid results will be shown.

Back to Programme Back to Participants Back to Contributions

HDCG: Hierarchically Deflated Conjugate Gradient algorithm for 5d Chiral Fermions
Peter Boyle
Mon, 15:20, Seminar Room D -- Parallels 1D (Slides)

I present an algorithm for 5d Chiral Fermions that, after a modest subspace generation phase, accelerates the inversion of the Hermitian red-black preconditioned Hermitian operator (normal equations) for general Mobius fermions. The approach is an extension of the inexact deflation approach to this system of equations, but also bears some similarity to algebraic multigrid approaches. The little Dirac operator is expensive due to the next-to-next-to-next-to-nearest neighbour stencil. I find that preconditioned CG is remarkably stable against preconditioner noise and that flexible algorithms are not needed. By using the little Dirac operator solely as a preconditioner and adopting a three level deflation, the overhead is suppressed by two orders of magnitude compared to the inexact deflation approach. On small (\(16^3\)) volumes the approach is similar in performance to EigCG. On large (\(48^3\)) volumes and at the physical quark masses the approach is between 2x and 5x more effective then EigCG with much reduced set up cost, and 10x more effective than CGNR.

Back to Programme Back to Participants Back to Contributions

Optimization of the Oktay-Kronfeld Action Conjugate Gradient Inverter
Yong-Chull Jang, Jon Bailey, Carleton DeTar, Andreas Kronfeld, Weonjong Lee, Bugra Oktay
Mon, 15:40, Seminar Room D -- Parallels 1D (Slides)

Carrying out the Fermilab improvement program to third order in heavy-quark effective theory yields the Oktay-Kronfeld (OK) action, a promising candidate for precise calculations of the spectra of heavy quark systems and weak matrix elements of heavy-light systems. We have optimized the OK-action conjugate gradient inverter in the SciDAC QOP/QDP library and are developing a GPU code. The OK action is rewritten and the needed gauge-link combinations are precalculated. This procedure accelerates the conjugate gradient by more than a factor of five. The remaining floating-point operations are simple matrix multiplications between gauge links and fermion vectors, which we accelerate with CUDA. We present preliminary results for the spectra confirming expected decreases of heavy-quark discretization errors.

Back to Programme Back to Participants Back to Contributions

One flavor mass reweighting: foundations
Björn Leder, Jacob Finkenrath, Francesco Knechtli
Mon, 16:30, Seminar Room D -- Parallels 2D (Slides)

Reweighting is not a new method in lattice QCD, but a comprehensive analysis is missing in the literature. We close this gap by presenting: (i) a proof of an integral representation of the complex determinant of a complex matrix, (ii) a method to control the stochastic error of its Monte Carlo estimation, (iii) expansions of the stochastic error and the ensemble fluctuations of the one flavor reweighting factor. Based on (iii) we present a detailed scaling analysis and optimized reweighting strategies.

Back to Programme Back to Participants Back to Contributions

Towards Simulations of 1+1 Flavor QCD
Jacob Finkenrath, Francesco Knechtli, Björn Leder
Mon, 16:50, Seminar Room D -- Parallels 2D (Slides)

Today's simulations in lattice quantum chromodynamics get closer and closer to the physical point by simultaneously minimizing the statistical errors. Small effects like isospin symmetry--breaking start becoming significant. Incorporating such effects into the Boltzmann factor by reweighting introduces fluctuations which increase with the volume. Correlations between parts of the lattice actions can be utilized to reduce this fluctuations significantly. We combine and compare approaches consisting of reweighting and a modified sampling for generating configurations. Employing these methods we can estimate non-perturbatively the effect of the sea quarks on isospin splitting.

Back to Programme Back to Participants Back to Contributions

Exact Pseudofermion Action for Hybrid Monte Carlo Simulation of One-Flavor Domain-Wall Fermion
Yu-Chih Chen, Wen-Ping Chen, Ting-Wai Chiu, Han-Yi Chou, Tung-Han Hsieh
Mon, 17:10, Seminar Room D -- Parallels 2D (Slides)

We present a novel pseudofermion action for hybrid Monte Carlo simulation of one-flavor domain-wall fermion (DWF) in lattice QCD. This pseudofermion action is exact, without taking square-root, unlike the widely-used rational hybrid Monte-Carlo algorithm (RHMC). We compare the performance of one-flavor algorithm (OFA) with RHMC, and find that OFA outperforms RHMC, no matter in terms of the efficiency or the memory consumption. Using our one-flavor and the two-flavor algorithms, we perform HMC simulations of \( 2+1+1 \) flavors lattice QCD with optimal domain-wall fermion. We outline our recent simulations on the \(32^3 \times 64 \times 16 \) lattice, using multiGPUs composed of Nvidia GTX-TITAN.

Back to Programme Back to Participants Back to Contributions

Simulating the Random Surface representation of Abelian Gauge Theories
Tomasz Korzec, Ulli Wolff
Mon, 17:30, Seminar Room D -- Parallels 2D (Slides)

We present a Monte-Carlo algorithm for the simulation of the all-order strong coupling expansion of the Z2 gauge theory. This random surface ensemble is completely equivalent to the standard formulation, but allows to measure some quantities, like Polyakov loop correlators or excess free energies, with an accuracy that could not have been easily achieved with traditional simulation methods. One interesting application of the algorithm is the comparison of the D=3 model with predictions from effective string theories, for which we refer to the following talk by Ulli Wolff.

Back to Programme Back to Participants Back to Contributions

Simulated random surfaces and effective string models in 3d Z(2) gauge theory
Ulli Wolff, Tomasz Korzec
Mon, 17:50, Seminar Room D -- Parallels 2D (Slides)

We apply an all-order strong coupling simulation algorithm presented in the previous talk by Tomasz Korzec to study the three dimensional Z(2) gauge theory. The Polyakov line correlation has constant and large signal to noise ratio for arbitrary separations at low temperature. Thus we can precisely estimate ground state energies of flux states which are related to the string tension and compare with effective string model predictions.

Back to Programme Back to Participants Back to Contributions

Applicability of Quasi-Monte Carlo for lattice systems
Andreas Ammon, Karl Jansen, Hernan Leovey, Andreas Griewank, Michael Müller-Preuÿker
Mon, 18:10, Seminar Room D -- Parallels 2D (Slides)

This project investigates the applicability of Quasi-Monte Carlo methods to Euclidean lattice systems in order to improve the asymptotic error behavior of observables for such theories. The error of an observable calculated by averaging over random observations generated from ordinary Monte Carlo simulations behaves like \(N^{-1/2}\), where \(N\) is the number of observations. By means of Quasi-Monte Carlo methods it is possible to improve this behavior for certain problems to \(N^{-1}\), or even further if the problems are regular enough. We adapted and applied this approach to simple systems like the quantum harmonic and anharmonic oscillator and verified an improved error scaling.

Back to Programme Back to Participants Back to Contributions

2D and 3D Antiferromagnetic Ising Model with ``topological''` term at \(\theta=\pi\).
Gennaro Cortese, Vicente Azcoiti, Eduardo Follana, Matteo Giordano
Mon, 18:30, Seminar Room D -- Parallels 2D (Slides)

In this work we study the Antiferromagnetic Ising model with an imaginary magnetic field \(i\theta\) at \(\theta=\pi\) in two and three dimensions. For this purpose we develop a new algorithm, not affected by the sign problem, that allows us to perform numerical simulations.

Back to Programme Back to Participants Back to Contributions

QCL: OpenCL meta programming for lattice QCD
Massimo Di Pierro
Mon, 18:50, Seminar Room D -- Parallels 2D (Slides)

QCL is a Python application which generates and runs portable and efficient OpenCL code from a high level description formulation of the action. The action, both the gauge part and the fermionic part, is described in terms of relative paths on the lattice and its symmetry groups. The machine generated code runs on any platform supporting OpenCL including multiple CPUs and GPUs. The generated code is highly inlined to minimize function calls and designed to minimize the number of arithmetic operations. It supports arbitrary dimensions, actions, and gauges groups.

Back to Programme Back to Participants Back to Contributions

Lattice Simulations using OpenACC compilers
Pushan Majumdar
Tue, 16:20, Seminar Room G -- Parallels 4G (Slides)

OpenACC compilers allow one to use Graphics Processing Units without having to write explicit CUDA codes. Programs can be modified incrementally using OpenMP like directives which causes the compiler to generate CUDA kernels to be run on the GPUs. In this presentation we will look at the performance gain in lattice simulations using OpenACC compilers for both pure gauge as well as dynamical fermions.

Back to Programme Back to Participants Back to Contributions

Twisted-Mass Lattice QCD using OpenCL
Matthias Bach, Christopher Pinke, Owe Philipsen, Volker Lindenstruth
Tue, 16:40, Seminar Room G -- Parallels 4G (Slides)

Graphics Processing Units (GPUs) are by now an established tool for Lattice QCD applications. I present an update on our OpenCL based code for Lattice QCD with twisted-mass fermions. On current generation AMD GPUs we now reach 100 GFLOPS in double-precision Dslash and 70 GFLOPS in our double-precision inverter. For the hybrid Monte-Carlo (HMC) we improve energy-efficiency by a factor of four over a plain CPU system. We also found one 4-GPU node to provide about 12 times the throughput of a pure CPU system of comparable cost.

Back to Programme Back to Participants Back to Contributions

QDP-JIT: A QDP++ Implementation for CUDA-Enabled GPUs
Frank Winter
Tue, 17:00, Seminar Room G -- Parallels 4G (Slides)

QDP++ provides parallel data types and operations suitable for lattice gauge theory similar to high-level domain-specific languages. Heterogeneity with massively multi-core accelerators is becoming ubiquitous and offers tremendous computational power. However, current parallel programming models like the CUDA architecture expose many low-level programming details to the user opening a gap between high-level usability and low-level programmability. QDP-JIT leverages a novel approach to bridge this gap. While maintaining the full QDP++ API high-performance compute kernels are generated and launched on-the-fly. Low-level GPU programming details are completely concealed from the user.

Back to Programme Back to Participants Back to Contributions

Adaptive Multigrid Algorithms on GPUs
M Clark, Michael Cheng, Richard Brower
Tue, 17:20, Seminar Room G -- Parallels 4G (Slides)

Graphics Processing Units (GPUs) are an increasingly popular platform upon which to deploy LQCD calculations. While there has been much progress to date in developing solver algorithms to improve strong scaling on such platforms, there has been less focus on deploying mathematically optimal algorithms. A good example of this are hierarchical solver algorithms such as adaptive multigrid, which are known to solve the Dirac operator with optimal O(N) complexity. We describe progress to date in deploying adaptive multigrid solver algorithms to NVIDIA GPU architectures and discuss the suitability of heterogeneous architectures for hierarchical algorithms.

Back to Programme Back to Participants Back to Contributions

DWF Solvers and Clover for BGQ
Karthee Sivalingam, Peter Boyle
Poster Session

Solving QCD in lattice usually involves hundreds of thousands of inversions in a serialy dependent importance sampling of QCD path integral. The inverter performance is critical for any good simulation performance. The inverter of this sparse matrix involves using a iterative solver that involves repeated application of the operator. This work describes porting and optimisation of Clover inverter to Blue gene/Q architecture using BAGEL compiler. Also different iterative solvers for DWF are discussed and compared.

Back to Programme Back to Participants Back to Contributions

Performance of Kepler GTX Titan GPUs and Xeon Phi system
Hwancheol Jeong, Kwang-jong Choi, Joo Hwan Kim, Joungjin Lee, Weonjong Lee, Young Woo Lee, Jeonghwan Pak, Sang-Hyun Park, Jun-sik Yoo
Poster Session

NVIDIA's GTX Titan of Kepler GPUs provides a high performance-to-price ratio for computing. Although it is a Geforce model, GTX Titan gives as high performance in double precision floating point calculation as the most recent Tesla K20X. Also, it offers a high memory bandwidth as well as additional cache. Along with hardware improvement, new CUDA programming technologies such as Direct Parallelism and GPU Direct communication are introduced. We analyze the performance of GTX Titan and these CUDA technologies. We also compare GTX titan GPUs with Xeon Phi coprocessor.

Back to Programme Back to Participants Back to Contributions

Getting Covariantly Smeared Sources into Better Shape
Georg von Hippel, Benjamin Jäger, Thomas Rae, Hartmut Wittig
Poster Session

The use of covariantly smeared sources in hadronic correlators is a common method of improving the projection onto the ground state. Studying the dependence of the shape of such sources on the gauge field background, we find that localized fluxes of magnetic field can strongly distort them, which results in a reduction of the smearing radii that can be reached by iterative smearing prescriptions, in particular as the continuum limit is approached. As a remedy, we propose a novel covariant smearing procedure ("free-form smearing") enabling the creation of arbitrarily shaped sources, including in particular Gaussians of arbitrary radius, as well as shapes with nodes, such as hydrogenic wavefunctions.

Back to Programme Back to Participants Back to Contributions

The openQCD code
Stefan Schaefer
Poster Session

OpenQCD is a code for QCD simulations with improved Wilson fermions. Its main features are the open boundary conditions in time, which solve the problem of topology freezing as the continuum limit is approached, a locally deflated SAP-GCR solver, which is very efficient for small quark masses, and twisted-mass reweighting, which stabilizes the simulations. Any number of quark flavors can be simulated, with single flavors implemented by the RHMC and Hasenbusch twisted-mass splitting for the degenerate flavors.

Back to Programme Back to Participants Back to Contributions

An implementation of hybrid parallel C++ code for the four-point correlation function of various baryon-baryon systems
Hidekatsu Nemura
Poster Session

We present our recent effort to develop the computational code written in C++ to calculate the four-point correlation functions of various baryon-baryon (BB) systems which is a primary quantity to study the nuclear force and hyperonic nuclear forces from lattice QCD. For the recent few years, the 2+1 flavor lattice QCD calculations have been widely performed. The flavor symmetry breaking effects would be a central issue so that a lot of BB channels have to be calculated. The situation is contrast to the study of flavor symmetric BB interactions where each channel is classified into only six kinds of flavor irreducible representation. This work is also aimed at the large volume calculation of the lattice QCD for the hyperonic nuclear forces which is performed at more closer point to the physical quark mass. A hybrid parallel code is implemented by utilizing the MPI and OpenMP together with the porting it to Bridge++ which is a recently developed new C++ code set for lattice QCD calculation. The present code now works on BlueGene/Q and shows better performance at a hybrid parallel execution rather than the flat MPI. In this contribution, we will discuss how the computational time is reduced for various BB channels by a diagramatic classification.

Back to Programme Back to Participants Back to Contributions

The new "Gauge Connection" at NERSC
Massimo Di Pierro, James Simone, James Hetrick, Carleton DeTar, Shreyas Cholia
Poster Session

We present a new and improved version of the "Gauge Connection", the web interface to the repository of lattice ensembles hosted at NERSC. The goal of the new version is to host lattice QCD ensembles as well as to allow users to search in one place for both local (NERSC-hosted) ensembles and remote (ILDG-hosted) ensembles. The system creates a local database image of remote ensembles from information obtained via ILDG web services, then uses the metadata to create searchable tags. Ensembles are searchable by name, location, and tags. Each ensemble is also associated to a wiki page which can be edited by users to document the ensemble. The system monitors and logs user activity for statistical purposes. The local files are stored on the NERSC HPSS tape storage system and can be downloaded using a provided download script, which can also convert file formats. We are currently implementing a mechanism to download files via Globus Online, a web-based interface for scheduling transfers across GridFTP sites. This will be added to the site in the near future.

Back to Programme Back to Participants Back to Contributions

JLQCD IroIro++ lattice code on BG/Q
Guido Cossu, Shoji Hashimoto, Takashi Kaneko, Junichi Noaki, Peter Boyle, Hidenori Fukaya
Poster Session

We will present our experience on the multipurpose C++ code IroIro++ designed for JLQCD to run on the BG/Q installation at KEK. We will discuss details on the code design, manageability and performance improvements specifics for the IBM Blue Gene Q.

Back to Programme Back to Participants Back to Contributions