Clone the directory

git clone https://github.com/claudiopica/HiRep

Make sure the build command Make/nj and ninja are in your PATH.

Adjust compilation options

Adjust the file Make/MkFlags to set the desired options. The option file can be generated by using the Make/write_mkflags.pl tools. Use

write_mkflags.pl -h

for a list of available options. The most important ones include:

Number of colors (NG)

NG=3

Gauge group SU(NG) or SO(NG)

GAUGE_GROUP = GAUGE_SUN

#GAUGE_GROUP = GAUGE_SON

Representation of fermion fields

REPR = REPR_FUNDAMENTAL
#REPR = REPR_SYMMETRIC
#REPR = REPR_ANTISYMMETRIC
#REPR = REPR_ADJOINT

Lattice boundary conditions

Comment out the line here when you want to establish certain boundary conditions in the respective direction.

Available options are

BC_<DIR>PERIODIC, for periodic boundary conditions
BC<DIR>ANTIPERIODIC, for antiperiodic boundary conditions
BC<DIR>THETA associates a twisting angle to the fermionic field in the specified direction <DIR>. The concrete angle has to be specified in the input file.
BC<DIR>_OPEN, for open boundary conditions. Open boundary conditions can only be set in the T direction.

Example for antiperiodic boundary conditions in the time direction and periodic boundary conditions in the spatial dimensions.

MACRO += BC_T_ANTIPERIODIC
MACRO += BC_X_PERIODIC
MACRO += BC_Y_PERIODIC
MACRO += BC_Z_PERIODIC

Parallelization

You can select a number of features via the MACRO variable. The most important ones are:

Specify whether you want to compile with MPI by using

MACRO += WITH_MPI

For compilation with GPU acceleration for CUDA GPUs enable GPU use and use the new geometry. If you try to compile with GPUs but forget to set the new geometry, the compilation will fail.

MACRO += WITH_GPU

MACRO += WITH_NEW_GEOMETRY

If you want to compile your code for AMD GPUs, additionally add the flag

MACRO += WITH_GPU
MACRO += WITH_NEW_GEOMETRY
MACRO += HIP

Other standard options

MACRO += UPDATE_EO

enables even-odd preconditioning, so you never want to disable it.

MACRO += NDEBUG

suppresses debug output. If you delete this option, HiRep will print a lot more unnecessary output.

MACRO += CHECK_SPINOR_MATCHING

This performs a check on the geometries of the spinors and is essential for debugging. In general, leaving it as a safety check does not hurt, but if you simulate with very small local lattices, you may want to disable it and check whether there is a performance improvement.

MACRO += IO_FLUSH

Prints to file immediately.

Advanced compilation settings

Blocking and non-blocking communications

The setting

MACRO += COMMS_NONBLOCKING

switches from sequential blocking communications to immediately returning calls. For multi-GPU jobs on unusual node and network topologies, blocking communications perform better, avoiding too many requests from piling up. However, for large jobs on a supercomputer, non-blocking communications are substantially faster. While the default option is to use blocking communications, performance tuning is possible by adding this flag.

New and old geometry

While the new geometry is the only supported option for GPUs, the old geometry minimizes the number of copies necessary for the send buffer synchronization, so you may want to use the old geometry when compiling for CPUs. However, there is a dependence on the system, so it is worth testing to see which performs better in your production setting.

The old geometry is the default. When one wants to use the new geometry, compile with

MACRO += WITH_NEW_GEOMETRY

Large-N simulations

For GPU setups, there is a kernel improvement that scales better for large gauge groups. When simulating SU(NG) with NG>5 on NVIDIA GPUs, try to enable

MACRO += LARGE_N

This option is also useful for all gauge groups when using AMD GPUs because the kernel is optimized to minimize register pressure.

Hardware locality

For GPU setups, you can use hwloc to make sure that the CPUs used to manage the GPUs on a node are located in the same NUMA domain. For this compile with

MACRO += HWLOC

and dynamically link -lhwloc in the LDFLAGS.

Compiler options

To compile the code for your laptop, you only need to set the C compiler. For example

CC = gcc
CFLAGS = -Wall -O3
INCLUDE = 
LDFLAGS =

If you want support for parallelization, you need to include the MPI compiler wrapper

CC = gcc
MPICC = mpicc
CFLAGS = -Wall -O3
GPUFLAGS =
INCLUDE =
LDFLAGS =

Another example: To use the Intel compiler and Intel's MPI implementation, and no CUDA, one could use:

CC = icc
MPICC = mpiicc
LDFLAGS = -O3
INCLUDE =

With GPUs: you can set your choice of C, C++, MPI, and CUDA compiler and their options by using the variables:

CC = gcc
MPICC = mpicc
NVCC = nvcc
CXX = g++
LDFLAGS = -Wall -O3
GPUFLAGS = -arch=sm_80 
INCLUDE = 

For LUMI AMD GPUs, it seems to be favorable to use hipcc

ENV = MPICH_CC=hipcc
CC = gcc
MPICC = cc
CFLAGS = -Wall -O3
NVCC = mpicc
GPUFLAGS = -w --offload-arch=gfx90a 
INCLUDE =
LDFLAGS = --offload-arch=gfx90a

For more information on configuring the code for AMD GPUs, see the user guide on the GitHub pages.

Compile the code

From the root folder just type:

nj

(this is a tool in the Make/ folder: make sure it is in your path!) The above will compile the libhr.a library and all the available executables in the HiRep distribution, including executables for dynamical fermions hmc and pure gauge suN simulations and all the applicable tests. If you wish to compile only one of the executables, e.g., suN, just change to the corresponding directory, e.g., PureGauge, and execute the nj command from there.

All build artefacts, except the final executables, are located in the build folder at the root directory of the distribution.