Arm Performance Reports

ARM Performance Reports is a low overhead tool that provides a high-level overview of your application’s performance including computation, communication, and I/O. This tool provides a one-page text or HTML summary about the application's performance.

To use this tool, all you need to do is prefix your execution command with perf-report. Please be advised that you do not need to recompile your code with any additional flags for generating a performance report. Instead, use the flags which you believe gives the best performance for your code.

You can use the following command for this purpose.

#For a single core Job.
$ perf-report <executable>

#For a MPI job.
$ perf-report mpirun -n 4 <executable>

You can also submit a job to scheduler to automatically generate the performance report for you. You can use the following example script (for an Hybrid MPI + OpenMP job) to generate the performance report for your code.

#!/bin/bash --login

#SBATCH -J Your_Job_Name
#SBATCH -o Your_Object_File_Name.o%j
#SBATCH -e Your_Error_File_Name.e%j
#SBATCH -p Partition_on_which_to_run_code
##SBATCH -A Account
#SBATCH --nodes=8
#SBATCH --ntasks=16
#SBATCH --ntasks-per-node=2
#SBATCH --exclusive
#SBATCH -t HH:MM:SS

export nodecnt=$SLURM_JOB_NUM_NODES
export corecnt=`expr ${SLURM_CPUS_ON_NODE} \* ${nodecnt}`
export mpicnt=$SLURM_NTASKS
export threadspermpi=`expr ${SLURM_CPUS_ON_NODE} \/ ${SLURM_NTASKS_PER_NODE}`
export threadcnt=`expr ${mpicnt} \* ${threadspermpi}`
export OMP_NUM_THREADS=$threadspermpi
export OMP_PLACES=cores

if [ $threadcnt -ne $corecnt ]
then
  echo "Error, mismatch between requested and available hardware!"
  exit -1
fi

#Dial3
module purge
module load intel-parallel-studio/cluster.2019.5
module load arm/forge/21.0.2

#This will prevent Arm forge from closing if it cannot find license within a specified limit. 
export ALLINEA_NO_TIMEOUT=1

export EXE_DIR=Your_executable_directory_path

perf-report  --processes=$SLURM_NTASKS --procs-per-node $SLURM_NTASKS_PER_NODE --mpi=intel-mpi $EXE_DIR/Your_executable

This will generate a .html file and .txt file which will show the various details about your code such as total run time, time consumed in MPI, whether code is compute bound or I/O bound etc. You can open the .html file in any browser. A sample image of performance report is shown below for your reference.