Allow for compiler+accelerator specific MPI overrides#231
Conversation
|
Example of the output # Set things up
ocaisa@~/EESSI/software-layer-scripts(additional_rpath_fallbacks)$ export EESSI_ACCELERATOR_TARGET_OVERRIDE=accel/nvidia/cc86
ocaisa@~/EESSI/software-layer-scripts(additional_rpath_fallbacks)$ module load EESSI/2025.06
Module for EESSI/2025.06 loaded successfully
{EESSI/2025.06} ocaisa@~/EESSI/software-layer-scripts(additional_rpath_fallbacks)$ echo $MODULEPATH
/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc80/modules/all:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/aarch64/neoverse_n1/modules/all:/cvmfs/software.eessi.io/versions/2025.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc80/modules/all:/cvmfs/software.eessi.io/versions/2025.06/software/linux/aarch64/neoverse_n1/modules/all:/cvmfs/software.eessi.io/init/modules
{EESSI/2025.06} ocaisa@~/EESSI/software-layer-scripts(additional_rpath_fallbacks)$ module load EESSI-extend
-- Using /tmp/$USER as a temporary working directory for installations, you can override this by setting the environment variable WORKING_DIR and reloading the module (e.g., /dev/shm is a common option)
Configuring for use of EESSI_USER_INSTALL under /home/ocaisa/eessi
-- To create installations for EESSI, you _must_ have write permissions to /home/ocaisa/eessi/versions/2025.06/software/linux/aarch64/neoverse_n1
-- You may wish to configure a sources directory for EasyBuild (for example, via setting the environment variable EASYBUILD_SOURCEPATH) to allow you to reuse existing sources for packages.
# Pretend to want to do a build
{EESSI/2025.06} ocaisa@~/EESSI/software-layer-scripts(additional_rpath_fallbacks)$ eb OSU-Micro-Benchmarks-7.5.1-gompi-2025b-CUDA-12.9.1.eb --stop prepare --rebuild --hooks=./eb_hooks.py
== Temporary log file in case of crash /tmp/eb-uflhewm6/easybuild-0ha7tv9j.log
== found valid index for /cvmfs/software.eessi.io/versions/2025.06/software/linux/aarch64/neoverse_n1/software/EasyBuild/5.3.0/easybuild/easyconfigs, so using it...
== Running parse hook for OSU-Micro-Benchmarks-7.5.1-gompi-2025b-CUDA-12.9.1.eb...
== found valid index for /cvmfs/software.eessi.io/versions/2025.06/software/linux/aarch64/neoverse_n1/software/EasyBuild/5.3.0/easybuild/easyconfigs, so using it...
== Running parse hook for gompi-2025b.eb...
...
== Running parse hook for lfbf-2025b.eb...
== processing EasyBuild easyconfig
/cvmfs/software.eessi.io/versions/2025.06/software/linux/aarch64/neoverse_n1/software/EasyBuild/5.3.0/easybuild/easyconfigs/o/OSU-Micro-Benchmarks/OSU-Micro-Benchmarks-7.5.1-gompi-2025b-CUDA-12.9.1.eb
== building and installing OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1...
>> installation prefix: /home/ocaisa/eessi/versions/2025.06/software/linux/aarch64/neoverse_n1/software/OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1
== fetching files and verifying checksums...
== Running pre-fetch hook...
>> sources:
>> /tmp/ocaisa/easybuild/sources/o/OSU-Micro-Benchmarks/osu-micro-benchmarks-7.5.1.tar.gz [SHA256: 160d0d5e3c3cb022520ecb247e9875bb0973b1d3cadccd6c17624f8407c52e22]
== ... (took < 1 sec)
== creating build dir, resetting environment...
>> build dir: /tmp/ocaisa/easybuild/build/OSUMicroBenchmarks/7.5.1/gompi-2025b-CUDA-12.9.1
== Running post-ready hook...
WARNING: Deprecated functionality, will no longer work in EasyBuild v6.0: Easyconfig parameter 'parallel' is deprecated, use 'max_parallel' or the parallel property instead.; see
https://docs.easybuild.io/deprecated-functionality/ for more information
== ... (took < 1 sec)
== unpacking...
>> running shell command:
tar xzf /tmp/ocaisa/easybuild/sources/o/OSU-Micro-Benchmarks/osu-micro-benchmarks-7.5.1.tar.gz
[started at: 2026-05-14 16:03:37]
[working dir: /tmp/ocaisa/easybuild/build/OSUMicroBenchmarks/7.5.1/gompi-2025b-CUDA-12.9.1]
[output and state saved to /tmp/eb-uflhewm6/run-shell-cmd-output/tar-gfx7xw93]
>> command completed: exit 0, ran in < 1s
== ... (took < 1 sec)
== patching...
== ... (took < 1 sec)
== preparing...
== Running pre-prepare hook...
== Updated rpath_override_dirs (to allow overriding MPI family OpenMPI):
/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/aarch64/neoverse_n1/rpath_overrides/OpenMPI/system-CUDA-12.9.1/lib:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/aarch64/neover
se_n1/rpath_overrides/OpenMPI/system-CUDA-12.9.1/lib64:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/aarch64/neoverse_n1/rpath_overrides/OpenMPI/system/lib:/cvmfs/software.eessi.io/host_injec
tions/2025.06/software/linux/aarch64/neoverse_n1/rpath_overrides/OpenMPI/system/lib64
>> loading toolchain module: gompi/2025b
== ... (took < 1 sec)
... |
|
Increased the complexity a bit but it might be necessary: |
|
No feedback to date, I'm waiting on someone to review it |
|
@TopRichard can you review it? |
|
Discussed in support meeting: @TopRichard said he already tested this for CUDA and that it works. We agreed he'll add a review here, including the steps taken by him to test it. I can then try to mimic that for ROCm and validate that it also works there. |
|
I have tested this locally, Integrating the changes introduced in the PR into |
|
For clarity: this is currently blocked by #228 for the ROCm side of things. |
| if dep[0] in top_level_accelerator_packages: | ||
| # Store the dependency as a property for later potential use | ||
| # (e.g., accelerator-specific MPI RPATH overrides) | ||
| ec.eessi_gpu_dependency = dep |
There was a problem hiding this comment.
Just as a reminder: this will need to be done for the ROCm side of things after #228 gets merged as well (you'll need to merge main into this feature branch, resolve any potential conflicts because they both touch this same part of the code, then add some ec.eessi_gpu_dependency = ... to the ROCm side of things).
|
#228 is merged. |
casparvl
left a comment
There was a problem hiding this comment.
I'm getting:
== Updated rpath_override_dirs (to allow overriding MPI family OpenMPI):
/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system-ROCm-ROCM-6.4.1/lib:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/
zen2/rpath_overrides/OpenMPI/system-ROCm-ROCM-6.4.1/lib64:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system-ROCm/lib:/cvmfs/software.eessi.io/hos
t_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system-ROCm/lib64:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system/l
ib:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system/lib64
I have the feeling that the system-ROCm-ROCM-6.4.1 should really be system-GCC-ROCM-6.4.1.
Other than that, it seems that the most specific path (i.e. including the ROCm version) comes first, which is good.
For CUDA, I do see:
/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system-GCC-CUDA-12.9.1/lib:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/
zen2/rpath_overrides/OpenMPI/system-GCC-CUDA-12.9.1/lib64:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system-GCC/lib:/cvmfs/software.eessi.io/host
_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system-GCC/lib64:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system/lib
:/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen2/rpath_overrides/OpenMPI/system/lib64
Which has the system-GCC-CUDA-<cudaver> as expected.
|
Ah, maybe this does make sense, becausse
|
|
I guess what I'm wondering about is... what should this string refer to? I guess But what does |
Alternative to #230 where we focus only on the potential need for CUDA/ROCm variants.
This also opens the door to other types of variants (but the options here would be multiplicative so I haven't included that until we hit a need for it).