Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU GEOS : Mismatched template definition usage in coreComponents/physicsSolvers #3496

Open
drmichaeltcvx opened this issue Dec 17, 2024 · 6 comments
Labels
type: bug Something isn't working type: new A new issue has been created and requires attention

Comments

@drmichaeltcvx
Copy link

Describe the bug
Building GEOS (latest "develop" and TPL "master") with CUDA, runs into conflicting definitions for certain templates.

To Reproduce
Steps to reproduce the behavior:

  1. Build TPL and then GEOS with GCC 13.2.0 and CUDA 11.8 or 12.5, or 12.6.
  2. Click on '....'
  3. Scroll down to '....'
  4. See error
    Try to provide minimal test cases where possible to help isolate the problem.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Platform (please complete the following information):

  • Machine AMD node CPU : AMD EPYC 7V12 and GPU: A100 A100-SXM4-80GB
  • Compiler: Host: gcc 13.2.0, GPU : CUDA 11.8 or 12.5, or 12.6
  • GEOS Version : v1.1.0 commit 93f0252

Additional context
Output from make all

[ 86%] Building CUDA object coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp.o
cd /dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.20-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DDIY_NO_THREADS -DFMT_HEADER_ONLY=1 -DH5_BUILT_AS_DYNAMIC_LIB -DMPICH_SKIP_MPICXX -DMPI_NO_CPPBIND -DOMPI_SKIP_MPICXX -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_MPICC_H -D_POSIX_C_SOURCE=200809L -Dkiss_fft_scalar=double -DphysicsSolvers_EXPORTS --options-file CMakeFiles/physicsSolvers.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations   -g -lineinfo    -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations  -ftz=true -O3 -DNDEBUG -Xcompiler -DNDEBUG -Xcompiler -Ofast -ftz=true   -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIC -Xcompiler=-fopenmp -Xcompiler=-pthread -Xcompiler -pthread -MD -MT coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp.o -MF CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp.o.d -x cu -rdc=true -c /dev/shm/mtml/src/GEOS/GEOS/src/coreComponents/physicsSolvers/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp -o CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp.o
[ 86%] Building CUDA object coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp.o
cd /dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.20-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DDIY_NO_THREADS -DFMT_HEADER_ONLY=1 -DH5_BUILT_AS_DYNAMIC_LIB -DMPICH_SKIP_MPICXX -DMPI_NO_CPPBIND -DOMPI_SKIP_MPICXX -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_MPICC_H -D_POSIX_C_SOURCE=200809L -Dkiss_fft_scalar=double -DphysicsSolvers_EXPORTS --options-file CMakeFiles/physicsSolvers.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations   -g -lineinfo    -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations  -ftz=true -O3 -DNDEBUG -Xcompiler -DNDEBUG -Xcompiler -Ofast -ftz=true   -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIC -Xcompiler=-fopenmp -Xcompiler=-pthread -Xcompiler -pthread -MD -MT coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp.o -MF CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp.o.d -x cu -rdc=true -c /dev/shm/mtml/src/GEOS/GEOS/src/coreComponents/physicsSolvers/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp -o CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp.o

[ 86%] Linking CUDA device code CMakeFiles/physicsSolvers.dir/cmake_device_link.o
cd /dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /data/saet/mtml/software/x86_64/cmake-3.28.3-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/physicsSolvers.dir/dlink.txt --verbose=1
/vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.20-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++   -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations   -g -lineinfo    -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations  -ftz=true -O3 -DNDEBUG -Xcompiler -DNDEBUG -Xcompiler -Ofast -ftz=true   "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fopenmp -Xcompiler=-L/vend/nvidia/cuda/v12.6/lib64 -Xlinker=-rpath -Xlinker=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.20-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ompi/lib -Xlinker=--enable-new-dtags -Xcompiler=-pthread -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink --options-file CMakeFiles/physicsSolvers.dir/deviceObjects1.rsp --options-file CMakeFiles/physicsSolvers.dir/deviceObjects2.rsp -o CMakeFiles/physicsSolvers.dir/cmake_device_link.o --options-file CMakeFiles/physicsSolvers.dir/deviceLinkLibs.rsp
nvlink error   : Size doesn't match for '_ZN4geos13finiteElement17FiniteElementBaseC1ERKS1_$824' in 'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DelftEgg-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o', first specified in 'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DruckerPragerExtended-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o' (target: sm_80)
nvlink fatal   : merge_elf failed (target: sm_80)
make[2]: *** [coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build.make:13103: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/cmake_device_link.o] Error 1
make[2]: Target 'coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build' not remade because of errors.
make[2]: Leaving directory '/dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make[1]: *** [CMakeFiles/Makefile2:8964: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/all] Error 2
make[1]: Target 'all' not remade because of errors.
make[1]: Leaving directory '/dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make: *** [Makefile:146: all] Error 2
[Make-build-2024-12-17-103544.log](https://github.com/user-attachments/files/18171272/Make-build-2024-12-17-103544.log)
[PoromechanicsKernels_CellElementSubRegion_PorousSolid-DruckerPragerExtended-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.txt](https://github.com/user-attachments/files/18171302/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DruckerPragerExtended-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.txt)
[ThermoPoromechanicsKernels_CellElementSubRegion_PorousSolid-DelftEgg-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.txt](https://github.com/user-attachments/files/18171303/ThermoPoromechanicsKernels_CellElementSubRegion_PorousSolid-DelftEgg-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.txt)
@drmichaeltcvx drmichaeltcvx added type: bug Something isn't working type: new A new issue has been created and requires attention labels Dec 17, 2024
@drmichaeltcvx
Copy link
Author

When we force set CALC_FEM_SHAPE_IN_KERNEL we get these messages during compilation :

[ 42%] Building CUDA object coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o
cd /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo/coreComponents/finiteElement/unitTests && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DFMT_HEADER_ONLY=1 -DGTEST_HAS_DEATH_TEST=1 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200809L -DtestFiniteElementBase_EXPORTS --options-file CMakeFiles/testFiniteElementBase.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -DGEOS_USE_DEVICE   -g -lineinfo  -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG   -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIE -Xcompiler=-fopenmp -Xcompiler=-pthread -MD -MT coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o -MF CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o.d -x cu -rdc=true -c /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp -o CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o
In file included from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/common/DataTypes.hpp:27,
                 from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/elementFormulations/FiniteElementBase.hpp:29,
                 from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp:17:
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo/include/common/GeosxConfig.hpp:199: warning: "GEOS_USE_DEVICE" redefined
  199 | #define GEOS_USE_DEVICE
      | 
<command-line>: note: this is the location of the previous definition
In file included from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/common/DataTypes.hpp:27,
                 from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/elementFormulations/FiniteElementBase.hpp:29,
                 from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp:17:
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo/include/common/GeosxConfig.hpp:199: warning: "GEOS_USE_DEVICE" redefined
  199 | #define GEOS_USE_DEVICE
      | 
<command-line>: note: this is the location of the previous definition
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: "parallelDevicePolicy" is ambiguous
    forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
            ^

/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: expected an expression
    forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
                                 ^

/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: expected an identifier
    forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
                                   ^

/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: expected a type specifier
    forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
                                      ^

/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: expected a type specifier
    forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
                                         ^

/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(134): warning #12-D: parsing restarts here after previous syntax error
    } );
      ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

5 errors detected in the compilation of "/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp".
make[2]: *** [coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/build.make:77: coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o] Error 2
make[2]: Target 'coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/build' not remade because of errors.
make[2]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo'
make[1]: *** [CMakeFiles/Makefile2:7925: coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/all] Error 2

@drmichaeltcvx
Copy link
Author

I used the branch 'testing/cusini/deactivate-some-kernels' Matteo prepared (#3516)

Unfortunately, the base class FiniteElementBase mismatch error now moved to another pair of derived classes:

/vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++   -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations  -g -lineinfo  -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG   "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fopenmp -Xcompiler=-L/vend/nvidia/cuda/v12.6/lib64 -Xlinker=-rpath -Xlinker=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/lib -Xlinker=--enable-new-dtags -Xcompiler=-pthread -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink --options-file CMakeFiles/physicsSolvers.dir/deviceObjects1.rsp --options-file CMakeFiles/physicsSolvers.dir/deviceObjects2.rsp -o CMakeFiles/physicsSolvers.dir/cmake_device_link.o --options-file CMakeFiles/physicsSolvers.dir/deviceLinkLibs.rsp
nvlink error   : Size doesn't match for '_ZN4geos13finiteElement17FiniteElementBaseC1ERKS1_$933' in 
'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/ThermoPoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o', first specified in
'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o' (target: sm_80)
nvlink fatal   : merge_elf failed (target: sm_80)

make[2]: *** [coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build.make:12168: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/cmake_device_link.o] Error 1
make[2]: Target 'coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build' not remade because of errors.
make[2]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make[1]: *** [CMakeFiles/Makefile2:8964: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/all] Error 2
make[1]: Target 'all' not remade because of errors.
make[1]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make: *** [Makefile:146: all] Error 2

@drmichaeltcvx
Copy link
Author

drmichaeltcvx commented Jan 21, 2025

Here are excerpts from the failing build process with testing/cusini/deactivate-some-kernels


[ 61%] Building CUDA object coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o
cd /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DDIY_NO_THREADS -DFMT_HEADER_ONLY=1 -DH5_BUILT_AS_DYNAMIC_LIB -DMPICH_SKIP_MPICXX -DMPI_NO_CPPBIND -DOMPI_SKIP_MPICXX -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_MPICC_H -D_POSIX_C_SOURCE=200809L -Dkiss_fft_scalar=double -DphysicsSolvers_EXPORTS --options-file CMakeFiles/physicsSolvers.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations  -g -lineinfo  -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG   -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIC -Xcompiler=-fopenmp -Xcompiler=-pthread -Xcompiler -pthread -MD -MT coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o -MF CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o.d -x cu -rdc=true -c /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp -o CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o

[ 63%] Building CUDA object coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o
cd /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DDIY_NO_THREADS -DFMT_HEADER_ONLY=1 -DH5_BUILT_AS_DYNAMIC_LIB -DMPICH_SKIP_MPICXX -DMPI_NO_CPPBIND -DOMPI_SKIP_MPICXX -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_MPICC_H -D_POSIX_C_SOURCE=200809L -Dkiss_fft_scalar=double -DphysicsSolvers_EXPORTS --options-file CMakeFiles/physicsSolvers.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations  -g -lineinfo  -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG   -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIC -Xcompiler=-fopenmp -Xcompiler=-pthread -Xcompiler -pthread -MD -MT coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o -MF CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o.d -x cu -rdc=true -c /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp -o CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o

[ 93%] Linking CUDA device code CMakeFiles/physicsSolvers.dir/cmake_device_link.o
cd /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /data/saet/mtml/software/x86_64/cmake-3.28.3-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/physicsSolvers.dir/dlink.txt --verbose=1

/vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++   -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations  -g -lineinfo  -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG   "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fopenmp -Xcompiler=-L/vend/nvidia/cuda/v12.6/lib64 -Xlinker=-rpath -Xlinker=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/lib -Xlinker=--enable-new-dtags -Xcompiler=-pthread -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink --options-file CMakeFiles/physicsSolvers.dir/deviceObjects1.rsp --options-file CMakeFiles/physicsSolvers.dir/deviceObjects2.rsp -o CMakeFiles/physicsSolvers.dir/cmake_device_link.o --options-file CMakeFiles/physicsSolvers.dir/deviceLinkLibs.rsp

nvlink error   : Size doesn't match for '_ZN4geos13finiteElement17FiniteElementBaseC1ERKS1_$933' in 
'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/ThermoPoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o', first specified in 
'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o' (target: sm_80)
nvlink fatal   : merge_elf failed (target: sm_80)

make[2]: *** [coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build.make:12168: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/cmake_device_link.o] Error 1
make[2]: Target 'coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build' not remade because of errors.
make[2]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make[1]: *** [CMakeFiles/Makefile2:8964: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/all] Error 2
make[1]: Target 'all' not remade because of errors.
make[1]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make: *** [Makefile:146: all] Error 2

Are the problems caused by some missed '__device__' definition between
GEOS\src\coreComponents\physicsSolvers\multiphysics\poromechanicsKernels\MultiphasePoromechanics_impl.hpp
and
GEOS\src\coreComponents\physicsSolvers\multiphysics\poromechanicsKernels\SinglePhasePoromechanics_impl.hpp
?

@rrsettgast
Copy link
Member

@drmichaeltcvx It is difficult for me to think that this would have any dependence on hardware (Intel vs AMD). I would think this has to be something in the software stack. The dockerfiles that generate our TPL environment for CI are here:
https://github.com/GEOS-DEV/thirdPartyLibs/tree/master/docker

You should look on dockerhub for a base image with closest match to your Linux distribution/version. I don't know if you have provided the linux distribution you are on. Once you have a suitable base image with a hopefully equivalent software stack, adding the image should involve copying one of the dockerfiles, replacing the base image, and modifying the packages to mimic your software stack.

here are some examples of base images:
https://hub.docker.com/_/ubuntu
https://hub.docker.com/r/rockylinux/rockylinux
https://hub.docker.com/_/fedora

@drmichaeltcvx
Copy link
Author

drmichaeltcvx commented Jan 22, 2025 via email

@drmichaeltcvx
Copy link
Author

I am providing here the configure and build logs for a failed GPU build. Let's go through these first to see if we can identify any useful information that could point to where the problem starts. TPL builds fine for GPUs on our s/w stack.

CMake command

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo -DENABLE_YAPF=OFF -DGEOSX_DIR=/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo -DGEOSX_TPL_DIR=/data/saet/mtml/software/x86_64/RHEL8/GEOSTPL/1.1.0-miket__GPU-build-fix-2025-01-14/install-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo -C/data/saet/mtml/src/GEOS_mtml/GEOS/host-configs/CVX/GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP.cmake /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src

  • Configure log

Config-2025-01-17-134202.log

  • build log

Make-build-2025-01-17-134202.log

Please comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working type: new A new issue has been created and requires attention
Projects
None yet
Development

No branches or pull requests

2 participants