-
Notifications
You must be signed in to change notification settings - Fork 13
NVIDIA Grace S1 M1 C72
- Processor: Nvidia Grace (Grace-Hopper Superchip)
- Base frequency: 2.2 GHz
- Number of sockets: 1
- Number of memory domains per socket: 1
- Number of cores per socket: 72
- Number of HWThreads per core: 1
- MachineState output: json
+----------+-------------------------------------------------------------------------------------+
| Compiler | Arm C/C++/Fortran |
|----------|-------------------------------------------------------------------------------------|
| Version | Arm C/C++/Fortran Compiler version 23.10 (build number 32) (based on LLVM 17.0.0) |
+----------+-------------------------------------------------------------------------------------+
Optimizing flags: -Ofast -march=armv9-a+sve2 -mprefer-vector-width=128 -mcpu=neoverse-v2 -std=c99 -Xpreprocessor -fopenmp
All results are in GB/s
.
Summary results:
+----------------------------------------------+
| Single core | 42.63 (Init) |
| Memory domain | 361.57 (Init with 12 cores) |
| Socket | 361.57 (Init with 12 cores) |
| Node | 361.57 (Init with 12 cores) |
+----------------------------------------------+
Results for scaling within a memory domain:
#nt Init Sum Copy Update Triad Daxpy STriad SDaxpy
1 42.63 20.46 27.56 39.71 26.73 33.08 25.49 30.00
2 84.95 40.46 54.04 76.06 52.45 64.70 50.72 59.05
3 126.70 59.87 79.69 108.48 78.24 94.58 75.42 85.83
4 168.08 78.68 104.57 137.25 102.27 121.91 99.20 112.27
5 189.31 96.76 128.29 162.93 125.98 146.99 122.46 136.47
6 225.75 114.24 151.88 185.92 148.36 169.89 143.89 156.90
7 261.54 130.82 175.41 201.19 168.91 190.48 162.39 178.55
8 295.41 147.00 199.43 209.89 188.38 208.83 183.05 196.32
9 328.39 162.21 223.53 214.58 206.41 221.52 200.57 213.06
10 354.77 176.99 238.63 220.04 223.70 231.63 218.28 227.61
11 360.72 190.58 245.94 224.42 237.86 239.26 232.09 238.91
12 361.57 204.26 255.05 230.08 251.63 248.23 247.57 249.86
13 361.20 216.56 260.86 235.42 261.63 256.67 259.58 259.11
14 360.75 228.29 265.76 240.50 270.35 264.98 268.98 268.51
15 360.37 238.81 272.73 247.33 278.61 273.13 279.85 277.78
16 359.72 248.37 277.87 253.18 284.44 280.95 285.83 285.94
17 359.30 258.63 282.28 259.07 291.59 288.70 294.06 293.77
18 358.93 267.94 288.17 264.77 299.25 294.62 301.33 300.10
19 358.85 277.37 291.41 270.54 304.88 299.10 306.79 305.31
20 359.05 286.38 299.06 274.74 311.17 302.14 315.65 309.12
21 359.43 293.95 303.20 276.31 316.37 302.61 319.32 309.11
22 359.58 301.72 309.38 277.94 322.26 303.90 325.57 310.65
23 359.36 308.50 312.08 280.01 324.76 304.51 326.99 311.22
24 359.36 315.41 317.11 281.78 329.94 304.86 334.52 313.22
25 358.59 321.07 321.97 282.69 335.17 306.85 338.63 313.21
26 357.80 325.98 321.43 282.74 334.62 304.87 336.96 310.91
27 353.69 330.83 325.83 282.96 338.23 304.15 340.14 311.88
28 352.57 334.17 330.91 282.68 339.49 305.04 341.51 311.53
29 347.34 338.38 332.02 284.12 340.50 306.25 341.72 312.16
30 348.75 342.28 336.19 284.96 343.24 305.85 345.23 313.01
31 343.42 344.99 336.42 283.97 343.82 305.46 345.73 311.47
32 348.93 348.04 338.71 285.50 344.02 305.61 345.53 312.27
33 351.08 349.64 337.77 284.35 342.60 304.38 343.14 310.33
34 348.32 351.81 339.97 285.16 343.48 304.78 345.12 312.21
35 349.54 353.80 337.83 285.82 343.73 305.78 345.40 312.14
36 348.71 354.74 340.72 284.87 341.56 303.85 341.44 311.11
37 353.63 356.37 342.87 285.70 344.25 304.91 343.24 311.97
38 348.10 357.39 343.09 286.03 343.31 305.23 342.99 311.96
39 348.93 357.81 340.95 286.18 341.31 304.68 327.47 310.51
40 351.21 356.71 343.75 286.73 342.43 305.69 341.05 312.24
41 351.48 354.69 342.35 286.77 339.80 304.82 340.46 311.09
42 354.05 351.81 340.35 285.40 335.84 303.47 321.11 309.27
43 350.62 346.86 341.54 285.48 333.31 303.24 313.63 309.41
44 352.32 348.13 341.77 285.62 334.66 303.72 331.54 309.73
45 354.22 347.48 341.54 286.64 333.76 303.83 306.95 309.58
46 351.74 348.25 339.01 285.92 327.60 303.44 310.46 309.68
47 353.96 348.47 340.32 286.86 336.85 304.99 323.72 310.10
48 355.80 347.78 342.08 285.81 338.93 303.82 327.34 309.69
49 353.54 350.15 340.04 286.48 317.33 303.58 312.32 308.50
50 353.69 345.41 341.94 286.86 337.67 303.63 323.17 309.79
51 355.55 344.87 340.32 285.74 307.12 302.29 311.10 308.29
52 356.69 344.21 339.72 286.29 314.81 303.28 311.66 308.69
53 354.89 346.73 340.79 286.85 316.87 303.65 310.04 308.76
54 354.37 347.38 340.60 286.61 309.13 303.21 311.90 308.11
55 356.72 345.48 339.86 286.36 323.39 302.55 315.61 307.74
56 356.50 345.48 337.35 285.18 320.53 301.58 315.99 306.50
57 356.74 341.46 339.08 286.13 326.64 302.11 327.16 307.49
58 355.52 342.27 338.37 287.34 321.19 303.44 314.23 307.89
59 357.26 344.09 338.45 286.15 316.11 301.82 313.27 307.33
60 357.15 346.23 338.87 286.54 320.08 303.26 321.50 307.89
61 355.97 341.99 337.84 286.92 320.21 302.25 325.77 307.12
62 358.04 339.52 336.57 286.36 321.89 301.91 325.08 306.49
63 357.61 340.96 336.89 286.00 324.88 301.77 325.32 306.28
64 355.76 342.39 336.66 286.26 321.92 302.80 323.02 306.90
65 356.22 339.95 335.18 286.03 325.28 301.57 325.94 306.61
66 356.26 338.29 334.49 286.11 322.83 301.18 327.32 306.33
67 356.56 336.68 334.26 286.68 321.11 301.93 325.89 306.59
68 358.39 332.86 333.13 286.74 326.49 300.76 333.20 305.84
69 357.43 337.02 334.05 286.84 325.00 302.04 331.39 306.33
70 356.02 336.22 332.92 285.79 326.52 301.05 330.46 305.40
71 358.08 336.72 332.83 285.75 329.33 300.28 331.18 305.26
72 358.44 335.73 333.09 285.36 329.23 300.52 332.26 305.07
Results for scaling across memory domains. Shown are the results for the number of memory domains used (nm) with columns number of cores used per memory domain.
Init:
#nm 1
1 42.63
2 84.95
3 126.70
4 168.08
5 189.31
6 225.75
7 261.54
8 295.41
9 328.39
10 354.77
11 360.72
12 361.57
13 361.20
14 360.75
15 360.37
16 359.72
17 359.30
18 358.93
19 358.85
20 359.05
21 359.43
22 359.58
23 359.36
24 359.36
25 358.59
26 357.80
27 353.69
28 352.57
29 347.34
30 348.75
31 343.42
32 348.93
33 351.08
34 348.32
35 349.54
36 348.71
37 353.63
38 348.10
39 348.93
40 351.21
41 351.48
42 354.05
43 350.62
44 352.32
45 354.22
46 351.74
47 353.96
48 355.80
49 353.54
50 353.69
51 355.55
52 356.69
53 354.89
54 354.37
55 356.72
56 356.50
57 356.74
58 355.52
59 357.26
60 357.15
61 355.97
62 358.04
63 357.61
64 355.76
65 356.22
66 356.26
67 356.56
68 358.39
69 357.43
70 356.02
71 358.08
72 358.44
Sum:
#nm 1
1 20.46
2 40.46
3 59.87
4 78.68
5 96.76
6 114.24
7 130.82
8 147.00
9 162.21
10 176.99
11 190.58
12 204.26
13 216.56
14 228.29
15 238.81
16 248.37
17 258.63
18 267.94
19 277.37
20 286.38
21 293.95
22 301.72
23 308.50
24 315.41
25 321.07
26 325.98
27 330.83
28 334.17
29 338.38
30 342.28
31 344.99
32 348.04
33 349.64
34 351.81
35 353.80
36 354.74
37 356.37
38 357.39
39 357.81
40 356.71
41 354.69
42 351.81
43 346.86
44 348.13
45 347.48
46 348.25
47 348.47
48 347.78
49 350.15
50 345.41
51 344.87
52 344.21
53 346.73
54 347.38
55 345.48
56 345.48
57 341.46
58 342.27
59 344.09
60 346.23
61 341.99
62 339.52
63 340.96
64 342.39
65 339.95
66 338.29
67 336.68
68 332.86
69 337.02
70 336.22
71 336.72
72 335.73
Copy
#nm 1
1 27.56
2 54.04
3 79.69
4 104.57
5 128.29
6 151.88
7 175.41
8 199.43
9 223.53
10 238.63
11 245.94
12 255.05
13 260.86
14 265.76
15 272.73
16 277.87
17 282.28
18 288.17
19 291.41
20 299.06
21 303.20
22 309.38
23 312.08
24 317.11
25 321.97
26 321.43
27 325.83
28 330.91
29 332.02
30 336.19
31 336.42
32 338.71
33 337.77
34 339.97
35 337.83
36 340.72
37 342.87
38 343.09
39 340.95
40 343.75
41 342.35
42 340.35
43 341.54
44 341.77
45 341.54
46 339.01
47 340.32
48 342.08
49 340.04
50 341.94
51 340.32
52 339.72
53 340.79
54 340.60
55 339.86
56 337.35
57 339.08
58 338.37
59 338.45
60 338.87
61 337.84
62 336.57
63 336.89
64 336.66
65 335.18
66 334.49
67 334.26
68 333.13
69 334.05
70 332.92
71 332.83
72 333.09
Update
#nm 1
1 39.71
2 76.06
3 108.48
4 137.25
5 162.93
6 185.92
7 201.19
8 209.89
9 214.58
10 220.04
11 224.42
12 230.08
13 235.42
14 240.50
15 247.33
16 253.18
17 259.07
18 264.77
19 270.54
20 274.74
21 276.31
22 277.94
23 280.01
24 281.78
25 282.69
26 282.74
27 282.96
28 282.68
29 284.12
30 284.96
31 283.97
32 285.50
33 284.35
34 285.16
35 285.82
36 284.87
37 285.70
38 286.03
39 286.18
40 286.73
41 286.77
42 285.40
43 285.48
44 285.62
45 286.64
46 285.92
47 286.86
48 285.81
49 286.48
50 286.86
51 285.74
52 286.29
53 286.85
54 286.61
55 286.36
56 285.18
57 286.13
58 287.34
59 286.15
60 286.54
61 286.92
62 286.36
63 286.00
64 286.26
65 286.03
66 286.11
67 286.68
68 286.74
69 286.84
70 285.79
71 285.75
72 285.36
Triad
#nm 1
1 26.73
2 52.45
3 78.24
4 102.27
5 125.98
6 148.36
7 168.91
8 188.38
9 206.41
10 223.70
11 237.86
12 251.63
13 261.63
14 270.35
15 278.61
16 284.44
17 291.59
18 299.25
19 304.88
20 311.17
21 316.37
22 322.26
23 324.76
24 329.94
25 335.17
26 334.62
27 338.23
28 339.49
29 340.50
30 343.24
31 343.82
32 344.02
33 342.60
34 343.48
35 343.73
36 341.56
37 344.25
38 343.31
39 341.31
40 342.43
41 339.80
42 335.84
43 333.31
44 334.66
45 333.76
46 327.60
47 336.85
48 338.93
49 317.33
50 337.67
51 307.12
52 314.81
53 316.87
54 309.13
55 323.39
56 320.53
57 326.64
58 321.19
59 316.11
60 320.08
61 320.21
62 321.89
63 324.88
64 321.92
65 325.28
66 322.83
67 321.11
68 326.49
69 325.00
70 326.52
71 329.33
72 329.23
Memory bandwidth scaling within one memory domain:
The following plots illustrate the the performance scaling over multiple memory domains using different number of cores per memory domain.
Memory bandwidth scaling across memory domains for init:
Memory bandwidth scaling across memory domains for sum
Memory bandwidth scaling across memory domains for copy
Memory bandwidth scaling across memory domains for Triad