-
Notifications
You must be signed in to change notification settings - Fork 13
Fujitsu FX1000 S1 M4 C48
- Processor: Fujitsu A64FX FX1000
- Base frequency: 2.2 GHz
- Number of sockets: 1
- Number of memory domains per socket: 4
- Number of cores per socket: 48
- Number of HWThreads per core: 1
- MachineState output: NA
+----------+-----------------------------+
| Compiler | fcc (FCC) |
|----------|-----------------------------|
| Version | fcc (FCC) 4.4.0a 20210127 |
+----------+-----------------------------+
Optimizing flags: -Kfast -Kocl -Koptmsg=2 -Nlst=t -Kzfill -Kprefetch_line=6 -Kopenmp
Please note that runs were performed with huge pages (2MB).
All results are in GB/s
.
Summary results:
+--------------------------------------------------+
| Single core | 92.72 (SDaxpy) |
| Memory domain | 230.03 (Sum with 8 cores) |
| Socket | 870.95 (Sum with 8 cores) |
| Node | 870.95 (Sum with 8 cores) |
+--------------------------------------------------+
Results for scaling within a memory domain:
#nt Init Sum Copy Update Triad Daxpy STriad SDaxpy
1 13.24 57.29 26.15 80.14 38.22 89.81 49.41 92.72
2 26.51 117.00 52.39 132.17 76.15 159.29 99.31 180.45
3 39.79 168.38 78.55 171.95 114.22 203.63 148.67 214.14
4 53.11 203.77 104.60 198.39 152.39 213.95 196.94 215.06
5 66.48 218.87 130.04 208.79 190.06 213.73 208.89 215.36
6 80.00 226.30 155.77 213.38 207.44 213.69 212.09 214.96
7 85.95 222.49 164.64 208.16 207.38 212.42 211.67 214.28
8 107.11 230.03 202.29 214.65 211.58 212.13 212.99 214.17
9 110.59 228.35 203.24 210.91 205.07 211.34 209.35 213.26
10 133.32 228.69 205.94 212.66 210.84 211.85 212.49 213.68
11 131.18 227.63 208.70 212.26 210.38 211.58 212.24 213.37
12 133.24 227.19 208.84 209.98 210.47 211.67 211.93 213.24
Results for scaling across memory domains. Shown are the results for the number of memory domains used (nm) with columns number of cores used per memory domain.
Init:
#nm 1 2 3 4
1 13.24 26.49 39.73 52.96
2 26.51 53.01 79.49 105.94
3 39.79 79.53 110.35 158.88
4 53.11 106.18 159.09 211.97
5 66.48 132.81 199.04 265.06
6 80.00 159.83 219.82 319.11
7 85.95 171.35 256.64 341.38
8 107.11 213.79 320.11 425.84
9 110.59 220.76 330.20 437.92
10 133.32 265.29 396.50 525.45
11 131.18 259.50 383.31 519.40
12 133.24 264.95 393.73 527.38
Sum:
#nm 1 2 3 4
1 57.29 114.50 170.48 226.70
2 117.00 231.74 344.43 453.07
3 168.38 333.02 418.70 646.18
4 203.77 400.74 592.29 779.26
5 218.87 430.32 633.73 828.78
6 226.30 443.33 627.61 865.72
7 222.49 435.47 645.35 847.42
8 230.03 451.53 665.44 870.95
9 228.35 449.76 665.32 864.43
10 228.69 448.83 661.70 864.71
11 227.63 446.43 659.02 859.17
12 227.19 445.79 655.94 864.56
Copy
#nm 1 2 3 4
1 26.15 52.40 78.60 104.78
2 52.39 104.74 157.08 209.39
3 78.55 157.00 215.78 313.31
4 104.60 209.10 312.30 416.26
5 130.04 259.82 389.30 518.41
6 155.77 311.05 423.25 619.02
7 164.64 329.27 492.46 655.47
8 202.29 402.55 599.54 797.99
9 203.24 403.31 601.76 796.74
10 205.94 411.32 616.34 817.48
11 208.70 415.97 622.32 827.97
12 208.84 416.56 615.35 817.13
Update
#nm 1 2 3 4
1 80.14 160.16 240.20 319.72
2 132.17 263.95 395.49 526.63
3 171.95 343.19 432.03 683.28
4 198.39 396.26 592.45 790.80
5 208.79 416.82 623.73 829.48
6 213.38 425.51 598.87 850.88
7 208.16 414.01 623.04 827.46
8 214.65 427.37 639.56 849.91
9 210.91 422.03 633.41 837.44
10 212.66 423.60 635.51 845.11
11 212.26 422.42 632.77 840.07
12 209.98 419.61 630.04 839.04
Triad
#nm 1 2 3 4
1 38.22 76.68 114.99 153.31
2 76.15 152.17 228.30 304.08
3 114.22 228.18 304.35 455.71
4 152.39 304.20 455.11 606.28
5 190.06 378.81 567.17 753.18
6 207.44 412.90 592.12 820.72
7 207.38 412.86 616.03 812.76
8 211.58 420.05 613.30 813.03
9 205.07 408.38 613.02 810.70
10 210.84 418.52 626.45 831.34
11 210.38 418.01 624.82 825.94
12 210.47 417.65 621.49 824.72
Memory bandwidth scaling within one memory domain:
The following plots illustrate the the performance scaling over multiple memory domains using different number of cores per memory domain.
Memory bandwidth scaling across memory domains for init:
Memory bandwidth scaling across memory domains for sum
Memory bandwidth scaling across memory domains for copy
Memory bandwidth scaling across memory domains for Triad