Skip to content

Latest commit

 

History

History
469 lines (458 loc) · 61.3 KB

RESULTS.md

File metadata and controls

469 lines (458 loc) · 61.3 KB

Performance Results gtbench

The following numbers report performance measurements conducted at Piz Daint at CSCS. Both CPU and GPU gtbench versions are tested using various parameter configurations. The parameter space is described in the following table.

Parameter name Description
local size local domain size per compute node (for example 256 means 256 x 256 x 60 grid points)
total size total domain size per gtbench run (for example 1024 means 1024 x 1024 x 60 grid points)
gridtools backend GridTools compute backend, possible values are x86, mc and cuda
comm communcications backend, possible values are simple, gcl and ghex
var low-level transport library, possible value is mpi
nodes number of compute nodes
tasks per node MPI tasks per compute node
domain threads number of threads per MPI task (number of sub-domains per task)
openmp threads number of openmp threads per sub-domain
columns / s /node computed columns per second and node (higher is better)
wall clock time median time reported for the gtbench simulation in seconds (lower is better)

Hence, the total domain total size is decomposed in nodes x tasks per node x domain threads sub-domains. All runs use single precision floating point numbers.

Results on Piz Daint multicore partition

compute node: Cray XC40 (Two Intel® Xeon® E5-2695 v4 @ 2.10GHz (2 x 18 cores, 64/128 GB RAM))

local size total size gridtools backend comm var nodes tasks per node domain threads openmp threads columns/ s/node wall clock time [s]
128 128 mc gcl mpi 1 1 1 36 10989.2 1.49092
128 256 mc gcl mpi 4 1 1 36 10488.9 1.56203
128 512 mc gcl mpi 16 1 1 36 9620.0 1.70312
128 1024 mc gcl mpi 64 1 1 36 9487.7 1.72686
128 128 mc gcl mpi 1 2 1 18 12103.2 1.35369
128 256 mc gcl mpi 4 2 1 18 11655.9 1.40564
128 512 mc gcl mpi 16 2 1 18 10665.1 1.53623
128 1024 mc gcl mpi 64 2 1 18 10538.6 1.55466
128 128 mc gcl mpi 1 36 1 1 21438.9 0.764219
128 256 mc gcl mpi 4 36 1 1 19649.1 0.833829
128 512 mc gcl mpi 16 36 1 1 17355.5 0.944024
128 1024 mc gcl mpi 64 36 1 1 17527.0 0.934781
128 128 mc ghex mpi 1 1 1 36 11030.0 1.4854
128 256 mc ghex mpi 4 1 1 36 11380.2 1.43969
128 512 mc ghex mpi 16 1 1 36 10315.6 1.58828
128 1024 mc ghex mpi 64 1 1 36 10015.6 1.63585
128 128 mc ghex mpi 1 1 36 1 11012.5 1.48776
128 256 mc ghex mpi 4 1 36 1 10041.5 1.63163
128 512 mc ghex mpi 16 1 36 1 10341.6 1.58428
128 1024 mc ghex mpi 64 1 36 1 9982.7 1.64123
128 128 mc ghex mpi 1 2 1 18 12340.0 1.32772
128 256 mc ghex mpi 4 2 1 18 11984.5 1.3671
128 512 mc ghex mpi 16 2 1 18 10645.1 1.53912
128 1024 mc ghex mpi 64 2 1 18 10847.5 1.5104
128 128 mc ghex mpi 1 2 18 1 13203.0 1.24093
128 256 mc ghex mpi 4 2 18 1 12746.2 1.28541
128 512 mc ghex mpi 16 2 18 1 12503.8 1.31032
128 1024 mc ghex mpi 64 2 18 1 12317.6 1.33013
128 128 mc ghex mpi 1 36 1 1 22064.6 0.742548
128 256 mc ghex mpi 4 36 1 1 20173.8 0.812143
128 512 mc ghex mpi 16 36 1 1 18029.1 0.908752
128 1024 mc ghex mpi 64 36 1 1 18319.4 0.894351
128 128 mc simple mpi 1 1 1 36 12149.1 1.34858
128 256 mc simple mpi 4 1 1 36 10825.3 1.51349
128 512 mc simple mpi 16 1 1 36 10100.5 1.6221
128 1024 mc simple mpi 64 1 1 36 8477.4 1.93267
128 128 mc simple mpi 1 2 1 18 12842.6 1.27575
128 256 mc simple mpi 4 2 1 18 11867.1 1.38063
128 512 mc simple mpi 16 2 1 18 9448.4 1.73405
128 1024 mc simple mpi 64 2 1 18 9376.2 1.7474
128 128 mc simple mpi 1 36 1 1 21783.2 0.75214
128 256 mc simple mpi 4 36 1 1 21118.5 0.775814
128 512 mc simple mpi 16 36 1 1 16408.7 0.998496
128 1024 mc simple mpi 64 36 1 1 15907.8 1.02993
128 128 x86 gcl mpi 1 1 1 36 11465.8 1.42894
128 256 x86 gcl mpi 4 1 1 36 11378.9 1.43986
128 512 x86 gcl mpi 16 1 1 36 10799.9 1.51706
128 1024 x86 gcl mpi 64 1 1 36 9654.0 1.69711
128 128 x86 gcl mpi 1 2 1 18 11302.3 1.44962
128 256 x86 gcl mpi 4 2 1 18 11027.4 1.48575
128 512 x86 gcl mpi 16 2 1 18 10799.1 1.51717
128 1024 x86 gcl mpi 64 2 1 18 10222.0 1.60282
128 128 x86 gcl mpi 1 36 1 1 21410.5 0.765233
128 256 x86 gcl mpi 4 36 1 1 21133.9 0.775246
128 512 x86 gcl mpi 16 36 1 1 19997.1 0.819317
128 1024 x86 gcl mpi 64 36 1 1 18640.9 0.878923
128 128 x86 ghex mpi 1 1 1 36 11338.8 1.44495
128 256 x86 ghex mpi 4 1 1 36 10921.8 1.50012
128 512 x86 ghex mpi 16 1 1 36 9521.8 1.7207
128 1024 x86 ghex mpi 64 1 1 36 9830.4 1.66666
128 128 x86 ghex mpi 1 1 36 1 12236.1 1.33899
128 256 x86 ghex mpi 4 1 36 1 11244.7 1.45704
128 512 x86 ghex mpi 16 1 36 1 11541.8 1.41954
128 1024 x86 ghex mpi 64 1 36 1 11427.7 1.43371
128 128 x86 ghex mpi 1 2 1 18 11399.7 1.43723
128 256 x86 ghex mpi 4 2 1 18 11096.9 1.47645
128 512 x86 ghex mpi 16 2 1 18 10264.9 1.59611
128 1024 x86 ghex mpi 64 2 1 18 10151.0 1.61403
128 128 x86 ghex mpi 1 2 18 1 13369.5 1.22548
128 256 x86 ghex mpi 4 2 18 1 12887.7 1.27129
128 512 x86 ghex mpi 16 2 18 1 12378.8 1.32355
128 1024 x86 ghex mpi 64 2 18 1 12389.5 1.32241
128 128 x86 ghex mpi 1 36 1 1 21931.6 0.747051
128 256 x86 ghex mpi 4 36 1 1 21028.5 0.779132
128 512 x86 ghex mpi 16 36 1 1 19067.1 0.859279
128 1024 x86 ghex mpi 64 36 1 1 19129.1 0.8565
128 128 x86 simple mpi 1 1 1 36 11550.1 1.41852
128 256 x86 simple mpi 4 1 1 36 10437.6 1.5697
128 512 x86 simple mpi 16 1 1 36 10107.1 1.62103
128 1024 x86 simple mpi 64 1 1 36 8300.1 1.97395
128 128 x86 simple mpi 1 2 1 18 11709.9 1.39915
128 256 x86 simple mpi 4 2 1 18 10943.8 1.49711
128 512 x86 simple mpi 16 2 1 18 10235.6 1.60069
128 1024 x86 simple mpi 64 2 1 18 8992.9 1.82188
128 128 x86 simple mpi 1 36 1 1 21789.6 0.751919
128 256 x86 simple mpi 4 36 1 1 19960.8 0.820808
128 512 x86 simple mpi 16 36 1 1 16245.9 1.0085
128 1024 x86 simple mpi 64 36 1 1 17247.8 0.94992
256 256 mc gcl mpi 1 1 1 36 11344.1 5.77712
256 512 mc gcl mpi 4 1 1 36 10899.3 6.01285
256 1024 mc gcl mpi 16 1 1 36 10384.4 6.31099
256 2048 mc gcl mpi 64 1 1 36 10255.0 6.39062
256 256 mc gcl mpi 1 2 1 18 12983.0 5.04784
256 512 mc gcl mpi 4 2 1 18 12679.1 5.16881
256 1024 mc gcl mpi 16 2 1 18 12216.5 5.36455
256 2048 mc gcl mpi 64 2 1 18 12054.4 5.43671
256 256 mc gcl mpi 1 36 1 1 22935.4 2.85742
256 512 mc gcl mpi 4 36 1 1 22119.3 2.96285
256 1024 mc gcl mpi 16 36 1 1 22283.6 2.941
256 2048 mc gcl mpi 64 36 1 1 21525.3 3.0446
256 256 mc ghex mpi 1 1 1 36 11352.7 5.77275
256 512 mc ghex mpi 4 1 1 36 11233.2 5.83413
256 1024 mc ghex mpi 16 1 1 36 10695.1 6.12769
256 2048 mc ghex mpi 64 1 1 36 10692.3 6.12928
256 256 mc ghex mpi 1 1 36 1 13009.9 5.0374
256 512 mc ghex mpi 4 1 36 1 12881.1 5.08777
256 1024 mc ghex mpi 16 1 36 1 12584.3 5.20779
256 2048 mc ghex mpi 64 1 36 1 12543.4 5.22473
256 256 mc ghex mpi 1 2 1 18 13483.8 4.86034
256 512 mc ghex mpi 4 2 1 18 12588.3 5.20609
256 1024 mc ghex mpi 16 2 1 18 12777.8 5.12891
256 2048 mc ghex mpi 64 2 1 18 12373.6 5.29642
256 256 mc ghex mpi 1 2 18 1 14758.1 4.44068
256 512 mc ghex mpi 4 2 18 1 14467.8 4.52977
256 1024 mc ghex mpi 16 2 18 1 14345.9 4.56829
256 2048 mc ghex mpi 64 2 18 1 14233.2 4.60444
256 256 mc ghex mpi 1 36 1 1 22419.7 2.92314
256 512 mc ghex mpi 4 36 1 1 22114.6 2.96348
256 1024 mc ghex mpi 16 36 1 1 22579.7 2.90243
256 2048 mc ghex mpi 64 36 1 1 21707.0 3.01912
256 256 mc simple mpi 1 1 1 36 11926.5 5.49498
256 512 mc simple mpi 4 1 1 36 11596.1 5.65154
256 1024 mc simple mpi 16 1 1 36 10200.7 6.42465
256 2048 mc simple mpi 64 1 1 36 9772.3 6.7063
256 256 mc simple mpi 1 2 1 18 13795.8 4.75042
256 512 mc simple mpi 4 2 1 18 13328.2 4.9171
256 1024 mc simple mpi 16 2 1 18 12837.2 5.10516
256 2048 mc simple mpi 64 2 1 18 11840.0 5.53513
256 256 mc simple mpi 1 36 1 1 23021.3 2.84675
256 512 mc simple mpi 4 36 1 1 22637.6 2.89501
256 1024 mc simple mpi 16 36 1 1 20822.2 3.14741
256 2048 mc simple mpi 64 36 1 1 20923.0 3.13225
256 256 x86 gcl mpi 1 1 1 36 12398.5 5.28582
256 512 x86 gcl mpi 4 1 1 36 12742.7 5.14303
256 1024 x86 gcl mpi 16 1 1 36 12423.0 5.27538
256 2048 x86 gcl mpi 64 1 1 36 12013.0 5.45541
256 256 x86 gcl mpi 1 2 1 18 13101.1 5.00234
256 512 x86 gcl mpi 4 2 1 18 13014.7 5.03556
256 1024 x86 gcl mpi 16 2 1 18 12440.3 5.26807
256 2048 x86 gcl mpi 64 2 1 18 12510.8 5.23835
256 256 x86 gcl mpi 1 36 1 1 22843.9 2.86887
256 512 x86 gcl mpi 4 36 1 1 22411.0 2.92427
256 1024 x86 gcl mpi 16 36 1 1 21868.0 2.99689
256 2048 x86 gcl mpi 64 36 1 1 20987.5 3.12263
256 256 x86 ghex mpi 1 1 1 36 12817.8 5.11291
256 512 x86 ghex mpi 4 1 1 36 12372.6 5.29689
256 1024 x86 ghex mpi 16 1 1 36 11400.5 5.74852
256 2048 x86 ghex mpi 64 1 1 36 11603.8 5.64778
256 256 x86 ghex mpi 1 1 36 1 13662.4 4.79682
256 512 x86 ghex mpi 4 1 36 1 13371.2 4.9013
256 1024 x86 ghex mpi 16 1 36 1 13580.7 4.82568
256 2048 x86 ghex mpi 64 1 36 1 13542.0 4.83947
256 256 x86 ghex mpi 1 2 1 18 12939.9 5.06466
256 512 x86 ghex mpi 4 2 1 18 12854.5 5.0983
256 1024 x86 ghex mpi 16 2 1 18 12315.6 5.32137
256 2048 x86 ghex mpi 64 2 1 18 12459.4 5.25996
256 256 x86 ghex mpi 1 2 18 1 14387.9 4.55493
256 512 x86 ghex mpi 4 2 18 1 14149.6 4.63165
256 1024 x86 ghex mpi 16 2 18 1 14123.6 4.64017
256 2048 x86 ghex mpi 64 2 18 1 14046.6 4.66561
256 256 x86 ghex mpi 1 36 1 1 23475.6 2.79166
256 512 x86 ghex mpi 4 36 1 1 23148.0 2.83118
256 1024 x86 ghex mpi 16 36 1 1 22039.1 2.97363
256 2048 x86 ghex mpi 64 36 1 1 21702.8 3.01969
256 256 x86 simple mpi 1 1 1 36 13014.8 5.03551
256 512 x86 simple mpi 4 1 1 36 12489.3 5.24739
256 1024 x86 simple mpi 16 1 1 36 11273.8 5.81314
256 2048 x86 simple mpi 64 1 1 36 11059.8 5.92558
256 256 x86 simple mpi 1 2 1 18 13157.0 4.98108
256 512 x86 simple mpi 4 2 1 18 12837.4 5.10509
256 1024 x86 simple mpi 16 2 1 18 12538.8 5.22668
256 2048 x86 simple mpi 64 2 1 18 11531.2 5.68339
256 256 x86 simple mpi 1 36 1 1 23387.1 2.80223
256 512 x86 simple mpi 4 36 1 1 23304.1 2.81221
256 1024 x86 simple mpi 16 36 1 1 17745.9 3.69302
256 2048 x86 simple mpi 64 36 1 1 21055.8 3.11249
512 512 mc gcl mpi 1 1 1 36 12920.3 20.2892
512 1024 mc gcl mpi 4 1 1 36 12706.4 20.6308
512 2048 mc gcl mpi 16 1 1 36 12554.0 20.8814
512 4096 mc gcl mpi 64 1 1 36 12460.8 21.0374
512 512 mc gcl mpi 1 2 1 18 14068.8 18.6331
512 1024 mc gcl mpi 4 2 1 18 14083.3 18.6138
512 2048 mc gcl mpi 16 2 1 18 13814.3 18.9763
512 4096 mc gcl mpi 64 2 1 18 13813.5 18.9773
512 512 mc gcl mpi 1 36 1 1 22404.3 11.7006
512 1024 mc gcl mpi 4 36 1 1 22067.4 11.8793
512 2048 mc gcl mpi 16 36 1 1 22724.7 11.5357
512 4096 mc gcl mpi 64 36 1 1 20785.3 12.612
512 512 mc ghex mpi 1 1 1 36 12976.6 20.2012
512 1024 mc ghex mpi 4 1 1 36 12688.7 20.6597
512 2048 mc ghex mpi 16 1 1 36 12544.4 20.8974
512 4096 mc ghex mpi 64 1 1 36 12290.9 21.3284
512 512 mc ghex mpi 1 1 36 1 13706.3 19.1258
512 1024 mc ghex mpi 4 1 36 1 13300.2 19.7098
512 2048 mc ghex mpi 16 1 36 1 13396.3 19.5684
512 4096 mc ghex mpi 64 1 36 1 13311.0 19.6937
512 512 mc ghex mpi 1 2 1 18 14282.7 18.354
512 1024 mc ghex mpi 4 2 1 18 14258.3 18.3854
512 2048 mc ghex mpi 16 2 1 18 13493.8 19.4271
512 4096 mc ghex mpi 64 2 1 18 13493.2 19.4279
512 512 mc ghex mpi 1 2 18 1 15033.6 17.4372
512 1024 mc ghex mpi 4 2 18 1 15087.4 17.375
512 2048 mc ghex mpi 16 2 18 1 14987.8 17.4905
512 4096 mc ghex mpi 64 2 18 1 14863.1 17.6372
512 512 mc ghex mpi 1 36 1 1 23257.0 11.2716
512 1024 mc ghex mpi 4 36 1 1 23175.2 11.3114
512 2048 mc ghex mpi 16 36 1 1 23225.3 11.287
512 4096 mc ghex mpi 64 36 1 1 20901.4 12.5419
512 512 mc simple mpi 1 1 1 36 13145.4 19.9418
512 1024 mc simple mpi 4 1 1 36 13163.9 19.9139
512 2048 mc simple mpi 16 1 1 36 12506.2 20.9611
512 4096 mc simple mpi 64 1 1 36 12504.1 20.9647
512 512 mc simple mpi 1 2 1 18 14480.9 18.1028
512 1024 mc simple mpi 4 2 1 18 14229.1 18.4231
512 2048 mc simple mpi 16 2 1 18 14072.7 18.6278
512 4096 mc simple mpi 64 2 1 18 13788.5 19.0117
512 512 mc simple mpi 1 36 1 1 22910.7 11.442
512 1024 mc simple mpi 4 36 1 1 23009.5 11.3929
512 2048 mc simple mpi 16 36 1 1 22949.7 11.4225
512 4096 mc simple mpi 64 36 1 1 21995.6 11.918
512 512 x86 gcl mpi 1 1 1 36 13559.4 19.333
512 1024 x86 gcl mpi 4 1 1 36 13503.4 19.4132
512 2048 x86 gcl mpi 16 1 1 36 12625.9 20.7623
512 4096 x86 gcl mpi 64 1 1 36 12868.4 20.3712
512 512 x86 gcl mpi 1 2 1 18 13667.5 19.1801
512 1024 x86 gcl mpi 4 2 1 18 13464.0 19.47
512 2048 x86 gcl mpi 16 2 1 18 13446.5 19.4953
512 4096 x86 gcl mpi 64 2 1 18 13280.3 19.7393
512 512 x86 gcl mpi 1 36 1 1 23624.0 11.0965
512 1024 x86 gcl mpi 4 36 1 1 23473.1 11.1679
512 2048 x86 gcl mpi 16 36 1 1 21997.3 11.9171
512 4096 x86 gcl mpi 64 36 1 1 22661.9 11.5676
512 512 x86 ghex mpi 1 1 1 36 13413.2 19.5437
512 1024 x86 ghex mpi 4 1 1 36 13397.9 19.5661
512 2048 x86 ghex mpi 16 1 1 36 13340.8 19.6499
512 4096 x86 ghex mpi 64 1 1 36 13025.6 20.1254
512 512 x86 ghex mpi 1 1 36 1 14356.7 18.2594
512 1024 x86 ghex mpi 4 1 36 1 14349.7 18.2683
512 2048 x86 ghex mpi 16 1 36 1 14385.1 18.2233
512 4096 x86 ghex mpi 64 1 36 1 14353.3 18.2637
512 512 x86 ghex mpi 1 2 1 18 13594.9 19.2825
512 1024 x86 ghex mpi 4 2 1 18 13574.8 19.3111
512 2048 x86 ghex mpi 16 2 1 18 13337.6 19.6546
512 4096 x86 ghex mpi 64 2 1 18 13280.1 19.7396
512 512 x86 ghex mpi 1 2 18 1 14561.6 18.0024
512 1024 x86 ghex mpi 4 2 18 1 14582.8 17.9763
512 2048 x86 ghex mpi 16 2 18 1 14482.4 18.1008
512 4096 x86 ghex mpi 64 2 18 1 14460.5 18.1282
512 512 x86 ghex mpi 1 36 1 1 23734.1 11.045
512 1024 x86 ghex mpi 4 36 1 1 23653.9 11.0825
512 2048 x86 ghex mpi 16 36 1 1 22945.6 11.4246
512 4096 x86 ghex mpi 64 36 1 1 22830.9 11.482
512 512 x86 simple mpi 1 1 1 36 13549.6 19.347
512 1024 x86 simple mpi 4 1 1 36 13397.1 19.5673
512 2048 x86 simple mpi 16 1 1 36 13230.5 19.8136
512 4096 x86 simple mpi 64 1 1 36 12699.4 20.6422
512 512 x86 simple mpi 1 2 1 18 13696.0 19.1402
512 1024 x86 simple mpi 4 2 1 18 13615.5 19.2534
512 2048 x86 simple mpi 16 2 1 18 13428.9 19.5209
512 4096 x86 simple mpi 64 2 1 18 12826.5 20.4377
512 512 x86 simple mpi 1 36 1 1 23742.7 11.041
512 1024 x86 simple mpi 4 36 1 1 23092.1 11.3521
512 2048 x86 simple mpi 16 36 1 1 17671.6 14.8342
512 4096 x86 simple mpi 64 36 1 1 20144.4 13.0133
1024 1024 mc gcl mpi 1 1 1 36 13115.6 79.9486
1024 2048 mc gcl mpi 4 1 1 36 13073.2 80.2081
1024 4096 mc gcl mpi 16 1 1 36 13046.9 80.3694
1024 8192 mc gcl mpi 64 1 1 36 12966.8 80.866
1024 1024 mc gcl mpi 1 2 1 18 15218.0 68.9038
1024 2048 mc gcl mpi 4 2 1 18 15284.3 68.6049
1024 4096 mc gcl mpi 16 2 1 18 15155.0 69.1901
1024 8192 mc gcl mpi 64 2 1 18 15147.3 69.2255
1024 1024 mc gcl mpi 1 36 1 1 23806.5 44.0457
1024 2048 mc gcl mpi 4 36 1 1 23596.2 44.4384
1024 4096 mc gcl mpi 16 36 1 1 23636.8 44.3619
1024 8192 mc gcl mpi 64 36 1 1 23250.9 45.0983
1024 1024 mc ghex mpi 1 1 1 36 13159.2 79.684
1024 2048 mc ghex mpi 4 1 1 36 13150.2 79.7388
1024 4096 mc ghex mpi 16 1 1 36 13111.4 79.974
1024 8192 mc ghex mpi 64 1 1 36 12718.2 82.4467
1024 1024 mc ghex mpi 1 1 36 1 14259.2 73.5368
1024 2048 mc ghex mpi 4 1 36 1 14242.5 73.6233
1024 4096 mc ghex mpi 16 1 36 1 14247.1 73.5994
1024 8192 mc ghex mpi 64 1 36 1 14198.3 73.8524
1024 1024 mc ghex mpi 1 2 1 18 15271.1 68.6639
1024 2048 mc ghex mpi 4 2 1 18 15218.4 68.9021
1024 4096 mc ghex mpi 16 2 1 18 14665.9 71.4974
1024 8192 mc ghex mpi 64 2 1 18 14624.7 71.699
1024 1024 mc ghex mpi 1 2 18 1 15322.0 68.4362
1024 2048 mc ghex mpi 4 2 18 1 15299.5 68.5365
1024 4096 mc ghex mpi 16 2 18 1 15247.0 68.7726
1024 8192 mc ghex mpi 64 2 18 1 15202.8 68.9725
1024 1024 mc ghex mpi 1 36 1 1 23954.2 43.7743
1024 2048 mc ghex mpi 4 36 1 1 23914.5 43.8468
1024 4096 mc ghex mpi 16 36 1 1 23488.6 44.642
1024 8192 mc ghex mpi 64 36 1 1 23078.4 45.4355
1024 1024 mc simple mpi 1 1 1 36 13274.7 78.9905
1024 2048 mc simple mpi 4 1 1 36 13126.3 79.8836
1024 4096 mc simple mpi 16 1 1 36 13122.4 79.9069
1024 8192 mc simple mpi 64 1 1 36 12981.8 80.773
1024 1024 mc simple mpi 1 2 1 18 15306.2 68.5064
1024 2048 mc simple mpi 4 2 1 18 15232.3 68.8389
1024 4096 mc simple mpi 16 2 1 18 15151.7 69.2052
1024 8192 mc simple mpi 64 2 1 18 14743.2 71.1226
1024 1024 mc simple mpi 1 36 1 1 23760.1 44.1317
1024 2048 mc simple mpi 4 36 1 1 23745.4 44.1592
1024 4096 mc simple mpi 16 36 1 1 23228.4 45.1419
1024 8192 mc simple mpi 64 36 1 1 23440.6 44.7332
1024 1024 x86 gcl mpi 1 1 1 36 13704.4 76.5136
1024 2048 x86 gcl mpi 4 1 1 36 13721.1 76.4208
1024 4096 x86 gcl mpi 16 1 1 36 13660.7 76.7587
1024 8192 x86 gcl mpi 64 1 1 36 13668.4 76.7152
1024 1024 x86 gcl mpi 1 2 1 18 13794.3 76.0152
1024 2048 x86 gcl mpi 4 2 1 18 13765.4 76.1749
1024 4096 x86 gcl mpi 16 2 1 18 13743.6 76.2957
1024 8192 x86 gcl mpi 64 2 1 18 13735.0 76.3432
1024 1024 x86 gcl mpi 1 36 1 1 24009.7 43.673
1024 2048 x86 gcl mpi 4 36 1 1 23950.4 43.7812
1024 4096 x86 gcl mpi 16 36 1 1 22438.9 46.7302
1024 8192 x86 gcl mpi 64 36 1 1 23033.8 45.5235
1024 1024 x86 ghex mpi 1 1 1 36 13644.3 76.851
1024 2048 x86 ghex mpi 4 1 1 36 13655.0 76.7909
1024 4096 x86 ghex mpi 16 1 1 36 13611.1 77.038
1024 8192 x86 ghex mpi 64 1 1 36 13615.5 77.0136
1024 1024 x86 ghex mpi 1 1 36 1 14618.0 71.7317
1024 2048 x86 ghex mpi 4 1 36 1 14610.5 71.7688
1024 4096 x86 ghex mpi 16 1 36 1 14604.4 71.7983
1024 8192 x86 ghex mpi 64 1 36 1 14602.3 71.8087
1024 1024 x86 ghex mpi 1 2 1 18 13713.0 76.4661
1024 2048 x86 ghex mpi 4 2 1 18 13703.9 76.5165
1024 4096 x86 ghex mpi 16 2 1 18 13608.8 77.0516
1024 8192 x86 ghex mpi 64 2 1 18 13588.9 77.1643
1024 1024 x86 ghex mpi 1 2 18 1 14729.4 71.1894
1024 2048 x86 ghex mpi 4 2 18 1 14665.4 71.5
1024 4096 x86 ghex mpi 16 2 18 1 14675.3 71.4517
1024 8192 x86 ghex mpi 64 2 18 1 14692.7 71.3673
1024 1024 x86 ghex mpi 1 36 1 1 23868.5 43.9314
1024 2048 x86 ghex mpi 4 36 1 1 23884.9 43.9013
1024 4096 x86 ghex mpi 16 36 1 1 23742.6 44.1642
1024 8192 x86 ghex mpi 64 36 1 1 23261.7 45.0775
1024 1024 x86 simple mpi 1 1 1 36 13704.9 76.5111
1024 2048 x86 simple mpi 4 1 1 36 13649.7 76.8208
1024 4096 x86 simple mpi 16 1 1 36 13633.9 76.9098
1024 8192 x86 simple mpi 64 1 1 36 13360.9 78.4811
1024 1024 x86 simple mpi 1 2 1 18 13762.6 76.19
1024 2048 x86 simple mpi 4 2 1 18 13789.1 76.0439
1024 4096 x86 simple mpi 16 2 1 18 13561.1 77.3224
1024 8192 x86 simple mpi 64 2 1 18 13597.5 77.1151
1024 1024 x86 simple mpi 1 36 1 1 24072.6 43.5589
1024 2048 x86 simple mpi 4 36 1 1 23463.6 44.6895
1024 4096 x86 simple mpi 16 36 1 1 18140.3 57.8036
1024 8192 x86 simple mpi 64 36 1 1 20670.8 50.7273

Results on Piz Daint GPU partition

compute node: Cray XC50 (Intel® Xeon® E5-2690 v3 @ 2.60GHz (12 cores, 64GB RAM) and NVIDIA® Tesla® P100 16GB)

local size total size gridtools backend comm var nodes tasks per node domain threads openmp threads columns/ s/node wall clock time [s]
256 256 cuda gcl mpi 1 1 1 1 117615.0 0.55721
256 512 cuda gcl mpi 4 1 1 1 81335.3 0.805752
256 1024 cuda gcl mpi 16 1 1 1 84595.6 0.774699
256 2048 cuda gcl mpi 64 1 1 1 83838.3 0.781695
256 256 cuda gcl mpi 1 12 1 1 76534.8 0.85629
256 512 cuda gcl mpi 4 12 1 1 76853.3 0.852743
256 1024 cuda gcl mpi 16 12 1 1 81631.3 0.802832
256 2048 cuda gcl mpi 64 12 1 1 83060.9 0.789012
256 256 cuda ghex mpi 1 1 1 1 123736.0 0.529642
256 512 cuda ghex mpi 4 1 1 1 85689.0 0.764812
256 1024 cuda ghex mpi 16 1 1 1 103015.6 0.636174
256 2048 cuda ghex mpi 64 1 1 1 83689.7 0.783084
256 256 cuda ghex mpi 1 1 12 1 46596.7 1.40645
256 512 cuda ghex mpi 4 1 12 1 45308.5 1.44644
256 1024 cuda ghex mpi 16 1 12 1 44690.4 1.46644
256 2048 cuda ghex mpi 64 1 12 1 44509.7 1.4724
256 256 cuda ghex mpi 1 12 1 1 112822.0 0.580879
256 512 cuda ghex mpi 4 12 1 1 118372.8 0.553641
256 1024 cuda ghex mpi 16 12 1 1 109095.0 0.600723
256 2048 cuda ghex mpi 64 12 1 1 95842.2 0.68379
512 512 cuda gcl mpi 1 1 1 1 154926.0 1.69206
512 1024 cuda gcl mpi 4 1 1 1 132726.8 1.97507
512 2048 cuda gcl mpi 16 1 1 1 130343.8 2.01117
512 4096 cuda gcl mpi 64 1 1 1 129091.9 2.03068
512 512 cuda gcl mpi 1 12 1 1 127848.0 2.05043
512 1024 cuda gcl mpi 4 12 1 1 134527.8 1.94862
512 2048 cuda gcl mpi 16 12 1 1 133615.6 1.96192
512 4096 cuda gcl mpi 64 12 1 1 134917.5 1.943
512 512 cuda ghex mpi 1 1 1 1 158033.0 1.6588
512 1024 cuda ghex mpi 4 1 1 1 142825.3 1.83542
512 2048 cuda ghex mpi 16 1 1 1 142910.6 1.83432
512 4096 cuda ghex mpi 64 1 1 1 131272.0 1.99695
512 512 cuda ghex mpi 1 1 12 1 115709.0 2.26555
512 1024 cuda ghex mpi 4 1 12 1 109672.0 2.39026
512 2048 cuda ghex mpi 16 1 12 1 108063.8 2.42582
512 4096 cuda ghex mpi 64 1 12 1 107109.5 2.44744
512 512 cuda ghex mpi 1 12 1 1 146554.0 1.78871
512 1024 cuda ghex mpi 4 12 1 1 159569.0 1.64283
512 2048 cuda ghex mpi 16 12 1 1 161224.4 1.62596
512 4096 cuda ghex mpi 64 12 1 1 159960.9 1.63879
1024 1024 cuda gcl mpi 1 1 1 1 162790.0 6.44128
1024 2048 cuda gcl mpi 4 1 1 1 150927.0 6.94757
1024 4096 cuda gcl mpi 16 1 1 1 150072.5 6.98714
1024 8192 cuda gcl mpi 64 1 1 1 149380.5 7.0195
1024 1024 cuda gcl mpi 1 12 1 1 146360.0 7.16438
1024 2048 cuda gcl mpi 4 12 1 1 149540.3 7.01199
1024 4096 cuda gcl mpi 16 12 1 1 148485.0 7.06184
1024 8192 cuda gcl mpi 64 12 1 1 149910.8 6.99467
1024 1024 cuda ghex mpi 1 1 1 1 163778.0 6.4024
1024 2048 cuda ghex mpi 4 1 1 1 157444.8 6.65996
1024 4096 cuda ghex mpi 16 1 1 1 155720.0 6.73373
1024 8192 cuda ghex mpi 64 1 1 1 150950.6 6.94648
1024 1024 cuda ghex mpi 1 1 12 1 151286.0 6.9311
1024 2048 cuda ghex mpi 4 1 12 1 148672.3 7.05294
1024 4096 cuda ghex mpi 16 1 12 1 147590.0 7.10465
1024 8192 cuda ghex mpi 64 1 12 1 147375.2 7.11501
1024 1024 cuda ghex mpi 1 12 1 1 159082.0 6.5914
1024 2048 cuda ghex mpi 4 12 1 1 163376.3 6.41817
1024 4096 cuda ghex mpi 16 12 1 1 163605.0 6.40918
1024 8192 cuda ghex mpi 64 12 1 1 163831.3 6.40035
2048 2048 cuda gcl mpi 1 1 1 1 154193.0 27.2017
2048 4096 cuda gcl mpi 4 1 1 1 149692.0 28.0195
2048 8192 cuda gcl mpi 16 1 1 1 148322.5 28.2783
2048 16384 cuda gcl mpi 64 1 1 1 147388.0 28.4576
2048 2048 cuda gcl mpi 1 12 1 1 157813.0 26.5777
2048 4096 cuda gcl mpi 4 12 1 1 156117.0 26.8664
2048 8192 cuda gcl mpi 16 12 1 1 152270.0 27.5452
2048 16384 cuda gcl mpi 64 12 1 1 145672.8 28.7926
2048 2048 cuda ghex mpi 1 1 1 1 153631.0 27.3011
2048 4096 cuda ghex mpi 4 1 1 1 149734.5 28.0116
2048 8192 cuda ghex mpi 16 1 1 1 148567.5 28.2317
2048 16384 cuda ghex mpi 64 1 1 1 146674.7 28.596
2048 2048 cuda ghex mpi 1 1 12 1 159277.0 26.3334
2048 4096 cuda ghex mpi 4 1 12 1 158508.0 26.4612
2048 8192 cuda ghex mpi 16 1 12 1 158518.1 26.4595
2048 16384 cuda ghex mpi 64 1 12 1 158168.8 26.5179
2048 2048 cuda ghex mpi 1 12 1 1 160260.0 26.1718
2048 4096 cuda ghex mpi 4 12 1 1 159821.5 26.2437
2048 8192 cuda ghex mpi 16 12 1 1 160239.4 26.1752
2048 16384 cuda ghex mpi 64 12 1 1 159829.7 26.2422