Add benchmark comparison script #1467

upsj · 2023-11-19T16:58:57Z

This script allows us to easily compare runtimes, storage and iteration counts between different benchmark runs on the same input.

TODO

allow listing outliers for larger benchmark runs
different output modes: csv, markdown table, json

codecov · 2023-11-19T20:49:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (fa71990) 90.98% compared to head (8465172) 89.31%.

❗ Current head 8465172 differs from pull request most recent head 267c6f4. Consider uploading reports for the commit 267c6f4 to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1467      +/-   ##
===========================================
- Coverage    90.98%   89.31%   -1.68%     
===========================================
  Files          688      688              
  Lines        56120    56335     +215     
===========================================
- Hits         51063    50314     -749     
- Misses        5057     6021     +964

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MarcelKoch

some quick first comments

benchmark/tools/compare.py

MarcelKoch · 2023-11-20T08:15:19Z

benchmark/tools/compare.py

+        )
+        outliers[benchmark_name] = outlier[: min(len(outlier), args.outlier_count)]
+
+if args.output == "json":


why is the json output not handled by pandas?

I have two output styles in mind: The JSON output can handle arbitrary nesting (for further detailed analysis), the tabular style (for posting summaries on Github) flattens everything. I want to preserve the nested structure for the JSON output, which is hard to represent in Pandas, since there is no orient matching this structure well.

benchmark/tools/compare.py

greole

Some quick fstring suggestions. F-strings are great, helps to get rid of most str.format()

benchmark/tools/compare.py

MarcelKoch

My biggest issue is storing the keys to the benchmarks as tuples. This throws away too much information that is then still implicitly relied on.
Also, the output for the different formats should be unified, or perhaps it is best to only output json.

benchmark/tools/compare.py

greole

Some more minor things

benchmark/tools/compare.py

MarcelKoch · 2023-11-20T13:15:05Z

benchmark/tools/compare.py

+    default=1000,
+    help="How many outliers should be reported per benchmark",
+)
+parser.add_argument("--output", choices=["json", "csv", "markdown"], default="json")


maybe just a single option --short that switches between markdown (on) and json (off) would be better suited. That would clarify that the output depends more strongly on the format.

I would like to also support CSV for follow-up analysis

benchmark/tools/compare.py

MarcelKoch

LGTM. Are the test run automatically or manually?

benchmark/tools/compare.py

upsj · 2023-11-21T10:04:36Z

@MarcelKoch currently only manually, I wasn't sure about the directory layout and common practices w.r.t. available packages, so I wanted to collect some feedback first.

yhmtsai · 2023-11-21T10:12:27Z

I do not look into the detail but I feel it is more suitable in ginkgo-data repo

benchmark/tools/compare.py

upsj · 2023-11-21T10:24:00Z

@yhmtsai I am building this for development purposes, modeled after what nvbench_compare.py provides for nvbench, I don't think this is specific to ginkgo-data.

yhmtsai · 2023-11-21T10:48:22Z

It somehow post-processes the data generated by benchmark, right? Its purpose is like the script for aggregating the json or plotting spmv performance comparison in ginkgo-data.

upsj · 2023-11-22T13:24:58Z

An argument for putting this into ginkgo directly is that it is likely to be used directly in conjunction with the benchmarks, while it is unlikely you will need to check out ginkgo-data when optimizing something, because it is more stable to run the baseline manually instead of relying on archived values.

greole

LGTM!

- simplify things - add type annotations - fix outlier issues - test everything Co-authored-by: Gregor Olenik <[email protected]> Co-authored-by: Marcel Koch <[email protected]>

ginkgo-bot · 2023-11-27T14:13:34Z

Error: PR already merged!

sonarqubecloud · 2023-11-28T08:24:08Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

ginkgo-bot added the reg:benchmarking This is related to benchmarking. label Nov 19, 2023

upsj added the 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. label Nov 19, 2023

upsj added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Nov 20, 2023

upsj marked this pull request as ready for review November 20, 2023 05:04

upsj requested a review from a team November 20, 2023 05:05

upsj self-assigned this Nov 20, 2023

MarcelKoch self-requested a review November 20, 2023 07:41

MarcelKoch reviewed Nov 20, 2023

View reviewed changes