Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark comparison script #1467

Merged
merged 7 commits into from
Nov 27, 2023
Merged

Add benchmark comparison script #1467

merged 7 commits into from
Nov 27, 2023

Conversation

upsj
Copy link
Member

@upsj upsj commented Nov 19, 2023

This script allows us to easily compare runtimes, storage and iteration counts between different benchmark runs on the same input.

TODO

  • allow listing outliers for larger benchmark runs
  • different output modes: csv, markdown table, json

@ginkgo-bot ginkgo-bot added the reg:benchmarking This is related to benchmarking. label Nov 19, 2023
@upsj upsj added the 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. label Nov 19, 2023
Copy link

codecov bot commented Nov 19, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (fa71990) 90.98% compared to head (8465172) 89.31%.

❗ Current head 8465172 differs from pull request most recent head 267c6f4. Consider uploading reports for the commit 267c6f4 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1467      +/-   ##
===========================================
- Coverage    90.98%   89.31%   -1.68%     
===========================================
  Files          688      688              
  Lines        56120    56335     +215     
===========================================
- Hits         51063    50314     -749     
- Misses        5057     6021     +964     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@upsj upsj added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Nov 20, 2023
@upsj upsj marked this pull request as ready for review November 20, 2023 05:04
@upsj upsj requested a review from a team November 20, 2023 05:05
@upsj upsj self-assigned this Nov 20, 2023
@MarcelKoch MarcelKoch self-requested a review November 20, 2023 07:41
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick first comments

benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Show resolved Hide resolved
)
outliers[benchmark_name] = outlier[: min(len(outlier), args.outlier_count)]

if args.output == "json":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the json output not handled by pandas?

Copy link
Member Author

@upsj upsj Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two output styles in mind: The JSON output can handle arbitrary nesting (for further detailed analysis), the tabular style (for posting summaries on Github) flattens everything. I want to preserve the nested structure for the JSON output, which is hard to represent in Pandas, since there is no orient matching this structure well.

benchmark/tools/compare.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@greole greole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick fstring suggestions. F-strings are great, helps to get rid of most str.format()

benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest issue is storing the keys to the benchmarks as tuples. This throws away too much information that is then still implicitly relied on.
Also, the output for the different formats should be unified, or perhaps it is best to only output json.

benchmark/tools/compare.py Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Show resolved Hide resolved
Copy link
Collaborator

@greole greole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more minor things

benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
benchmark/tools/compare.py Outdated Show resolved Hide resolved
default=1000,
help="How many outliers should be reported per benchmark",
)
parser.add_argument("--output", choices=["json", "csv", "markdown"], default="json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just a single option --short that switches between markdown (on) and json (off) would be better suited. That would clarify that the output depends more strongly on the format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to also support CSV for follow-up analysis

@upsj upsj requested review from MarcelKoch and greole November 20, 2023 22:25
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Are the test run automatically or manually?

benchmark/tools/compare.py Outdated Show resolved Hide resolved
@upsj
Copy link
Member Author

upsj commented Nov 21, 2023

@MarcelKoch currently only manually, I wasn't sure about the directory layout and common practices w.r.t. available packages, so I wanted to collect some feedback first.

@yhmtsai
Copy link
Member

yhmtsai commented Nov 21, 2023

I do not look into the detail but I feel it is more suitable in ginkgo-data repo

benchmark/tools/compare.py Outdated Show resolved Hide resolved
@upsj
Copy link
Member Author

upsj commented Nov 21, 2023

@yhmtsai I am building this for development purposes, modeled after what nvbench_compare.py provides for nvbench, I don't think this is specific to ginkgo-data.

@yhmtsai
Copy link
Member

yhmtsai commented Nov 21, 2023

It somehow post-processes the data generated by benchmark, right? Its purpose is like the script for aggregating the json or plotting spmv performance comparison in ginkgo-data.

@upsj
Copy link
Member Author

upsj commented Nov 22, 2023

An argument for putting this into ginkgo directly is that it is likely to be used directly in conjunction with the benchmarks, while it is unlikely you will need to check out ginkgo-data when optimizing something, because it is more stable to run the baseline manually instead of relying on archived values.

Copy link
Collaborator

@greole greole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@upsj upsj force-pushed the benchmark_compare branch from 6f4ac2a to 47d3086 Compare November 27, 2023 13:49
@upsj upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Nov 27, 2023
@upsj upsj force-pushed the benchmark_compare branch 2 times, most recently from da5c309 to 6efc350 Compare November 27, 2023 13:52
@upsj upsj force-pushed the benchmark_compare branch from 6efc350 to c26f69f Compare November 27, 2023 14:05
@upsj upsj merged commit e087f87 into develop Nov 27, 2023
8 of 13 checks passed
@upsj upsj deleted the benchmark_compare branch November 27, 2023 14:12
@ginkgo-bot
Copy link
Member

Error: PR already merged!

Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. reg:benchmarking This is related to benchmarking.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants