-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmark comparison script #1467
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1467 +/- ##
===========================================
- Coverage 90.98% 89.31% -1.68%
===========================================
Files 688 688
Lines 56120 56335 +215
===========================================
- Hits 51063 50314 -749
- Misses 5057 6021 +964 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some quick first comments
benchmark/tools/compare.py
Outdated
) | ||
outliers[benchmark_name] = outlier[: min(len(outlier), args.outlier_count)] | ||
|
||
if args.output == "json": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the json output not handled by pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two output styles in mind: The JSON output can handle arbitrary nesting (for further detailed analysis), the tabular style (for posting summaries on Github) flattens everything. I want to preserve the nested structure for the JSON output, which is hard to represent in Pandas, since there is no orient
matching this structure well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick fstring suggestions. F-strings are great, helps to get rid of most str.format()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest issue is storing the keys to the benchmarks as tuples. This throws away too much information that is then still implicitly relied on.
Also, the output for the different formats should be unified, or perhaps it is best to only output json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more minor things
benchmark/tools/compare.py
Outdated
default=1000, | ||
help="How many outliers should be reported per benchmark", | ||
) | ||
parser.add_argument("--output", choices=["json", "csv", "markdown"], default="json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just a single option --short
that switches between markdown (on) and json (off) would be better suited. That would clarify that the output depends more strongly on the format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to also support CSV for follow-up analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Are the test run automatically or manually?
@MarcelKoch currently only manually, I wasn't sure about the directory layout and common practices w.r.t. available packages, so I wanted to collect some feedback first. |
I do not look into the detail but I feel it is more suitable in ginkgo-data repo |
@yhmtsai I am building this for development purposes, modeled after what nvbench_compare.py provides for nvbench, I don't think this is specific to ginkgo-data. |
It somehow post-processes the data generated by benchmark, right? Its purpose is like the script for aggregating the json or plotting spmv performance comparison in ginkgo-data. |
An argument for putting this into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
6f4ac2a
to
47d3086
Compare
da5c309
to
6efc350
Compare
- simplify things - add type annotations - fix outlier issues - test everything Co-authored-by: Gregor Olenik <[email protected]> Co-authored-by: Marcel Koch <[email protected]>
6efc350
to
c26f69f
Compare
Error: PR already merged! |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17. |
This script allows us to easily compare runtimes, storage and iteration counts between different benchmark runs on the same input.
TODO