Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to DynamicPPLBenchmarks #346

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from
Draft

Conversation

torfjelde
Copy link
Member

Produces results such as can be seen here: #309 (comment)

@torfjelde torfjelde marked this pull request as draft December 3, 2021 00:43
benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved
benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved
benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved
benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved
benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved
benchmarks/src/tables.jl Outdated Show resolved Hide resolved
benchmarks/src/tables.jl Outdated Show resolved Hide resolved
benchmarks/src/tables.jl Outdated Show resolved Hide resolved
benchmarks/src/tables.jl Outdated Show resolved Hide resolved
benchmarks/src/tables.jl Outdated Show resolved Hide resolved
@yebai
Copy link
Member

yebai commented Dec 16, 2021

This might be helpful for running benchmarks via CI - https://github.com/tkf/BenchmarkCI.jl

@yebai
Copy link
Member

yebai commented Aug 29, 2022

@torfjelde should we improve this PR by incorporating TuringBenchmarks ? Alternatively, we can move all benchmarking code here into TuringBenchmarks . I am happy with both cases, but ideally, these benchmarking utilities should live in only one place to minimise confusion.

Also, https://github.com/TuringLang/TuringExamples contains some very old benchmarking code.

cc @xukai92 @devmotion

benchmarks/benchmark_body.jmd Outdated Show resolved Hide resolved
benchmarks/benchmark_body.jmd Outdated Show resolved Hide resolved
benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved
benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved
benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved
benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved
benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved
benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved
benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved
benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Feb 2, 2023

Pull Request Test Coverage Report for Build 5458519079

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 76.408%

Totals Coverage Status
Change from base Build 5358326778: 0.0%
Covered Lines: 1927
Relevant Lines: 2522

💛 - Coveralls

@codecov
Copy link

codecov bot commented Feb 2, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (e6dd4ef) 76.40% compared to head (c867ae8) 76.40%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #346   +/-   ##
=======================================
  Coverage   76.40%   76.40%           
=======================================
  Files          21       21           
  Lines        2522     2522           
=======================================
  Hits         1927     1927           
  Misses        595      595           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

yebai and others added 3 commits February 2, 2023 22:39
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@yebai
Copy link
Member

yebai commented Sep 16, 2024

We could implement a setup similar to EnzymeAD/Reactant.jl#105 (comment)

@shravanngoswamii
Copy link
Member

I will look into this soon!

@torfjelde
Copy link
Member Author

I think there are few different things we need to address:

  • How to set up the benchmarks for a given Model. This is already taken care of in TuringBenchmarking.jl; if anything is missing, we should just contribute to that, since this is also useful for end-users.
  • How do we track and compare benchmarks across versions?
  • How do we present in the information? Do we use Weave docs like in this PR or do we just present stuff in a table?
  • Which models should we benchmark?
  • Should the benchmarking be part of the CI? If so, how should this be triggered? How do we get compute for this (we can't just use a standard GH action for this but will need our "own" server to run this on)?

IMO, the CI stuff is not really that crucial. The most important things are a) choose a suite of models that answers all the questions we want, e.g. how does changes we make affect different impls of a model, how is scaling wrt. number of parameters affacted, how are compilation times affect, etc., and b) what's the output format for all of this.

@torfjelde
Copy link
Member Author

How do we present in the information? Do we use Weave docs like in this PR or do we just present stuff in a table?

Some further notes on this. IMO we're mainly interested in a few different "experiments". We don't want to be testing every model out there, and so there are things we want to "answer" with our benchmarks.

As a result, I'm leaning more towards a Weave approach with each notebook containing answering a distinct question, e.g. "how does the model scale with number of observations", which subsequently produces outputs that can be compared across versions somehow. That is, I think the overall approach taken in this PR is "correct", but we need to make it much nicer + update how the benchmarks are performed.

But then the question is: what are the "questions" we want to answer. Here's few I can think of:

  1. How does performance vary across implementations, going from "everything uses for-loops" to "everything is vectorized"?
  2. How does both runtime performance and compilation times scale wrt. number of parameters and observations?

@shravanngoswamii
Copy link
Member

shravanngoswamii commented Oct 19, 2024

How do we track and compare benchmarks across versions?

We can store html of benchmarks.md with some setup of different versions in gh-pages and serve it on /benchmarks

How do we present in the information? Do we use Weave docs like in this PR or do we just present stuff in a table?

Weave approach looks fine as each notebook could address a specific questions!

Should the benchmarking be part of the CI? If so, how should this be triggered? How do we get compute for this (we can't just use a standard GH action for this but will need our "own" server to run this on)?

It took a lot of time to run benchmarks from this PR locally, so I guess GH action is not preferred for this!

Let me know what to do next, I will proceed as you say!

@shravanngoswamii
Copy link
Member

Might want to look at https://github.com/JasonPekos/TuringPosteriorDB.jl.

I have looked into this, there are many models, we must figure out which ones to benchmark.

@yebai
Copy link
Member

yebai commented Dec 16, 2024

@shravanngoswamii can you run all models in https://github.com/JasonPekos/TuringPosteriorDB.jl and provide an output like: https://nsiccha.github.io/StanBlocks.jl/performance.html#visualization?

Let's create a qmd notebook for this benchmarking that is easy to run on CI and local machines.

EDIT: a first step is to

  • cleanup this PR
  • setup CI to run the jmd scripts and push output to the gh-pages branch
  • merge this PR as "unit benchmarking" for DynamicPPL models

After this is done, start a new PR, work on adding TuringPosteriorDB as an additional set of benchmarks

@shravanngoswamii
Copy link
Member

shravanngoswamii commented Dec 25, 2024

setup CI to run the jmd scripts and push output to the gh-pages branch

I don't think we can run this in GHA, it takes too much time to run, so how are we expecting to run individual models? And can you give me the rough idea of what are we expecting from DynamicPPL benchmarking PR as of now?

Can we pick some particular models that can run on GH Action? And if we are going with JMD Weave approach, then I guess we will use many JMD scripts in future...

What parameters should be kept in benchmarks, and do you have any particular format in which we should display benchmark results?

I am working on this kind of stuff for the first time, so I guess I am taking too much time to understand even simple things! Really sorry for it!

@yebai
Copy link
Member

yebai commented Jan 6, 2025

Ideally, we cherry-pick a suitable set of benchmarks that could run on Github CI. Let's consider replacing jmd files with Julia scripts. We could use PrettyTables.jl's to produce readable Github comments.

More expensive benchmarks could be transferred into a separate script which we can run on private machines if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants