Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds bioconda recipe conversion integration test #15

Closed

Conversation

schuylermartin45
Copy link
Collaborator

@schuylermartin45 schuylermartin45 commented Mar 27, 2024

  • All bioconda recipes (as of 2024-03-27) has been copied into this project for the purposes of integration testing the conversion work with rattler-build
  • This will be a great way to test both the recipe conversion AND determine if the other integration tests can scale to a few thousand recipes
  • Hopefully the bioconda folks won't mind me doing this...

UPDATE: Half of the bioconda recipes are currently not being evaluated. I have raised #16 to figure out why. For now, the files are included but the tests are disabled.

- All bioconda recipes (as of 2024-03-27) has been copied into this project
  for the purposes of integration testing the conversion work with
  rattler-build
- This will be a great way to test both the recipe conversion AND determine
  if the other integration tests can scale to a few thousand recipes
- Hopefully the bioconda folks won't mind me doing this...
- This should allow each independent test to control their passing thresholds
- This will allow us to more flexibly control passing states over time as the
  project improves
- Broke up the bioconda test into 4 parts...giving the 16 cores of parallelism
  I desire.
- This may be a short-term hack to speed up the integration tests
- From experience, GitHub seems to be struggling to maintain all the logging
  information we are dumping to the console.
- To mitigate this, we suppress dumping all the failed recipe file names to
  the console. This should dramatically shorten the amount of text being
  buffered.
- This is unfortunate for the time being, but it should also make it easier to
  navigate the output of `convert` and `rattler-bulk-build`
- Tracking timeouts now use the `subshell.run()` timeout parameter instead of
  attempting to use some UNIX signal solution from StackOverflow
- Exit codes are now stored as ints, there is no way to predict what
  rattler-build my return to us
- Tweaks to minimum test passing metrics
- I can't figure out why these integration tests cause so many issues for the
  GitHub runner, so it'll be a story for another time
@schuylermartin45 schuylermartin45 requested a review from a team March 29, 2024 16:50
@schuylermartin45 schuylermartin45 marked this pull request as ready for review March 29, 2024 16:51
@schuylermartin45 schuylermartin45 requested a review from jezdez March 29, 2024 16:51
schuylermartin45 added a commit that referenced this pull request Apr 1, 2024
Work is based on the branch used for #15
- Adds integration test case for conda-forge
- Adds new `scripts` directory for developer helper scripts
- Adds `randomly_select_recipes.py` utility that allows developers to randomly
  select `n` recipes from a GitHub organization hosting publicly accessible
  feedstock repositories
@baszalmstra
Copy link

baszalmstra commented Apr 2, 2024

Instead of adding all these files to this repository (making it huge), maybe you could dynamically fetch (fetch them in github actions) or use a git submodule?

@schuylermartin45
Copy link
Collaborator Author

I want the integration tests to be snapshots and stable. In principle, the tests should only fail IFF there is an issue with the conversion work or rattler-build. The success metrics per test are expected to increase over time as compatibility improves.

Fetching individual files would mean a ton of HTTP requests are made per run at this scale AND I'd have to mark each commit hash I'd want to fetch to ensure the files do not change between tests. That doesn't sound very manageable at this scale.

Submodules have the same fetch issue don't they? Not to mention I'd have to fetch the whole feedstock repo at once and not just the one file I am interested in? In my testing a few thousand recipes only take a few dozen megabytes of disk space, whereas a few thousand feedstock repositories take up potentially gigabytes of disk, and it's potentially unbounded.

@schuylermartin45
Copy link
Collaborator Author

I've also considered putting the integration test files in a separate repo to checkout in the workflow, but that effectively has the same network fetch issue, albeit a bit easier to manage.

@schuylermartin45
Copy link
Collaborator Author

On top of all this, I generally try to avoid the network in my automated testing. It adds a layer of uncertainty and another point of failure outside of the domain of the work being tested.

@baszalmstra
Copy link

I understand where you are coming from. 👍

I do feel like this adds a lot of bloat to the repository. I would personally use a submodule which in my eyes is at the same level harmful as the git repository itself in terms of netwerk access. But I do understand that you want to have more control over the files you include.

But disregard my comment, having these integration tests is awesome. :)

@schuylermartin45
Copy link
Collaborator Author

No I'm glad you brought it up, I have had some of these thoughts/conversations with others internally.

It sucks to have thousands of files in a repo but I don't think I've seen/can think of an alternative that manages the pros/cons better than this.

@jezdez
Copy link
Member

jezdez commented Apr 3, 2024

@schuylermartin45 This is a no-go, let's use a git submodule for this instead of copying over those files.

@jezdez jezdez closed this Apr 3, 2024
@schuylermartin45
Copy link
Collaborator Author

Let's talk on early next week. I have some other ideas that I think are best discussed over the phone.

@schuylermartin45
Copy link
Collaborator Author

I'm going to be closing this PR and it's sister draft PR. We'll be moving the integration test files into it's own separate repo to checkout/cache in the CI workflow. So that'll be 1 potential network call over having to manage thousands, with no need to mess with submodules.

In the mean time, I will be cleaning up the branch from the draft PR and re-submit it as a new PR with the new scripts, but no test files.

@schuylermartin45 schuylermartin45 deleted the smartin_bioconda_recipes_integration_test branch April 8, 2024 17:45
schuylermartin45 added a commit that referenced this pull request Apr 8, 2024
Work is based on the branch used for #15
- Adds integration test case for conda-forge
- Adds new `scripts` directory for developer helper scripts
- Adds `randomly_select_recipes.py` utility that allows developers to randomly
  select `n` recipes from a GitHub organization hosting publicly accessible
  feedstock repositories
schuylermartin45 added a commit that referenced this pull request Apr 11, 2024
Work is based on the branch used for #15
- Adds integration test case for conda-forge
- Adds new `scripts` directory for developer helper scripts
- Adds `randomly_select_recipes.py` utility that allows developers to randomly
  select `n` recipes from a GitHub organization hosting publicly accessible
  feedstock repositories
schuylermartin45 added a commit that referenced this pull request Apr 12, 2024
* Integration test build matrix is now parameterized

- This should allow each independent test to control their passing thresholds
- This will allow us to more flexibly control passing states over time as the
  project improves

* Disables fail-fast flag for testing matrix

* Sets a lower passing threshold for bioconda tests to start

* Truncates logs when scripts run from a GitHub Workflow

- From experience, GitHub seems to be struggling to maintain all the logging
  information we are dumping to the console.
- To mitigate this, we suppress dumping all the failed recipe file names to
  the console. This should dramatically shorten the amount of text being
  buffered.
- This is unfortunate for the time being, but it should also make it easier to
  navigate the output of `convert` and `rattler-bulk-build`

* Experimental timeout mechanism

* Improves timeout mechanism

- Tracking timeouts now use the `subshell.run()` timeout parameter instead of
  attempting to use some UNIX signal solution from StackOverflow

* Reduced timeout

* Removes `ExitCode` enum

- Exit codes are now stored as ints, there is no way to predict what
  rattler-build my return to us
- Tweaks to minimum test passing metrics

* Adds timeout, disables bioconda_03 and 04

- I can't figure out why these integration tests cause so many issues for the
  GitHub runner, so it'll be a story for another time

* Fixes disabling tests

* Fixes minor typo

* Starts work on conda-forge integration test

Work is based on the branch used for #15
- Adds integration test case for conda-forge
- Adds new `scripts` directory for developer helper scripts
- Adds `randomly_select_recipes.py` utility that allows developers to randomly
  select `n` recipes from a GitHub organization hosting publicly accessible
  feedstock repositories

* Fixes issue with parsing raw bytes from the GET request

* Bumps CI minimum scores

* Test data now pulls from `conda-recipe-manager-test-data`

- Integration tests now pull data from the test data repo using the sparse
  option in the checkout action.

* Fixes typos
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants