Adds bioconda recipe conversion integration test #15

schuylermartin45 · 2024-03-27T22:54:50Z

All bioconda recipes (as of 2024-03-27) has been copied into this project for the purposes of integration testing the conversion work with rattler-build
This will be a great way to test both the recipe conversion AND determine if the other integration tests can scale to a few thousand recipes
Hopefully the bioconda folks won't mind me doing this...

UPDATE: Half of the bioconda recipes are currently not being evaluated. I have raised #16 to figure out why. For now, the files are included but the tests are disabled.

- All bioconda recipes (as of 2024-03-27) has been copied into this project for the purposes of integration testing the conversion work with rattler-build - This will be a great way to test both the recipe conversion AND determine if the other integration tests can scale to a few thousand recipes - Hopefully the bioconda folks won't mind me doing this...

- This should allow each independent test to control their passing thresholds - This will allow us to more flexibly control passing states over time as the project improves

- Broke up the bioconda test into 4 parts...giving the 16 cores of parallelism I desire. - This may be a short-term hack to speed up the integration tests

- From experience, GitHub seems to be struggling to maintain all the logging information we are dumping to the console. - To mitigate this, we suppress dumping all the failed recipe file names to the console. This should dramatically shorten the amount of text being buffered. - This is unfortunate for the time being, but it should also make it easier to navigate the output of `convert` and `rattler-bulk-build`

- Tracking timeouts now use the `subshell.run()` timeout parameter instead of attempting to use some UNIX signal solution from StackOverflow

- Exit codes are now stored as ints, there is no way to predict what rattler-build my return to us - Tweaks to minimum test passing metrics

- I can't figure out why these integration tests cause so many issues for the GitHub runner, so it'll be a story for another time

Work is based on the branch used for #15 - Adds integration test case for conda-forge - Adds new `scripts` directory for developer helper scripts - Adds `randomly_select_recipes.py` utility that allows developers to randomly select `n` recipes from a GitHub organization hosting publicly accessible feedstock repositories

baszalmstra · 2024-04-02T10:31:02Z

Instead of adding all these files to this repository (making it huge), maybe you could dynamically fetch (fetch them in github actions) or use a git submodule?

schuylermartin45 · 2024-04-02T12:15:23Z

I want the integration tests to be snapshots and stable. In principle, the tests should only fail IFF there is an issue with the conversion work or rattler-build. The success metrics per test are expected to increase over time as compatibility improves.

Fetching individual files would mean a ton of HTTP requests are made per run at this scale AND I'd have to mark each commit hash I'd want to fetch to ensure the files do not change between tests. That doesn't sound very manageable at this scale.

Submodules have the same fetch issue don't they? Not to mention I'd have to fetch the whole feedstock repo at once and not just the one file I am interested in? In my testing a few thousand recipes only take a few dozen megabytes of disk space, whereas a few thousand feedstock repositories take up potentially gigabytes of disk, and it's potentially unbounded.

schuylermartin45 · 2024-04-02T12:17:56Z

I've also considered putting the integration test files in a separate repo to checkout in the workflow, but that effectively has the same network fetch issue, albeit a bit easier to manage.

schuylermartin45 · 2024-04-02T12:23:42Z

On top of all this, I generally try to avoid the network in my automated testing. It adds a layer of uncertainty and another point of failure outside of the domain of the work being tested.

baszalmstra · 2024-04-02T13:12:38Z

I understand where you are coming from. 👍

I do feel like this adds a lot of bloat to the repository. I would personally use a submodule which in my eyes is at the same level harmful as the git repository itself in terms of netwerk access. But I do understand that you want to have more control over the files you include.

But disregard my comment, having these integration tests is awesome. :)

schuylermartin45 · 2024-04-02T13:40:34Z

No I'm glad you brought it up, I have had some of these thoughts/conversations with others internally.

It sucks to have thousands of files in a repo but I don't think I've seen/can think of an alternative that manages the pros/cons better than this.

jezdez · 2024-04-03T11:32:15Z

@schuylermartin45 This is a no-go, let's use a git submodule for this instead of copying over those files.

schuylermartin45 · 2024-04-05T16:53:41Z

Let's talk on early next week. I have some other ideas that I think are best discussed over the phone.

schuylermartin45 · 2024-04-08T17:44:58Z

I'm going to be closing this PR and it's sister draft PR. We'll be moving the integration test files into it's own separate repo to checkout/cache in the CI workflow. So that'll be 1 potential network call over having to manage thousands, with no need to mess with submodules.

In the mean time, I will be cleaning up the branch from the draft PR and re-submit it as a new PR with the new scripts, but no test files.

Work is based on the branch used for #15 - Adds integration test case for conda-forge - Adds new `scripts` directory for developer helper scripts - Adds `randomly_select_recipes.py` utility that allows developers to randomly select `n` recipes from a GitHub organization hosting publicly accessible feedstock repositories

* Integration test build matrix is now parameterized - This should allow each independent test to control their passing thresholds - This will allow us to more flexibly control passing states over time as the project improves * Disables fail-fast flag for testing matrix * Sets a lower passing threshold for bioconda tests to start * Truncates logs when scripts run from a GitHub Workflow - From experience, GitHub seems to be struggling to maintain all the logging information we are dumping to the console. - To mitigate this, we suppress dumping all the failed recipe file names to the console. This should dramatically shorten the amount of text being buffered. - This is unfortunate for the time being, but it should also make it easier to navigate the output of `convert` and `rattler-bulk-build` * Experimental timeout mechanism * Improves timeout mechanism - Tracking timeouts now use the `subshell.run()` timeout parameter instead of attempting to use some UNIX signal solution from StackOverflow * Reduced timeout * Removes `ExitCode` enum - Exit codes are now stored as ints, there is no way to predict what rattler-build my return to us - Tweaks to minimum test passing metrics * Adds timeout, disables bioconda_03 and 04 - I can't figure out why these integration tests cause so many issues for the GitHub runner, so it'll be a story for another time * Fixes disabling tests * Fixes minor typo * Starts work on conda-forge integration test Work is based on the branch used for #15 - Adds integration test case for conda-forge - Adds new `scripts` directory for developer helper scripts - Adds `randomly_select_recipes.py` utility that allows developers to randomly select `n` recipes from a GitHub organization hosting publicly accessible feedstock repositories * Fixes issue with parsing raw bytes from the GET request * Bumps CI minimum scores * Test data now pulls from `conda-recipe-manager-test-data` - Integration tests now pull data from the test data repo using the sparse option in the checkout action. * Fixes typos

schuylermartin45 added 11 commits March 27, 2024 16:51

Integration test build matrix is now parameterized

0b5440f

- This should allow each independent test to control their passing thresholds - This will allow us to more flexibly control passing states over time as the project improves

Mrs. Puff, I think I cheated

db216d0

- Broke up the bioconda test into 4 parts...giving the 16 cores of parallelism I desire. - This may be a short-term hack to speed up the integration tests

Disables fail-fast flag for testing matrix

15c8c24

Sets a lower passing threshold for bioconda tests to start

fa944a5

Experimental timeout mechanism

d7da90a

Improves timeout mechanism

4faa136

- Tracking timeouts now use the `subshell.run()` timeout parameter instead of attempting to use some UNIX signal solution from StackOverflow

Reduced timeout

869653d

Removes ExitCode enum

ccc2978

- Exit codes are now stored as ints, there is no way to predict what rattler-build my return to us - Tweaks to minimum test passing metrics

Adds timeout, disables bioconda_03 and 04

deaecf7

- I can't figure out why these integration tests cause so many issues for the GitHub runner, so it'll be a story for another time

schuylermartin45 mentioned this pull request Mar 29, 2024

bioconda_recipes_03 and bioconda_recipes_04 fail and don't complete #16

Closed

schuylermartin45 added 2 commits March 29, 2024 10:34

Fixes disabling tests

339fe57

Fixes minor typo

83291f7

schuylermartin45 requested a review from a team March 29, 2024 16:50

schuylermartin45 marked this pull request as ready for review March 29, 2024 16:51

schuylermartin45 requested a review from jezdez March 29, 2024 16:51

schuylermartin45 mentioned this pull request Apr 1, 2024

conda-forge Integration Tests #17

Closed

jezdez closed this Apr 3, 2024

schuylermartin45 reopened this Apr 5, 2024

schuylermartin45 closed this Apr 8, 2024

schuylermartin45 deleted the smartin_bioconda_recipes_integration_test branch April 8, 2024 17:45

schuylermartin45 mentioned this pull request Apr 8, 2024

Integration Testing, Round 2 #26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds bioconda recipe conversion integration test #15

Adds bioconda recipe conversion integration test #15

schuylermartin45 commented Mar 27, 2024 •

edited

Loading

baszalmstra commented Apr 2, 2024 •

edited

Loading

schuylermartin45 commented Apr 2, 2024

schuylermartin45 commented Apr 2, 2024

schuylermartin45 commented Apr 2, 2024

baszalmstra commented Apr 2, 2024

schuylermartin45 commented Apr 2, 2024

jezdez commented Apr 3, 2024

schuylermartin45 commented Apr 5, 2024

schuylermartin45 commented Apr 8, 2024

Adds bioconda recipe conversion integration test #15

Adds bioconda recipe conversion integration test #15

Conversation

schuylermartin45 commented Mar 27, 2024 • edited Loading

baszalmstra commented Apr 2, 2024 • edited Loading

schuylermartin45 commented Apr 2, 2024

schuylermartin45 commented Apr 2, 2024

schuylermartin45 commented Apr 2, 2024

baszalmstra commented Apr 2, 2024

schuylermartin45 commented Apr 2, 2024

jezdez commented Apr 3, 2024

schuylermartin45 commented Apr 5, 2024

schuylermartin45 commented Apr 8, 2024

schuylermartin45 commented Mar 27, 2024 •

edited

Loading

baszalmstra commented Apr 2, 2024 •

edited

Loading