-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds bioconda recipe conversion integration test #15
Conversation
- All bioconda recipes (as of 2024-03-27) has been copied into this project for the purposes of integration testing the conversion work with rattler-build - This will be a great way to test both the recipe conversion AND determine if the other integration tests can scale to a few thousand recipes - Hopefully the bioconda folks won't mind me doing this...
- This should allow each independent test to control their passing thresholds - This will allow us to more flexibly control passing states over time as the project improves
- Broke up the bioconda test into 4 parts...giving the 16 cores of parallelism I desire. - This may be a short-term hack to speed up the integration tests
- From experience, GitHub seems to be struggling to maintain all the logging information we are dumping to the console. - To mitigate this, we suppress dumping all the failed recipe file names to the console. This should dramatically shorten the amount of text being buffered. - This is unfortunate for the time being, but it should also make it easier to navigate the output of `convert` and `rattler-bulk-build`
- Tracking timeouts now use the `subshell.run()` timeout parameter instead of attempting to use some UNIX signal solution from StackOverflow
- Exit codes are now stored as ints, there is no way to predict what rattler-build my return to us - Tweaks to minimum test passing metrics
- I can't figure out why these integration tests cause so many issues for the GitHub runner, so it'll be a story for another time
Work is based on the branch used for #15 - Adds integration test case for conda-forge - Adds new `scripts` directory for developer helper scripts - Adds `randomly_select_recipes.py` utility that allows developers to randomly select `n` recipes from a GitHub organization hosting publicly accessible feedstock repositories
Instead of adding all these files to this repository (making it huge), maybe you could dynamically fetch (fetch them in github actions) or use a git submodule? |
I want the integration tests to be snapshots and stable. In principle, the tests should only fail IFF there is an issue with the conversion work or rattler-build. The success metrics per test are expected to increase over time as compatibility improves. Fetching individual files would mean a ton of HTTP requests are made per run at this scale AND I'd have to mark each commit hash I'd want to fetch to ensure the files do not change between tests. That doesn't sound very manageable at this scale. Submodules have the same fetch issue don't they? Not to mention I'd have to fetch the whole feedstock repo at once and not just the one file I am interested in? In my testing a few thousand recipes only take a few dozen megabytes of disk space, whereas a few thousand feedstock repositories take up potentially gigabytes of disk, and it's potentially unbounded. |
I've also considered putting the integration test files in a separate repo to checkout in the workflow, but that effectively has the same network fetch issue, albeit a bit easier to manage. |
On top of all this, I generally try to avoid the network in my automated testing. It adds a layer of uncertainty and another point of failure outside of the domain of the work being tested. |
I understand where you are coming from. 👍 I do feel like this adds a lot of bloat to the repository. I would personally use a submodule which in my eyes is at the same level harmful as the git repository itself in terms of netwerk access. But I do understand that you want to have more control over the files you include. But disregard my comment, having these integration tests is awesome. :) |
No I'm glad you brought it up, I have had some of these thoughts/conversations with others internally. It sucks to have thousands of files in a repo but I don't think I've seen/can think of an alternative that manages the pros/cons better than this. |
@schuylermartin45 This is a no-go, let's use a git submodule for this instead of copying over those files. |
Let's talk on early next week. I have some other ideas that I think are best discussed over the phone. |
I'm going to be closing this PR and it's sister draft PR. We'll be moving the integration test files into it's own separate repo to checkout/cache in the CI workflow. So that'll be 1 potential network call over having to manage thousands, with no need to mess with submodules. In the mean time, I will be cleaning up the branch from the draft PR and re-submit it as a new PR with the new scripts, but no test files. |
Work is based on the branch used for #15 - Adds integration test case for conda-forge - Adds new `scripts` directory for developer helper scripts - Adds `randomly_select_recipes.py` utility that allows developers to randomly select `n` recipes from a GitHub organization hosting publicly accessible feedstock repositories
Work is based on the branch used for #15 - Adds integration test case for conda-forge - Adds new `scripts` directory for developer helper scripts - Adds `randomly_select_recipes.py` utility that allows developers to randomly select `n` recipes from a GitHub organization hosting publicly accessible feedstock repositories
* Integration test build matrix is now parameterized - This should allow each independent test to control their passing thresholds - This will allow us to more flexibly control passing states over time as the project improves * Disables fail-fast flag for testing matrix * Sets a lower passing threshold for bioconda tests to start * Truncates logs when scripts run from a GitHub Workflow - From experience, GitHub seems to be struggling to maintain all the logging information we are dumping to the console. - To mitigate this, we suppress dumping all the failed recipe file names to the console. This should dramatically shorten the amount of text being buffered. - This is unfortunate for the time being, but it should also make it easier to navigate the output of `convert` and `rattler-bulk-build` * Experimental timeout mechanism * Improves timeout mechanism - Tracking timeouts now use the `subshell.run()` timeout parameter instead of attempting to use some UNIX signal solution from StackOverflow * Reduced timeout * Removes `ExitCode` enum - Exit codes are now stored as ints, there is no way to predict what rattler-build my return to us - Tweaks to minimum test passing metrics * Adds timeout, disables bioconda_03 and 04 - I can't figure out why these integration tests cause so many issues for the GitHub runner, so it'll be a story for another time * Fixes disabling tests * Fixes minor typo * Starts work on conda-forge integration test Work is based on the branch used for #15 - Adds integration test case for conda-forge - Adds new `scripts` directory for developer helper scripts - Adds `randomly_select_recipes.py` utility that allows developers to randomly select `n` recipes from a GitHub organization hosting publicly accessible feedstock repositories * Fixes issue with parsing raw bytes from the GET request * Bumps CI minimum scores * Test data now pulls from `conda-recipe-manager-test-data` - Integration tests now pull data from the test data repo using the sparse option in the checkout action. * Fixes typos
UPDATE: Half of the bioconda recipes are currently not being evaluated. I have raised #16 to figure out why. For now, the files are included but the tests are disabled.