Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with ert=2.21.0 when running large ensembles #367

Closed
edubarrosTNO opened this issue Mar 20, 2021 · 6 comments
Closed

Problems with ert=2.21.0 when running large ensembles #367

edubarrosTNO opened this issue Mar 20, 2021 · 6 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@edubarrosTNO
Copy link
Contributor

edubarrosTNO commented Mar 20, 2021

I was running a couple of tests last Friday (for PR's #217 and #365) and observed difficulties to complete runs for the FlowNet experiments in the Norne example (with 500 realizations and 10 ES-MDA iterations). At first I thought they were related to problems in the PR's, but then I re-run the same case with the new release in PyPI (flownet==0.5.2) and again it failed with the ERT process stopping to print to the screen iter-0 and timing out after some more iterations / simulations running on the background. I then investigated further whether this is an issue with ERT and noticed that all these FlowNet branches and release versions share ert==2.21.0 in common, while I know that I can run the same case with the previous version of FlowNet which uses ert==2.20.1. As a final test, I tried running the same case using the CI config locally and everything runs successfully (for 2 realizations and 2 ES-MDA iterations), including properly logging / printing info to screen after the completion of iter-0.

So my hypothesis is that this new release of ERT might be behaving strangely for larger ensembles (> 500 realizations). Can anyone else test this to confirm the behavior?

@edubarrosTNO edubarrosTNO added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed and removed enhancement New feature or request labels Mar 20, 2021
@wouterjdb
Copy link
Collaborator

Have you manually installed the previous release of ert and ran the same simulation?

@edubarrosTNO
Copy link
Contributor Author

Have you manually installed the previous release of ert and ran the same simulation?

Yes, I did that using the latest release version of flownet==0.5.2 and installing ert==2.20.1 manually, and then the same FlowNet experiment ran. We should check why / report that ert==2.21.0 is not behaving properly.

The problem now is that the experiment run with flownet==0.5.2 does not reproduce the same results as the experiment that was run before with flownet==0.5.0, namely: more failed simulations with 0.5.2 causing the HM to be interrupted due to not meeting the requirements of percentage of successful realizations, while with 0.5.0 this requirement would be met. But this is a separate issue from the one on the ert version, maybe more has changed in flownet in between the releases

@wouterjdb
Copy link
Collaborator

Are you running with the modified Norne model now? You ruled out any changes caused by that?

@edubarrosTNO
Copy link
Contributor Author

Yes, I used the same version of the Norne model in both experiments. The first experiment with flownet==0.5.0 was run two weeks ago, so it was the previous version of what is now in the master of flownet-testdata (which was updated last Thursday)

@edubarrosTNO
Copy link
Contributor Author

edubarrosTNO commented Mar 24, 2021

When repeating my tests with a fresh installation of flownet==0.5.0 and flownet==0.5.2, I noticed that there is a new version of ERT, ert==2.21.1. I checked this new release version (https://github.com/equinor/ert/releases/tag/2.21.1) and found out that there was a bug which is supposedly fixed now:

ert==2.21.1
Bugfix:
Don't assume singular snapshot in CLI. Fixes a problem where ERT would crash on iiteration 1 if a realization failed in iteration 0.

I will re-run my tests now and see if the behavior described in this issue is fixed.

@wouterjdb
Copy link
Collaborator

Problem solved. We can close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants