-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make corrXY work with multiple GEN_KWs #9426
base: main
Are you sure you want to change the base?
Conversation
b04af00
to
a147425
Compare
1c392a1
to
c44eac7
Compare
c44eac7
to
124777c
Compare
There are some failing tests though. |
124777c
to
1e656a4
Compare
d29f1c3
to
359eb5a
Compare
359eb5a
to
161940d
Compare
7eaa3e4
to
766c012
Compare
CodSpeed Performance ReportMerging #9426 will degrade performances by 11.04%Comparing Summary
Benchmarks breakdown
|
766c012
to
29ab039
Compare
29ab039
to
5ad84b8
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9426 +/- ##
==========================================
+ Coverage 82.69% 91.73% +9.03%
==========================================
Files 426 426
Lines 26559 26566 +7
==========================================
+ Hits 21964 24371 +2407
+ Misses 4595 2195 -2400
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
# Used as labels for observations in cross-correlation matrix. | ||
# Say we have two observation groups "FOPR" and "WOPR" where "FOPR" has | ||
# 2 responses and "WOPR" has 3. | ||
# In this case we create a list [FOPR_0, FOPR_1, WOPR_0, WOPR_1, WOPR_2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use the index for the observations here instead? I guess it is ok for summary observations, though perhaps a bit confusing as this number will not align with report_step
, but for GEN_OBS
which already have an int index it would probably not align with that, and could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Do I remember correctly that the index is a time-stamp or date for summary observations? We could perhaps still use it but perhaps a bit messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, was thinking a bit more about this, and should we consider doing it like we do for the snapshot in the gui showing the scaling factors when doing auto scaling? Those events are automatically written to csv by the client, so will be available to the user.
Something like:
ert/src/ert/analysis/_es_update.py
Line 239 in 7127528
AnalysisDataEvent( |
ert/src/ert/analysis/_es_update.py
Line 252 in 7127528
indexes[obs_group_mask], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting we use the second part of the string as an index?
Yes, or perhaps send the whole thing as an event instead of saving it in storage. Then everyone gets the benefit right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow the event thing. Would we then still be able to analyse the data from a Jupyter Notebook?
file_path = os.path.join(self.mount_point, "corr_XY.parquet") | ||
if os.path.exists(file_path): | ||
existing_df = pl.read_parquet(file_path) | ||
df = pl.concat([existing_df, new_df]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a bit confusing if we just merge with existing data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you find confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is something on disk, we merge with that? Couldnt that potentially be done with a different threshold, and so no longer valid? I might be misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduction
A while back we introduced a call-back to the
iterative_ensemble_smoother
to allow fetching the cross-correlation matrix between parameters and responses.This matrix was saved into ert's internal storage as
corr_XY.nc
and theload_cross_correlations
API end-point was created.The idea was to provide users with the raw data and let them analyse and visualise it as they saw fit.
This feature was not clearly communicated to our users and was therefore not used at all.
Also, it turns out that the implementation is buggy in that it overwrites existing data such that only the
GEN_KW
that was specified last in ert's config is available.Works for the poly-case, but not for configs with multiple
GEN_KW
's.Approach
Save cross correlations to a
parquet
file with the following schema:The
obs_name
is created by appending a suffix toobs_group
. The suffix is a counter that depends on the number of observations within each group.Data is stored in "tidy format" to make it easy for users to query the data however they want.
Testing using the poly-case (see
Plot_correlations.ipynb
in the poly_example test folder)Note that the poly-case has just one
GEN_KW
.Testing using the heat equation (see
Plot_correlations.ipynb
in the heat_equation test folder)To test with multiple
GEN_KW
s, I've added twoGEN_KW
s to the heat-equation, and created a Jupyter Notebook that plots correlations for all iterations. See diff for all plots. Here's an example of one of the parameters at iteration 0:Testing using Drogon
The FMU-research team has done testing using Drogon.
Here's an example I created using Drogon:
git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"'
)When applicable