Make corrXY work with multiple GEN_KWs #9426

dafeda · 2024-12-03T11:42:25Z

Introduction

A while back we introduced a call-back to the iterative_ensemble_smoother to allow fetching the cross-correlation matrix between parameters and responses.
This matrix was saved into ert's internal storage as corr_XY.nc and the load_cross_correlations API end-point was created.
The idea was to provide users with the raw data and let them analyse and visualise it as they saw fit.
This feature was not clearly communicated to our users and was therefore not used at all.
Also, it turns out that the implementation is buggy in that it overwrites existing data such that only the GEN_KW that was specified last in ert's config is available.
Works for the poly-case, but not for configs with multiple GEN_KW's.

Approach

Save cross correlations to a parquet file with the following schema:

The obs_name is created by appending a suffix to obs_group. The suffix is a counter that depends on the number of observations within each group.
Data is stored in "tidy format" to make it easy for users to query the data however they want.

Testing using the poly-case (see `Plot_correlations.ipynb` in the poly_example test folder)

Note that the poly-case has just one GEN_KW.

Testing using the heat equation (see `Plot_correlations.ipynb` in the heat_equation test folder)

To test with multiple GEN_KWs, I've added two GEN_KWs to the heat-equation, and created a Jupyter Notebook that plots correlations for all iterations. See diff for all plots. Here's an example of one of the parameters at iteration 0:

Testing using Drogon

The FMU-research team has done testing using Drogon.

Here's an example I created using Drogon:

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"')

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Create Backport PR to latest release

xjules · 2024-12-12T08:16:43Z

There are some failing tests though.

src/ert/analysis/_es_update.py

codspeed-hq · 2025-01-03T10:46:55Z

CodSpeed Performance Report

Merging #9426 will degrade performances by 11.04%

_{Comparing dafeda:heat-equation-gen-kws (5ad84b8) with main (a6d2dba)}

Summary

❌ 1 regressions
✅ 23 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`dafeda:heat-equation-gen-kws`	Change
❌	`test_direct_dark_performance_with_storage[gen_x: 20, sum_x: 20 reals: 10-gen_data_with_obs-get_record_observations]`	2.3 ms	2.6 ms	-11.04%

codecov-commenter · 2025-01-07T08:45:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.73%. Comparing base (d82fc01) to head (5ad84b8).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9426      +/-   ##
==========================================
+ Coverage   82.69%   91.73%   +9.03%     
==========================================
  Files         426      426              
  Lines       26559    26566       +7     
==========================================
+ Hits        21964    24371    +2407     
+ Misses       4595     2195    -2400

Flag	Coverage Δ
cli-tests	`39.72% <100.00%> (?)`
everest-models-test	`34.14% <21.42%> (-0.01%)`	⬇️
gui-tests	`71.86% <85.71%> (?)`
integration-test	`38.03% <21.42%> (-0.01%)`	⬇️
performance-tests	`51.63% <28.57%> (-0.01%)`	⬇️
test	`39.40% <21.42%> (-0.01%)`	⬇️
unit-tests	`74.18% <28.57%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

oyvindeide · 2025-01-07T14:05:04Z

src/ert/analysis/_es_update.py

+    # Used as labels for observations in cross-correlation matrix.
+    # Say we have two observation groups "FOPR" and "WOPR" where "FOPR" has
+    # 2 responses and "WOPR" has 3.
+    # In this case we create a list [FOPR_0, FOPR_1, WOPR_0, WOPR_1, WOPR_2]


Would it be possible to use the index for the observations here instead? I guess it is ok for summary observations, though perhaps a bit confusing as this number will not align with report_step, but for GEN_OBS which already have an int index it would probably not align with that, and could be confusing.

Not sure. Do I remember correctly that the index is a time-stamp or date for summary observations? We could perhaps still use it but perhaps a bit messy.

Yes, was thinking a bit more about this, and should we consider doing it like we do for the snapshot in the gui showing the scaling factors when doing auto scaling? Those events are automatically written to csv by the client, so will be available to the user.

Something like:

ert/src/ert/analysis/_es_update.py

Line 239 in 7127528

AnalysisDataEvent(

, which alread uses the indices:

ert/src/ert/analysis/_es_update.py

Line 252 in 7127528

indexes[obs_group_mask],

As far as I know, these are the two types of indexes we have:

Are you suggesting we use the second part of the string as an index?

Are you suggesting we use the second part of the string as an index?

Yes, or perhaps send the whole thing as an event instead of saving it in storage. Then everyone gets the benefit right away.

Not sure I follow the event thing. Would we then still be able to analyse the data from a Jupyter Notebook?

oyvindeide · 2025-01-07T14:06:16Z

src/ert/storage/local_ensemble.py

+        file_path = os.path.join(self.mount_point, "corr_XY.parquet")
+        if os.path.exists(file_path):
+            existing_df = pl.read_parquet(file_path)
+            df = pl.concat([existing_df, new_df])


This might be a bit confusing if we just merge with existing data?

What do you find confusing?

If there is something on disk, we merge with that? Couldnt that potentially be done with a different threshold, and so no longer valid? I might be misunderstanding.

Good point. self.mount_point points to a specific ensemble. Can we run the same ensemble with different thresholds?

dafeda changed the title ~~Commitin~~ Make corrXY work with multiple GEN_KWs Dec 3, 2024

dafeda self-assigned this Dec 3, 2024

dafeda force-pushed the heat-equation-gen-kws branch 13 times, most recently from b04af00 to a147425 Compare December 5, 2024 08:10

dafeda requested a review from oyvindeide December 6, 2024 09:27

dafeda force-pushed the heat-equation-gen-kws branch 4 times, most recently from 1c392a1 to c44eac7 Compare December 11, 2024 07:52

dafeda marked this pull request as ready for review December 11, 2024 07:52

dafeda force-pushed the heat-equation-gen-kws branch from c44eac7 to 124777c Compare December 11, 2024 08:15

dafeda force-pushed the heat-equation-gen-kws branch from 124777c to 1e656a4 Compare December 12, 2024 08:29

eivindjahren added the christmas-review label Dec 13, 2024

dafeda force-pushed the heat-equation-gen-kws branch 2 times, most recently from d29f1c3 to 359eb5a Compare December 13, 2024 12:30

xjules reviewed Dec 13, 2024

View reviewed changes

src/ert/analysis/_es_update.py Show resolved Hide resolved

dafeda force-pushed the heat-equation-gen-kws branch from 359eb5a to 161940d Compare December 13, 2024 13:53

dafeda mentioned this pull request Jan 3, 2025

Return all correlation pairs equinor/iterative_ensemble_smoother#222

Merged

dafeda force-pushed the heat-equation-gen-kws branch 3 times, most recently from 7eaa3e4 to 766c012 Compare January 3, 2025 10:23

dafeda added release-notes:bug-fix Automatically categorise as bug fix in release notes and removed christmas-review labels Jan 3, 2025

dafeda force-pushed the heat-equation-gen-kws branch from 766c012 to 29ab039 Compare January 3, 2025 10:53

Return cross correlations for multiple GEN_KWs

5ad84b8

dafeda force-pushed the heat-equation-gen-kws branch from 29ab039 to 5ad84b8 Compare January 7, 2025 08:27

oyvindeide reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make corrXY work with multiple GEN_KWs #9426

Make corrXY work with multiple GEN_KWs #9426

dafeda commented Dec 3, 2024 •

edited

Loading

xjules commented Dec 12, 2024

codspeed-hq bot commented Jan 3, 2025 •

edited

Loading

codecov-commenter commented Jan 7, 2025

oyvindeide Jan 7, 2025

dafeda Jan 7, 2025

oyvindeide Jan 8, 2025

dafeda Jan 8, 2025

oyvindeide Jan 8, 2025

dafeda Jan 8, 2025

oyvindeide Jan 7, 2025

dafeda Jan 7, 2025

oyvindeide Jan 8, 2025

dafeda Jan 8, 2025

Make corrXY work with multiple GEN_KWs #9426

Are you sure you want to change the base?

Make corrXY work with multiple GEN_KWs #9426

Conversation

dafeda commented Dec 3, 2024 • edited Loading

Introduction

Approach

Testing using the poly-case (see Plot_correlations.ipynb in the poly_example test folder)

Testing using the heat equation (see Plot_correlations.ipynb in the heat_equation test folder)

Testing using Drogon

When applicable

xjules commented Dec 12, 2024

codspeed-hq bot commented Jan 3, 2025 • edited Loading

Merging #9426 will degrade performances by 11.04%

Summary

Benchmarks breakdown

codecov-commenter commented Jan 7, 2025

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dafeda commented Dec 3, 2024 •

edited

Loading

Testing using the poly-case (see `Plot_correlations.ipynb` in the poly_example test folder)

Testing using the heat equation (see `Plot_correlations.ipynb` in the heat_equation test folder)

codspeed-hq bot commented Jan 3, 2025 •

edited

Loading