Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up identify_variables / identify_mutable_parameters; deprecate SimpleExpressionVisitor #3436

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jsiirola
Copy link
Member

Fixes # .

Summary/Motivation:

This cleans up the implementation of identify_variables to simplify the implementation, improve efficiency, and improve robustness of the named expression cache:

  • Reduce redundancy in the implementation
  • Avoid allocating data structures for caching named_expressions unless the cache is defined and a named expression is actually encountered
  • Improve the cache robustness, including robustly handling cache invalidation if the named expressions have changed.

In addition:

  • identify_mutables_parameters is moved to build on identify_variables (by deriving from the IdentifyVariablesVisitor`)
  • switch identify_components to use the StreamBasedExpressionVisitor

This allows us to (finally) deprecate the SimpleExpressionVisitor and remove it from the documentation

Changes proposed in this PR:

  • (see above)

Legal Acknowledgement

By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the BSD license.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

@jsiirola jsiirola changed the title Clean up identify_variables / identify_mutable_parameters; deprecate SImpleExpressionVisitor Clean up identify_variables / identify_mutable_parameters; deprecate SimpleExpressionVisitor Nov 26, 2024
Copy link
Contributor

@mrmundt mrmundt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiniest of changes

Co-authored-by: Miranda Mundt <[email protected]>
Copy link
Contributor

@Robbybp Robbybp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance comparison on my motivating example for the previous identify_variables rewrite:

Main

Identifier                                       ncalls   cumtime   percall      %
----------------------------------------------------------------------------------
root                                                  1     2.905     2.905  100.0
     -----------------------------------------------------------------------------
     full model post-solve                            1     0.484     0.484   16.7
     solve-scc                                        1     2.421     2.421   83.3
                          --------------------------------------------------------
                          igraph                      1     0.359     0.359   14.8
                          scc-subsolver             546     1.610     0.003   66.5
                          vars-from-components      546     0.187     0.000    7.7
                          other                     n/a     0.264       n/a   10.9
                          ========================================================
     other                                          n/a     0.000       n/a    0.0
     =============================================================================
==================================================================================

This branch

Identifier                                       ncalls   cumtime   percall      %
----------------------------------------------------------------------------------
root                                                  1     2.960     2.960  100.0
     -----------------------------------------------------------------------------
     full model post-solve                            1     0.482     0.482   16.3
     solve-scc                                        1     2.478     2.478   83.7
                          --------------------------------------------------------
                          igraph                      1     0.358     0.358   14.4
                          scc-subsolver             546     1.536     0.003   62.0
                          vars-from-components      546     0.326     0.001   13.1
                          other                     n/a     0.259       n/a   10.4
                          ========================================================
     other                                          n/a     0.000       n/a    0.0
     =============================================================================
==================================================================================

Less than a factor of 2 overhead (see the vars-from-components category) and still much better than the 2s this was taking before exploiting named expressions. Thanks for fixing this.

# The following attributes will be added by initializeWalker:
# self._objs: the list of found objects
# self._seen: set(self._objs)
# self._exprs: list of (e, e.expr) for any (nested) named expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we need to store e.expr in addition to e?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we need to store the immutable expression object in case e changes what expression it contains.

Comment on lines +1470 to +1471
if self._cache is None:
return True, None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check my understanding: This means that if named_expression_cache was not provided in __init__, we won't exploit repeated named expressions within this expression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. My assumption is that the same named expression rarely appears twice in a single expression. So, if you don't provide a cache, then there isn't a big win for defining a cache -- there probably won't be a cache hit (and there is the overhead of creating the cache and throwing it away).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably a good assumption in general, but I know that for some IDAES models (at least the autothermal reformer), there is a significant benefit to exploiting named expressions within a single constraint. (Of course the user can always do this by explicitly passing the named expression cache.)

Comment on lines +1508 to +1518
v = identify_variables.visitor
save = v._include_fixed, v._cache
try:
v._include_fixed = include_fixed
v._cache = named_expression_cache
yield from v.walk_expression(expr)
finally:
v._include_fixed, v._cache = save


identify_variables.visitor = IdentifyVariableVisitor()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a global visitor instead of a new one every time identify_variables is called? Is there a significant overhead to __init__?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit of overhead in object creation and disposal. In other cases, I have seen performance benefits by keeping the visitor around between calls. I did not profile the impact here, so maybe this is a red herring?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen this in AMPLRepnVisitor, so I'm not too surprised, but it hasn't shown up in any of my profiles.

@jsiirola
Copy link
Member Author

@Robbybp: I am surprised that the overhead went up. can you share your test with me (off-line is fine)? I wonder if we are measuring something else (like the GC)?

@Robbybp
Copy link
Contributor

Robbybp commented Dec 17, 2024

@jsiirola Run this script: https://github.com/Robbybp/surrogate-vs-implicit/blob/main/svi/auto_thermal_reformer/fullspace_flowsheet.py

I insert the timing calls into incidence_analysis.scc_solver and util.subsystems manually when I profile this, so this won't give you the detailed profile, but you should still see the runtime jump.

@blnicho
Copy link
Member

blnicho commented Jan 14, 2025

We are waiting to merge this until we double check the performance degradation @Robbybp noted

@Robbybp
Copy link
Contributor

Robbybp commented Jan 15, 2025

FWIW, the evidence I presented for a performance degradation is pretty flimsy (~0.15s diff), so I wouldn't be opposed to just merging this as-is, given the bug fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants