-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up identify_variables
/ identify_mutable_parameters
; deprecate SimpleExpressionVisitor
#3436
base: main
Are you sure you want to change the base?
Conversation
identify_variables
/ identify_mutable_parameters
; deprecate SImpleExpressionVisitor
identify_variables
/ identify_mutable_parameters
; deprecate SimpleExpressionVisitor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiniest of changes
Co-authored-by: Miranda Mundt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance comparison on my motivating example for the previous identify_variables
rewrite:
Main
Identifier ncalls cumtime percall %
----------------------------------------------------------------------------------
root 1 2.905 2.905 100.0
-----------------------------------------------------------------------------
full model post-solve 1 0.484 0.484 16.7
solve-scc 1 2.421 2.421 83.3
--------------------------------------------------------
igraph 1 0.359 0.359 14.8
scc-subsolver 546 1.610 0.003 66.5
vars-from-components 546 0.187 0.000 7.7
other n/a 0.264 n/a 10.9
========================================================
other n/a 0.000 n/a 0.0
=============================================================================
==================================================================================
This branch
Identifier ncalls cumtime percall %
----------------------------------------------------------------------------------
root 1 2.960 2.960 100.0
-----------------------------------------------------------------------------
full model post-solve 1 0.482 0.482 16.3
solve-scc 1 2.478 2.478 83.7
--------------------------------------------------------
igraph 1 0.358 0.358 14.4
scc-subsolver 546 1.536 0.003 62.0
vars-from-components 546 0.326 0.001 13.1
other n/a 0.259 n/a 10.4
========================================================
other n/a 0.000 n/a 0.0
=============================================================================
==================================================================================
Less than a factor of 2 overhead (see the vars-from-components
category) and still much better than the 2s this was taking before exploiting named expressions. Thanks for fixing this.
# The following attributes will be added by initializeWalker: | ||
# self._objs: the list of found objects | ||
# self._seen: set(self._objs) | ||
# self._exprs: list of (e, e.expr) for any (nested) named expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we need to store e.expr
in addition to e
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we need to store the immutable expression object in case e
changes what expression it contains.
if self._cache is None: | ||
return True, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check my understanding: This means that if named_expression_cache
was not provided in __init__
, we won't exploit repeated named expressions within this expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. My assumption is that the same named expression rarely appears twice in a single expression. So, if you don't provide a cache, then there isn't a big win for defining a cache -- there probably won't be a cache hit (and there is the overhead of creating the cache and throwing it away).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably a good assumption in general, but I know that for some IDAES models (at least the autothermal reformer), there is a significant benefit to exploiting named expressions within a single constraint. (Of course the user can always do this by explicitly passing the named expression cache.)
v = identify_variables.visitor | ||
save = v._include_fixed, v._cache | ||
try: | ||
v._include_fixed = include_fixed | ||
v._cache = named_expression_cache | ||
yield from v.walk_expression(expr) | ||
finally: | ||
v._include_fixed, v._cache = save | ||
|
||
|
||
identify_variables.visitor = IdentifyVariableVisitor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a global visitor instead of a new one every time identify_variables
is called? Is there a significant overhead to __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of overhead in object creation and disposal. In other cases, I have seen performance benefits by keeping the visitor around between calls. I did not profile the impact here, so maybe this is a red herring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen this in AMPLRepnVisitor
, so I'm not too surprised, but it hasn't shown up in any of my profiles.
@Robbybp: I am surprised that the overhead went up. can you share your test with me (off-line is fine)? I wonder if we are measuring something else (like the GC)? |
@jsiirola Run this script: https://github.com/Robbybp/surrogate-vs-implicit/blob/main/svi/auto_thermal_reformer/fullspace_flowsheet.py I insert the timing calls into |
We are waiting to merge this until we double check the performance degradation @Robbybp noted |
FWIW, the evidence I presented for a performance degradation is pretty flimsy (~0.15s diff), so I wouldn't be opposed to just merging this as-is, given the bug fixes. |
Fixes # .
Summary/Motivation:
This cleans up the implementation of
identify_variables
to simplify the implementation, improve efficiency, and improve robustness of the named expression cache:In addition:
identify_mutables_parameters
is moved to build onidentify_variables
(by deriving from the IdentifyVariablesVisitor`)identify_components
to use theStreamBasedExpressionVisitor
This allows us to (finally) deprecate the
SimpleExpressionVisitor
and remove it from the documentationChanges proposed in this PR:
Legal Acknowledgement
By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution: