You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gym interface for the dynamic system can benefit from the following improvements:
standardize state-space model gym interface. For example, the gym observations should be the [x, d, y]. We don't distinguish observable states and unobservable states in this design. User can add a ObservationWrapper to design more realistic observations.
User-specific environments such as RC model based on linear state space mode, should use ObservationWrapper/RewardWrapper to inherit from basic linear state space model, and store additional laten dynamic states such as interior/exterior wall temperatures to info.
add TimeLimit wrapper for the environment to distinguish truncated episode from terminated episode.
test and compare two gym designs: soft termination or hard termination. In soft termination, we allow the system state go beyond limits, like the zone temperature upper/lower bounds. In hard termination, we terminate the episode once the system states exceed its bound.
This may raise a bigger problem:
If we use a differentiable simulator such as State Space Model as the gym env, and use that same model for end-to-end learning, the above approach is doable. But the same differentiable state space model cannot support end-to-end learning when interacting with real environment. For example, if we use RC model in DRL for planning, we need know the initial states such as interior wall temperature at each planning step, which is not observable and thus in the replay buffer. It seems we have to integrate state estimator to the planning.
might be a new contribution to reconstruct model states from DRL env states and support planning
use branch improve/gym-interface-design.
The text was updated successfully, but these errors were encountered:
The gym interface for the dynamic system can benefit from the following improvements:
x
,d
,y
]. We don't distinguish observable states and unobservable states in this design. User can add aObservationWrapper
to design more realistic observations.ObservationWrapper
/RewardWrapper
to inherit from basic linear state space model, and store additional laten dynamic states such as interior/exterior wall temperatures to info.TimeLimit
wrapper for the environment to distinguish truncated episode from terminated episode.soft termination
orhard termination
. Insoft termination
, we allow the system state go beyond limits, like the zone temperature upper/lower bounds. Inhard termination
, we terminate the episode once the system states exceed its bound.This may raise a bigger problem:
State Space Model
as the gym env, and use that same model for end-to-end learning, the above approach is doable. But the same differentiable state space model cannot support end-to-end learning when interacting with real environment. For example, if we use RC model in DRL for planning, we need know the initial states such as interior wall temperature at each planning step, which is not observable and thus in the replay buffer. It seems we have to integratestate estimator
to the planning.use branch
improve/gym-interface-design
.The text was updated successfully, but these errors were encountered: