Gym interface design for state space model #14

YangyangFu · 2023-04-03T16:22:04Z

The gym interface for the dynamic system can benefit from the following improvements:

standardize state-space model gym interface. For example, the gym observations should be the [x, d, y]. We don't distinguish observable states and unobservable states in this design. User can add a ObservationWrapper to design more realistic observations.
User-specific environments such as RC model based on linear state space mode, should use ObservationWrapper/RewardWrapper to inherit from basic linear state space model, and store additional laten dynamic states such as interior/exterior wall temperatures to info.
add TimeLimit wrapper for the environment to distinguish truncated episode from terminated episode.
test and compare two gym designs: soft termination or hard termination. In soft termination, we allow the system state go beyond limits, like the zone temperature upper/lower bounds. In hard termination, we terminate the episode once the system states exceed its bound.

This may raise a bigger problem:

If we use a differentiable simulator such as State Space Model as the gym env, and use that same model for end-to-end learning, the above approach is doable. But the same differentiable state space model cannot support end-to-end learning when interacting with real environment. For example, if we use RC model in DRL for planning, we need know the initial states such as interior wall temperature at each planning step, which is not observable and thus in the replay buffer. It seems we have to integrate state estimator to the planning.
might be a new contribution to reconstruct model states from DRL env states and support planning

use branch improve/gym-interface-design.

The text was updated successfully, but these errors were encountered:

YangyangFu · 2023-09-20T20:31:24Z

The gym interface was designed and merged at pull requests #27.

But, there are still remaining work needed to meet all these design requirements.

YangyangFu self-assigned this Apr 3, 2023

YangyangFu added the enhancement New feature or request label Apr 3, 2023

Provide feedback