Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gym interface design for state space model #14

Open
1 of 4 tasks
YangyangFu opened this issue Apr 3, 2023 · 1 comment
Open
1 of 4 tasks

Gym interface design for state space model #14

YangyangFu opened this issue Apr 3, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@YangyangFu
Copy link
Owner

YangyangFu commented Apr 3, 2023

The gym interface for the dynamic system can benefit from the following improvements:

  • standardize state-space model gym interface. For example, the gym observations should be the [x, d, y]. We don't distinguish observable states and unobservable states in this design. User can add a ObservationWrapper to design more realistic observations.
  • User-specific environments such as RC model based on linear state space mode, should use ObservationWrapper/RewardWrapper to inherit from basic linear state space model, and store additional laten dynamic states such as interior/exterior wall temperatures to info.
  • add TimeLimit wrapper for the environment to distinguish truncated episode from terminated episode.
  • test and compare two gym designs: soft termination or hard termination. In soft termination, we allow the system state go beyond limits, like the zone temperature upper/lower bounds. In hard termination, we terminate the episode once the system states exceed its bound.

This may raise a bigger problem:

  • If we use a differentiable simulator such as State Space Model as the gym env, and use that same model for end-to-end learning, the above approach is doable. But the same differentiable state space model cannot support end-to-end learning when interacting with real environment. For example, if we use RC model in DRL for planning, we need know the initial states such as interior wall temperature at each planning step, which is not observable and thus in the replay buffer. It seems we have to integrate state estimator to the planning.
  • might be a new contribution to reconstruct model states from DRL env states and support planning

use branch improve/gym-interface-design.

@YangyangFu YangyangFu self-assigned this Apr 3, 2023
@YangyangFu YangyangFu added the enhancement New feature or request label Apr 3, 2023
@YangyangFu
Copy link
Owner Author

The gym interface was designed and merged at pull requests #27.

But, there are still remaining work needed to meet all these design requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant