Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The learning experience at the unoccupied time #94

Open
terrancelu92 opened this issue Dec 19, 2021 · 1 comment
Open

The learning experience at the unoccupied time #94

terrancelu92 opened this issue Dec 19, 2021 · 1 comment
Labels
question Further information is requested

Comments

@terrancelu92
Copy link
Collaborator

I wonder if the learning performance will be improved if the experience at the unoccupied time should be discarded and not added to the buffer. The reward at the unoccupied time is zero no matter what action was made while the reward at the occupied is negative. Will the agent consider the reward at the unoccupied time as the positive reward which causes the noise to the learning process? How will this influence the off-policy and on-policy respectively? Did you consider this in your single-zone model? @shichao2023

@terrancelu92 terrancelu92 added the question Further information is requested label Dec 19, 2021
@shichao2023
Copy link
Collaborator

I suggest that you can set the length of each epoch to occupied time only if you want to discard the training at the unoccupied time. (We may need further discussion on this after the holiday. Basically, it is better if we can set our training environment to different dates at the beginning of each epoch. ) And the behavior at the unoccupied time can also be viewed as part of the system dynamic. If you notice that we have included the time as part of the system states, then the agent should be able to learn that behavior. We didn't discard that part just for convenience. If you want to discard that, you need to consider what is the system state the next day. If you set it manually, it's the same as "set the length of each training epoch to occupied time" as I mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants