You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder if the learning performance will be improved if the experience at the unoccupied time should be discarded and not added to the buffer. The reward at the unoccupied time is zero no matter what action was made while the reward at the occupied is negative. Will the agent consider the reward at the unoccupied time as the positive reward which causes the noise to the learning process? How will this influence the off-policy and on-policy respectively? Did you consider this in your single-zone model? @shichao2023
The text was updated successfully, but these errors were encountered:
I suggest that you can set the length of each epoch to occupied time only if you want to discard the training at the unoccupied time. (We may need further discussion on this after the holiday. Basically, it is better if we can set our training environment to different dates at the beginning of each epoch. ) And the behavior at the unoccupied time can also be viewed as part of the system dynamic. If you notice that we have included the time as part of the system states, then the agent should be able to learn that behavior. We didn't discard that part just for convenience. If you want to discard that, you need to consider what is the system state the next day. If you set it manually, it's the same as "set the length of each training epoch to occupied time" as I mentioned.
I wonder if the learning performance will be improved if the experience at the unoccupied time should be discarded and not added to the buffer. The reward at the unoccupied time is zero no matter what action was made while the reward at the occupied is negative. Will the agent consider the reward at the unoccupied time as the positive reward which causes the noise to the learning process? How will this influence the off-policy and on-policy respectively? Did you consider this in your single-zone model? @shichao2023
The text was updated successfully, but these errors were encountered: