The learning experience at the unoccupied time #94

terrancelu92 · 2021-12-19T05:49:23Z

I wonder if the learning performance will be improved if the experience at the unoccupied time should be discarded and not added to the buffer. The reward at the unoccupied time is zero no matter what action was made while the reward at the occupied is negative. Will the agent consider the reward at the unoccupied time as the positive reward which causes the noise to the learning process? How will this influence the off-policy and on-policy respectively? Did you consider this in your single-zone model? @shichao2023

shichao2023 · 2021-12-19T20:39:35Z

I suggest that you can set the length of each epoch to occupied time only if you want to discard the training at the unoccupied time. (We may need further discussion on this after the holiday. Basically, it is better if we can set our training environment to different dates at the beginning of each epoch. ) And the behavior at the unoccupied time can also be viewed as part of the system dynamic. If you notice that we have included the time as part of the system states, then the agent should be able to learn that behavior. We didn't discard that part just for convenience. If you want to discard that, you need to consider what is the system state the next day. If you set it manually, it's the same as "set the length of each training epoch to occupied time" as I mentioned.

terrancelu92 added the question Further information is requested label Dec 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The learning experience at the unoccupied time #94

The learning experience at the unoccupied time #94

terrancelu92 commented Dec 19, 2021

shichao2023 commented Dec 19, 2021

The learning experience at the unoccupied time #94

The learning experience at the unoccupied time #94

Comments

terrancelu92 commented Dec 19, 2021

shichao2023 commented Dec 19, 2021