You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed while trying to run the MORL-Baselines implementation of EUPG on other environments than fishwood that there seem to be an inconsistency in the observation shape for fishwood compared to the other MO-Gymnasium environment, and one in the action function of EUPG compared to the other algorithms of MORL-Baselines. Sorry if I am missing something that should be obvious !
In short:
EUPG seem to only work on fishwood
it seems like other algorithms do not work on fishwood
The cause seems to be that the observations generated by the fishwood environment are 0-dimensional arrays, whereas following the convention used for other environments they should be 1-dimensional arrays of length 1. To compensate for this, EUPG increases the dimensionality of received observations by one before sending them to the action function, making it work for fishwood but not for any other environment.
We can see the discrepancy in the dimension of the observations with
Similarly, the following test adapted from the EUPG and MOQL examples
importmo_gymnasiumasmo_gymimportnumpyasnpimporttorchasthfrommo_gymnasium.utilsimportMORecordEpisodeStatisticsfrommorl_baselines.single_policy.esr.eupgimportEUPGfrommorl_baselines.single_policy.ser.mo_q_learningimportMOQLearningif__name__=="__main__":
defscalarization(reward: np.ndarray, w=None):
reward=th.tensor(reward) ifnotisinstance(reward, th.Tensor) elsereward# Handle the case when reward is a single tensor of shape (2, )ifreward.dim() ==1andreward.size(0) ==2:
returnmin(reward[0], reward[1] //2).item()
# Handle the case when reward is a tensor of shape (200, 2)elifreward.dim() ==2andreward.size(1) ==2:
returnth.min(reward[:, 0], reward[:, 1] //2)
envs= [mo_gym.make("fishwood-v0"), mo_gym.make("deep-sea-treasure-v0")]
forenvinenvs:
agents= [EUPG(env, scalarization=scalarization, log=False),
MOQLearning(env, scalarization=scalarization, log=False)]
foragentinagents:
try:
agent.train(total_timesteps=1, start_time=0)
print(f"{agent.__class__.__name__} succeeded on {env.unwrapped.spec.id}")
exceptExceptionase:
print(f"{agent.__class__.__name__} failed on {env.unwrapped.spec.id}")
returns:
EUPG succeeded on fishwood-v0
MOQLearning failed on fishwood-v0
/home/timon/miniconda3/envs/issue_morl_baslines/lib/python3.11/site-packages/morl_baselines/single_policy/esr/eupg.py:295: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:278.)
action = self.__choose_action(th.Tensor([obs]).to(self.device), accrued_reward_tensor)
EUPG failed on deep-sea-treasure-v0
MOQLearning succeeded on deep-sea-treasure-v0
Following what I said before, I think a change needs to be done in the source files for both fishwood and EUPG. I duplicated this issue in MORL-Baselines, and if both sides give me the green light I can test the issue further, confirm the cause and create a PR with a fix.
Thanks for these observations. Indeed, we never tried EUPG in other domains so we didn't catch that. In theory, it shouldn't be a problem for FishWood to have a spaces.Discrete observation space, but I guess it makes sense indeed to change it to Box to make it more consistent with the other tabular environments.
Feel free to open a PR in MO-Gymnasium and MORL-Baselines! :)
I noticed while trying to run the MORL-Baselines implementation of EUPG on other environments than fishwood that there seem to be an inconsistency in the observation shape for fishwood compared to the other MO-Gymnasium environment, and one in the action function of EUPG compared to the other algorithms of MORL-Baselines. Sorry if I am missing something that should be obvious !
In short:
The cause seems to be that the observations generated by the fishwood environment are 0-dimensional arrays, whereas following the convention used for other environments they should be 1-dimensional arrays of length 1. To compensate for this, EUPG increases the dimensionality of received observations by one before sending them to the action function, making it work for fishwood but not for any other environment.
We can see the discrepancy in the dimension of the observations with
which returns:
Similarly, the following test adapted from the EUPG and MOQL examples
returns:
Following what I said before, I think a change needs to be done in the source files for both fishwood and EUPG. I duplicated this issue in MORL-Baselines, and if both sides give me the green light I can test the issue further, confirm the cause and create a PR with a fix.
In MO-Gymnasium
The following line:
https://github.com/Farama-Foundation/MO-Gymnasium/blob/7087d48280ff715dac46a531702903b9aa71f986/mo_gymnasium/envs/fishwood/fishwood.py#L58
should be changed to:
In MORL-Baselines
The following line:
morl-baselines/morl_baselines/single_policy/esr/eupg.py
Line 295 in 7c2d96a
should be changed to:
in order to mimic what is done in other algorithms where the observation is given to the action function as is (e.g. MO-Q-learning, CAPQL...).
I also noticed that the
self.experiment_name
is not set in MOQLearning, I can include that in the PR if need be.The text was updated successfully, but these errors were encountered: