EUPG inconsistently expands the observation #118

timondesch · 2024-09-06T16:37:09Z

I noticed while trying to run the MORL-Baselines implementation of EUPG on other environments than fishwood that there seem to be an inconsistency in the observation shape for fishwood compared to the other MO-Gymnasium environment, and one in the action function of EUPG compared to the other algorithms of MORL-Baselines. Sorry if I am missing something that should be obvious !

In short:

EUPG seem to only work on fishwood
it seems like other algorithms do not work on fishwood

The cause seems to be that the observations generated by the fishwood environment are 0-dimensional arrays, whereas following the convention used for other environments they should be 1-dimensional arrays of length 1. To compensate for this, EUPG increases the dimensionality of received observations by one before sending them to the action function, making it work for fishwood but not for any other environment.

We can see the discrepancy in the dimension of the observations with

import mo_gymnasium as mo_gym

print(mo_gym.make("fishwood-v0").observation_space.shape)
print(mo_gym.make("deep-sea-treasure-v0").observation_space.shape)
print(mo_gym.make("resource-gathering-v0").observation_space.shape)

which returns:

()
(2,)
(4,)

Similarly, the following test adapted from the EUPG and MOQL examples

import mo_gymnasium as mo_gym
import numpy as np
import torch as th
from mo_gymnasium.utils import MORecordEpisodeStatistics

from morl_baselines.single_policy.esr.eupg import EUPG
from morl_baselines.single_policy.ser.mo_q_learning import MOQLearning


if __name__ == "__main__":

    def scalarization(reward: np.ndarray, w=None):
        reward = th.tensor(reward) if not isinstance(reward, th.Tensor) else reward
        # Handle the case when reward is a single tensor of shape (2, )
        if reward.dim() == 1 and reward.size(0) == 2:
            return min(reward[0], reward[1] // 2).item()

        # Handle the case when reward is a tensor of shape (200, 2)
        elif reward.dim() == 2 and reward.size(1) == 2:
            return th.min(reward[:, 0], reward[:, 1] // 2)

    envs = [mo_gym.make("fishwood-v0"), mo_gym.make("deep-sea-treasure-v0")]

    for env in envs:
        agents= [EUPG(env, scalarization=scalarization, log=False),
                    MOQLearning(env, scalarization=scalarization, log=False)]
        for agent in agents:
            try:
                agent.train(total_timesteps=1, start_time=0)
                print(f"{agent.__class__.__name__} succeeded on {env.unwrapped.spec.id}")
            except Exception as e:
                print(f"{agent.__class__.__name__} failed on {env.unwrapped.spec.id}")

returns:

EUPG succeeded on fishwood-v0
MOQLearning failed on fishwood-v0
/home/timon/miniconda3/envs/issue_morl_baslines/lib/python3.11/site-packages/morl_baselines/single_policy/esr/eupg.py:295: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:278.)
  action = self.__choose_action(th.Tensor([obs]).to(self.device), accrued_reward_tensor)
EUPG failed on deep-sea-treasure-v0
MOQLearning succeeded on deep-sea-treasure-v0

Following what I said before, I think a change needs to be done in the source files for both fishwood and EUPG. I duplicated this issue in MORL-Baselines, and if both sides give me the green light I can test the issue further, confirm the cause and create a PR with a fix.

In MO-Gymnasium

The following line:
https://github.com/Farama-Foundation/MO-Gymnasium/blob/7087d48280ff715dac46a531702903b9aa71f986/mo_gymnasium/envs/fishwood/fishwood.py#L58
should be changed to:

self.observation_space = spaces.Box(low=0, high=1, shape=(1,), dtype=np.int32)

In MORL-Baselines

The following line:

morl-baselines/morl_baselines/single_policy/esr/eupg.py

Line 295 in 7c2d96a

    
           action = self.__choose_action(th.Tensor([obs]).to(self.device), accrued_reward_tensor)

should be changed to:

action = self.__choose_action(th.Tensor(obs).to(self.device), accrued_reward_tensor)

in order to mimic what is done in other algorithms where the observation is given to the action function as is (e.g. MO-Q-learning, CAPQL...).

I also noticed that the self.experiment_name is not set in MOQLearning, I can include that in the PR if need be.

The text was updated successfully, but these errors were encountered:

LucasAlegre · 2024-09-06T18:31:53Z

Hi @timondesch,

Thanks for these observations. Indeed, we never tried EUPG in other domains so we didn't catch that. In theory, it shouldn't be a problem for FishWood to have a spaces.Discrete observation space, but I guess it makes sense indeed to change it to Box to make it more consistent with the other tabular environments.

Feel free to open a PR in MO-Gymnasium and MORL-Baselines! :)

LucasAlegre · 2024-09-07T10:07:56Z

Fixed in #119

LucasAlegre mentioned this issue Sep 6, 2024

Inconsistent observation dimension for fishwood Farama-Foundation/MO-Gymnasium#102

Closed

timondesch mentioned this issue Sep 6, 2024

Fix EUPG's inconsistent expansion of observations #119

Merged

LucasAlegre closed this as completed Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EUPG inconsistently expands the observation #118

EUPG inconsistently expands the observation #118

timondesch commented Sep 6, 2024

LucasAlegre commented Sep 6, 2024

LucasAlegre commented Sep 7, 2024

EUPG inconsistently expands the observation #118

EUPG inconsistently expands the observation #118

Comments

timondesch commented Sep 6, 2024

LucasAlegre commented Sep 6, 2024

LucasAlegre commented Sep 7, 2024