Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EUPG inconsistently expands the observation #118

Closed
timondesch opened this issue Sep 6, 2024 · 2 comments
Closed

EUPG inconsistently expands the observation #118

timondesch opened this issue Sep 6, 2024 · 2 comments

Comments

@timondesch
Copy link
Contributor

I noticed while trying to run the MORL-Baselines implementation of EUPG on other environments than fishwood that there seem to be an inconsistency in the observation shape for fishwood compared to the other MO-Gymnasium environment, and one in the action function of EUPG compared to the other algorithms of MORL-Baselines. Sorry if I am missing something that should be obvious !

In short:

  • EUPG seem to only work on fishwood
  • it seems like other algorithms do not work on fishwood

The cause seems to be that the observations generated by the fishwood environment are 0-dimensional arrays, whereas following the convention used for other environments they should be 1-dimensional arrays of length 1. To compensate for this, EUPG increases the dimensionality of received observations by one before sending them to the action function, making it work for fishwood but not for any other environment.

We can see the discrepancy in the dimension of the observations with

import mo_gymnasium as mo_gym

print(mo_gym.make("fishwood-v0").observation_space.shape)
print(mo_gym.make("deep-sea-treasure-v0").observation_space.shape)
print(mo_gym.make("resource-gathering-v0").observation_space.shape)

which returns:

()
(2,)
(4,)

Similarly, the following test adapted from the EUPG and MOQL examples

import mo_gymnasium as mo_gym
import numpy as np
import torch as th
from mo_gymnasium.utils import MORecordEpisodeStatistics

from morl_baselines.single_policy.esr.eupg import EUPG
from morl_baselines.single_policy.ser.mo_q_learning import MOQLearning


if __name__ == "__main__":

    def scalarization(reward: np.ndarray, w=None):
        reward = th.tensor(reward) if not isinstance(reward, th.Tensor) else reward
        # Handle the case when reward is a single tensor of shape (2, )
        if reward.dim() == 1 and reward.size(0) == 2:
            return min(reward[0], reward[1] // 2).item()

        # Handle the case when reward is a tensor of shape (200, 2)
        elif reward.dim() == 2 and reward.size(1) == 2:
            return th.min(reward[:, 0], reward[:, 1] // 2)

    envs = [mo_gym.make("fishwood-v0"), mo_gym.make("deep-sea-treasure-v0")]

    for env in envs:
        agents= [EUPG(env, scalarization=scalarization, log=False),
                    MOQLearning(env, scalarization=scalarization, log=False)]
        for agent in agents:
            try:
                agent.train(total_timesteps=1, start_time=0)
                print(f"{agent.__class__.__name__} succeeded on {env.unwrapped.spec.id}")
            except Exception as e:
                print(f"{agent.__class__.__name__} failed on {env.unwrapped.spec.id}")

returns:

EUPG succeeded on fishwood-v0
MOQLearning failed on fishwood-v0
/home/timon/miniconda3/envs/issue_morl_baslines/lib/python3.11/site-packages/morl_baselines/single_policy/esr/eupg.py:295: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:278.)
  action = self.__choose_action(th.Tensor([obs]).to(self.device), accrued_reward_tensor)
EUPG failed on deep-sea-treasure-v0
MOQLearning succeeded on deep-sea-treasure-v0

Following what I said before, I think a change needs to be done in the source files for both fishwood and EUPG. I duplicated this issue in MORL-Baselines, and if both sides give me the green light I can test the issue further, confirm the cause and create a PR with a fix.

In MO-Gymnasium

The following line:
https://github.com/Farama-Foundation/MO-Gymnasium/blob/7087d48280ff715dac46a531702903b9aa71f986/mo_gymnasium/envs/fishwood/fishwood.py#L58
should be changed to:

self.observation_space = spaces.Box(low=0, high=1, shape=(1,), dtype=np.int32)

In MORL-Baselines

The following line:

action = self.__choose_action(th.Tensor([obs]).to(self.device), accrued_reward_tensor)

should be changed to:

action = self.__choose_action(th.Tensor(obs).to(self.device), accrued_reward_tensor)

in order to mimic what is done in other algorithms where the observation is given to the action function as is (e.g. MO-Q-learning, CAPQL...).

I also noticed that the self.experiment_name is not set in MOQLearning, I can include that in the PR if need be.

@LucasAlegre
Copy link
Owner

Hi @timondesch,

Thanks for these observations. Indeed, we never tried EUPG in other domains so we didn't catch that. In theory, it shouldn't be a problem for FishWood to have a spaces.Discrete observation space, but I guess it makes sense indeed to change it to Box to make it more consistent with the other tabular environments.

Feel free to open a PR in MO-Gymnasium and MORL-Baselines! :)

@LucasAlegre
Copy link
Owner

Fixed in #119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants