Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalization of the observations for GPI-LS #81

Closed
wants to merge 12 commits into from

Conversation

AdrienBolling
Copy link

Here is my modification of the library to accept all types of Observations
More things are left up to the person using the library, especially through the implementation of the generic Observation class (that could admitedly be moved to MO-Gym alongside the associated wrapper)
This has been tester with the Minecart file to see if it was still compatible and it seems to be as it ran without issues.

The principle is just that all observations are just considered of type Observation and are handled as array of objects to keep as much of the previous code as possible. The conversion to tensor is left to the discretion of the user, and made in the forward function of the q_net

This PR also allows people to use custom q_net with their observations (Observations that are not a wrapper of a np vector will require a custom q_net by default)

The basic Observation class I made should be pretty compatible with a good number of observations, although not optimally.

Please don't take into account the changes made to enveloppe, these were old tweaks and mluch less transferable.

I think this implementation could be deployed in the entire library without too much hassle

Please excuse me for the mess of a PR this is, it is the first contribution I'm doing of the sort of my life

I have also not really touched the ProbabilisticEnsemble part. I think a reasonnable option would be to leave the choice for the user to provide a custom model for the env simulation if he wishes to use a "complex" observation.

(This modification mainly stems from my need to use your library with DGL Heterographs observations)

Modification of the pyproject to allow installation of both the original and modified packages
Modification of the pyproject to allow installation of both the original and modified packages
# Conflicts:
#	morl_baselines/multi_policy/envelope/envelope.py
Buffers have also been modified

Creation of a wrapper for the observations as well as a Generic type to encapsulate them
@AdrienBolling
Copy link
Author

I just noticed that I required a merge to your main branch. I'm not experienced with git except from a personnal use stadnpoint, so I don't know if you'll be able to redirect it into another branch or if I need to redo it towards another branch myself ?

@LucasAlegre
Copy link
Owner

Hi @AdrienBolling, I am not sure I understand the motivation for this PR. GPI-LS already supports vector and image observations. In case another type of observation would be used, this could be done by a wrapper converting to vector, right?

@AdrienBolling
Copy link
Author

In my case : I am working with graph based observations
(More precisely heterogeneous dynamic graphs)

So I have several issues with just using a flattened array :

  • my observations don't have a fixed size (dynamic), which is handled by DGL but not your implementation of a replay buffer
  • It is a tedious work in terms of number of operations to convert everything to a flattened array then convert it back to a graph, and that is not only the issue with graphs, but even for image based
  • Which leads me to the next point, the networks that can be useful or interesting are not all just fully connected MLP, and with just a flatten wrapper, you are forced to unflatten in your forward function, for no good reason actually since the "array" structure of the flattened observation is not used, it is just always treated as an object
    This in turn leads to a very high number of technically unnecessary conversion operations, which scales a lot
  • To use a more simple and less specific case than graphs, in a previous approach, I used a tuple of numpy matrixes as my observation, these matrixes being of different shapes (which doesn't seem so unlikely for an environment), flattening then unflattening in the forward function of my q_net, to treat each matrix with a separate sub-network was a costly task which slowed down things a lot

Buffers have also been modified

Creation of a wrapper for the observations as well as a Generic type to encapsulate them
@ffelten
Copy link
Collaborator

ffelten commented Feb 7, 2024

Hi Adrien 🙂,

I will close this since we do not want to overcomplexify the codebase for now. We believe there are more advantages in staying close to SB3/cleanRL (which only support Gymnasium's spaces) while MORL is still not so well known.

In case you need any more help on this, you can still message us on Discord :-).

Cheers,

@ffelten ffelten closed this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants