-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change XML saving for maze to fix infrequent crash #206
base: main
Are you sure you want to change the base?
Conversation
Can you please elaborate on why the crashing was happening prior to this change? |
Honestly it's a little unclear to me, the crash threw an XML parsing error giving a line and reason, but obviously the XML is supposed to be working. My guess is either the XML tool we are using to write to the XML file does not flush the buffers correctly internally, and since here we use the open() context manager it flushes as needed, or it's because there is a collision since I'm using a lot of parallelism and this is using timestamping. With these changes the directories are unique regardless of time, and also the file is flushed fully. |
@Kallinteris-Andreas is there anything else I should do to get this fix in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, those changes are an improvement over the hacky solution with timestamping XML files I had come up with.
To elaborate a bit, the likely cause of the previous version crashing was because of possible timestamp aliasing, which this PR fixes by using /tmp/
files
Other than the testing issue, it is ready for merge
tests/envs/maze/test_point_maze.py
Outdated
@@ -39,3 +39,23 @@ def test_goal_cell(): | |||
obs = env.reset(options={"goal_cell": [2, 1]}, seed=42)[0] | |||
desired_goal = np.array([-0.36302198, -0.53056078]) | |||
np.testing.assert_almost_equal(desired_goal, obs["desired_goal"], decimal=4) | |||
|
|||
|
|||
def test_multiprocessing_not_crash(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Problem with this kind of nondeterministic testing, is that I can verify it working or not
I can not get it to fail with the version of maze
If it can not prove that a bug has been fixed, it should not be added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me see if I can make a test that fails it, I think the actual problem is that when using Tianshou RL it recreates the env a bunch of times, so what I should be doing to test is creating a buttload of envs.
@mishmish66 any progress on finding the test? Note: I am willing to merge without a test, since it is hard to assert concurrency behavior |
No, I couldn't replicate the errors in a test which is weird. I was seeing this on a node in a cluster so I guess testing it locally is slightly different, but in any case I guess it isn't a very significant issue if it only happens on strangely configured cluster nodes. |
Just remove |
@mishmish66 rebase and remove the test, and I can merge |
I'll do that! Now that the semester is going again it may take me a second to get to it. Thanks! |
@mishmish66 please remove tests and rebase |
@Kallinteris-Andreas I rebased and fixed a warning that was thrown from the env on cleanup that was causing the tests to fail. I also added the change to maze-v3 envs, let me know if that is the wrong thing to do and I'll go remove that! Thanks for your patience! |
@@ -308,3 +316,7 @@ def compute_truncated( | |||
|
|||
def update_target_site_pos(self, pos): | |||
raise NotImplementedError | |||
|
|||
def __del__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually required? does it now cleanup in /tmp
without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late response,
please run pre-commit
Description
I changed the way that temporary xmls are stored for the maze envs. I have used this change for training and it improves stability, without it my training would crash sometimes, usually after about 1m env steps. With this change it seems to be stable.
Type of change
Please delete options that are not relevant.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)