RFGPS #62

wmontgomery4 · 2016-10-26T00:21:14Z

Here's the commit for RFGPS, it's pretty large but here's the main takeaways:

Restructure algorithm.py and algorithm_mdgps.py to allow for fitting dynamics/policies based on cluster indices rather than condition (I couldn't quite move this all to algorithm_mdgps.py unfortunately)
Change CostState to only take a single data_type, rather than a dictionary, and also change the cost to be wp*(A.dot(x) - target). No one seems to really be using cost_state, and this is a more effective use of it in my opinion, but I'd be happy to change this back (although it will require some other changes since we need the A.dot(x) stuff to make it easy to get the difference between the end effector and target).

The one remaining issue is how to initialize the EM. Right now it starts with Kmeans, but I have some code which starts it from random points instead (commented out, near the end of algorithm_mdgps.py in the _cluster_traj_em method).

cbfinn

Took a fairly quick pass, mostly looking at style/clarity, not for technical correctness.
@anuragajay Please also look over the code, especially the technical parts.

cbfinn · 2016-10-26T05:18:18Z

python/gps/agent/config.py

@@ -69,7 +69,11 @@
    'image_width': 640,
    'image_height': 480,
    'image_channels': 3,
-    'meta_include': []
+    'meta_include': [],


comments for these options would be nice, since they aren't self-explanatory.

cbfinn · 2016-10-26T05:19:52Z

python/gps/agent/mjc/agent_mjc.py

@@ -99,6 +94,44 @@ def _setup_world(self, filename):
                                       cam_pos[0], cam_pos[1], cam_pos[2],
                                       cam_pos[3], cam_pos[4], cam_pos[5])

+    def reset_initial_x0(self, condition):


would a better name for this be "sample_initial_x0"?

cbfinn · 2016-10-26T05:20:14Z

python/gps/agent/mjc/agent_mjc.py

+        Reset initial x0 randomly based on sampling_range_x0
+        and prohibited_ranges_x0
+        Args:
+            condition: Which condition setup to run.


Update this comment.

cbfinn · 2016-10-26T05:21:40Z

python/gps/agent/mjc/agent_mjc.py

+        Args:
+            condition: Which condition setup to run.
+        """
+        tmp_body_pos_offset = self._hyperparams['pos_body_offset'][condition][:]


nit: we are now inconsistent between pos_body and bodypos. Maybe we should change pos_body_offset to bodypos_offset?

cbfinn · 2016-10-26T05:22:12Z

python/gps/agent/mjc/agent_mjc.py

+        self._hyperparams['pos_body_offset'][condition] = sample_params(
+                self._hyperparams['sampling_range_bodypos'],
+                self._hyperparams['prohibited_ranges_bodypos'])
+        # TODO: handle the i/j stuff as above


What does this TODO mean?

cbfinn · 2016-10-26T05:37:38Z

python/gps/algorithm/algorithm_mdgps.py

@@ -231,3 +287,135 @@ def compute_costs(self, m, eta):
            fcv[t, :] = (cv[t, :] + PKLv[t, :] * eta) / (eta + multiplier)

        return fCm, fcv
+
+    def _cluster_samples(self, mode=None):


Maybe put the clustering in a different file? This one is getting quite long.

cbfinn · 2016-10-26T05:37:53Z

python/gps/algorithm/algorithm_mdgps.py

+        """
+        # Initialize starting with random data points as centers.
+        K = self._hyperparams['num_clusters']
+#        self.cluster_idx = np.random.choice(K, size=self.M)


remove commented out code.

cbfinn · 2016-10-26T05:38:32Z

python/gps/algorithm/config.py

@@ -22,6 +22,8 @@
    'cost': None,  # A list of Cost objects for each condition.
    # Whether or not to sample with neural net policy (only for badmm/mdgps).
    'sample_on_policy': False,
+    # Number of clusters to use if clustering samples.
+    'num_clusters': 0,


Set default to None rather than 0 to be more explicit? Also, say in the comment that this is used in RF-GPS (unless it can also be used for traj opt?)

cbfinn · 2016-10-26T05:39:05Z

python/gps/algorithm/config.py

    'step_rule': 'laplace',
+    # How to cluster samples, 'random', 'kmeans', 'traj_em'.


Mention RF-GPS here.

cbfinn · 2016-10-26T05:41:39Z

python/gps/algorithm/cost/config.py

 COST_STATE = {
-    'ramp_option': RAMP_CONSTANT,  # How target cost ramps over time.
+    'wp': None,  # State weights - Defaults to ones.
+    'A': None, # A matrix - Defaults to identity.


Explain what the A matrix is in the comment.

anuragajay · 2016-10-26T17:37:05Z

@cbfinn The code looks correct to me. Should I also make the other changes you suggested?

wmontgomery4 · 2016-10-26T17:38:20Z

Don't worry, I'll handle those other changes Anurag. Thanks for the comments Chelsea!

wmontgomery4 · 2016-10-28T16:21:36Z

@cbfinn Now that the PIGPS stuff is merged, it's going to be kind of hard to merge this stuff in a way that isn't overly complicated. I think it would make sense to refactor the current codebase first, in a way that allows

wmontgomery4 · 2016-10-28T16:39:18Z

Whoops, accidentally hit 'close and comment'. I was going to send this in the email thread instead, but I'll just say it here: I think it would make sense to focus on refactoring the current codebase before merging this, so that the rfgps code can be written more succinctly. I've got a few ideas about how that should go:

Move as much as possible to the base Algorithm class, or to algorithm_utils, so that each algorithm can be written nearly as tersely as the PI2 algorithms are right now. I think that pretty much everything but the __init__, iteration and _compute_costs functions could be moved out (and even the _compute_costs functions could be rewritten to just take some library calls). In general, I want to move away from using functions which take in the algorithm object, and instead take in the necessary parts of the algorithm, as this should make it easier to include RFGPS, and will make for better parallelism as well.
Keep MDGPS/PIGPS separate, and probably make RFGPS a separate algorithm as well. I think the idea of merging the MDGPS/PIGPS is nice, but there are enough differences that it will be kind of challenging to maintain that structure (the traj_opt module isn't totally replaceable, since it affects other things like whether or not we need to fit dynamics). Also, if we abstract things out of those files, writing an individual algorithm file should be pretty short.

I've got my quals exam in ~10 days, so I'm going to put this stuff at a lower priority right now, and make a branch off this for the RWR stuff. In the meantime this branch can serve as the public implementation of rfgps. Maybe we should have a refactoring meeting sometime soon to talk about this more?

EdsterG · 2016-10-28T23:15:16Z

I don't think we should change CostState to only take a single data_type. I found the dictionary of data_type's to be very useful. It may be best to make a new cost function CostStateWeighted?

wmontgomery4 · 2016-10-28T23:21:36Z

That's a fair point, and I agree that changing the API at this point is unnecessarily confusing. The main reason I wanted to do it was to add the A parameter, as this allows you get arbitrary linear functions of the state (i.e. the distance between the end effector and target). I think that cost_linwp does something similar on one of the other branches though, I'll try to just make it look like that.

cbfinn · 2016-10-29T06:41:47Z

Having a refactoring meeting sounds like a good idea. I'm pretty busy with ICLR right now (deadline in <7 days), so let's figure it out after your quals.

I agree with your two main points!

add RFGPS

f7b482d

cbfinn requested changes Oct 26, 2016

View reviewed changes

wmontgomery4 closed this Oct 28, 2016

wmontgomery4 reopened this Oct 28, 2016

added ICRA experiments

ab6d7ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFGPS #62

RFGPS #62

wmontgomery4 commented Oct 26, 2016

cbfinn left a comment

cbfinn Oct 26, 2016

cbfinn Oct 26, 2016

cbfinn Oct 26, 2016

cbfinn Oct 26, 2016

cbfinn Oct 26, 2016

cbfinn Oct 26, 2016

cbfinn Oct 26, 2016

cbfinn Oct 26, 2016

cbfinn Oct 26, 2016

cbfinn Oct 26, 2016

anuragajay commented Oct 26, 2016

wmontgomery4 commented Oct 26, 2016

wmontgomery4 commented Oct 28, 2016

wmontgomery4 commented Oct 28, 2016

EdsterG commented Oct 28, 2016

wmontgomery4 commented Oct 28, 2016

cbfinn commented Oct 29, 2016

		'step_rule': 'laplace',
		# How to cluster samples, 'random', 'kmeans', 'traj_em'.

RFGPS #62

Are you sure you want to change the base?

RFGPS #62

Conversation

wmontgomery4 commented Oct 26, 2016

cbfinn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anuragajay commented Oct 26, 2016

wmontgomery4 commented Oct 26, 2016

wmontgomery4 commented Oct 28, 2016

wmontgomery4 commented Oct 28, 2016

EdsterG commented Oct 28, 2016

wmontgomery4 commented Oct 28, 2016

cbfinn commented Oct 29, 2016