From d28f77eba9fce515ef2062ed08756a331dce45b9 Mon Sep 17 00:00:00 2001 From: Daniel Lawson Date: Tue, 20 Sep 2022 15:11:26 -0400 Subject: [PATCH] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 96bf113..a1ff01c 100644 --- a/README.md +++ b/README.md @@ -30,8 +30,8 @@ Alternatively, the environment can be setup using Docker with the attached Docke There are some small implementation detail differences and other assumed implementation detials that may be different. -Originally, the policy is parameterized as: -$$\pi_\theta(a_t|s_{-K,t}, g_{-K,t}) = N(\mu_\theta(s_{-K,t}, g_{-K,t}), \Sigma_{\theta}(s_{-K,t}, g_{-K,t}))$$ + However, we use $$\mathbf{tanh}(N(\mu_\theta(s_{-K,t}, g_{-K,t}), \Sigma_{\theta}(s_{-K,t}, g_{-K,t})))$$