From a553a68dbe85df21f3d5749cbc309f4c5c26d0a8 Mon Sep 17 00:00:00 2001 From: Harshit Sikchi Date: Fri, 29 Mar 2019 15:57:22 +0530 Subject: [PATCH 1/3] Update README.md --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index 1957316..6478383 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ * [HCOPE](#hcope) * [Safe Exploration](#safe_exploration) * [Off Policy Evaluation](#importance_sampling) + * [Solving side effects](#side_effects) @@ -117,3 +118,11 @@ Comparision of different importance sampling estimators: Image is taken from phD thesis of P.Thomas: Links: https://people.cs.umass.edu/~pthomas/papers/Thomas2015c.pdf + + +## Penalizing side effects using relative reachability + +* Added a simple example for calculating side effects as given towards the end of paper + + + Paper: Penalizing side effects using stepwise relative reachability - Krakovna et al. From 90347be808124147c8191522292e65fdadd86adb Mon Sep 17 00:00:00 2001 From: Harshit Sikchi Date: Sat, 30 Mar 2019 13:09:27 +0530 Subject: [PATCH 2/3] Update README.md --- README.md | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 6478383..75aed1d 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ - +*** ## HCOPE (High-Confidence Off-Policy Evaluation.) @@ -55,6 +55,8 @@ method. Also, a graph of distribution of Importance sampling ratio is created wh Output format: ![Output](https://github.com/hari-sikchi/safeRL/blob/master/results/Result.png) +*** + ## Safe exploration in continuous action spaces. @@ -99,7 +101,7 @@ This enables agent to learn while following the safety constraints. ![Action Correction](https://github.com/hari-sikchi/safeRL/blob/master/results/safety_optimization.png) - +*** ## Importance Sampling @@ -119,10 +121,24 @@ Comparision of different importance sampling estimators: Links: https://people.cs.umass.edu/~pthomas/papers/Thomas2015c.pdf - -## Penalizing side effects using relative reachability + + + +*** + +## Side Effects +### Penalizing side effects using relative reachability + +Code - https://github.com/hari-sikchi/safeRL/tree/safe_recovery/side_effects + * Added a simple example for calculating side effects as given towards the end of paper +![Environment](https://github.com/hari-sikchi/safeRL/blob/safe_recovery/side_effects/env.png) + +The relative reachability measure +![Equation relative reachability](https://github.com/hari-sikchi/safeRL/blob/safe_recovery/side_effects/rr.png) + + Paper: Penalizing side effects using stepwise relative reachability - Krakovna et al. From 35388b86b4a645ec7870421404d114d70821a333 Mon Sep 17 00:00:00 2001 From: Harshit Sikchi Date: Sat, 30 Mar 2019 13:10:49 +0530 Subject: [PATCH 3/3] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 75aed1d..03598c3 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,6 @@ # Safe Reinforcement Learning Algorithms -================= + +*** * [HCOPE](#hcope)