Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/hari-sikchi/safeRL
Browse files Browse the repository at this point in the history
  • Loading branch information
hari-sikchi committed Mar 31, 2019
2 parents 26b9cdd + 35388b8 commit 9c2db47
Showing 1 changed file with 29 additions and 3 deletions.
32 changes: 29 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# Safe Reinforcement Learning Algorithms
=================

***

<!--ts-->
* [HCOPE](#hcope)
* [Safe Exploration](#safe_exploration)
* [Off Policy Evaluation](#importance_sampling)
* [Solving side effects](#side_effects)
<!--te-->



***

<a name="hcope"></a>
## HCOPE (High-Confidence Off-Policy Evaluation.)
Expand Down Expand Up @@ -54,6 +56,8 @@ method. Also, a graph of distribution of Importance sampling ratio is created wh
Output format:
![Output](https://github.com/hari-sikchi/safeRL/blob/master/results/Result.png)

***

<a name="safe_exploration"></a>
## Safe exploration in continuous action spaces.

Expand Down Expand Up @@ -98,7 +102,7 @@ This enables agent to learn while following the safety constraints.
![Action Correction](https://github.com/hari-sikchi/safeRL/blob/master/results/safety_optimization.png)



***

<a name="importance_sampling"></a>
## Importance Sampling
Expand All @@ -117,3 +121,25 @@ Comparision of different importance sampling estimators:
Image is taken from phD thesis of P.Thomas:
Links: https://people.cs.umass.edu/~pthomas/papers/Thomas2015c.pdf


<a name="side_effects"></a>


***

## Side Effects
### Penalizing side effects using relative reachability

Code - https://github.com/hari-sikchi/safeRL/tree/safe_recovery/side_effects


* Added a simple example for calculating side effects as given towards the end of paper
![Environment](https://github.com/hari-sikchi/safeRL/blob/safe_recovery/side_effects/env.png)

The relative reachability measure
![Equation relative reachability](https://github.com/hari-sikchi/safeRL/blob/safe_recovery/side_effects/rr.png)




Paper: Penalizing side effects using stepwise relative reachability - Krakovna et al.

0 comments on commit 9c2db47

Please sign in to comment.