soft-Q-learning推导

论文Reinforcement Learning with Deep Energy-Based Policies 在策略中引入熵，定义了softQ、softV，给出了soft Bellman Equation、策略提升定理，证明了soft Q可以通过soft Bellman Equation迭代收敛。

proof.pdf 对以上内容进行推导。

参考

论文笔记之Soft Q-learning

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
proof.pdf		proof.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

soft-Q-learning推导

参考

About

Releases

Packages

Git-123-Hub/soft-Q-learning-proof

Folders and files

Latest commit

History

Repository files navigation

soft-Q-learning推导

参考

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages