generated from academicpages/academicpages.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8d69320
commit 3ee2b42
Showing
1 changed file
with
16 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
--- | ||
title: "Off-Policy Risk-Sensitive Reinforcement Learning-Based Constrained Robust Optimal Control" | ||
collection: publications | ||
permalink: /publication/2022off | ||
authorlist: 'Cong Li, Qingchen Liu, <b>Zhehua Zhou</b>, Martin Buss, Fangzhou Liu' | ||
excerpt: 'This article proposes an off-policy risk-sensitive reinforcement learning (RL)-based control framework to jointly optimize the task performance and constraint satisfaction in a disturbed environment.' | ||
date: 2022-10-25 | ||
venue: 'IEEE Transactions on Systems, Man, and Cybernetics: Systems' | ||
# slidesurl: 'http://academicpages.github.io/files/slides1.pdf' | ||
paperurl: 'https://ieeexplore.ieee.org/abstract/document/9927372' | ||
# citation: 'Your Name, You. (2009). "Paper Title Number 1." <i>Journal 1</i>. 1(1).' | ||
pubtype: 'journal' | ||
highlight: 'no' | ||
--- | ||
|
||
This article proposes an off-policy risk-sensitive reinforcement learning (RL)-based control framework to jointly optimize the task performance and constraint satisfaction in a disturbed environment. The risk-aware value function, constructed using the pseudo control and risk-sensitive input and state penalty terms, is introduced to convert the original constrained robust stabilization problem into an equivalent unconstrained optimal control problem. Then, an off-policy RL algorithm is developed to learn the approximate solution to the risk-aware value function. During the learning process, the associated approximate optimal control policy is able to satisfy both input and state constraints under disturbances. By replaying experience data to the off-policy weight update law of the critic neural network, the weight convergence is guaranteed. Moreover, online and offline algorithms are developed to serve as principled ways to record informative experience data to achieve a sufficient excitation required for the weight convergence. The proofs of system stability and weight convergence are provided. The Simulation results reveal the validity of the proposed control framework. |