报告题目: Model-Free Finite Horizon H-Infinity Control: An Off-Policy Approach with Double Minimax Q-learning
报 告 人:Wen Yu (余文)
报告时间:2025年8月28日 09:30-10:30
报告地点:莲花街校区惟学楼228会议室
报告人简介:
Wen Yu(余文),墨西哥科学院院士,墨西哥国立理工学院全职教授,入选前2%高被引科学家,1990年获得清华大学自动化专业学士学位,1992年和1995年分别获得东北大学自动控制专业硕士和博士学位。1995年至1996年期间,在东北大学自动控制系担任讲师。自1996年起,就职于墨西哥国立理工学院。2002年至2003年,在墨西哥石油研究所担任研究职务。2006年至2007年,他作为高级访问研究员在英国贝尔法斯特女王大学工作;2009年至2010年,他在美国加利福尼亚州圣克鲁兹的加州大学担任访问副教授。自2006年以来,他还担任东北大学的访问教授。
已发表500余篇学术论文,其中包括200余篇期刊论文,并出版了8部专著。指导了38篇博士论文和40篇硕士论文。根据Google Scholar统计,学术成果已被引用超过12,000次,H指数为52。曾担任IEEE旗舰年会SSCI 2023的大会主席。还曾担任《IEEE Transactions on Cybernetics》《IEEE Transactions on Neural Networks and Learning Systems》《Neurocomputing》《Scientific Reports》《Intelligence & Robotics》等期刊的副编辑。
报告内容简介:
Finite horizon H-infinity control is essential for robust system design, particularly when guaranteed system performance is required over a specific time interval. Despite offering practical benefits over its infinite horizon counterpart, these model-based frameworks present complexities, notably the time-varying nature of the Difference Riccati Equation (DRE), which significantly complicates solutions for systems with unknown dynamics. This paper proposes a novel model-free method by leveraging off-policy reinforcement learning (RL), known for its superior data efficiency and flexibility compared to traditional on-policy methods prevalent in model-free H-infinity control literature. Recognizing the unique challenges of off-policy RL within the inherent minimax optimization problem of H-infinity control, we propose the Neural Network-based Double Minimax Q-learning (NN-DMQ) algorithm. This algorithm is specifically designed to handle the adversarial interaction between the controller and the worst-case disturbance, while also mitigating the bias introduced by Q-value overestimation, which can destabilize learning. A key theoretical contribution of this work is a rigorous convergence proof of the proposed Double Minimax Q-learning (DMQ) algorithm. This proof provides strong guarantees for the algorithm's stability and capability to learn the optimal finite-horizon robust control and worst-case disturbance policies. Extensive were performed to verify the effectiveness and robustness of our approach, illustrating its real-world implementation in challenging real-world control problems with unknown dynamics.
欢迎广大师生参加!
信息科学与工程学院
2025年8月25日