学术预告

信息科学与工程学院学术报告预告: Model-Free Finite Horizon H-Infinity Control: An Off-Policy Approach with Double Minimax Q-learning

来源:信息科学与工程学院 2025-08-25 12:14 浏览:
演讲者 Wen Yu (余文) 演讲时间 2025年8月28日 09:30-10:30
地点 莲花街校区惟学楼228会议室 分类
职位 摄影
审核 审校
主要负责 联系学院
事记时间

报告题目: Model-Free Finite Horizon H-Infinity Control: An Off-Policy Approach with Double Minimax Q-learning

报 告 人:Wen Yu (余文)

报告时间:2025年8月28日 09:30-10:30

报告地点:莲花街校区惟学楼228会议室

报告人简介:

Wen Yu(余文),墨西哥科学院院士,墨西哥国立理工学院全职教授,入选前2%高被引科学家,1990年获得清华大学自动化专业学士学位,1992年和1995年分别获得东北大学自动控制专业硕士和博士学位。1995年至1996年期间,在东北大学自动控制系担任讲师。自1996年起,就职于墨西哥国立理工学院。2002年至2003年,在墨西哥石油研究所担任研究职务。2006年至2007年,他作为高级访问研究员在英国贝尔法斯特女王大学工作;2009年至2010年,他在美国加利福尼亚州圣克鲁兹的加州大学担任访问副教授。自2006年以来,他还担任东北大学的访问教授。

已发表500余篇学术论文,其中包括200余篇期刊论文,并出版了8部专著。指导了38篇博士论文和40篇硕士论文。根据Google Scholar统计,学术成果已被引用超过12,000次,H指数为52。曾担任IEEE旗舰年会SSCI 2023的大会主席。还曾担任《IEEE Transactions on Cybernetics》《IEEE Transactions on Neural Networks and Learning Systems》《Neurocomputing》《Scientific Reports》《Intelligence & Robotics》等期刊的副编辑。

报告内容简介:

Finite horizon H-infinity control is essential for robust system design, particularly when guaranteed system performance is required over a specific time interval. Despite offering practical benefits over its infinite horizon counterpart, these model-based frameworks present complexities, notably the time-varying nature of the Difference Riccati Equation (DRE), which significantly complicates solutions for systems with unknown dynamics. This paper proposes a novel model-free method by leveraging off-policy reinforcement learning (RL), known for its superior data efficiency and flexibility compared to traditional on-policy methods prevalent in model-free H-infinity control literature. Recognizing the unique challenges of off-policy RL within the inherent minimax optimization problem of H-infinity control, we propose the Neural Network-based Double Minimax Q-learning (NN-DMQ) algorithm. This algorithm is specifically designed to handle the adversarial interaction between the controller and the worst-case disturbance, while also mitigating the bias introduced by Q-value overestimation, which can destabilize learning. A key theoretical contribution of this work is a rigorous convergence proof of the proposed Double Minimax Q-learning (DMQ) algorithm. This proof provides strong guarantees for the algorithm's stability and capability to learn the optimal finite-horizon robust control and worst-case disturbance policies. Extensive were performed to verify the effectiveness and robustness of our approach, illustrating its real-world implementation in challenging real-world control problems with unknown dynamics.

欢迎广大师生参加!


信息科学与工程学院

2025年8月25日

(责任编辑:李翰)