梯级水库深度强化学习长期随机优化调度研究

doi:10.11660/slfdxb.20231103

水力发电学报 ›› 2023, Vol. 42 ›› Issue (11): 21-32.doi: 10.11660/slfdxb.20231103

梯级水库深度强化学习长期随机优化调度研究

出版日期:2023-11-25 发布日期:2023-11-25

Study on long-term stochastic optimal operation of cascade reservoirs by deep reinforcement learning

Online:2023-11-25 Published:2023-11-25

摘要/Abstract

摘要： 梯级水库调度相较于单库调度状态空间呈指数级增大，为解决基于表格的强化学习方法在解决梯级水库长期随机优化调度问题时面临的维数灾问题，提出采用深度强化学习中的深度Q网络算法求解。首先基于Copula函数分析梯级水库随机入库径流的联合分布函数；再根据时序差分思想分别建立目标神经网络和主神经网络，分别逼近当前和下一状态对应的动作状态价值，并采用ε-贪婪探索利用策略获取最优调度策略；最后将主要参数分步调优保障调度效益。算例对比表明，深度Q网络算法相较于Q学习算法及其改进算法提升了优化调度目标值，加快收敛速度，有效解决了梯级水库随机优化调度中的维数灾问题

关键词: 梯级水库随机优化调度, 深度强化学习, 深度Q网络算法, 时序差分思想, 探索利用策略

Abstract: Compared with a single reservoir, cascade reservoirs operation features a state space increasing exponentially. This paper describes a Deep Q-network (DQN) algorithm for deep reinforcement learning to solve the dimension disaster problem that is faced by the table-based reinforcement learning method in optimizing the long-term operation of cascade reservoirs. First, we derive a joint distribution function of stochastic inflow runoffs of the reservoirs based on the Copula function. Then, following the idea of time series difference, we construct a target neural network and a main neural network for approximating the values of the current action state and the next action state, respectively, and use ε-greedy algorithm to obtain optimal operation policy. Finally, the main parameters of reservoir operation are optimized by step to ensure operation efficiency. Compared with the Q-learning algorithm or its modification, the DQN algorithm improves the objective value of optimal scheduling, accelerates convergence, and avoids dimension disaster effectively in the optimization of cascade reservoirs operation.

Key words: stochastic optimal operation of cascade reservoirs, deep reinforcement learning, Deep Q-network algorithm, temporal difference idea, exploration and exploitation strategy

李文武, 周佳妮, 裴本林, 张一凡. 梯级水库深度强化学习长期随机优化调度研究[J]. 水力发电学报, 2023, 42(11): 21-32.

LI Wenwu, ZHOU Jiani, PEI Benlin, ZHANG Yifan. Study on long-term stochastic optimal operation of cascade reservoirs by deep reinforcement learning[J]. Journal of Hydroelectric Engineering, 2023, 42(11): 21-32.

梯级水库深度强化学习长期随机优化调度研究

Study on long-term stochastic optimal operation of cascade reservoirs by deep reinforcement learning

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 3