水力发电学报
            首 页   |   期刊介绍   |   编委会   |   投稿须知   |   下载中心   |   联系我们   |   学术规范   |   编辑部公告   |   English

水力发电学报 ›› 2021, Vol. 40 ›› Issue (3): 124-133.doi: 10.11660/slfdxb.20210312

• • 上一篇    下一篇

基于密度分簇的长周期监测数据异常识别方法

  

  • 出版日期:2021-03-25 发布日期:2021-03-25

Density-based detection of clustering outliers in long-term monitoring data

  • Online:2021-03-25 Published:2021-03-25

摘要: 针对水工结构长周期监测数据野值识别中存在分布假设难以满足、野值点数量受限和野值难以有效量化的问题,提出了以改进局部异常系数算法为基础的密度分簇局部异常识别方法。该方法将长周期监测数据集分为极端簇、野值簇和正常簇,在每个簇中以不同方式赋予异常可能性,得到了综合考虑自变量和效应量的异常可能性时序图,实现了水工结构长周期监测数据野值识别与量化分析。核心算法预先不使用任何分布假设,改进了局部异常系数算法可达距离的定义,扩大了高异常系数与低异常系数的差值,使得野值与其他数据点更易区分。依托实际调水工程长周期监测数据,考虑实测数据集中野值数量和位置均未知的情况,根据异常可能性计算可信程度作为回归分析模型的权重,模型预测结果与未加权重的模型相比得到了较大提高,验证了所提出方法的有效性。

关键词: 水工结构, 异常识别, 局部异常系数, 野值, 长周期监测数据

Abstract: A density-based clustering outlier detection algorithm using improved local outlier factors is presented for analysing long-term hydraulic structure monitoring data. It is aimed at the problems that the distribution assumptions are difficult to meet, the number of outliers to be processed is limited, and the outliers are difficult to effectively be quantified. It divides the long-term data set into extreme clusters, outlier clusters, and normal clusters; in each cluster, anomalous possibilities are assigned in different ways, and it obtains an anomalous possibility that considers independent variables and effect sizes comprehensively. Its sequence diagram realizes the identification and quantitative analysis of long-term data sets of hydraulic structures. The core algorithm requires no distribution assumptions. This method can improve the definition of the reachable distance for the local outlier factor algorithm, expanding the difference between high and low anomaly coefficients. Thus, it can easily distinguish the outliers from other data points. Based on the long-term monitoring data from a water transfer project, their credibility is calculated using such a sequence diagram for cases where the number and locations of outliers are unknown. Using the credibility as the weight of the regression model, the predictions are greatly improved in comparison to the unweighted model, verifying the effectiveness of our new method.

Key words: hydraulic structure, anomaly recognition, local outlier factor, outlier, long-term monitoring data

京ICP备13015787号-3
版权所有 © 2013《水力发电学报》编辑部
编辑部地址:中国北京清华大学水电工程系 邮政编码:100084 电话:010-62783813
本系统由北京玛格泰克科技发展有限公司设计开发  技术支持:support@magtech.com.cn