水力发电学报
            首 页   |   期刊介绍   |   编委会   |   投稿须知   |   下载中心   |   联系我们   |   学术规范   |   编辑部公告   |   English

水力发电学报 ›› 2021, Vol. 40 ›› Issue (6): 139-151.doi: 10.11660/slfdxb.20210613

• • 上一篇    下一篇

混凝土坝施工文档实体知识智能挖掘方法

  

  • 出版日期:2021-06-25 发布日期:2021-06-25

Intelligent data mining approach of text entity knowledge from construction documents of concrete dams

  • Online:2021-06-25 Published:2021-06-25

摘要: 混凝土坝施工信息多以文档文本的形式呈现,其体量大、分布广、内在关系复杂,人工操作难以准确、高效地提取信息知识内容,理清错综复杂的施工信息关系。在自然语言处理技术中,命名实体是文本信息知识的载体,实现精确快速的实体识别是施工知识挖掘的重要前提。本文提出一种融合深度学习与关联规则技术的混凝土坝施工文档知识智能识别及挖掘分析方法。该方法耦合双向长短期记忆神经网络(bi-directional long-short term memory,Bi-LSTM)与条件随机场(conditional random field,CRF),定义混凝土坝施工实体类型,构建命名实体识别模型,形成混凝土坝施工实体知识集合;在此基础上,考虑施工文本表达规律及实体类型,预定义实体之间关系,确定施工实体组合形式,形成实体关联规则提取技术;以实体关联规则提取技术为导向,改进Apriori算法计算频繁项集,获得实体间的强关联规则。该方法应用于实际混凝土坝施工监理周报中,经过计算得到命名实体识别的精确率为86.42%,验证了该方法的准确性。利用改进Apriori算法分析实体间的关联规则,证明了改进算法的优势,有助于提升混凝土坝施工文档知识分析的智能化与精细化水平。

关键词: 混凝土坝, 施工文档, 命名实体, 智能识别, 深度学习, 知识挖掘

Abstract: The construction information of concrete dams is mostly expressed in form of document text, which is characterized by a wealth of information, wide distribution, and complex internal relations; manual operation finds it difficult to accurately extract information knowledge and sort out complicated relationships of construction information. In natural language processing, named entities are the carriers of text information, and realizing accurate and fast entity recognition is an important premise of construction knowledge mining. This paper describes a knowledge intelligent recognition and analysis method that combines deep learning and association rule technique for processing the construction documents of concrete dams. The types of concrete dam construction entities are defined; the bi-directional long-short term memory (Bi-LSTM) and conditional random field (CRF) methods are used to build named entity recognition models and generate construction entity knowledge sets. Further, we develop an entity association rule extraction technique by considering the expression rules and entity types of the text, predefining the relationships between the entities, and determining their combination forms. And we use this method to improve the Apriori algorithm and obtain strong association rules by calculating the frequent itemset. Application to the weekly report text for construction supervision of a concrete dam verifies the method, and shows its accuracy of 86.4% in recognition of named entities. The improved Apriori algorithm is used to analyze the association rules between the entities, demonstrating its advantages and usefulness in raising the intelligence and refinement level of document knowledge extraction and analysis for concrete dam construction.

Key words: concrete dam, construction document, named entity, intelligent recognition, deep learning, knowledge mining

京ICP备13015787号-3
版权所有 © 2013《水力发电学报》编辑部
编辑部地址:中国北京清华大学水电工程系 邮政编码:100084 电话:010-62783813
本系统由北京玛格泰克科技发展有限公司设计开发  技术支持:support@magtech.com.cn