基于注意力机制和多尺度特征的场景文本检测算法

基于注意力机制和多尺度特征的场景文本检测算法
DOI:
                        
CSTR:
                        
作者:
                        
作者单位:武汉工程大学电气信息学院
作者简介:
通讯作者:
中图分类号:TP391.4
基金项目:江西省主要学科学术和技术带头人培养计划--领军人才项目

Scene Text Detection Algorithm Based on Attention Mechanism and Multiscale Features

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对定位信息传播路径过长，不能充分挖掘不同尺度特征的语义信息，导致难以利用底层定位信息来预测文本边界框的问题，提出了一种基于注意力机制和多尺度特征的场景文本检测方法；改进特征提取模块，构建内嵌十字交叉注意力机制的特征提取器提取多尺度特征，获取上下文感知信息；引入特征融合模块路径聚合网络(PANet)对不同尺度的特征映射进行融合，并提供多尺度的上下文语义信息，使分割网络生成更精细的边界分割结果，并在预测阶段重建损失函数；为了验证该方法的有效性，在公开数据集ICDAR2015,CTW1500和Total-Text三个公开数据集上进行实验，其综合指标F值分别达到了87.4%，82.3%，83.4%，基于注意力机制和多尺度特征的场景文本检测CM-STD的性能优于经典的EAST方法，且可与当前的LOMO方法相媲美。

Abstract:

Aiming at the problem that the propagation path of the localization information is too long and the semantic information of different scale features cannot be fully mined, which makes it difficult to predict the text bounding box using the underlying localization information, a scene text detection method based on the attention mechanism and multi-scale features is proposed; the feature extraction module is improved, and a feature extractor with an embedded cross-crossing attention mechanism is constructed to extract the multi-scale features and obtain the context-aware information; Introducing the feature fusion module Path Aggregation Network (PANet) to fuse feature mappings of different scales and provide multi-scale contextual semantic information, so that the segmentation network generates finer boundary segmentation results and rebuilds the loss function in the prediction stage; in order to validate the effectiveness of the method, experiments are conducted on the three public datasets, namely, ICDAR2015, CTW1500 and Total-Text experiments on three public datasets, ICDAR2015,CTW1500 and Total-Text, and its comprehensive index F-value reaches 87.4%, 82.3% and 83.4%, respectively. The performance of CM-STD for scene text detection based on the attention mechanism and multi-scale features outperforms that of the classical EAST method, and it can be comparable to the current LOMO method.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-07-18
最后修改日期:2025-08-25
录用日期:2025-08-27
在线发布日期:
出版日期:

引用本文

相关视频

分享

文章指标

历史

文章二维码