神经网络训练处理器的浮点运算优化架构-Floating Point Optimization Architecture of Neural Network Training Processor

神经网络训练处理器的浮点运算优化架构

2023,31(6):176-182

张立博, 李昌伟, 齐伟, 王刚, 戚鲁凤

中国绿发投资集团有限公司

摘要：针对神经网络训练加速器中存在权重梯度计算效率低的问题,设计了一种高性能卷积神经网络(CNN)训练处理器的浮点运算优化架构。在分析CNN训练架构基本原理的基础上, 提出了包括32bit、24bit、16bit和混合精度的训练优化架构,从而找到适用于低能耗且更小尺寸边缘设备的最佳浮点格式。通过现场可编程门阵列(FPGA)验证了加速器引擎可用于MNIST手写数字数据集的推理和训练,利用24bit自定义浮点格式与16bit脑浮点格式相结合构成混合卷积24bit浮点格式的准确率可达到93%以上。运用台积电55nm芯片实现优化混合精度加速器,训练每幅图像的能耗为8.51μJ。

关键词：卷积神经网络;浮点运算;加速器;权重梯度;处理器

Floating Point Optimization Architecture of Neural Network Training Processor

Abstract：Aiming at the low efficiency of weight gradient calculation in neural network training accelerator, this paper designs a floating-point operation optimization architecture of high-performance convolutional neural network (CNN) training processor. Based on the analysis of the basic principle of CNN training architecture, a training optimization architecture including 32bit, 24bit, 16bit and mixed accuracy is proposed, so as to find the best floating-point format for edge devices with low energy consumption and smaller size. The field programmable gate array (FPGA) verifies that the accelerator engine can be used for the reasoning and training of MNIST handwritten digital data sets. The accuracy of the hybrid convolution 24bit floating-point format formed by the combination of 24bit custom floating-point format and 16bit brain floating-point format can reach more than 93%. TSMC 55nm chip is used to realize the optimized hybrid accuracy accelerator, and the energy consumption of each image is 8.51μJ.

Key words：convolutional neural network; Floating point operation; Accelerator; Weight gradient; processor

收稿日期：2022-10-17

基金项目：中国绿发投资集团有限公司科技项目(项目编号：CGDG529000220008；项目名称：多产业融合下的数据治理体系研究)

下载PDF全文