基于嵌入式注意机制的目标语音提取算法-Target Speech Extraction Algorithm based on Embedded Atten-tion Mechanism

基于嵌入式注意机制的目标语音提取算法

2023,31(10):174-181

郭志楷, 杨明堃, 蒋国峰, 陶祁, 刘欢欢, 马红强

空军工程大学航空机务士官学校

摘要：摘要:针对说话人语音提取问题,提出了一种基于深度神经网络多任务学习的嵌入式注意机制单声道说话人语音提取方法。该算法将语音分离和语音提取统一到单个框架中,向频谱映射分离模型中嵌入说话人注意机制,并在引入说话人辅助信息的注意机制中得到时变注意权重,利用时变注意权重分离出目标说话人的内部嵌入向量,随后采用提取模型对目标说话人的嵌入向量进行非线性处理运算,估计出目标说话人对应的掩蔽,进而提取出目标说话人语音。同时借助TIMIT数据集,进行了语音提取实验。实验结果验证了所提算法的可行性和有效性,并在说话人语音提取的性能上有明显的优越性。

关键词：深度神经网络；单声道说话人语音提取；多任务学习；嵌入式注意机制

Target Speech Extraction Algorithm based on Embedded Atten-tion Mechanism

郭志楷

Abstract：Aiming at the problem of speaker speech extraction, a mono speaker speech extraction method based on deep neural network multi-task learning embedded attention mechanism is proposed. The algorithm unifies speech separation and speech extraction into a single framework, embedding the speaker attention mechanism into the spectrum mapping separation network, embeds the speaker attention mechanism in the spectrum mapping separation network, obtains the time-varying attention weight in the attention mechanism the speaker auxiliary information, uses the time-varying attention weight to separate the internal embedding vector of the target speaker, and then uses the extraction model to perform nonlinear processing operations on the embedding vector of the target speaker, estimates the mask corresponding to the target speaker, and then extracts the target speaker’s voice. At the same time, using the TIMIT dataset, speech extraction experiments are carried out. Experimental results verify the feasibility and effectiveness of the proposed algorithm, and have obvious superiority in the performance of speaker speech extraction.

Key words：deep neural network; monophonic speaker speech extraction; multi-task learning; embedded attention mechanism

收稿日期：2023-04-24

基金项目：

下载PDF全文