基于三维高斯模型的轻量级分割和编辑方法
DOI:
CSTR:
作者:
作者单位:

浙江工业大学 信息工程学院

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

国家自然科学基金(62373329);浙江省自然科学基金委员会、白马湖实验室联合基金、浙江省自然科学基金重大项目(LBMHD24F030002)


Efficient Segmentation and Editing Based on 3D Gaussian Splatting
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    三维场景语义理解作为计算机视觉领域的核心难题之一,其目标在于实现对三维空间结构的精确识别与分割;随着无人驾驶、机器人自主导航等应用场景的持续演进,该任务面临着愈发严苛的挑战标准;近年来,3D-GS的革新性地提出,在保证渲染精度与基线工作相当的前提下,将重建效率提升数个数量级;然而当前学术界的探索尚未充分解决基于3D-GS范式的语义解耦问题;由于三维数据在复杂度和存储要求等方面都远远超过二维数据集,高质量标注的三维数据集较为稀缺,直接训练神经网络理解三维语义往往是困难的;针对上述挑战,因此提出了一种通过二维语义先验知识,编码低维度信息的办法;通过预训练的二维语义分割网络提取其中的先验知识,基于可微体渲染的思想训练一个低维语义信息;用动态阈值实现语义场粗分割后,再利用统计学算法滤除噪点重校准;通过解耦式语义场绑定方案,实现参数的独立控制;通过大量实验,验证了该方法能够通过数秒钟的优化达到之前基线方法的水准,并能够无缝集成至场景编辑等下游任务。

    Abstract:

    3D scene semantic understanding is a fundamental challenge in computer vision, aiming to accurately recognize and segment three-dimensional spatial structures. With the advancement of applications such as autonomous driving and robotic navigation, this task faces increasingly stringent requirements.The recent introduction of 3D Gaussian Splatting has significantly improved reconstruction efficiency by several orders of magnitude while maintaining comparable rendering quality to baseline methods. However, semantic decoupling under the 3DGS framework remains insufficiently explored.Given the high complexity and limited availability of annotated 3D datasets, directly training networks for 3D semantic understanding is challenging. To address this, we propose a method that leverages 2D semantic priors to encode low-dimensional semantic fields, extracting prior knowledge via a pre-trained 2D segmentation network and training with differentiable volume rendering.A dynamic thresholding strategy enables coarse semantic segmentation, followed by statistical denoising and recalibration. Furthermore, a decoupled semantic binding approach is introduced for independent parameter control.Extensive experiments show that our method achieves baseline performance within seconds of optimization and can be seamlessly integrated into downstream tasks such as scene editing.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-03-26
  • 最后修改日期:2025-04-29
  • 录用日期:2025-04-30
  • 在线发布日期:
  • 出版日期:
文章二维码