Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/119530
| Title: | Merge3D : efficient 3D multimodal LLMs via joint 2D-3D token merging | Authors: | Pan, T Yang, X Wang, X |
Issue Date: | 2026 | Source: | The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026, June 3 - Sun June 7, 2026, Colorado Convention Center, p. 31066-31077 | Abstract: | Multimodal Large Language Models (MLLMs) incorporating 3D geometry demonstrate significant power in 3D scene understanding. Their primary bottleneck, however, is the substantial computational burden associated with processing multi-view, lengthy visual token sequences. To surmount this challenge, we propose \textbf{Merge3D}, a geometry-aware token merging framework that integrates both 3D geometry and 2D semantic information. Conventional 2D compression methods, which rely solely on semantic signals, prove inadequate for 3D tasks, as they tend to discard spatially critical tokens and damage grounding performance. Merge3D bridges the modalities with a Semantic–Geometric Token Merger (SemGeo Merger): 2D attention is used to select semantically salient dominant tokens, while a hybrid 2D+3D similarity assigns and aggregates contextual tokens from spatially coherent 3D neighborhoods. This preserves 3D structural priors and inter-frame correspondences under aggressive compression. Merge3D achieves up to 70\% visual token reduction and up to ~3X inference speedup, while retaining strong performance on 3D grounding, captioning, and spatial reasoning benchmarks such as Scan2Cap, CV-Bench, and BLINK. | Description: | The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026, June 3 - Sun June 7, 2026, Colorado Convention Center The following paper Tianbo Pan, Xingyi Yang, Xinchao Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 31066-31077 is available at https://openaccess.thecvf.com/content/CVPR2026/html/Pan_Merge3D_Efficient_3D_Multimodal_LLMs_via_Joint_2D-3D_Token_Merging_CVPR_2026_paper.html |
| Appears in Collections: | Conference Paper |
Show full item record
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


