Voxel set transformer : a set-to-set approach to 3D object detection from point clouds

He, C; Li, R; Li, S; Zhang, L

doi:10.1109/CVPR52688.2022.00823

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/109489

DC Field	Value	Language
dc.contributor	Department of Computing	-
dc.creator	He, C	en_US
dc.creator	Li, R	en_US
dc.creator	Li, S	en_US
dc.creator	Zhang, L	en_US
dc.date.accessioned	2024-11-01T08:04:36Z	-
dc.date.available	2024-11-01T08:04:36Z	-
dc.identifier.isbn	978-1-6654-6946-3	en_US
dc.identifier.uri	http://hdl.handle.net/10397/109489	-
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.rights	© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	The following publication C. He, R. Li, S. Li and L. Zhang, "Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds," 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 8407-8417 is available at https://doi.org/10.1109/CVPR52688.2022.00823.	en_US
dc.title	Voxel set transformer : a set-to-set approach to 3D object detection from point clouds	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	8407	en_US
dc.identifier.epage	8417	en_US
dc.identifier.doi	10.1109/CVPR52688.2022.00823	en_US
dcterms.abstract	Transformer has demonstrated promising performance in many 2D vision tasks. However, it is cumbersome to compute the self-attention on large-scale point cloud data because point cloud is a long sequence and unevenly distributed in 3D space. To solve this issue, existing methods usually compute self-attention locally by grouping the points into clusters of the same size, or perform convolutional self-attention on a discretized representation. However, the former results in stochastic point dropout, while the latter typically has narrow attention fields. In this paper, we propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation. VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes. With the VSA module, VoxSeT can manage voxelized point clusters with arbitrary size in a wide range, and process them in parallel with linear complexity. The proposed VoxSeT integrates the high performance of transformer with the efficiency of voxel-based model, which can be used as a good alternative to the convolutional and point-based backbones. VoxSeT reports competitive results on the KITTI and Waymo detection benchmarks. The source codes can be found at https://github.com/skyhehe123/VoxSeT.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition : New Orleans, Louisiana, 19 - 24 June 2022, p. 8407-8417	en_US
dcterms.issued	2022	-
dc.identifier.scopus	2-s2.0-85137827073	-
dc.relation.ispartofbook	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition : New Orleans, Louisiana, 19 - 24 June 2022	en_US
dc.relation.conference	Conference on Computer Vision and Pattern Recognition [CVPR]	-
dc.description.validate	202411 bcch	-
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	OA_Others	-
dc.description.fundingSource	Self-funded	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
He_Voxel_Set_Transformer.pdf	Pre-Published version	2.33 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Page views

45

Citations as of Apr 14, 2025

Downloads

31

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

197

Citations as of Sep 12, 2025

Google Scholar^TM

Check