MeT: A graph transformer for semantic segmentation of 3D meshes

IRIS

Polygonal meshes have become the standard for discretely approximating 3D shapes, thanks to their efficiency and high flexibility in capturing non-uniform shapes. This non-uniformity, however, leads to irregularity in the mesh structure, making tasks like segmentation of 3D meshes particularly challenging. Semantic segmentation of 3D mesh has been typically addressed through CNN-based approaches, leading to good accuracy. Recently, transformers have gained enough momentum both in NLP and computer vision fields, achieving performance at least on par with CNN models, supporting the long-sought architecture universalism. Following this trend, we propose a transformer-based method for semantic segmentation of 3D mesh motivated by a better modeling of the graph structure of meshes, by means of global attention mechanisms. In order to address the limitations of standard transformer architectures in modeling relative positions of non-sequential data, as in the case of 3D meshes, as well as in capturing the local context, we perform positional encoding by means the Laplacian eigenvectors of the adjacency matrix, replacing the traditional sinusoidal positional encodings, and by introducing clustering-based features into the self-attention and cross-attention operators. Experimental results, carried out on three sets of the Shape COSEG Dataset (Wang et al., 2012), on the human segmentation dataset proposed in Maron et al. (2017) and on the ShapeNet benchmark (Chang et al., 2015), show how the proposed approach yields state-of-the-art performance on semantic segmentation of 3D meshes.

MeT: A graph transformer for semantic segmentation of 3D meshes

Vecchio G.;Prezzavento L.;Pino C.;Rundo F.;Palazzo S.;Spampinato C.

2023-01-01

Abstract

Polygonal meshes have become the standard for discretely approximating 3D shapes, thanks to their efficiency and high flexibility in capturing non-uniform shapes. This non-uniformity, however, leads to irregularity in the mesh structure, making tasks like segmentation of 3D meshes particularly challenging. Semantic segmentation of 3D mesh has been typically addressed through CNN-based approaches, leading to good accuracy. Recently, transformers have gained enough momentum both in NLP and computer vision fields, achieving performance at least on par with CNN models, supporting the long-sought architecture universalism. Following this trend, we propose a transformer-based method for semantic segmentation of 3D mesh motivated by a better modeling of the graph structure of meshes, by means of global attention mechanisms. In order to address the limitations of standard transformer architectures in modeling relative positions of non-sequential data, as in the case of 3D meshes, as well as in capturing the local context, we perform positional encoding by means the Laplacian eigenvectors of the adjacency matrix, replacing the traditional sinusoidal positional encodings, and by introducing clustering-based features into the self-attention and cross-attention operators. Experimental results, carried out on three sets of the Shape COSEG Dataset (Wang et al., 2012), on the human segmentation dataset proposed in Maron et al. (2017) and on the ShapeNet benchmark (Chang et al., 2015), show how the proposed approach yields state-of-the-art performance on semantic segmentation of 3D meshes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Parole chiave
	
				3D mesh segmentation
3D meshes
Graph neural networks
Segmentation
Transformers
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
MeT A graph transformer for semantic segmentation of 3D meshes.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 2.05 MB Formato Adobe PDF Visualizza/Apri	2.05 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/638210

Citazioni

ND

5

3

social impact