The spectral clustering algorithm is a technique based on the properties of the pairwise similarity matrix coming from a suitable kernel function. It is a useful approach for high-dimensional data since the units are clustered in feature space with a reduced number of dimensions. In this paper, we consider a two-step modelbased approach within the spectral clustering framework. Based on simulated data, first, we discuss criteria for selecting the number of clusters and analyzing the robustness of the model-based approach concerning the choice of the proximity parameters of the kernel functions. Finally, we consider applications of the spectral methods to cluster five real textual datasets and, in this framework, a new kernel function is also proposed. The approach is illustrated on the ground of a large numerical study based on both simulated and real datasets.

A mixture model approach to spectral clustering and application to textual data

Di Nuzzo Cinzia
Primo
Methodology
;
Ingrassia Salvatore
Secondo
Methodology
2022-01-01

Abstract

The spectral clustering algorithm is a technique based on the properties of the pairwise similarity matrix coming from a suitable kernel function. It is a useful approach for high-dimensional data since the units are clustered in feature space with a reduced number of dimensions. In this paper, we consider a two-step modelbased approach within the spectral clustering framework. Based on simulated data, first, we discuss criteria for selecting the number of clusters and analyzing the robustness of the model-based approach concerning the choice of the proximity parameters of the kernel functions. Finally, we consider applications of the spectral methods to cluster five real textual datasets and, in this framework, a new kernel function is also proposed. The approach is illustrated on the ground of a large numerical study based on both simulated and real datasets.
2022
Spectral clustering
Gaussian mixture models
Document classification
File in questo prodotto:
File Dimensione Formato  
s10260-022-00635-4.pdf

solo gestori archivio

Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.46 MB
Formato Adobe PDF
1.46 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/544981
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact