In the last decades, different multivariate techniques have been applied to multidimensional dietary datasets to identify meaningful patterns reflecting the dietary habits of populations. Among them, principal component analysis (PCA) and cluster analysis represent the two most used techniques, either applied separately or in parallel. Here, we propose a workflow to combine PCA, hierarchical clustering, and a K-means algorithm in a novel approach for dietary pattern derivation. Since the workflow presents certain subjective decisions that might affect the final clustering solution, we also provide some alternatives in relation to different dietary data used. For example, we used the dietary data of 855 women from Catania, Italy. Our approach-defined as clustering on principal components-could be useful to leverage the strengths of each method and to obtain a better cluster solution. In fact, it seemed to disentangle dietary data better than simple clustering algorithms. However, before choosing between the alternatives proposed, it is suggested to consider the nature of dietary data and the main questions raised by the research.

The Application of Clustering on Principal Components for Nutritional Epidemiology: A Workflow to Derive Dietary Patterns

Maugeri, Andrea;Barchitta, Martina;Favara, Giuliana;La Mastra, Claudia;La Rosa, Maria Clara;Magnano San Lio, Roberta;Agodi, Antonella
2022-01-01

Abstract

In the last decades, different multivariate techniques have been applied to multidimensional dietary datasets to identify meaningful patterns reflecting the dietary habits of populations. Among them, principal component analysis (PCA) and cluster analysis represent the two most used techniques, either applied separately or in parallel. Here, we propose a workflow to combine PCA, hierarchical clustering, and a K-means algorithm in a novel approach for dietary pattern derivation. Since the workflow presents certain subjective decisions that might affect the final clustering solution, we also provide some alternatives in relation to different dietary data used. For example, we used the dietary data of 855 women from Catania, Italy. Our approach-defined as clustering on principal components-could be useful to leverage the strengths of each method and to obtain a better cluster solution. In fact, it seemed to disentangle dietary data better than simple clustering algorithms. However, before choosing between the alternatives proposed, it is suggested to consider the nature of dietary data and the main questions raised by the research.
2022
diet
dietary dataset
dietary factors
nutritional epidemiology
File in questo prodotto:
File Dimensione Formato  
The Application of Clustering.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 1.21 MB
Formato Adobe PDF
1.21 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/548662
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact