The Application of Clustering on Principal Components for Nutritional Epidemiology: A Workflow to Derive Dietary Patterns

Maugeri, Andrea; Barchitta, Martina; Favara, Giuliana; La Mastra, Claudia; La Rosa, Maria Clara; Magnano San Lio, Roberta; Agodi, Antonella

doi:10.3390/nu15010195

In the last decades, different multivariate techniques have been applied to multidimensional dietary datasets to identify meaningful patterns reflecting the dietary habits of populations. Among them, principal component analysis (PCA) and cluster analysis represent the two most used techniques, either applied separately or in parallel. Here, we propose a workflow to combine PCA, hierarchical clustering, and a K-means algorithm in a novel approach for dietary pattern derivation. Since the workflow presents certain subjective decisions that might affect the final clustering solution, we also provide some alternatives in relation to different dietary data used. For example, we used the dietary data of 855 women from Catania, Italy. Our approach-defined as clustering on principal components-could be useful to leverage the strengths of each method and to obtain a better cluster solution. In fact, it seemed to disentangle dietary data better than simple clustering algorithms. However, before choosing between the alternatives proposed, it is suggested to consider the nature of dietary data and the main questions raised by the research.