This paper addresses the classification of images depicting the eruptive activity of Mount Etna that were captured by a network of ground-based thermal cameras. This study aimed to evaluate the performance of Vision Transformers (ViTs), such as the Swin Trans former, compared with Convolutional Neural Networks (CNNs), including AlexNet and ShuffleNet. A dataset of 3000 images, evenly distributed across six classes, was utilized for training and testing. The results indicate that for this specific application, the performance advantage of Vision Transformers over CNNs was marginal, likely due to the nature of the classification task. While the Transformer-based models, like the Swin Transformer, demon strated a slightly improved accuracy for certain complex classes, the CNN-based models, such as AlexNet and ShuffleNet, exhibited superior computational efficiency, particularly in terms of the classification speed. These findings highlight the suitability of CNNs for real-time volcanic activity monitoring. Additionally, this paper provides a comprehensive review of the various CNN and Vision Transformer architectures, offering insights into their strengths and limitations in the context of volcanic activity classification

Exploring Vision Transformers and Convolution Neural Networks for the Thermal Image Classification of Volcanic Activity

Giuseppe Nunnari
2025-01-01

Abstract

This paper addresses the classification of images depicting the eruptive activity of Mount Etna that were captured by a network of ground-based thermal cameras. This study aimed to evaluate the performance of Vision Transformers (ViTs), such as the Swin Trans former, compared with Convolutional Neural Networks (CNNs), including AlexNet and ShuffleNet. A dataset of 3000 images, evenly distributed across six classes, was utilized for training and testing. The results indicate that for this specific application, the performance advantage of Vision Transformers over CNNs was marginal, likely due to the nature of the classification task. While the Transformer-based models, like the Swin Transformer, demon strated a slightly improved accuracy for certain complex classes, the CNN-based models, such as AlexNet and ShuffleNet, exhibited superior computational efficiency, particularly in terms of the classification speed. These findings highlight the suitability of CNNs for real-time volcanic activity monitoring. Additionally, this paper provides a comprehensive review of the various CNN and Vision Transformer architectures, offering insights into their strengths and limitations in the context of volcanic activity classification
2025
Vision Transformer; Swin Transformer; AlexNet; image classification; Etna activity; deep learning; real-time monitoring; thermal imaging; Convolutional Neural Networks (CNNs)
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/663709
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact