This paper addresses the classification of images depicting the eruptive activity of Mount Etna that were captured by a network of ground-based thermal cameras. This study aimed to evaluate the performance of Vision Transformers (ViTs), such as the Swin Trans former, compared with Convolutional Neural Networks (CNNs), including AlexNet and ShuffleNet. A dataset of 3000 images, evenly distributed across six classes, was utilized for training and testing. The results indicate that for this specific application, the performance advantage of Vision Transformers over CNNs was marginal, likely due to the nature of the classification task. While the Transformer-based models, like the Swin Transformer, demon strated a slightly improved accuracy for certain complex classes, the CNN-based models, such as AlexNet and ShuffleNet, exhibited superior computational efficiency, particularly in terms of the classification speed. These findings highlight the suitability of CNNs for real-time volcanic activity monitoring. Additionally, this paper provides a comprehensive review of the various CNN and Vision Transformer architectures, offering insights into their strengths and limitations in the context of volcanic activity classification
Exploring Vision Transformers and Convolution Neural Networks for the Thermal Image Classification of Volcanic Activity
Giuseppe Nunnari
2025-01-01
Abstract
This paper addresses the classification of images depicting the eruptive activity of Mount Etna that were captured by a network of ground-based thermal cameras. This study aimed to evaluate the performance of Vision Transformers (ViTs), such as the Swin Trans former, compared with Convolutional Neural Networks (CNNs), including AlexNet and ShuffleNet. A dataset of 3000 images, evenly distributed across six classes, was utilized for training and testing. The results indicate that for this specific application, the performance advantage of Vision Transformers over CNNs was marginal, likely due to the nature of the classification task. While the Transformer-based models, like the Swin Transformer, demon strated a slightly improved accuracy for certain complex classes, the CNN-based models, such as AlexNet and ShuffleNet, exhibited superior computational efficiency, particularly in terms of the classification speed. These findings highlight the suitability of CNNs for real-time volcanic activity monitoring. Additionally, this paper provides a comprehensive review of the various CNN and Vision Transformer architectures, offering insights into their strengths and limitations in the context of volcanic activity classificationI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.