Exploring Vision Transformers and Convolution Neural Networks for the Thermal Image Classification of Volcanic Activity

IRIS

This paper addresses the classification of images depicting the eruptive activity of Mount Etna that were captured by a network of ground-based thermal cameras. This study aimed to evaluate the performance of Vision Transformers (ViTs), such as the Swin Trans former, compared with Convolutional Neural Networks (CNNs), including AlexNet and ShuffleNet. A dataset of 3000 images, evenly distributed across six classes, was utilized for training and testing. The results indicate that for this specific application, the performance advantage of Vision Transformers over CNNs was marginal, likely due to the nature of the classification task. While the Transformer-based models, like the Swin Transformer, demon strated a slightly improved accuracy for certain complex classes, the CNN-based models, such as AlexNet and ShuffleNet, exhibited superior computational efficiency, particularly in terms of the classification speed. These findings highlight the suitability of CNNs for real-time volcanic activity monitoring. Additionally, this paper provides a comprehensive review of the various CNN and Vision Transformer architectures, offering insights into their strengths and limitations in the context of volcanic activity classification

Exploring Vision Transformers and Convolution Neural Networks for the Thermal Image Classification of Volcanic Activity

Giuseppe Nunnari

2025-01-01

Abstract

This paper addresses the classification of images depicting the eruptive activity of Mount Etna that were captured by a network of ground-based thermal cameras. This study aimed to evaluate the performance of Vision Transformers (ViTs), such as the Swin Trans former, compared with Convolutional Neural Networks (CNNs), including AlexNet and ShuffleNet. A dataset of 3000 images, evenly distributed across six classes, was utilized for training and testing. The results indicate that for this specific application, the performance advantage of Vision Transformers over CNNs was marginal, likely due to the nature of the classification task. While the Transformer-based models, like the Swin Transformer, demon strated a slightly improved accuracy for certain complex classes, the CNN-based models, such as AlexNet and ShuffleNet, exhibited superior computational efficiency, particularly in terms of the classification speed. These findings highlight the suitability of CNNs for real-time volcanic activity monitoring. Additionally, this paper provides a comprehensive review of the various CNN and Vision Transformer architectures, offering insights into their strengths and limitations in the context of volcanic activity classification

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				Vision Transformer; Swin Transformer; AlexNet; image classification; Etna  activity; deep learning; real-time monitoring; thermal imaging; Convolutional Neural  Networks (CNNs)
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/663709

Citazioni

ND

ND

ND

social impact