A transformer-based approach for source code classification for heterogeneous device mapping

Siino, Marco; Parisi, Emanuele; Barchi, Francesco; Acquaviva, Andrea; Bartolini, Andrea

doi:10.1016/j.engappai.2025.112987

The optimization of code allocation for heterogeneous architectures, such as Central Processing Units (CPUs) and Graphics Processing Units (GPUs), remains challenging due to the limitations of traditional compiler heuristics and existing machine learning approaches. This paper presents a systematic evaluation of Large Language Models (LLMs) for classifying source code execution targets in heterogeneous device mapping. We fine-tune and compare six models: Distilled Bidirectional Encoder Representations from Transformers (DistilBERT), Code Bidirectional Encoder Representations from Transformers (CodeBERT), Code Bidirectional Encoder Representations from Transformers with RoBERTa (Robustly Optimized BERT Pretraining Approach) architecture (CodeBERTa), CodeT5, jTrans, and Deep Learning Low Level Virtual Machine (DeepLLVM), trained on Open Computing Language (OpenCL) kernels. Results show that general-purpose LLMs achieve up to 92.8% accuracy, matching or surpassing code-specific models, and outperform the previous state of the art (DeepLLVM) by up to 5%. Our findings indicate that LLMs pre-trained on general text are not necessarily inferior to code-specialized models, with tokenizer design and pre-training objectives impacting performance more than domain specialization. These results demonstrate the effectiveness of Transformer-based LLMs as a state-of-the-art approach for source code classification in heterogeneous computing contexts.

A transformer-based approach for source code classification for heterogeneous device mapping

Marco Siino^Primo;Emanuele Parisi;Francesco Barchi;Andrea Acquaviva;Andrea Bartolini

Primo

2025-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

A transformer-based approach for source code classification for heterogeneous device mapping

Marco Siino Primo;Emanuele Parisi;Francesco Barchi;Andrea Acquaviva;Andrea Bartolini

Primo

2025-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Marco Siino^Primo;Emanuele Parisi;Francesco Barchi;Andrea Acquaviva;Andrea Bartolini

Scheda breve

Scheda completa

Scheda completa (DC)