Memory-Aware DNN Algorithm-Hardware Mapping via Integer Linear Programming

IRIS

Mapping a deep neural network (DNN) layer onto domain-specific accelerators can require an intractable number of choices regarding loop factorization, ordering, and spatial unrolling. Determining the optimal mapping that achieves the best figures in terms of latency and energy efficiency can be difficult due to the vast number of possible candidates that need to be exhaustively evaluated. Many techniques have been recently proposed for fast and efficient mapping space exploration; some of them adopt a black-box optimization approach, others make assumptions on the underlying accelerator memory hierarchy or require time-consuming model retraining. We propose an integer linear programming (ILP) approach and formulate a mathematical model, namely LEMON, that takes into account number of accesses to each buffer, energy costs and buffer bandwidths in the accelerator and is flexible enough to work with different memory hierarchies. Compared with state-of-the-art techniques, LEMON achieves up to 83% energy-delay product reduction when compared to another ILP-based approach (CoSA) and 27% when compared to a genetic algorithm approach (GAMMA).

Memory-Aware DNN Algorithm-Hardware Mapping via Integer Linear Programming

Russo, Enrico;Palesi, Maurizio;Ascia, Giuseppe;Patti, Davide;Monteleone, Salvatore;Catania, Vincenzo

2023-01-01

Abstract

Mapping a deep neural network (DNN) layer onto domain-specific accelerators can require an intractable number of choices regarding loop factorization, ordering, and spatial unrolling. Determining the optimal mapping that achieves the best figures in terms of latency and energy efficiency can be difficult due to the vast number of possible candidates that need to be exhaustively evaluated. Many techniques have been recently proposed for fast and efficient mapping space exploration; some of them adopt a black-box optimization approach, others make assumptions on the underlying accelerator memory hierarchy or require time-consuming model retraining. We propose an integer linear programming (ILP) approach and formulate a mathematical model, namely LEMON, that takes into account number of accesses to each buffer, energy costs and buffer bandwidths in the accelerator and is flexible enough to work with different memory hierarchies. Compared with state-of-the-art techniques, LEMON achieves up to 83% energy-delay product reduction when compared to another ILP-based approach (CoSA) and 27% when compared to a genetic algorithm approach (GAMMA).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Parole chiave
	
				accelerator
cnn
dataflow
deep neural networks
dnn
domain-specific
dsa
mapper
mapping
npu
optimization
scheduling
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/650010

Citazioni

ND

7

3

social impact