Mapping a deep neural network (DNN) layer onto domain-specific accelerators can require an intractable number of choices regarding loop factorization, ordering, and spatial unrolling. Determining the optimal mapping that achieves the best figures in terms of latency and energy efficiency can be difficult due to the vast number of possible candidates that need to be exhaustively evaluated. Many techniques have been recently proposed for fast and efficient mapping space exploration; some of them adopt a black-box optimization approach, others make assumptions on the underlying accelerator memory hierarchy or require time-consuming model retraining. We propose an integer linear programming (ILP) approach and formulate a mathematical model, namely LEMON, that takes into account number of accesses to each buffer, energy costs and buffer bandwidths in the accelerator and is flexible enough to work with different memory hierarchies. Compared with state-of-the-art techniques, LEMON achieves up to 83% energy-delay product reduction when compared to another ILP-based approach (CoSA) and 27% when compared to a genetic algorithm approach (GAMMA).
Memory-Aware DNN Algorithm-Hardware Mapping via Integer Linear Programming
Russo, Enrico;Palesi, Maurizio;Ascia, Giuseppe;Patti, Davide;Catania, Vincenzo
2023-01-01
Abstract
Mapping a deep neural network (DNN) layer onto domain-specific accelerators can require an intractable number of choices regarding loop factorization, ordering, and spatial unrolling. Determining the optimal mapping that achieves the best figures in terms of latency and energy efficiency can be difficult due to the vast number of possible candidates that need to be exhaustively evaluated. Many techniques have been recently proposed for fast and efficient mapping space exploration; some of them adopt a black-box optimization approach, others make assumptions on the underlying accelerator memory hierarchy or require time-consuming model retraining. We propose an integer linear programming (ILP) approach and formulate a mathematical model, namely LEMON, that takes into account number of accesses to each buffer, energy costs and buffer bandwidths in the accelerator and is flexible enough to work with different memory hierarchies. Compared with state-of-the-art techniques, LEMON achieves up to 83% energy-delay product reduction when compared to another ILP-based approach (CoSA) and 27% when compared to a genetic algorithm approach (GAMMA).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.