A multilevel approach to accelerate the training for Transformers

Guillaume Lauga, Maël Chaumette, Edgar Desainte-Maréville, Étienne Lassalle, Arthur Lebeurrier

March, 2025

Abstract

In this article, we investigate the potential of multilevel approaches to accelerate the training of transformer architectures. Using an ordinary differential equation (ODE) interpretation of these architectures, we propose an appropriate way of varying the discretization of these ODE Transformers in order to accelerate the training. We validate our approach experimentally by a comparison with the standard training procedure.

Type

Conference paper

Publication

GRETSI

Maël Chaumette

Ph.D Student

My research interests include machine learning and optimization.