Geometric Mechanics and Control Seminar
Deep neural networks as structure preserving optimal control problems
Speaker: Brynjulf Owren (Norwegian University of Science and Technology, Norway)Date: Friday, 26 February 2021 - 15:30Place: Online - zoom.us/j/93006808687?pwd=THpXbzhXTGJJY29KeXQxRTUvaGN1QT09 (ID: 930 0680 8687; Access code: 659820)
A deep neural network model consists a large number of layers, each with a number of parameters or controls associated to them. In supervised learning, these parameters are optimised to match training data in the best possible way. The data are propagated through the layers by nonlinear transformations, and in an important subclass of models (ResNet) the transformation can be seen as the numerical flow of a some continuous vector field. Ruthotto and Haber (2017) as well as Cheng et al. have experimented in using different type of vector fields to improve the deep learning model. In particular it is of interest that the trained model has good long time behaviour and is stable in the deep limit, when the number of layers tends to infinity. The models presented in the literature have certain builtin structural properties, they can for instance be gradient flows or Hamiltonian vector fields. A difficulty is however that the models are not autonomous and therefore it is less clear what their flows actually preserve. Starting from such ResNet vector fields, we shall discuss their properties and derive some new nonlinear stability bounds. The long time behaviour of these neural ODE flows is important in the generalisation mode, i.e. after the model has been trained. But also in the training algorithm itself, structure preserving numerical schemes are important. In deep learning models, the use of gradient flows for optimisation is prevalent, and there exists a number of different algorithms that can be used, some of them can be interpreted as approximations of the flow of certain vector fields with dissipations, such as conformal Hamiltonian systems. If time permits, we will briefly discuss also these algorithms and in particular the need for and efficiency of regularisation. Joint work with: Martin Benning, Elena Celledoni, Matthias Ehrhardt, Christian Etmann, Robert McLachlan, Carola-Bibiane Schönlieb and Ferdia Sherry.