**Data Science and Machine Learning**

**Organizer:**

M. Bauer

**Date and Place:
**

Remote; Friday 1:25PM-2:15PM

**Fall 2020:**

**Talk 1**(09/04/2020)

**Speaker:**Lingjiong Zhu (Florida State University)

**Title:**Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

**Abstract:**Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the theory is considerably less developed in the context of deep learning where the problem is non-convex and the gradient noise might exhibit a heavy-tailed behavior, as empirically observed in recent studies. In this study, we consider a continuous-time variant of SGDm, known as the underdamped Langevin dynamics (ULD), and investigate its asymptotic properties under heavy-tailed perturbations. Supported by recent studies from statistical physics, we argue both theoretically and empirically that the heavy-tails of such perturbations can result in a bias even when the step-size is small, in the sense that the optima of stationary distribution of the dynamics might not match the optima of the cost function to be optimized. As a remedy, we develop a novel framework, which we coin as fractional ULD (FULD), and prove that FULD targets the so-called Gibbs distribution, whose optima exactly match the optima of the original cost. We support our theory with experiments conducted on a synthetic model and neural networks. This is based on joint work with Umut Simsekli, Yee Whye Teh and Mert Gurbuzbalaban.

**Talk 2**(09/11/2020)

**Speaker:**Emmanuel Hartman (Florida State University)

**Title:**A supervised deep learning approach for the computation of elastic SRV distances

**Abstract:**The square root velocity (SRV) transform allows one to define a computable distance between spatial curves regardless of actions that preserve their shape, such as translations, rotations, or parameterization. Computing the SRV distance usually requires searching for an optimal reparameterization to match the curves. Instead, we introduce a supervised deep learning framework for the direct computation of SRV distances. We will discuss several experiments that demonstrate the effectiveness of this framework both in terms of computational speed and accuracy.

**Talk 3**(09/25/2020)

**Speaker:**Amanpreet Singh (University of Utah)

**Title:**Calculating the Wasserstein metric and density estimation by solving the Monge-Ampere equation using Deep Learning

**Abstract:**Physics Informed Neural Networks(PINNs) have shown promise in solving Partial Differential Equations (PDEs) given only a random set of points in the domain of interest. We focus on solving Monge's formulation of the optimal transport problem to find a transport map between two probability measures. We reformulate the Monge-Ampere equation in Brenier's sense using KL-Divergence so as to work with random samples from an arbitrary distribution. Once we have the transport map computing the Wasserstein metric becomes trivial. We present a few examples by deforming a unit Gaussian to various different distributions and computing the Wasserstein metric between them.

**Talk 4**(10/02/2020)

**Speaker:**Martins Bruveris (Onfido)

**Title:**Face recognition system: how to train them and how biased are they?

**Class 5**(10/09/2020)

**Speaker:**Bei Wang (University of Utah)

**Title:**TopoAct: Visually Exploring the Shape of Activations in Deep Learning

**Abstract:**Deep neural networks such as GoogLeNet and ResNet have achieved impressive performance in tasks like image classification. To understand how such performance is achieved, we can probe a trained deep neural network by studying neuron activations, that is, combinations of neuron firings, at any layer of the network in response to a particular input. With a large set of input images, we aim to obtain a global view of what neurons detect by studying their activations. We ask the following questions: What is the shape of the space of activations? That is, what is the organizational principle behind neuron activations, and how are the activations related within a layer and across layers? Applying tools from topological data analysis, we present TopoAct, a visual exploration system used to study topological summaries of activation vectors for a single layer as well as the evolution of such summaries across multiple layers. We present visual exploration scenarios using TopoAct that provide valuable insights towards learned representations of an image classifier.

**Class 6**(10/16/2020)

**Speaker:**Xiaoyu Wang (FSU)

**Title:**Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin Dynamics

**Abstract:**Stochastic Gradient Langevin Dynamics (SGLD) is a powerful algorithm for optimizing a non-convex objective, where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates towards a global minimum. SGLD is based on the overdamped Langevin diffusion which is reversible in time. By adding an anti-symmetric matrix to the drift term of the overdamped Langevin diffusion, one gets a non-reversible diffusion that converges to the same stationary distribution with a faster convergence rate. In this paper, we study the Non-reversible Stochastic Gradient Langevin Dynamics (NSGLD) which is based on discretization of the non-reversible Langevin diffusion. We provide finite-time performance bounds for the global convergence of NSGLD for solving stochastic non-convex optimization problems. Our results lead to non-asymptotic guarantees for both population and empirical risk minimization problems. Numerical experiments for Bayesian independent component analysis and neural network models show that NSGLD can outperform SGLD with proper choices of the anti-symmetric matrix.

**Class 7**(10/23/2020)

**Speaker:**John Abascal (FSU)

**Title:**The Double Descent Phenomenon

**Abstract:**Machine learning has become one of the most popular research fields in mathematics, computer science, and statistics in the past few years. Often times, we focus so much on its numerous, amazing applications to the sciences that we forget about the math that makes it tick. Because computational power is so widely available today, machine learning researchers have been able to architect massive models with millions of parameters. Some of these models that allow for a variable amount of parameters have exhibited a peculiar behavior called ?double descent? when the amount of parameters is increased when training on a fixed dataset. Understanding the theory that causes double descent may lead to ?free? performance gains for machine learning practitioners.

**Class 8**(10/30/2020)

**Speaker:**Tom Needham (Florida State University)

**Title:**Gromov-Wasserstein Distance and Network Analysis

**Abstract:**I'll introduce Gromov-Wasserstein (GW) distance, a convex relaxation of Gromov-Hausdorff distance which gives a way to compare probability distributions on different metric spaces. Originally introduced independently by K. T. Sturm and F. Memoli around a decade ago, GW distance has found recent popularity in the machine learning community, where one frequently wants to compare distributions on a priori incomparable spaces such as networks. I?ll discuss theoretical and computational aspects of GW distance, with a specific application to network partitioning; i.e., the unsupervised learning task of discovering communities in a network.

**Class 9**(11/13/2020)

**Speaker:**Patrick Eastham (Florida State University)

**Title:**Introduction to Neural ODEs

**Abstract:**

**Class 10**(11/20/2020)

**Speaker:**Said Ouala (UBL, Brest, France)

**Title:**Constrained Neural Embedding of Partially Observed Systems

**Abstract:**The learning of data-driven representations of dynamical systems arises as a relevant alternative to model-driven strategies for several applications ranging from system identification, forecasting, reconstruction and control. However, when considering observation data issued from complex fields as encountered in ocean and climate science, data-driven representations should be considered with care to account for the proper specifications of the underlying dynamics. In this work, we consider the identification of an Ordinary Differential Equation (ODE) from a set of partial observations. We do not rely explicitly on classical geometrical re- construction techniques such as Takens?s delay embedding and we rather aim to identify an augmented space of higher dimension than the observations, where the dynamics can be fully described by an ODE. Regarding the learning of the ODE, we show that classical short-term forecast criterion does not guarantee the model to satisfy elementary conservation constraints which are of key importance and relate to the boundedness of the learnt models. We show that enforcing this constraints in the identification scheme of the ODE improves the generalisation performances as we guarantee the existence of a monotoni- cally attracting trapping region of the reconstructed limit cycle. We report experiments on linear, non-linear and chaotic dynamics, which illustrate the relevance of the proposed framework compared to state-of-the-art approaches.

**Past Talks (Spring 2020):**

**Class 1**(01/17/2020)

**Speaker:**Arash Fahim (Florida State University)

**Title:**Stochastic Gradient Descent

**Class 2**(01/31/2020)

**Speaker:**Sathyanarayanan Chandramouli and Samuel Dent (Florida State University)

**Title:**Deep Forward Networks

**Class 3**(02/07/2020)

**Speaker:**Sathyanarayanan Chandramouli and Samuel Dent (Florida State University)

**Title:**Deep Forward Networks

**Class 4**(02/14/2020)

**Speaker:**Tyler Foster (Florida State University)

**Title:**GANs

**Class 5**(02/21/2020)

**Speaker:**Emmanuel Hartman (Florida State University)

**Title:**StyleGAN--www.thispersondoesnotexist.com

**Class 6**(03/06/2020)

**Speaker:**Joshua Kimrey (Florida State University)

**Title:**Regularization

**Class 7**(03/27/2020)

**Speaker:**Patrick Eastham (FSU)

**Title:**ML and Fluid Mechanics

**Guest Lecture**(04/03/2020)

**Speaker:**Vasileios Maroulas (University of Tennessee)

**Title:**TBA

**Class 9**(04/10/2020)

**Speaker:**John Abascal (Florida State University)

**Title:**The Double Descent Phenomenon

**Past Talks (Fall 2019):**

**Class 1**(09/06/2019)

**Speaker:**Aseel Farhat (Florida State University)

**Title:**SVD-Decomposition and organizational meeting

**Class 2**(09/13/2019)

**Speaker:**Washington Mio (Florida State University)

**Title:**PCA

**Class 3**(09/20/2019)

**Speaker:**Tyler Foster (Florida State University)

**Title:**Regression

**Class 4**(09/27/2019)

**Special Colloquium by Mert Gurbuzbalaban**

**Title:**TBA

**Class 5**(10/4/2019)

**Speaker:**Haibin Hang (Florida State University)

**Title:**Clustering and Classification I

**Class 6**(10/11/2019)

**Speaker:**Osman Okutan (Florida State University)

**Title:**Clustering and Classification II

**Class 7**(11/01/2019)

**Speaker:**Tom Needham

**Title:**Clustering and Classification using Python

**Class 8**(11/08/2019)

**Speaker:**Alex Vlasiuk

**Title:**Fourier-transform, Sparsity and compressed Sensing

**Class 9**(11/15/2019)

**Speaker: Patrick Eastham and Joshua Kimrey**

**Title:**Machine Learning Basics I

**Class 10**(11/22/2019)

**Speaker: Zhe Su (FSU)**

**Title:**Machine Learning Basics II