UC San Diego
Title: Two Mathematical Lessons of Deep Learning
Date: Friday, January 15, 2021
Place and Time: Zoom, 3:35-4:25 pm
Abstract. Recent empirical successes of deep learning have exposed significant gaps in our fundamental understanding of learning and optimization mechanisms. Modern best practices for model selection are in direct contradiction to the methodologies suggested by classical analyses. Similarly, the efficiency of SGD-based local methods used in training modern models, appeared at odds with the standard intuitions in optimization. First, I will present the evidence, empirical and mathematical, that necessitates revisiting classical notions, such as over-fitting. I will continue to discuss the emerging understanding of generalization, and, in particular, the "double descent" risk curve, which extends the classical U-shaped generalization curve beyond the point of interpolation. Second, I will discuss why the landscapes of over-parameterized neural networks are essentially never convex, even locally. Yet, they satisfy the local Polyak-Lojasiewicz condition, which allows SGD-type methods to converge to a global minimum. A key piece of the puzzle remains - how do these lessons come together to form a complete mathematical picture of modern ML?