FSUMATH
Florida State University Seal

Department of Mathematics

College of Arts and Sciences

Mathematics Colloquium


Wenjing Liao
Georgia Tech

Title: Exploiting Low-Dimensional Data Structures and Estimating Scaling Laws for Transformer Neural Networks
Date: Friday, October 11, 2024
Place and Time: Love 101, 3:05-3:55 pm

Abstract. When training deep neural networks, a model’s generalization error is often observed to follow a power scaling law dependent on the model size and the data size. Perhaps the best-known example of such scaling laws is for transformer-based large language models (LLMs), where networks with billions of parameters are trained on trillions of tokens of text. One theoretical interest in LLMs is to understand why transformer scaling laws exist. To answer this question, we exploit low-dimensional structures in language datasets by estimating its intrinsic dimension, and establish statistical estimation and mathematical approximation theories for transformers to predict the scaling laws. By leveraging low-dimensional data structures, we can explain transformer scaling laws in a way which respects the data geometry. Furthermore, we test our theory with empirical observations by training LLMs on natural language datasets, and find strong agreement between the observed empirical scaling laws and our theoretical predictions.