Colloquium Series: Sitan Chen

Date: 

Monday, February 12, 2024, 12:00pm to 1:00pm

Location: 

Science Center 316

Our upcoming event for the Statistics Department Colloquium Series is scheduled for Monday, February 12 from 12:00 – 1:00pm (ET) and will be an in-person presentation Science Center Rm. 316. Lunch will be provided to guests following the talk. This week's speaker will be Sitan Chen of the Computer Science Department at Harvard.

Provably learning a multi-head attention layer

Abstract: Despite the widespread empirical success of transformers, little is known about their learnability from a computational perspective. In practice these models are trained with SGD on a certain next-token prediction objective, but in theory it remains a mystery even to prove that such functions can be learned efficiently at all. In this work, we give the first nontrivial provable algorithms and computational lower bounds for this problem. Our results apply in a realizable setting where one is given random sequence-to-sequence pairs that are generated by some unknown multi-head attention layer. Our algorithm, which is centered around using examples to sculpt a convex body containing the unknown parameters, is a significant departure from existing provable algorithms for learning multi-layer perceptrons, which predominantly exploit fine-grained algebraic and rotation invariance properties of the input distribution.