Learnability - PowerPoint PPT Presentation


Understanding Multi-Head Attention Layers in Transformers

Sitan Chen from Harvard presents joint work with Yuanzhi Li exploring the provable learnability of a multi-head attention layer in transformers. The talk delves into the architecture of transformers, highlighting the gap between practical success and theoretical understanding. Preliminaries, prior w

3 views • 38 slides