Understanding Multi-Head Attention Layers in Transformers
Sitan Chen from Harvard presents joint work with Yuanzhi Li exploring the provable learnability of a multi-head attention layer in transformers. The talk delves into the architecture of transformers, highlighting the gap between practical success and theoretical understanding. Preliminaries, prior w
3 views • 38 slides
Understanding Language Properties and Definitions
Language is defined by its unique properties such as rapid fading, interchangeability, feedback, semanticity, arbitrariness, discreteness, displacement, productivity, cultural transmission, duality, prevarication, reflexiveness, and learnability. These properties help distinguish human language from
0 views • 19 slides