MLs
Decoder
Flash Attention
GQA
KV Cache
Least Squares Method
RoPE
SGD
SwiGLU
Train