#topic/ml #topic/ml12 notesBPEDiffusion ModelsEmbeddingsFlash AttentionKV CacheMixture of ExpertsML MOCRLHFRoPESpeculative DecodingTokenizationTransformers