共计 84 篇文章
2024
prompt压缩
残差结构的讨论
RNNS ARE NOT TRANSFORMERS (YET)
BitNet b1.58
Fuyu
Sora
DLinear-Are Transformers Effective for Time Forecasting
Depth Anything-Unleashing the Power of Large-Scale Unlabeled Data
Mamba---Linear-Time Sequence Modeling with Selective State Spaces
On Embeddings for Numerical Features in Tabular Deep Learning