paperarXivTrust 82 · PrimaryPublished 7d agoLive · 4d ago

MultiHashFormer: Hash-based Generative Language Models

Language models (LMs) represent tokens using embedding matrices that scale linearly with the vocabulary size. To constrain the parameter footprint, prior work proposes hashing many tokens into a single vector within encoder-only models. While this offers parameter efficiency, many-to-one collisions prevent its use in causal LMs. In this paper, we propose MultiHashFormer, a new framework that allows hash-based autoregression. Each token is represented as a unique hash signature, a short sequence of discrete hash IDs, generated by multiple independent hash functions. A Hash Encoder compresses th

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Topics

cs.CL