Read original ↗
newsReddit r/MachineLearningTrust 72 · CommunityPublished yesterdayLive · 22h ago

Has anyone tried this approach with Fast Byte Latent Transformers ? [R]

Paper Referred:- https://arxiv.org/pdf/2412.09871v1 Has anyone switched the transformer in the entropy model here to a Mamba model ? What could be the possible changes ? Just a ML fresher asking a genuine, since Mamba is more popular and saves computer (O(n)). Thanking you in advance ! submitted by /u/S