Topic cluster · 8 items
efficiency
model
Nano-Refuse-0.4B
A tiny safety classifier for fast content filtering.
paperSpeculative decoding with draft models
Accelerating generation by drafting tokens with a small model.
newsBreakthrough in long-context efficiency announced
A new attention scheme cuts memory use for very long inputs.
repoquant-kit
Post-training quantization tools for transformers.
glossary_termQuantization
Shrinking a model by storing its weights at lower precision.
paperQuantization at 1.58 bits
Ternary-weight models that retain most of full-precision quality.
modelWhisper-Lite
A compact speech-to-text model for on-device use.
toolQuantBench
A one-click quantization and benchmarking tool.