paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago

Contextual Slate GLM Bandits with Limited Adaptivity

We investigate the contextual slate bandit problem with generalized linear rewards under limited adaptivity. At each round, the learner is presented with $N$ sets of items, where each item is represented by a $d$-dimensional feature vector. The learner then constructs a slate by selecting one item per set; the resulting slate yields a scalar reward sampled from a Generalized Linear Model (GLM). We propose algorithms under two limited-adaptivity settings: (a) Batched and (b) Rarely-Switching. For the batched setting, we introduce B-SlateGLinCB, which partitions the time horizon into $\mathcal{O

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Topics

cs.LG