Authors: Haruka Kiyohara, Masahiro Nomura, Yuta Saito
Published on: February 03, 2024
Impact Score: 8.07
Arxiv code: Arxiv:2402.02171
Summary
- What is new: The introduction of the Latent IPS (LIPS) estimator for OPE in slate contextual bandits, which reduces variance without relying on linear reward functions.
- Why this is important: The high variance of the Inverse Propensity Scoring (IPS) estimator in slate contextual bandits due to large action spaces.
- What the research proposes: The Latent IPS (LIPS) estimator that optimizes slate abstractions in a low-dimensional space to minimize both bias and variance without assuming linearity in the reward function.
- Results: LIPS significantly outperforms existing estimators, especially in environments with non-linear rewards and large slate spaces.
Technical Details
Technological frameworks used: nan
Models used: Latent IPS (LIPS) estimator
Data used: nan
Potential Impact
Recommender systems, search engines, marketing, medical applications
Want to implement this idea in a business?
We have generated a startup concept here: SlateSafe Analytics.
Leave a Reply