Authors: Sándor Tóth, Stephen Wilson, Alexia Tsoukara, Enric Moreu, Anton Masalovich, Lars Roemheld
Published on: March 18, 2024
Impact Score: 7.4
Arxiv code: Arxiv:2403.11593
Summary
- What is new: A robust multi-modal product matching system that combines pre-trained image and text encoders with contrastive learning, significantly improving upon single modality systems and large pretrained models like CLIP.
- Why this is important: The challenge of accurately identifying different representations of the same product in online marketplaces, complicated by large datasets, data shifts, and new product domains.
- What the research proposes: A straightforward projection of pretrained image and text encoders, trained through contrastive learning, combined with a human-in-the-loop process for enhanced precision.
- Results: Outperforms previous methods, offering state-of-the-art results while maintaining cost-effectiveness and high performance.
Technical Details
Technological frameworks used: nan
Models used: Pretrained image and text encoders; Contrastive learning
Data used: Large datasets from online marketplaces and e-commerce platforms
Potential Impact
Online marketplaces and e-commerce platforms.
Want to implement this idea in a business?
We have generated a startup concept here: UniMatch.
Leave a Reply