19 March 2024

End-to-end multi-modal product matching in fashion e-commerce

Written by Startup Idea

Authors: Sándor Tóth, Stephen Wilson, Alexia Tsoukara, Enric Moreu, Anton Masalovich, Lars Roemheld

Published on: March 18, 2024

Impact Score: 7.4

Arxiv code: Arxiv:2403.11593

Summary

What is new: A robust multi-modal product matching system that combines pre-trained image and text encoders with contrastive learning, significantly improving upon single modality systems and large pretrained models like CLIP.
Why this is important: The challenge of accurately identifying different representations of the same product in online marketplaces, complicated by large datasets, data shifts, and new product domains.
What the research proposes: A straightforward projection of pretrained image and text encoders, trained through contrastive learning, combined with a human-in-the-loop process for enhanced precision.
Results: Outperforms previous methods, offering state-of-the-art results while maintaining cost-effectiveness and high performance.

Technological frameworks used: nan

Models used: Pretrained image and text encoders; Contrastive learning

Data used: Large datasets from online marketplaces and e-commerce platforms

Online marketplaces and e-commerce platforms.

We have generated a startup concept here: UniMatch.