TY - JOUR
T1 - VETE
T2 - improving visual embeddings through text descriptions for eCommerce search engines
AU - Martínez, Guillermo
AU - Saavedra, Jose M.
AU - Murrugara-Llerena, Nils
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/11
Y1 - 2023/11
N2 - A search engine is a critical component in the success of eCommerce. Searching for a particular product can be frustrating when users want specific product features that cannot be easily represented by a simple text search or catalog filter. Due to the advances in artificial intelligence and deep learning, content-based visual search engines are included in eCommerce search bars. A visual search is instantaneous, just take a picture and search; and it is fully expressive of image details. However, visual search in eCommerce still undergoes a large semantic gap. Traditionally, visual search models are trained in a supervised manner with large collections of images that do not represent well the semantic of a target eCommerce catalog. Therefore, we propose VETE (Visual Embedding modulated by TExt) to boost visual embeddings in eCommerce leveraging textual information of products in the target catalog. with real eCommerce data. Our proposal improves the baseline visual space for global and fine-grained categories in real-world eCommerce data. We achieved an average improvement of 3.48% for catalog-like queries, and 3.70% for noisy ones.
AB - A search engine is a critical component in the success of eCommerce. Searching for a particular product can be frustrating when users want specific product features that cannot be easily represented by a simple text search or catalog filter. Due to the advances in artificial intelligence and deep learning, content-based visual search engines are included in eCommerce search bars. A visual search is instantaneous, just take a picture and search; and it is fully expressive of image details. However, visual search in eCommerce still undergoes a large semantic gap. Traditionally, visual search models are trained in a supervised manner with large collections of images that do not represent well the semantic of a target eCommerce catalog. Therefore, we propose VETE (Visual Embedding modulated by TExt) to boost visual embeddings in eCommerce leveraging textual information of products in the target catalog. with real eCommerce data. Our proposal improves the baseline visual space for global and fine-grained categories in real-world eCommerce data. We achieved an average improvement of 3.48% for catalog-like queries, and 3.70% for noisy ones.
KW - Content-based image retrieval
KW - Self-supervised representation learning
KW - Visual and text embeddings
UR - http://www.scopus.com/inward/record.url?scp=85151253168&partnerID=8YFLogxK
U2 - 10.1007/s11042-023-14595-8
DO - 10.1007/s11042-023-14595-8
M3 - Article
AN - SCOPUS:85151253168
SN - 1380-7501
VL - 82
SP - 41343
EP - 41379
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 26
ER -