Authors: Rahul Wadbude,Vivek Gupta,Piyush Rai,Nagarajan Natarajan,Harish Karnick,Prateek Jain
ArXiv: 1709.05976
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1709.05976v3
We present a novel and scalable label embedding framework for large-scale
multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using
Distributional Semantics). Our approach draws inspiration from ideas rooted in
distributional semantics, specifically the Skip Gram Negative Sampling (SGNS)
approach, widely used to learn word embeddings for natural language processing
tasks. Learning such embeddings can be reduced to a certain matrix
factorization. Our approach is novel in that it highlights interesting
connections between label embedding methods used for multi-label learning and
paragraph/document embedding methods commonly used for learning representations
of text data. The framework can also be easily extended to incorporate
auxiliary information such as label-label correlations; this is crucial
especially when there are a lot of missing labels in the training data. We
demonstrate the effectiveness of our approach through an extensive set of
experiments on a variety of benchmark datasets, and show that the proposed
learning methods perform favorably compared to several baselines and
state-of-the-art methods for large-scale multi-label learning. To facilitate
end-to-end learning, we develop a joint learning algorithm that can learn the
embeddings as well as a regression model that predicts these embeddings given
input features, via efficient gradient-based methods.