Authors: Zhao Yan,Duyu Tang,Nan Duan,Junwei Bao,Yuanhua Lv,Ming Zhou,Zhoujun Li
ArXiv: 1706.02427
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1706.02427v1
Understanding the connections between unstructured text and semi-structured
table is an important yet neglected problem in natural language processing. In
this work, we focus on content-based table retrieval. Given a query, the task
is to find the most relevant table from a collection of tables. Further
progress towards improving this area requires powerful models of semantic
matching and richer training and evaluation resources. To remedy this, we
present a ranking based approach, and implement both carefully designed
features and neural network architectures to measure the relevance between a
query and the content of a table. Furthermore, we release an open-domain
dataset that includes 21,113 web queries for 273,816 tables. We conduct
comprehensive experiments on both real world and synthetic datasets. Results
verify the effectiveness of our approach and present the challenges for this
task.