Authors: Suleyman Cetintas,Luo Si,Yan Ping Xin,Dake Zhang,Joo Young Park,Ron Tzur
ArXiv: 1411.5732
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1411.5732v1
Estimating the difficulty level of math word problems is an important task
for many educational applications. Identification of relevant and irrelevant
sentences in math word problems is an important step for calculating the
difficulty levels of such problems. This paper addresses a novel application of
text categorization to identify two types of sentences in mathematical word
problems, namely relevant and irrelevant sentences. A novel joint probabilistic
classification model is proposed to estimate the joint probability of
classification decisions for all sentences of a math word problem by utilizing
the correlation among all sentences along with the correlation between the
question sentence and other sentences, and sentence text. The proposed model is
compared with i) a SVM classifier which makes independent classification
decisions for individual sentences by only using the sentence text and ii) a
novel SVM classifier that considers the correlation between the question
sentence and other sentences along with the sentence text. An extensive set of
experiments demonstrates the effectiveness of the joint probabilistic
classification model for identifying relevant and irrelevant sentences as well
as the novel SVM classifier that utilizes the correlation between the question
sentence and other sentences. Furthermore, empirical results and analysis show
that i) it is highly beneficial not to remove stopwords and ii) utilizing part
of speech tagging does not make a significant improvement although it has been
shown to be effective for the related task of math word problem type
classification.