Authors: Arun S. Maiya,Dale Visser,Andrew Wan
ArXiv: 1505.01072
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1505.01072v1
We present an approach to extract measured information from text (e.g., a
1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such
extractions are critically important across a wide range of domains -
especially those involving search and exploration of scientific and technical
documents. We first propose a rule-based entity extractor to mine measured
quantities (i.e., a numeric value paired with a measurement unit), which
supports a vast and comprehensive set of both common and obscure measurement
units. Our method is highly robust and can correctly recover valid measured
quantities even when significant errors are introduced through the process of
converting document formats like PDF to plain text. Next, we describe an
approach to extracting the properties being measured (e.g., the property "pixel
pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we
present MQSearch: the realization of a search engine with full support for
measured information.