Open library

Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction

lib:d464196e05b0e42b (v1.0.0)

Authors: Bhavana Dalvi,William W. Cohen,Jamie Callan
ArXiv: 1307.0261
Document: PDF DOI

Abstract URL: http://arxiv.org/abs/1307.0261v1

We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and concept-instance pairs obtained with Hearst patterns. In contrast, our method relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. The method can be efficiently applied to a large corpus, and experimental results on several datasets show that our method can accurately extract large numbers of concept-instance pairs.

Relevant initiatives

Related knowledge about this paper

Search on this portal

Reproduced results (crowd-benchmarking and competitions)

Artifact and reproducibility checklists

Common formats for research projects and shared artifacts

Collective Knowledge (organizing research projects based on FAIR principles)

Reproducibility initiatives

Comments

Please log in to add your comments!

If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!

WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction

Relevant initiatives Hide

Comments Hide

Relevant initiatives

Comments