Authors: Riddhiman Dasgupta,Balaji Ganesan,Aswin Kannan,Berthold Reinwald,Arun Kumar
ArXiv: 1811.09368
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1811.09368v1
Entity Type Classification can be defined as the task of assigning category
labels to entity mentions in documents. While neural networks have recently
improved the classification of general entity mentions, pattern matching and
other systems continue to be used for classifying personal data entities (e.g.
classifying an organization as a media company or a government institution for
GDPR, and HIPAA compliance). We propose a neural model to expand the class of
personal data entities that can be classified at a fine grained level, using
the output of existing pattern matching systems as additional contextual
features. We introduce new resources, a personal data entities hierarchy with
134 types, and two datasets from the Wikipedia pages of elected representatives
and Enron emails. We hope these resource will aid research in the area of
personal data discovery, and to that effect, we provide baseline results on
these datasets, and compare our method with state of the art models on
OntoNotes dataset.