Ellipsis in Chinese AMR Corpus

lib:0f0412fa6f197c2c (v1.0.0)

Authors: Yihuan Liu,Bin Li,Peiyi Yan,Li Song,Weiguang Qu
Where published: WS 2019 8
Document:  PDF  DOI 
Abstract URL: https://www.aclweb.org/anthology/W19-3310/

Ellipsis is very common in language. It{'}s necessary for natural language processing to restore the elided elements in a sentence. However, there{'}s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98{\%} of sentences have ellipses. 92{\%} of the ellipses are restored by copying the antecedents{'} concepts. and 12.9{\%} of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis.

