Authors: Yihuan Liu,Bin Li,Peiyi Yan,Li Song,Weiguang Qu
Where published:
WS 2019 8
Document:
PDF
DOI
Abstract URL: https://www.aclweb.org/anthology/W19-3310/
Ellipsis is very common in language. It{'}s necessary for natural language processing to restore the elided elements in a sentence. However, there{'}s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98{\%} of sentences have ellipses. 92{\%} of the ellipses are restored by copying the antecedents{'} concepts. and 12.9{\%} of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis.