Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Ellipsis in Chinese AMR Corpus

lib:0f0412fa6f197c2c (v1.0.0)

Authors: Yihuan Liu,Bin Li,Peiyi Yan,Li Song,Weiguang Qu
Where published: WS 2019 8
Document:  PDF  DOI 
Abstract URL: https://www.aclweb.org/anthology/W19-3310/


Ellipsis is very common in language. It{'}s necessary for natural language processing to restore the elided elements in a sentence. However, there{'}s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98{\%} of sentences have ellipses. 92{\%} of the ellipses are restored by copying the antecedents{'} concepts. and 12.9{\%} of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!