Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

A Persian Treebank with Stanford Typed Dependencies

lib:52c4d759d6af59ff (v1.0.0)

Authors: Mojgan Seraji,Carina Jahani,Be{\'a}ta Megyesi,Joakim Nivre
Where published: LREC 2014 5
Document:  PDF  DOI 
Abstract URL: https://www.aclweb.org/anthology/L14-1326/


We present the Uppsala Persian Dependency Treebank (UPDT) with a syntactic annotation scheme based on Stanford Typed Dependencies. The treebank consists of 6,000 sentences and 151,671 tokens with an average sentence length of 25 words. The data is from different genres, including newspaper articles and fiction, as well as technical descriptions and texts about culture and art, taken from the open source Uppsala Persian Corpus (UPC). The syntactic annotation scheme is extended for Persian to include all syntactic relations that could not be covered by the primary scheme developed for English. In addition, we present open source tools for automatic analysis of Persian containing a text normalizer, a sentence segmenter and tokenizer, a part-of-speech tagger, and a parser. The treebank and the parser have been developed simultaneously in a bootstrapping procedure. The result of a parsing experiment shows an overall labeled attachment score of 82.05{\%} and an unlabeled attachment score of 85.29{\%}. The treebank is freely available as an open source resource.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!