Walid Elhedda,Maroua Mehri,Mohamed Ali Mahjoub
Abstract URL: https://arxiv.org/abs/1908.09007v1
Current systems used by the Tunisian national archives for the automatic transcription of archival documents are hindered by many issues related to the performance of the optical character recognition (OCR) tools. Indeed, using a classical OCR system to transcribe and index ancient Arabic documents is not a straightforward task due to the idiosyncrasies of this category of documents, such as noise and degradation. Thus, applying an enhancement method or a denoising technique remains an essential prerequisite step to ease the archival document image analysis task. The state-of-the-art methods addressing the use of degraded document image enhancement and denoising are mainly based on applying filters. The most common filtering techniques applied to color images in the literature may be categorized into four approaches: scalar, marginal, vector and hybrid. To provide a set of comprehensive guidelines on the strengths and weaknesses of these filtering approaches, a thorough comparative study is proposed in this article. Numerical experiments are carried out in this study on color archival document images to show and quantify the performance of each assessed filtering approach.