We are very excited to join forces with MLCommons and OctoML.ai! Contact Grigori Fursin for more details!

CloudScan - A configuration-free invoice analysis system using recurrent neural networks

lib:8e642dcb972a512b (v1.0.0)

Vote to reproduce this paper and share portable workflows   1 
Authors: Rasmus Berg Palm,Ole Winther,Florian Laws
ArXiv: 1708.07403
Document:  PDF  DOI 
Artifact development version: GitHub
Abstract URL: http://arxiv.org/abs/1708.07403v1


We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!