Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Task-Driven Data Verification via Gradient Descent

lib:4523588d4ee393c7 (v1.0.0)

Authors: Siavash Golkar,Kyunghyun Cho
ArXiv: 1905.05843
Document:  PDF  DOI 
Abstract URL: https://arxiv.org/abs/1905.05843v1


We introduce a novel algorithm for the detection of possible sample corruption such as mislabeled samples in a training dataset given a small clean validation set. We use a set of inclusion variables which determine whether or not any element of the noisy training set should be included in the training of a network. We compute these inclusion variables by optimizing the performance of the network on the clean validation set via "gradient descent on gradient descent" based learning. The inclusion variables as well as the network trained in such a way form the basis of our methods, which we call Corruption Detection via Gradient Descent (CDGD). This algorithm can be applied to any supervised machine learning task and is not limited to classification problems. We provide a quantitative comparison of these methods on synthetic and real world datasets.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!