Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Read classification using semi-supervised deep learning

lib:15beabc043ddad1a (v1.0.0)

Authors: Tomislav Šebrek,Jan Tomljanović,Josip Krapac,Mile Šikić
ArXiv: 1904.10353
Document:  PDF  DOI 
Abstract URL: http://arxiv.org/abs/1904.10353v1


In this paper, we propose a semi-supervised deep learning method for detecting the specific types of reads that impede the de novo genome assembly process. Instead of dealing directly with sequenced reads, we analyze their coverage graphs converted to 1D-signals. We noticed that specific signal patterns occur in each relevant class of reads. Semi-supervised approach is chosen because manually labelling the data is a very slow and tedious process, so our goal was to facilitate the assembly process with as little labeled data as possible. We tested two models to learn patterns in the coverage graphs: M1+M2 and semi-GAN. We evaluated the performance of each model based on a manually labeled dataset that comprises various reads from multiple reference genomes with respect to the number of labeled examples that were used during the training process. In addition, we embedded our detection in the assembly process which improved the quality of assemblies.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!