Open library

Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Soft edit distance for differentiable comparison of symbolic sequences

lib:f9adbcaa3b7d4906 (v1.0.0)

Authors: Evgenii Ofitserov,Vasily Tsvetkov,Vadim Nazarov
ArXiv: 1904.12562
Document: PDF DOI

Abstract URL: http://arxiv.org/abs/1904.12562v1

Edit distance, also known as Levenshtein distance, is an essential way to compare two strings that proved to be particularly useful in the analysis of genetic sequences and natural language processing. However, edit distance is a discrete function that is known to be hard to optimize. This fact hampers the use of this metric in Machine Learning. Even as simple algorithm as K-means fails to cluster a set of sequences using edit distance if they are of variable length and abundance. In this paper we propose a novel metric - soft edit distance (SED), which is a smooth approximation of edit distance. It is differentiable and therefore it is possible to optimize it with gradient methods. Similar to original edit distance, SED as well as its derivatives can be calculated with recurrent formulas at polynomial time. We prove usefulness of the proposed metric on synthetic datasets and clustering of biological sequences.

Relevant initiatives

Related knowledge about this paper

Search on this portal

Reproduced results (crowd-benchmarking and competitions)

Artifact and reproducibility checklists

Common formats for research projects and shared artifacts

Collective Knowledge (organizing research projects based on FAIR principles)

Reproducibility initiatives

Comments

Please log in to add your comments!

If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!

Soft edit distance for differentiable comparison of symbolic sequences

Relevant initiatives Hide

Comments Hide

Relevant initiatives

Comments