Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Medical Word Embeddings for Spanish: Development and Evaluation

lib:150d48ad4d7f894a (v1.0.0)

Authors: Felipe Soares,Marta Villegas,Aitor Gonzalez-Agirre,Martin Krallinger,Jordi Armengol-Estap{\'e}
Where published: WS 2019 6
Document:  PDF  DOI 
Abstract URL: https://www.aclweb.org/anthology/W19-1916/


Word embeddings are representations of words in a dense vector space. Although they are not recent phenomena in Natural Language Processing (NLP), they have gained momentum after the recent developments of neural methods and Word2Vec. Regarding their applications in medical and clinical NLP, they are invaluable resources when training in-domain named entity recognition systems, classifiers or taggers, for instance. Thus, the development of tailored word embeddings for medical NLP is of great interest. However, we identified a gap in the literature which we aim to fill in this paper: the availability of embeddings for medical NLP in Spanish, as well as a standardized form of intrinsic evaluation. Since most work has been done for English, some established datasets for intrinsic evaluation are already available. In this paper, we show the steps we employed to adapt such datasets for the first time to Spanish, of particular relevance due to the considerable volume of EHRs in this language, as well as the creation of in-domain medical word embeddings for the Spanish using the state-of-the-art FastText model. We performed intrinsic evaluation with our adapted datasets, as well as extrinsic evaluation with a named entity recognition systems using a baseline embedding of general-domain. Both experiments proved that our embeddings are suitable for use in medical NLP in the Spanish language, and are more accurate than general-domain ones.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!