Authors: Halid Ziya Yerebakan,Fitsum Reda,Yiqiang Zhan,Yoshihisa Shinagawa
ArXiv: 1601.05472
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1601.05472v1
This paper presents a new Bayesian non-parametric model by extending the
usage of Hierarchical Dirichlet Allocation to extract tree structured word
clusters from text data. The inference algorithm of the model collects words in
a cluster if they share similar distribution over documents. In our
experiments, we observed meaningful hierarchical structures on NIPS corpus and
radiology reports collected from public repositories.