figshare
Browse
cjas_a_1435633_sm1544.mp4 (6.57 MB)

COWORDS: a probabilistic model for multiple word clouds

Download (6.57 MB)
media
posted on 2018-02-16, 09:16 authored by Luís G. Silva e Silva, Renato M. Assunção

Word clouds constitute one of the most popular statistical tools for the visual analysis of text documents because they provide users with a quick and intuitive understanding of the content. Despite their popularity for visualizing single documents, word clouds are not appropriate to compare different text documents. Independently generating word clouds for each document leads to configurations where the same word is typically located in widely different positions. This makes it very difficult to compare two or more word clouds. This paper introduces COWORDS, a new stochastic algorithm to create multiple word clouds, including one for each document. The shared words in multiple documents are placed in the same position in all clouds. Similar documents produce similar and compact clouds, making it easier to simultaneously compare and interpret several word clouds. The algorithm is based on a probability distribution in which the most probable configurations are those with a desirable visual aspect, such as a low value for the total distance between the words in all clouds. The algorithm output is a set of word clouds that are randomly selected from this probability distribution. The selection procedure uses a Markov chain Monte Carlo simulation method. We present several examples that illustrate the performance and visual results that can be obtained by our algorithm.

History