Semantic-Similarity-Using-Google.pdf (310.12 kB)
Semantic Similarity Measurement Using Historical Google Search Patterns
Computing the semantic similarity between terms (or short text
expressions) that have the same meaning but which are not
lexicographically similar is an important challenge in the information
integration field. The problem is that techniques for textual semantic
similarity measurement often fail to deal with words not covered by
synonym dictionaries. In this paper, we try to solve this problem by
determining the semantic similarity for terms using the knowledge
inherent in the search history logs from the Google search engine. To do
this, we have designed and evaluated four algorithmic methods for
measuring the semantic similarity between terms using their associated
history search patterns. These algorithmic methods are: a) frequent
co-occurrence of terms in search patterns, b) computation of the
relationship between search patterns, c) outlier coincidence on search
patterns, and d) forecasting comparisons. We have shown experimentally
that some of these methods correlate well with respect to human judgment
when evaluating general purpose benchmark datasets, and significantly
outperform existing methods when evaluating datasets containing terms
that do not usually appear in dictionaries.