figshare
Browse
file.pdf (227.78 kB)

Efficient Similarity Search In Sequence Databases

Download (227.78 kB)
journal contribution
posted on 2006-01-01, 00:00 authored by Rakesh Agrawal, Christos Faloutsos, Arun Swami
We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

History

Date

2006-01-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC