k-Word matches: an alignment-free sequence comparison method

k-word matches, the number of words of length k shared between two sequences, also known as the D2 statistic, are used in alignment-free sequence comparison statistic. The advantages of the use of this statistic over alignment-based methods for nucleotide and amino-acid sequence comparisons are firstly that it does not assume that homologous segments are contiguous, and secondly that the algorithm is computationally extremely fast, the runtime being proportional to the size of the sequence under scrutiny. We summarise our results to date on determing the distributional properties of the D2 statistic for a range of biologically relevant parameters and outline the directions in which the research will proceed. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1