figshare
Browse
Camera_ready_submission_SUBSEQ_DEXA_2019_KTISTAKIS_PUGLISI_FVIGER.pdf (753.18 kB)

Succinct BWT-based Sequence prediction

Download (753.18 kB)
conference contribution
posted on 2020-05-18, 15:27 authored by R Ktistakis, P Fournier-Viger, S Puglisi, R Raman
Sequences of symbols can be used to represent data in many domains such as text documents, activity logs, customer transactions and website click-streams. Sequence prediction is a popular task, which consists of predicting the next symbol of a sequence, given a set of training sequences. Although numerous prediction models have been proposed, many have a low accuracy because they are lossy models (they discard information from training sequences to build the model), while lossless models are often more accurate but typically consume a large amount of memory. This paper addresses these issues by proposing a novel sequence prediction model named SuBSeq that is lossless and utilizes the succinct Wavelet Tree data structure and the Burrows-Wheeler Transform to compactly store and efficiently access training sequences for prediction. An experimental evaluation shows that SuBSeq has a very low memory consumption and excellent accuracy when compared to eight state-of-the-art predictors on seven real datasets.

History

Citation

Ktistakis R., Fournier-Viger P., Puglisi S.J., Raman R. (2019) Succinct BWT-Based Sequence Prediction. In: Hartmann S., Küng J., Chakravarthy S., Anderst-Kotsis G., Tjoa A., Khalil I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science, vol 11707. Springer, Cham

Author affiliation

/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics

Source

30th International Conference on Database and Expert Systems Applications - DEXA 2019

Version

  • AM (Accepted Manuscript)

Published in

Lecture Notes in Computer Science

Volume

11707

Pagination

91-101

Publisher

Springer Verlag

issn

0302-9743

Acceptance date

2019-06-23

Copyright date

2019

Available date

2019-08-06

Publisher version

https://link.springer.com/chapter/10.1007/978-3-030-27618-8_7

Book series

Lecture Notes in Computer Science

Spatial coverage

Linz, Austria

Temporal coverage: start date

2019-08-26

Temporal coverage: end date

2019-08-29

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC