Chapter title |
Succinct BWT-Based Sequence Prediction
|
---|---|
Chapter number | 7 |
Book title |
Database and Expert Systems Applications
|
Published in |
figshare, August 2019
|
DOI | 10.1007/978-3-030-27618-8_7 |
Book ISBNs |
978-3-03-027617-1, 978-3-03-027618-8
|
Authors |
Rafael Ktistakis, Philippe Fournier-Viger, Simon J. Puglisi, Rajeev Raman, Ktistakis, Rafael, Fournier-Viger, Philippe, Puglisi, Simon J., Raman, Rajeev, Ktistakis, R, Fournier-Viger, P, Puglisi, S, Raman, R |
Abstract |
Sequences of symbols can be used to represent data in many domains such as text documents, activity logs, customer transactions and website click-streams. Sequence prediction is a popular task, which consists of predicting the next symbol of a sequence, given a set of training sequences. Although numerous prediction models have been proposed, many have a low accuracy because they are lossy models (they discard information from training sequences to build the model), while lossless models are often more accurate but typically consume a large amount of memory. This paper addresses these issues by proposing a novel sequence prediction model named SuBSeq that is lossless and utilizes the succinct Wavelet Tree data structure and the Burrows-Wheeler Transform to compactly store and efficiently access training sequences for prediction. An experimental evaluation shows that SuBSeq has a very low memory consumption and excellent accuracy when compared to eight state-of-the-art predictors on seven real datasets. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Ukraine | 1 | 33% |
Unknown | 2 | 67% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 3 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 8 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Professor | 2 | 25% |
Student > Ph. D. Student | 1 | 13% |
Student > Doctoral Student | 1 | 13% |
Student > Master | 1 | 13% |
Unknown | 3 | 38% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 2 | 25% |
Biochemistry, Genetics and Molecular Biology | 1 | 13% |
Environmental Science | 1 | 13% |
Chemistry | 1 | 13% |
Materials Science | 1 | 13% |
Other | 0 | 0% |
Unknown | 2 | 25% |