STUFFED

2015-09-07T12:52:16Z (GMT) by Robert Lyon
<p>Stuffed Framework - Useful for testing algorithms on unlabelled data streams.</p> <p>Overview</p> <p>1. Stuffed is a wrapper for WEKA and MOA classification algorithms, which enables testing and evaluation on unlabelled data streams. This is (or was last I checked) hard to achieve with MOA. Stuffed makes this possible by using custom sampling methods to sample large data sets so that they can contain:</p> <p>- Varied levels of class balance in both test and training sets.</p> <p>- Varied levels of labelling in the test data streams.</p> <p>The custom sampling method produces meta data with each sampling, that allows stream classifier predictions to be evaluated on unlabelled data. For instance, if a data item in the stream is unlabelled (?), typical evaluation mechanisms would not evaluate classifier performance on this example. However since Stuffed keeps meta data at hand, it is possible to evaluate the label assigned by a classifier to each unlabelled instance.</p> <p>Stuffed is only designed to work on binary classification problems. It can be used to gather statistics on classifier performance, is easily extensible, and can be used with other tools such as MatLab.</p> <p>So far Stuffed has been used to perform experiments for two papers:</p> <p>R. J. Lyon, J. M. Brooke, J. D. Knowles, B. W. Stappers. A Study on Classification in Imbalanced and Partially-Labelled Data Streams, in International Conference on Systems, Man, and Cybernetics (SMC), pages 1506-1511, 2013, IEEE.</p> <p>R. J. Lyon, J. M. Brooke, J. D. Knowles, B. W. Stappers. Hellinger Distance Trees for Imbalanced Streams, In 22nd International Conference on Pattern Recognition, pages 1969-1974, Stockholm, Sweden, 2014, IEEE.</p> <p>If you use Stuffed please use the citations below.</p> <p>Use</p> <p>The algorithm is designed to work directly with both the MOA stream test framework and WEKA. It is a wrapper API, thus is not meant to be executed as an application. Rather you incorporate it directly into your code projects, to be extended, refined and improved.</p> <p>The code comes with examples of how it can be executed which speaks for themselves. Also check the user manual for more information.</p> <p>Citing our work</p> <p>Please use the following citation if you make use of this algorithm:</p> <p>@inproceedings{Lyon:2014:jk, author = {{Lyon}, R.~J. and {Knowles}, J.~D. and {Brooke}, J.~M. and {Stappers}, B.~W.}, title = {{Hellinger Distance Trees for Imbalanced Streams}}, booktitle = {22nd IEEE International Conference on Pattern Recognition}, series = {ICPR '14}, year = {2014}, month = {August}, pages = {1969-1974}, location = {Stockholm, Sweden}, publisher = {IEEE} }</p> <p>@inproceedings{Lyon:2013:jk, author = {{Lyon}, R.~J. and {Knowles}, J.~D. and {Brooke}, J.~M. and {Stappers}, B.~W.}, title = {{A Study on Classification in Imbalanced and Partially-Labelled Data Streams}}, booktitle = {International Conference on Systems, Man, and Cybernetics}, series = {SMC '13}, year = {2013}, month = {October}, pages = {1506-1511}, location = {Manchester, United Kingdom}, publisher = {IEEE} }</p> <p>Acknowledgements</p> <p>This work was supported by grant EP/I028099/1 for the University of Manchester Centre for Doctoral Training in Computer Science, from the UK Engineering and Physical Sciences Research Council (EPSRC).</p>