Entropy in volunteer evolutionary computing experiments

dataset

posted on 2016-04-03, 08:27 authored by Juan J. MereloJuan J. Merelo

An evolutionary algorithm is a search method that tries to keep the balance between two forces: exploration and exploitation. Exploration keeps the search process alive, looking for solutions in zones where it had not before, while exploitation changes the best solutions to find the global optimum.
This balance is kept by diversity, represented by entropy: if entropy is high, exploration can proceed at the same time exploitation breaks through.
Besides, in volunteer computing experiments where people visiting a web page contribute to an evolutionary algorithm it is difficult to find a model of performance, and, even more so, find what minimum amount of intervention must be made to make the whole experiment run apace and finish early. It is obvious that more users will make the simulation faster, but global number of participants (represented by single IPs) has proved not to be a predictor for total time since, for instance, these users could operate sequentially and not at the same time, but neither maximum number of IPs in a single minute is a good predictor.
It is clear, however, that diversity is always correlated with performance: good diversity means, in general, that the algorithm will finish soon.
So in these figures we measured "compression" diversity for two different aspects, and we did so for every successfully finished evolutionary algorithm. First, the string of IPs that were contributing every minute, and second, the sequence of cache (that is, the number of elements kept in the pool, capped to 32) sizes for every experiment, These strings were compressed and the compression ratio used as entropy.
The figures show a good and so far unmeasured correlation, better for IPs per minute, worse for cache sizes. However, this latter is the one in which experiment design can have a bigger influence, so it will probably be used in the future for improving the evolutionary algorithms.
All data and sources can be downloaded from http://github.com/JJ/splash-volunteer (data branch)

Funding

TIN2014-56494-C4-3-P (Spanish Ministry of Economy and Competitivity)

History

Usage metrics

Keywords

volunteer computing evolutionary computation distributed computing Applied Computer Science Artificial Life Coding and Information Theory Computer Software Distributed Computing

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM