ZPICS.tar.gz (59.35 MB)

Examples comparisons for random partition algorithms

Version 2 2013-02-05, 16:04

Version 1 2012-10-05, 02:48

dataset

posted on 2012-10-05, 02:48 authored by Kenneth LoceyKenneth Locey

Topic: generating uniform random samples from the set of all integer partitions for a given total (N) and a number of parts (S)

Problem: existing random integer partitioning functions can take a long time to generate a single partition for a given N (regardless of S). If one is interested in generating random integer partitions for N having S parts, then one must waste time generating random partitions of N and rejecting those not matching S. Those partitions of N matching S can have low probabilities of being random generated (e.g. p < 10^-10 ).

Solution: generate a single random partition of N and randomly manipulate it until its length (i.e. number of parts) equals S. General method for manipulating the partition: randomly choose 2 summands, remove them from the partition, conjugate the partition, append the sum of the two removed summand, repeat until partition has S parts. Why? Because randomly perturbing a partition is much faster than randomly generating a new one, and this advantage grows as the time of generating random partitions for N increases.

Lingering problem: I can’t mathematically prove why this works to generate uniform random partitions of N having S parts. Not because it's potentially difficult, but because I'm an ecologist and not a mathematician.

What follows:

486 visual comparisons of 500 random samples generated from the new function derived by myself, Kenneth J. Locey, (red curves) against 500 random samples generated using the random partition function found in the Sage mathematical environment (black curves).

Kernel density curves (red ones and black ones) are for evenness across the partition. Here, evenness is estimated using Evar, a transform of the variance of log summand values. Evar is standardized to take values between 0.0 (no evenness) and 1.0 (perfect evenness).

N is the total and S is the number of parts. This same close agreement was also found using other statistical characteristics (e.g. median summand, relative size of largest summand). These results reveal that the statistical qualities of the two samples of randomly generated partitions (My algorithm and that of the Sage mathematical environment) are in high agreement.

There appears to be no systematic bias despite the sensitivity of Evar. So far, only algorithms that, by definition, return uniform random samples have shown such close agreement.