Generating Sorted Lists of Random Numbers

The empirical testing of a program often calls for generating a set of random numbers and then immediately sorting them. In this paper we consider the problem of accomplishing that process in a single step: generating a sorted list of random numbers (specifically, reals chosen uniformly from [0,1]). The method we describe generates the randoms in linear time, is perfectly random (if it can call a perfectly random generator for a single uniform), and can be described In Just three lines of Algol or Pascal code. If the numbers are not required to be generated all at once (but are rather to be used one-at-a-tlme), then the method can be Implemented as a subroutine to produce the "next 11 number and requires only constant storage.


Introduction
The first step of many computer algorithms is to sort the input data. When testing these programs to determine runtimes empirically, one usually generates N random numbers and then sorts them. The efficiency of sorting algorithms Is w ell-known (see Knuth [1973]), however, so it is often not necessary to test the sorting procedure empirically in a particular program. In this application (as well as many others), It is desirable to generate a sorted list of random numbers as quickly a s possible. In this paper we will study the problem of generating a sorted list of N reals drawn uniformly from [0,1]. In this paper we will Investigate a new method for generating sorted lists of randoms that has significant advantages over all previous approaches. We will begin by discussing previous work in Section 2. In Section 3 we will study some Important probabilistic lemmas, and then show In Section 4 how these can be used to make efficient programs. A discussion of this approach Is offered In Section 5.

Previous Work
Before presenting our new algorithms for generating sorted lists of random numbers, we will mention, for purposes of completeness and of comparison, the best previously known method for generating sorted lists of random numbers. Although the method seems to be well-known among statisticians, the present authors are unable to find a description of its computational aspects In the statistical literature.
The algorithm is based on the following lemma. It is obvious that this method Is a very efficient way of generating sorted lists of numbers chosen uniformly on [0,1]. Its one computational disadvantage, however, is that It Is Inherently a two-pass algorithm-the first to place the numbers Into the array and the second to normalize them. We will now turn our attention to a new, single-pass algorithm.

. Programs
In this section we will see how the basic probabilistic facts discussed in the last section can be used to make programs for generating sorted lists of randoms. In all these programs we will assume that we have a subroutine RAND that returns a random number drawn uniformly from [0,1]. All the programs that we will describe produce correct output In the sense that if RAND satisfies the probabilistic definition of U[0,1], then the output of our program will satisfy the probabilistic definitions of a ^ Since this paper is intended primarily for non-statisticians, we have attempted to minimize statistical notation in the presentation of this lemma, at the expense of conciseness. A more general form of this well-known result IS more formally presented as Theorem 2.7 of David [1970].
This number (the number of event classes) may range from 1 to n!/(n-k)l, depending on the number and pattern of equalities among the y^. Note that with perfect arithmetic this procedure will produce exactly the same 6 March 1979 Sorted Randoms output as Program 2 (assuming the use of the same procedure RAND); this program, however, is numerically more robust than Its predecessor.
In many applications the variables are not all needed at one time, but rather can be used "on the fly". If this is Indeed the case, then using the N array words of X la very wasteful of storage. We would prefer to have an algorithm that can generate the "next" value. We will now describe such an algorithm as two Pascal Although it Is clear that the method of Program 3 Is superior to a ^Th© Pascal compiler used in these tests does not produce very efficient code; the authors suspect that the speed of the programs could be substantially increased by careful hand-coding. T his is unnecessary in most applications, however, since the use of this method is usually enough to remove the process of generating sorted randoms from the time bottleneck of the program.

generate-and-sort solution In almost all applications, It Is more difficult to compare
Program 3 with Program 1. Program 1 is faster than Program 3 (Program 1 uses an addition, a logarithm, a multiplication, and three array accesses for each random; Program 3 uses an additional exponentiation, but only one array access), but Program 3 Is shorter to code. The primary advantage of the method of Section 4.  InitSort is called again before N calls have been made to NextSorted, the current sequence of randoms will be lost and a new sequence will begin with the next call to NextSorted. ft) (ft Note: If an ascending sequence of random numbers is desired, the final assignment statement of Nextsorted should be altered to read "NextSorted :-1 -exp(OLGLnCurMax)". Test on-line generation of sorted randoms by generating TestSize random numbers in descending order and writing them to output, ft) w rite In; uri te In ('Commencing test of on-line generation 1 ); InitSorted(TestSize); for J : * 1 to TestSize do writeln(J f NextSorted:18:5) and.