Sample online data

Problem statement

Write a function that takes a stream of elements stream, number of elements k to select from the stream and returns a sequence of k elements from the stream that are randonly chosen, such that any combination of the elements is equally likely possible (eq. uniform distribution).

EXAMPLE:

  Input: stream = [1,2,3,4,5,6,7,8,9], k = 4

  Output: [9,2,4,1]

Solution

Here is the actual code of the algorithm that implements a solution to randomly choose k elements from a stream, such that any combination of elements is equally likely possible:

func sample(stream: Stream, k: Int) -> Sequence {
    // Prepare state of the algorithm
    var seed = 0, seq = Sequence(repeating: 0, count: k)
    // Query stream for the next element (if any)
    while let next = stream.next() {
        // Select position of the element in `0...seed`
        let x = k > seed ? seed : Int.random(in: 0...seed)
        // Generator yields a value between `0` and `k - 1` is
        // a signal to replace an existing element in the sequence 
        if k > x { seq[x] = next }
        // Extend random generator's seed range by 1
        // to account for the current element
        seed += 1
    }
    // Here is the answer
    return seq
}

Following is an example that the code performs as advertised and establish a baseline for any improvement work in the future:

Sample online data test assertions

Bits & pieces

This coding challenge uses a clever technique to randomly pick items from a stream without knowing in advance how many elements there are in the stream.

Lets get started!

The first statement prepares state of the algorithm and initializes seq variable to store elements of the output sequence and seed variable used to seed the random number generator and to track count of elements read from the stream so far (since we couldn’t get estimate of the elements in the stream):

// Prepare state of the algorithm
var seed = 0, seq = Sequence(repeating: 0, count: k)

Following is the main loop that reads elements from the stream and processes each element according to the original constraint, eq. where all combinations of elements is equally likely possible:

// Query stream for the next element (if any)
while let next = stream.next() {
    ... inner block of code ...
}

This statement either seeds the random number generator and requests a number from the given range or uses its sequential index which indicates a desired (sometimes random) position of the current element in the output sequence:

// Select position of the element in `0...seed`
let x = k > seed ? seed : Int.random(in: 0...seed)

Next statement checks whether the position yielded by the random generator falls between 0 and (k - 1) range and if that’s the case the algorithm will use the current element from the stream to replace an existing item in the output sequence:

// Generator yields a value between `0` and `k - 1` is
// a signal to replace an existing element in the sequence 
if k > x { seq[x] = next }

This statement, the last within while loop adds 1 to the number elements processed so far, since the algorithm uses this data to seed the random number generator and calculate uniform distribution:

// Extend random generator's seed range by 1
// to account for the current element
seed += 1

As soon as the main loop is over, the function is ready to return the sequence of k elements randomly picked from the stream:

// Here is the answer
return seq

That’s all folks! 🍺

Here is a Swift playground for this article at Github: Sample online data.