Compute a random subset

Problem statement

Write a function that takes a positive numbers n and k, eq. upper bound, size and returns an array, eq. a subset of randomly chosen numbers in 0...n of size k.

All subsets should be equally likely and, in addition, all permutations of numbers of the array should be equally likely.

EXAMPLE:

  Input: n = 9, k = 5

  Output: [1, 4, 5, 0, 6]

Solution

Here is the actual code of the algorithm that implements a solution to randomly choose a subset of k numbers in 0...n, such that any combination is equally likely possible:

func subset(n: Int, k: Int) -> [Int] {
    // Prepare state of the algorithm
    var map: [Int:Int] = [:]
    // Process each element of the subset
    for ix in 0...k - 1 {
        // Select next random element in {ix...n}
        let rx = Int.random(in: ix...n)
        // Query for existing keys in the map
        let ixv = map[ix], rxv = map[rx]
        // Insert or update `ix ~> rxv/rx` mapping pair
        map[ix] = rxv ?? rx
        // Insert or update `rx ~> ixv/ix` mapping pair
        map[rx] = ixv ?? ix
    }
    // Prepare an array to copy the resulting subset
    var seq = [Int](repeating: 0, count: k)
    // Query & copy each element of the subset in the map
    for i in 0...k - 1 { seq[i] = map[i]! }
    // Here is the answer
    return seq
}

Following is an example that the code performs as advertised and establish a baseline for any improvement work in the future:

Compute a random subset test assertions

Bits & pieces

This coding challenge introduces reader to a very ineresting technique that strikes a remarkable balance between time and space complexity characteristics and effectively solves the problem at hand.

Lets get started!

The first statement instantiates map variable used as a dictionary to keep randomly chosen numbers and as a data structure to resolve conflicts with duplicates:

// Prepare state of the algorithm
var map: [Int:Int] = [:]

Next statement is the main loop that is executed k - 1 times and randomly picks numbers in the specified range:

// Process each element of the subset
for ix in 0...k - 1 {
    ... inner block of code ...
}

Following statement, the first in the loop, is the instruction that randomly picks numbers from the given range using sliding window approach to minimize chances of chosing the same number multiple times:

// Select next random element in {ix...n}
let rx = Int.random(in: ix...n)

This statement queries the dictionary to find out whether the specified keys have been already discovered and added to the mapping, knowing about the existing keys and their values is an important part of the algorithm’s strategy used to solve conflicts with duplicates:

// Query for existing keys in the map
let ixv = map[ix], rxv = map[rx]

Next statement either creates a new or updates an existing mapping between ix and rxv/rx values. In case our random numbers generator has yielded a number that already exists, the code will use the number to fill the current element’s position (eq. ix) in the dictionary:

// Insert or update `ix ~> rxv/rx` mapping pair
map[ix] = rxv ?? rx

This statement as well creates or updates an existing mapping in the dictionary between rx and ixv/ix values. As the statement above this one uses the same conflict resolution strategy, eq. use an existing key’s value to fill the randomly generated number’s position in the dictionary:

// Insert or update `rx ~> ixv/ix` mapping pair
map[rx] = ixv ?? ix

As soon as the main loop is over, it is now time to copy all the numbers generated to an array that is returned to the calling code and this statement initializes seq variable used as an output array:

// Prepare an array to copy the resulting subset
var seq = [Int](repeating: 0, count: k)

This statement enumerates every element in 0...k - 1 range and uses the index as a key to copy the value from the dictionary into the output array:

// Query & copy each element of the subset in the map
for i in 0...k - 1 { seq[i] = map[i]! }

Finally, once all the numbers are copied the function is ready to return a k-sized subset of randomly selected numbers in 0...n range:

// Here is the answer
return seq

That’s all folks! 🍹

Here is a Swift playground for this article at Github: Compute a random subset.