Sample offline data

Problem statement

Write a function that takes an array of unique numbers s as input and a size k and returns a subset of the given size of the array elements. All subsets should be equally likely.

Return the result in input array itself and substitute remaining elements with 0.

EXAMPLE:

  Input:  s = [1,2,3,4,5], k = 3

  Output: s = [3,2,4,0,0]

Solution

Here is the actual code of the algorithm that implements a solution to take a random sample from the given array of unique elements:

func sample(s: inout [Int], k: Int) {
    // Prepare state of the algorithm
    let end = s.count - 1
    // Iterate thru the range
    for lhs in 0...end {
        // Clear out remaining elements with 0
        if lhs >= k { s[lhs] = 0; continue }
        // Pick the next exchange using random generator
        let rhs = Int.random(in: lhs...end)
        // Exchange elements
        swap(&s, lhs, rhs)
    }
}

Following is the traditional section with assertions to validate the code performs as advertised and establish a baseline for any improvement work in the future:

Sample offline data test assertions

Bits & pieces

This coding challenge utilizes random number generator to simulate sampling process within the specified boundaries that are adjusted with every iteration.

Lets get started!

The first statement is a simple shorthand for the array’s length, since this value is being used in a few places giving it a human-friendly name improves readability:

// Prepare state of the algorithm
let end = s.count - 1

This statement is the main loop used to iterate thru the array and process all elements according to the order of their apperance in the array:

// Iterate thru the range
for lhs in 0...end {
    ... inner block of code ...
}

This step clears out value of the current element with 0 when the algorithm is done processing the requested number of elements to sample:

Clear out remaining elements from the sequence

Here is the code that clears out the current element and immediately skips to the next iteration:

// Clear out remaining elements with 0
if lhs >= k { s[lhs] = 0; continue }

Following step uses random number generator to pick an element from the remaining sequence into rhs variable to use for the exchange operation that comes up next:

Pick a random element within lhs...end range

Here is the code for that step:

// Pick the next exchange using random generator
let rhs = Int.random(in: lhs...end)

And the last step of the algorithm actually exchanges elements in the current and the randomly chosen position as yet another sample for the resulting sequence:

Exchange chosen items to simulate sampling process

Here is the code for the last step of the algorithm:

// Exchange elements
swap(&s, lhs, rhs)

As soon as the main loop is over s argument that is marked as inout has all the values to represent the requested sample of the original sequence.

That’s all folks! 🌼

Here is a Swift playground for this article at Github: Sample offline data.