A couple of weeks ago, I blogged about my participation in the inaugural Facebook Hacker Cup. Well, I didn't get any further, and exited spectacularly in round 2.
To say it was tricky is an understatement. The level of difficulty was a huge leap up from the previous round. Of the 1,673 programmers eligible to compete in the round, just five got all three problems solved, a further 22 solved two of the problems, and 109 people got just one right… leaving over 90% of entrants with none right, including me!
I started by tackling the problem which seemed easiest at first glance. This was "Scott's New Trick". I don't know whether it really was the easiest, as I spent the whole three hours working on it. I casually glanced at the evolving scoreboard a few times as the contest progressed, and was confident that if I managed to get just one right, I'd be in the top 300, thus winning a commemorative T-shirt. Obviously, I had no realistic expectation of being in the top 25, thus earning an all expenses paid trip to Palo Alto, California for the finals.
The problem statement for Scott's New Trick is as follows:
Little Scott recently learned how to perform arithmetic operations modulo some prime number P. As a training set he picked two sequences a of length N and b of length M, generated in the following way:
ai=(ai-2 * A3 + ai-1*A4 + A5) mod P, for i=3...N
bj=(bj-2 * B3 + bj-1 * B4 + B5) mod P, for j=3...M
Now he wants to find the number of pairs (i, j), where 1 ≤ i ≤ N and 1 ≤ j ≤ M, such that (ai * bj) mod P < L, for given number L. He asked you to do the same to help him check his answers.
The first line of input file consists of a single number T, the number of test cases. Each test consists of three lines. The first line of a test case contains two integers: prime number P and positive integer L. The second line consists of six non-negative integers N, A1, A2, A3, A4, A5. Likewise, the third line contains six non-negative integers M, B1, B2, B3, B4, B5.
Output T lines, with the answer to each test case on a single line.
T = 20
2 ≤ P < 250,000
P is prime
1 ≤ L ≤ P
2 ≤ N, M ≤ 10,000,000
0 ≤ A1, A2, A3, A4, A5, B1, B2, B3, B4, B5 < P
The given example input file was:
5 3 1 4 0 2 2 2 2 2 1 2 1 0 0 3 1 5 2 0 0 1 1 5 1 1 2 0 0 3 3 5 0 0 1 2 2 3 2 1 1 1 1 5 1 5 2 0 4 0 4 3 2 1 2 4 4 5 4 2 2 1 3 1 4 5 1 0 2 3 3
The corresponding output should be:
6 10 15 3 9
It was quite straight-forward to get a solution that worked on the sample input and produced the expected output. My naive approach was to read the input file, and build up the values of the sequences a and b. Then, I used two nested for loops to consider every pair of values in the sequence, checking whether the the remainder of the product of the two values when divided by P was less than L.
I got this working in maybe 40 minutes. I think I would have been a lot faster if it weren't for my unfamiliarity with C++ (yes I know, a bad workman blames his tools).
Next, I wrote a quick utility to generate some test data using the range of numbers specified in the problem statement. This resulted in an input file with more realistic numbers (i.e. 20 tests with the lengths of each series in the millions). At this point, I realised I had a problem.
I devised an optimisation strategy and set to work. The (unsuccessful) approach I took was based upon the assumption that if we get any two values in a sequence repeated, then the rest of the sequence is going to be a short repeating sequence, thus drastically reducing the number of distinct values in the sequence.
So, as I was constructing the sequence, I was keeping a list of distinct values, and the number of times each appeared. Then, instead of looping over the entire sequence, I could instead loop over the two sets of distinct values, and add the product of the number of times each value appeared in its sequence to the result when the condition was satisfied.
This did have benefits on my own generated test data, and a single test ran in around 20 seconds (rather than me giving up waiting which is what I did before my optimisation).
I knew that this was going to be close when multiplied up by 20 tests (as only six minutes is allowed once the real input file is downloaded), but I was running out of time. I set my program running and waited. And waited. The time expired. I waited some more. After 15 minutes, I terminated execution and went to bed. It was after midnight by this time.
So, what went wrong. Well, Facebook could have been clever and chosen values which didn't result in repeated sequences, thus thwarting the assumption made in my optimisation. Alternatively, my assumption could have been completely wrong, and the chance of a repeated sequence might have been quite low, meaning that some of the Facebook sequences would take a long time to run.
I've downloaded some of the successful competitors solutions and taken a quick look at them. They are exceedingly complex when compared to my simple attempt, and if that's the level of skill required to progress from this round, then I'm not ashamed to have gone out at this stage.
I wish the best of luck to the entrants that have made it to the on site round at FB headquarters in Palo Alto. I'll be back next year.