elfs: (Default)
[personal profile] elfs

I got an email today from one of the super secret stealth start-ups, the one I really wanted, and they told me that they had chosen to go with another candidate. Bummer.

At one recent interview, an engineer asked me to “give him the sum of the two highest digits in an integer array. Any language.” I thought for a moment, then wrote:

array.sort()
return array[-1] + array[-2]

He said that was fine, but since it would be O(n*logn), could I do it in O(n). He wanted a linear search instead.

After answering the question to his satisfaction, we got into a bit of a religious discussion about the merits of concentrating on performance during initial composition, versus concentrating on correctness. I’m much more a “get it working, then get it right, then profile the working program for bottlenecks and work them out.” It is possible that you can create systemically slow solutions that’ll require a re-write, but that’s actually far less likely than having I/O or CPU bottlenecks in small functions of code that can be fixed with optimization.

He later told me that, although I didn’t get the job, he voted to hire me. He liked my answers and appreciated that I was willing to stand my ground and discuss the relative merits of different approaches.

This entry was automatically cross-posted from Elf's technical journal, ElfSternberg.com

Date: 2009-06-19 06:18 am (UTC)
From: [identity profile] nbarnes.livejournal.com
I 110% agree with you. Premature optimization is the devil.

Date: 2009-06-19 07:15 am (UTC)
From: [identity profile] duskwuff.livejournal.com
Moreover, the dynamic programming solution that gets you O(n) performance will probably be slower in Python in all practical situations, as list.sort is native, whereas the alternate solution involves an expensive interpreted loop.

Date: 2009-06-19 01:59 pm (UTC)
From: [identity profile] shockwave77598.livejournal.com
He want O(n) eh? Boring, but okay. You iterate through the array and if you find array(i) is greater than your max1, you copy max1 into max2 and then array(i) into max1. When finished, you simply add max2 and max1 together.

Date: 2009-06-19 03:57 pm (UTC)
From: [identity profile] shaterri.livejournal.com
Congratulations on proving Elf's point. :-) As written, this is buggy: consider its behavior on the array 1, 2, 4, 6, 5. You can run an additional comparison on max2, of course, which keeps it O(n) but with a larger constant; you're doing approximately 2n comparisons. There are ways of doing it with approximately n+log(n) comparisons (and more broadly/more importantly, getting the top k with approximately n+k*log(n) comparisons for any fixed k), but those get to be a little more complicated. Go Knuth!

Date: 2009-06-19 04:08 pm (UTC)
From: [identity profile] shockwave77598.livejournal.com
Ah, I see your point. I was automatically assuming the array was presorted, probably from his Sort() stuck in my head :)

Date: 2009-06-19 05:54 pm (UTC)
From: [identity profile] wolfwings.livejournal.com
Yeah, for an arbitrary unsorted array, I believe it's actually 'fastest' in O terms to compare against the smallest stored side-value, then compare against the highest stored side-value to determine which store-logic to follow, so (statistically) you'll end up with a worst-case of 2n comparisons but average around n+log(n) in that instance. Another of those 'subtle gotchas' in optimization.

Date: 2009-06-19 06:34 pm (UTC)
From: [identity profile] shaterri.livejournal.com
Well, now we get into the question of what's being measured. :-) At the cost of a little extra space, you can get away with only doing n+log(n) comparisons even in the worst-case. The easiest way to see this: imagine setting up a 'tournament' using your array, where rather than starting with 0 as best, then comparing best to 1, best to 2, best to 3, etc, you compare 0 and 1, 2 and 3, ..., then compare (0/1 winner) vs. (2/3 winner), etc. This is still just n comparisons (technically, n-1; the easy way of proving it: every element but the ultimate winner loses exactly once), and once you have winner you can find 'second-best' with just log(n) additional comparisons: at some point the 'winner' must have beat 'second place' (because there's no one else that second could lose to), so you can just check the log(n) competitors that the winner beat to find the next-best.
The real problem is that the obvious schemes for doing that take O(n) additional space; I suspect you might not be able to do it in n+log(n) comparisons and constant additional space, but I can't say for sure either way offhand.

Date: 2009-06-19 06:41 pm (UTC)
From: [identity profile] wolfwings.livejournal.com
True, I was admittedly making an assumption as well, that space was at a premium so only 'k * constant' added space would be allowed, where the constant is the size of an item being compared. I.E. Two extra entries, no more space.

Date: 2009-06-20 10:18 pm (UTC)
From: (Anonymous)
Naturally, it's possible to solve the problem *with sort()*
in O(n) time. :-)

(Also, this uses O(constant) additional space.)

I hope this is slightly amusing...

- -
def find_maxes(a):
    """Returns second-greatest and greatest members of a."""
    if len(a) < 2:
        raise 'insufficiently long array'
           # deprecators of string exceptions can suck my left nut
           #   until I get a spec.
    if len(a) == 2:
        return a
    maxes = [a[2]] + a[:2]
           # After any sort call,
           #   holds second-greatest and greatest value 
           #   found so far,
           #   *preceded* by current candidate value.
    for x in a[2:]:
        maxes[0] = x
        maxes.sort()
    print maxes
    return maxes[1:3]

def find_maxes_sum(a):
    return sum(find_maxes(a))

Date: 2009-06-19 02:09 pm (UTC)
From: [identity profile] shockwave77598.livejournal.com
I usually look at what the program is supposed to do before deciding on my direction. Sometimes the longer approach is fine for small datasets, but when the final application dataset is godawful large (say every phone call origin and destination in the US) then it becomes too slow to function. And such huge datasets dont' even start to fit in cache, so no help there.

I've only just started playing with multithreading. Do you have any pointers on that?

And for some reason...

Date: 2009-06-19 05:49 pm (UTC)
From: [identity profile] wolfwings.livejournal.com
...when I read, "Okay, do it in O(n)," I immediately want to hand back assembly.

But I agree, get it right, see where the actual instead of theoretical bottlenecks are, and break those open.

Date: 2009-06-20 12:22 am (UTC)
From: [identity profile] ashley-y.livejournal.com
It cannot be done at all, in fact, because the two highest elements of an integer array do not in general exist.

I think your code will throw some kind of exception (given array=[]). But throwing exception is not the same thing as returning a sum.

Anyway, this is the kind of response I give, because this is the kind of response that makes me recommend hiring when I'm asking the questions. Programmers simply cannot be too meticulous.

Date: 2009-06-20 04:42 am (UTC)
From: [identity profile] shaterri.livejournal.com
I have to heartily disagree here. If you say 'This won't work for arrays with less than two elements, but...', that's being meticulous. If you're saying 'It can't be done, they don't in general exist', you're being a dick. I would gladly give bonus points for people pointing out that it fails on a zero-element array, but under the circumstances I think the intent (testing your algorithmic skills) is pretty obvious.

Date: 2009-06-20 05:14 am (UTC)
From: [identity profile] ashley-y.livejournal.com
It depends on the nature of the position, I guess. In my line of work (specifying network protocols, writing code to validate implementations of said protocols), the fact that the proposed function is improperly specified (per Elf's account) is easily the most significant thing about it, and I would expect an interviewee to notice it right away. Algorithms are easy, and over-relied upon for interviews.

I will tolerate arbitrary amounts of pedantry from candidates, in fact I love it, because it exposes intelligence. If someone is pedantic but not very bright, it's easy to find the limits of their intelligence in a Socratic manner. I particularly like someone correctly pointing out issues in something I've told them, as that also shows confidence. It's a necessary skill for work, after all.

Date: 2009-06-20 05:55 am (UTC)
From: [identity profile] elfs.livejournal.com
Well, I did point out that it would toss an array index exception (in the first example) if there were less than two elements. And I agree; there are problems with the nominal solution as proposed.

Date: 2009-06-22 03:40 am (UTC)
From: [identity profile] ewhac.livejournal.com
Okay, you snagged me with this one.

Rather than performing comparisons to find the two largest integers in the array, consider instead finding the largest sum, and then "auto-pull" the two largest integers.

Something along these lines (sorry, I think in C):

extern int	array[];
extern int	alen;
register int	cur, test, sum;
int		max1, max2;
int		i;

if (alen < 2)
	return (-1);	/*  Array too short.  */
else if (alen == 2)
	return (array[0] + array[1]);
else {
	max1 = array[0];
	max2 = array[1];
	sum = max1 + max1 + max2;
	for (i = 2;  i < alen;  ++i) {
		cur = array[i];
		test = cur + max1 + max2;
		if (test > sum) {
			sum = test;
			/*
			 * Replace the smaller value.
			 */
			if (max1 < max2)
				max1 = cur;
			else
				max2 = cur;
		}
	}
}
return (max1 + max2);

Note: I have not actually tested the code above. And given the number of additions, subtractions, and comparisons it performs, it's not at all clear whether it's superior to any other approach. But it does fetch each array element only once.

The Computing World Isn't What It Once Was

Date: 2009-06-23 08:41 am (UTC)
From: (Anonymous)
I agree with your answer. I learned to program starting in the early 70's when computers were slow, expensive, and memory was measured in kilobytes. It was this attitude that led to the Y2K problem of 2-digit years.

Since then I've come around to having megabytes of space available for programs and data, and that throwing faster h/w at a problem is often cheaper than rewriting it because the chances are that the hardware in question is already idling more than computing, and memory limits don't really exist for most programs. If it's too slow that can be fixed after it's running.

The one thing I would have done differently with your solution would have been to add a comment line above it:

//compute and return the sum of the two largest array values
...

Even on a 2-line solution that is the way I write my code - and it often catches the examiner by surprise.

--DB_Story

Re: The Computing World Isn't What It Once Was

Date: 2009-06-26 06:25 pm (UTC)
From: [identity profile] elfs.livejournal.com
I would never have written that comment. What the code does is obvious. If you write a function to do what it does, you name the function: def get_sum_of_highest_two_values(array)

Profile

elfs: (Default)
Elf Sternberg

December 2025

S M T W T F S
 12345 6
78910111213
14151617181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 1st, 2026 08:37 pm
Powered by Dreamwidth Studios