# Computer puzzles in job interviews

^{31}/Jan 2006

Yesterday, I went to a job interview. I was asked some difficult problems and ways to solve them. One of them was: We have a list of a million phone numbers on the standard input and we have a reduced memory pc which we want to use it to sort them and use to check later if any number given is in that list or not.

The solution offered was to use an array of bits and use the number as an index, so if b[1234 ] == 0 would mean that the phone number 1234 wasn’t on the input list and 1 would mean that we had it.

I have been thinking about the solution they provided me. I’ve been thinking for a while and would like to extend the rationale behind. Ok, so, Let’s imagine we have 1 million phone numbers. If we choose to use the bit array approach, that means that, given the English phone number system, we have numbers like these:

01273 121212

That is, eleven numbers. I guess, the initial 0 is always present, so we can omit it right now. So that leaves us 10 numbers to deal with. Supposing that there are no numbers starting with a 0 (after the initial 0), that leaves us with the possibility of having number in the range:

1000000000 to 9999999999 (=10 000 000 000, 10 thousand million numbers )

So, there can be a total of 10,000,000,000 - 1,000,000,000 numbers, which is then, 9,000,000,000.

If you want an array of bits to use for storing if certain number is on the list or not, that means that you need an array of 9,000,000,000 elements, or, 9,000,000,000 bits. That is 1,125,000,000 bytes, or, 1,072Mb or a little bit more than 1 Gb, right?

Ok, that’s a lot of memory. And, the thing is that you would need to have all that array even if it was the best case (that in which all the numbers would be the same, one single number repeated one million time). So the memory consumption would be fixed.

On the other hand, if we used a n-ary tree to hold all the information as something like this:

```
0 0 0
1 1 1
2 2 2
3 3 3
4 /4--4
5/ 5 5
6 6 6
7 7 7
8 8 8
9 9 9
```

would mean that the number 544 (in a 3-number scheme) would exist.

For example, if we used a C struct like this one:

```
typedef struct Node Node;
struct Node {
Node *array[10]; /* */
};
```

In which each position of the array of pointers would represent a number itself and a pointer to the next one.

```
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
| ------ \ |null|
+----+ +----+\ +----+
|null| |null| \ |null|
+----+ +----+ \ +----+
|null| |null| \ XXX |
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
| \ |null| |null|
+----+\ +----+ +----+
\
\ +----+ +----+
\ |null| |null|
\ +----+ +----+
\ null| |null|
+----+ +----+
|null| / XXX |
+----+ / +----+
|null| / |null|
+----+/ +----+
| / |null|
+----+ +----+
|null| |null|
+----+ +----+
|null| |null|
+----+ +----+
|null| |null|
+----+ +----+
|null| |null|
+----+ +----+
| \ |null|
+----+\ +----+
\
\
\ +----+
\ |null|
\+----+
XXX |
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
```

So that would give us numbers 446 and 912 and 992 in a 3 digit scheme. We would set up all the pointers to null. If any pointer at any position is different of null, that means that is a valid number and it points to the following number (the next array of pointers), so we use the pointer to both show the number and point to the next structure. The only exception is for the last node (last digit), that we could make it point to a invalid address (represented here as XXX).

So, I don’t know if I make myself clear enough, but using this method, the thing is that in the worst case, we would have 1,000,000 ways to go from the root to the end of each branch, meaning that we
would need 1,000,000 * 10 (the number of digits each phone number has) nodes to hold all that information. That is 10,000,000 nodes. In this structure, thats about 40 bytes, which gives us the amount of 400,000,000 bytes, that’s is : 381 Mb approximately, which is ^{1}⁄_{3} of the other approach. But, that’s the worst case scenario. In the average case scenario, many numbers would be repeated and the total size would be reduced, actually. And on the best case scenario (one number repeated one million times) we would only need 10 nodes.

On doing the search, using this n-ary tree, it is O(10) at most if we need to check the whole number, but as soon we found a null on any digit’s position, we can stop there, so it would be < O(10) which is a good number anyway.

the thing is, Am I right, or not?

Ok, so, Let’s imagine we have 1 million phone numbers. If we choose to use the bit array approach, that means that, given the English phone number system, we have numbers like these:

01273 121212

That is, eleven numbers. I guess, the initial 0 is always present, so we can omit it right now. So that leaves us 10 numbers to deal with. Supposing that there are no numbers starting with a 0 (after the initial 0), that leaves us with the possibility of having number in the range:

1000000000 to 9999999999 (=10 000 000 000, 10 thousand million numbers )

So, there can be a total of 10,000,000,000 – 1,000,000,000 numbers, which is then, 9,000,000,000.

If you want an array of bits to use for storing if certain number is on the list or not, that means that you need an array of 9,000,000,000 elements, or, 9,000,000,000 bits. That is 1,125,000,000 bytes, or, 1,072Mb or a little bit more than 1 Gb, right?

Ok, that’s a lot of memory. And, the thing is that you would need to have all that array even if it was the best case (that in which all the numbers would be the same, one single number repeated one million time). So the memory consumption would be fixed.

On the other hand, if we used a n-ary tree to hold all the information as something like this:

```
0 0 0
1 1 1
2 2 2
3 3 3
4 /4--4
5/ 5 5
6 6 6
7 7 7
8 8 8
9 9 9
```

would mean that the number 544 (in a 3-number scheme) would exist.

For example, if we used a C struct like this one:

```
typedef struct Node Node;
struct Node {
Node *array[10]; /* */
};
```

In which each position of the array of pointers would represent a number itself and a pointer to the next one.

```
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
| ------ \ |null|
+----+ +----+\ +----+
|null| |null| \ |null|
+----+ +----+ \ +----+
|null| |null| \ XXX |
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
|null| |null| |null|
+----+ +----+ +----+
| \ |null| |null|
+----+\ +----+ +----+
\
\ +----+ +----+
\ |null| |null|
\ +----+ +----+
\ null| |null|
+----+ +----+
|null| / XXX |
+----+ / +----+
|null| / |null|
+----+/ +----+
| / |null|
+----+ +----+
|null| |null|
+----+ +----+
|null| |null|
+----+ +----+
|null| |null|
+----+ +----+
|null| |null|
+----+ +----+
| \ |null|
+----+\ +----+
\
\
\ +----+
\ |null|
\+----+
XXX |
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
|null|
+----+
```

So that would give us numbers 446 and 912 and 992 in a 3 digit scheme. We would set up all the pointers to null. If any pointer at any position is different of null, that means that is a valid number and it points to the following number (the next array of pointers), so we use the pointer to both show the number and point to the next structure. The only exception is for the last node (last digit), that we could make it point to a invalid address (represented here as XXX).

So, I don’t know if I make myself clear enough, but using this method,
the thing is that in the worst case, we would have 1,000,000
ways to go from the root to the end of each branch, meaning that we
would need 1,000,000 * 10 (the number of digits each phone number has)
nodes to hold all that information. That is 10,000,000 nodes. In this
structure, thats about 40 bytes, which gives us the amount of
400,000,000 bytes, that’s is : 381 Mb approximately, which is ^{1}⁄_{3} of the
other approach. But, that’s the worst case scenario. In the average case
scenario, many numbers would be repeated and the total size would be
reduced, actually. And on the best case scenario (one number repeated
one million times) we would only need 10 nodes.

On doing the search, using this n-ary tree, it is O(10) at most if we need to check the whole number, but as soon we found a null on any digit’s position, we can stop there, so it would be < O(10) which is a good number anyway.

the thing is, Am I right, or not?