## More on O(log(log(n))) searching

#### « previous entry | next entry »

** 30th Jul 2009 | 15:51**

Thanks to everyone for the informative comments on my previous post.

I found a bug in my simulation code which meant that many of the tests were performed on arrays that were half-full of zeroes (oops). This distorted the stats a little, and more for larger N because I tested fewer different arrays for large N. This bug caused the increase in s.d. for large N: the corrected code has the same s.d. for all N. Another effect was to increase the maximum number of iterations required to find an entry. After the fix the maximum number of iterations goes down to log(log(N))+12 for the pure secant method, and (log(N)+14)/2 for Junio's mixed secant/bisection method. Altogether much more well-behaved.

from:gareth_reesdate:30th Jul 2009 19:17 (UTC)Permalink

Does your simulation bear this out? (The relative costs of division and memory lookup have changed since Knuth was writing §6.2.1, so it might not be true any more.)

## Reply | Thread

from:fanfdate:30th Jul 2009 23:19 (UTC)Permalink

## Reply | Parent | Thread

from:fanfdate:31st Jul 2009 11:30 (UTC)Permalink

on average. If you look at the maximum number of memory accesses the crossover is around 2^13.In git, there's a 256-entry meta-index which reduces the number of probes in a bisection search by 8, but is probably less useful than a single iteration of the secant method when N > 2^16. Each index entry is 24 bytes, so 168 of them fit in a page. This implies that git won't see a clear benefit from the secant method until you get packs of 350 million objects, which is not really feasible when pack files are limited to 4GB :-) But this does explain why it isn't a win when large repositories currently have one or two million objects.

## Reply | Parent | Thread

from:nonameyetdate:30th Jul 2009 21:34 (UTC)Permalink

## Reply | Thread

from:fanfdate:30th Jul 2009 22:54 (UTC)Permalink

## Reply | Parent | Thread