=pod

I<2013-07-05> - One of my favourite data structures in C is the ordered vector
(or array, whatever you call them).  Incredibly simple to implement, very low
memory overhead, and can provide O(log n) lookup with a simple binary search.
However, ordered vectors have one very weak point: insertion and deletion of
items is O(n). For small n that doesn't really matter, but if the number of
items in the list can grow a bit, you may run into performance issues. If
you're not careful, this could even turn your ordered vector into an attack
vector (apologies for the terrible pun).

My goal with this benchmark is to get a feeling on how, exactly, insertion
performance behaves with an ordered vector. What values of n are "small"? And
how much worse does insertion performance get compared to more complex data
structures?

For comparison, I chose the B-tree and hash table implementations from
L<klib|https://github.com/attractivechaos/klib> (from commit fff70758, to be
precise). My goal wasn't to benchmark the performance of different
implementations, so I simply chose two implementations that I suspect are among
the fastest. The vector implementation in the benchmarks is my own creation:
L<vec.h|http://g.blicky.net/globster.git/tree/src/util/vec.h?id=2c11d2a> from
the L<Globster|http://dev.yorhel.nl/globster> code base.

B<Source code:> L<ins-bench.c|http://p.blicky.net/r746e>


=head2 Best case & worst case

For a start, I decided to benchmark the best and worst case performance of
inserting elements into a vector. The best case happens when inserting all
items at the end of the vector, the worst case when inserting them in front.
The B-tree and hash table benchmarks provided for comparison have all items
inserted in order.

I'm cheating here with the vector implementation, because all elements are
inserted in the list without first finding out the position with a binary
search. Actual performance will be thus be a bit worse, depending on whether
the final application needs that binary search or whether it can assume its
input to be already sorted.

L<[img graph insbench-bench-thumb.png ]|http://dev.yorhel.nl/img/insbench-bench.png>

Gnuplot script: (The awk(ward) part can likely be done natively in gnuplot as
well, but I was too lazy to figure out how)

  set terminal png size 1000, 1500
  set output "bench.png"
  set logscale xy
  set xlabel "number of items"
  set ylabel "average time per insert (ms)"
  set grid mxtics xtics mytics ytics
  plot "< awk '{print $1, $2/$1*1000}' bench-vec" title 'vector, worst case',\
       "< awk '{print $1, $2/$1*1000}' bench-best" title 'vector, best case',\
       "< awk '{print $1, $2/$1*1000}' bench-hash" title 'khash',\
       "< awk '{print $1, $2/$1*1000}' bench-btree" title 'kbtree'


=head2 Average case

For the second benchmark I inserted values created with C<rand()>, which should
be a more accurate simulation of some real-world applications. This time I'm
not cheating with the vector implementation, a binary search is performed in
order to insert the items in the correct location.

L<[img graph insbench-rand-thumb.png ]|http://dev.yorhel.nl/img/insbench-rand.png>

  set terminal png size 1000, 1500
  set output "bench-rand.png"
  set logscale xy
  set xlabel "number of items"
  set ylabel "average time per insert (ms)"
  set grid mxtics xtics mytics ytics
  plot "< awk '{print $1, $2/$1*1000}' rand-vec" title 'vector',\
       "< awk '{print $1, $2/$1*1000}' rand-hash" title 'khash',\
       "< awk '{print $1, $2/$1*1000}' rand-btree" title 'kbtree'


=head2 Benchmarking setup

All benchmarks were performed on a 3 GHz Core Duo E8400 with a 6 MiB cache.
Compiled with the Gentoo-provided gcc 4.6.3 at -O3, linked against glibc 2.15,
and run on a Linux 3.8.13-gentoo kernel. Boring details, but somehow good to
document.
