How much slower is a indirect reference to an object?

If you stored a list of words and wanted to know all the words with a letter ‘a’. Which is data structure is faster?

  1. Dictionary<char,List<string>>


    Hashtable of letters, to list of addresses to words w that letter.

  2. Dictionary<char,List<int>> and string[]

    Hashtable of letters to a integer representing an array index. The index pointing to strings w that character.

  3. List<string>[a-z]

    Theoretically this should be fastest. No hash function needed to compute the array index containing the addresses of the strings. (Pretty much same as 1)

Below are results of looping thru the above lookup methods, which is a simplified version of the candidates of how I’ve implemented an character in word index to a list. The letters are selected randomly.

Lookup Method Total time(ms) Avg lookup time(ms) Tot time, not lookup(ms) Tot time, just lookup(ms)
Dictionary–>indices–>array–>word 7398 0.093646 735 6663
Dictionary–>List—>word 8057 0.101987 656 7401
Array–>list–>word 8533 0.108013 735 7798

The results don’t really make intuitive sense in terms of which comes out faster, but there isn’t that much of a difference either.

But the choice of an index is already been made for me. Because the index needs to be persisted to disk. Which methods are ruled out automatically as a result of that requirement? The hint is the phrase “memory addresses of the words”. There are no memory addresses on disk. To perist to disk and maintain offset reference to the word, we store the array index, and when it’s loaded back into memory, the order is kept. But at least you can see, there isn’t that much penalty with another lookup. The .NET Dictionary<,> type is not serializable, so it has to be rebuilt on load from disk. A version of 2 w/o the dictionary, is best for my purpose if I need to store to disk. 3 if there is no master list of words, which there is. 3 is there simply for what should be the fastest lookup, but surprisingly isn’t.

What if we used a tree as indexer

The above methods use an analogy of a hashtable. Which takes the data, and turns it into a number for an index. The Dictionary<,> is .NET’s implementation of one. What if the index was built using a B-Tree like implementation?

public class ArrayTree 
    {
        static public int CharComparer(char a, char b)
        {
            if (a < b)
                return -1;
            else if (a > b)
                return 1;
            else
                return 0;
        }

        public ArrayTree()
        {
            var count = depth2elements(Depth);
            _keys = new TKey[count];
            _payload = new List[count];
            _nexttree = new ArrayTree[count*2];
        }

        public int Depth = 9;
        static int depth2elements(int depth)
        {
            int count = 0;
            int mult=1;
            for(int i=0; i[] _payload = null;
        ArrayTree[] _nexttree = null;

        public Comparison Comparer;

        public void Add(TKey key, TPayLoad item)
        {
            int index = 0;
            int depth = 0;
            var keys = _keys;
            var data = _payload;
            if (data[0] == null)
            {
                keys[0] = key;
                data[0] = new List() { item };
                return;
            }

            List list = null;
            index = findKey(key);
            if (index < 0)
            {
                index = -index;
                keys[index] = key;
                data[index] = new List() { item };

            }
            else
            {
                list = data[index];
                list.Add(item);
            }
            
        }

        public List Get(TKey key)
        {
            var index = findKey(key);
            if (index >= 0)
                return _payload[index];
            else
                return null;
        }

        int findKey(TKey key)
        {
            int index = 0;
            int depth = 0;
            var keys = _keys;
            var payloads = _payload;

            while (index>=0)
            {
                this.Hops++;

                if (payloads[index] == null)
                    return -index; //tree ended

                var currentkey = keys[index];
                if (currentkey.Equals(key))
                {
                    return index;
                }
                else
                {
                    var branch = Comparer(currentkey, key);
                    if (branch < 0)
                    {
                        index = index * 2 + 1;
                    }
                    else if (branch > 0)
                    {
                        index = index * 2 + 2;
                    }
                    if (index > keys.Length)
                        index = -index;
                    else 
                        depth++;
                }
            }

            return index;
        }

        public int Hops = 0;
        public void HopReset()
        {
            Hops = 0;
        }

        public override string ToString()
        {
            StringBuilder sb = new StringBuilder();
            this.ToString(this, 0, 0, sb);
            return sb.ToString();
        }

        void ToString(ArrayTree tree, int element, int indent, StringBuilder sb)
        {
            var data = tree._keys[element].ToString();
            sb.Append(new string(' ', indent));
            sb.AppendLine(data);

            int left = element * 2 + 1;
            int right = element * 2 + 2;

            indent+=2;
            if (tree._payload[left] != null)
                ToString(tree, left, indent, sb);
            if (tree._payload[right] != null)
                ToString(tree, right, indent, sb);
        }
    }

So why am I not using this structure? In order to be effective, the tree has to be balanced, so that the worst case is log2(number of indexed values) compares for matching value.

Lookup Method Total time(ms) Avg lookup time(ms) Tot time, not lookup(ms) Compares#
Statistical Tree–>list–>word 9127 0.115532 688 1970919
Balanced Tree–>list–>word 8619 0.109101 688 1876716

The timing seems comparable to Dictionary, though I’ve personally encountered much slower performance using different implementations. Also, you’ll noticed the Statistical tree performed worse in Compare counts to Balanced tree. Unbalanced is not always bad, if it meets your statistically expected searches. The above search is using random selected character.

This is what a slightly unbalanced tree probably looks like, when you build your tree to make sure that the most likely letters encounter in a English dictionary, will be traversed first, while maintaining traversal order. (More)
  • l
    • s
      • u
        • x
          • y
            • z
          • w
            • v
        • t
      • o
        • r
          • p
            • q
        • n
          • m
    • e
      • h
        • i
          • k
            • j
        • f
          • g
      • b
        • d
          • c
        • a
a…99
b…21
c…50
d…44
e…151
f…24
g…36
h…40
i…89
j…4
k…13
l…62
m…37
n…89
o…71
p…33
q…5
r…96
s…89
t…97
u…55
v…22
w…22
x…6
y…23
z…1

What if we tried to search for each character in a list of English words? Because certain letters occur more often, they get searched for more. It stands to reason, that if we put them higher on the tree, the search will terminate faster. That is how the statistical tree was meant to be built. And that is the results we see below.

Lookup Method Total time(ms) Avg lookup time(ms) Tot time, not lookup(ms) Compares#
Statistical Tree–>list–>word 13022 0.164835 688 1683870

Balanced Tree–>list–>word 13291 0.168241 688 1909940

Leave a Reply

Your email address will not be published. Required fields are marked *