How much slower is a indirect reference to an object?

If you stored a list of words and wanted to know all the words with a letter ‘a’. Which is data structure is faster?

Dictionary<char,List<string>>

Hashtable of letters, to list of addresses to words w that letter.
Dictionary<char,List<int>> and string[]

Hashtable of letters to a integer representing an array index. The index pointing to strings w that character.
List<string>[a-z]

Theoretically this should be fastest. No hash function needed to compute the array index containing the addresses of the strings. (Pretty much same as 1)

Below are results of looping thru the above lookup methods, which is a simplified version of the candidates of how I’ve implemented an character in word index to a list. The letters are selected randomly.

Lookup Method	Total time(ms)	Avg lookup time(ms)	Tot time, not lookup(ms)	Tot time, just lookup(ms)
Dictionary–>indices–>array–>word	7398	0.093646	735	6663
Dictionary–>List—>word	8057	0.101987	656	7401
Array–>list–>word	8533	0.108013	735	7798

The results don’t really make intuitive sense in terms of which comes out faster, but there isn’t that much of a difference either.

But the choice of an index is already been made for me. Because the index needs to be persisted to disk. Which methods are ruled out automatically as a result of that requirement? The hint is the phrase “memory addresses of the words”. There are no memory addresses on disk. To perist to disk and maintain offset reference to the word, we store the array index, and when it’s loaded back into memory, the order is kept. But at least you can see, there isn’t that much penalty with another lookup. The .NET Dictionary<,> type is not serializable, so it has to be rebuilt on load from disk. A version of 2 w/o the dictionary, is best for my purpose if I need to store to disk. 3 if there is no master list of words, which there is. 3 is there simply for what should be the fastest lookup, but surprisingly isn’t.

What if we used a tree as indexer

The above methods use an analogy of a hashtable. Which takes the data, and turns it into a number for an index. The Dictionary<,> is .NET’s implementation of one. What if the index was built using a B-Tree like implementation?

public class ArrayTree 
    {
        static public int CharComparer(char a, char b)
        {
            if (a < b)
                return -1;
            else if (a > b)
                return 1;
            else
                return 0;
        }

        public ArrayTree()
        {
            var count = depth2elements(Depth);
            _keys = new TKey[count];
            _payload = new List[count];
            _nexttree = new ArrayTree[count*2];
        }

        public int Depth = 9;
        static int depth2elements(int depth)
        {
            int count = 0;
            int mult=1;
            for(int i=0; i[] _payload = null;
        ArrayTree[] _nexttree = null;

        public Comparison Comparer;

        public void Add(TKey key, TPayLoad item)
        {
            int index = 0;
            int depth = 0;
            var keys = _keys;
            var data = _payload;
            if (data[0] == null)
            {
                keys[0] = key;
                data[0] = new List() { item };
                return;
            }

            List list = null;
            index = findKey(key);
            if (index < 0)
            {
                index = -index;
                keys[index] = key;
                data[index] = new List() { item };

            }
            else
            {
                list = data[index];
                list.Add(item);
            }
            
        }

        public List Get(TKey key)
        {
            var index = findKey(key);
            if (index >= 0)
                return _payload[index];
            else
                return null;
        }

        int findKey(TKey key)
        {
            int index = 0;
            int depth = 0;
            var keys = _keys;
            var payloads = _payload;

            while (index>=0)
            {
                this.Hops++;

                if (payloads[index] == null)
                    return -index; //tree ended

                var currentkey = keys[index];
                if (currentkey.Equals(key))
                {
                    return index;
                }
                else
                {
                    var branch = Comparer(currentkey, key);
                    if (branch < 0)
                    {
                        index = index * 2 + 1;
                    }
                    else if (branch > 0)
                    {
                        index = index * 2 + 2;
                    }
                    if (index > keys.Length)
                        index = -index;
                    else 
                        depth++;
                }
            }

            return index;
        }

        public int Hops = 0;
        public void HopReset()
        {
            Hops = 0;
        }

        public override string ToString()
        {
            StringBuilder sb = new StringBuilder();
            this.ToString(this, 0, 0, sb);
            return sb.ToString();
        }

        void ToString(ArrayTree tree, int element, int indent, StringBuilder sb)
        {
            var data = tree._keys[element].ToString();
            sb.Append(new string(' ', indent));
            sb.AppendLine(data);

            int left = element * 2 + 1;
            int right = element * 2 + 2;

            indent+=2;
            if (tree._payload[left] != null)
                ToString(tree, left, indent, sb);
            if (tree._payload[right] != null)
                ToString(tree, right, indent, sb);
        }
    }

So why am I not using this structure? In order to be effective, the tree has to be balanced, so that the worst case is log₂(number of indexed values) compares for matching value.

Lookup Method	Total time(ms)	Avg lookup time(ms)	Tot time, not lookup(ms)	Compares#
Statistical Tree–>list–>word	9127	0.115532	688	1970919
Balanced Tree–>list–>word	8619	0.109101	688	1876716

The timing seems comparable to Dictionary, though I’ve personally encountered much slower performance using different implementations. Also, you’ll noticed the Statistical tree performed worse in Compare counts to Balanced tree. Unbalanced is not always bad, if it meets your statistically expected searches. The above search is using random selected character.

This is what a slightly unbalanced tree probably looks like, when you build your tree to make sure that the most likely letters encounter in a English dictionary, will be traversed first, while maintaining traversal order. (More)

l
- s
  - u
    - x
      - y
        
        z
      - w
        
        v
    - t
  - o
    - r
      - p
        
        q
    - n
      - m
- e
  - h
    - i
      - k
        
        j
    - f
      - g
  - b
    - d
      - c
    - a

a…99

b…21

c…50

d…44

e…151

f…24

g…36

h…40

i…89

j…4

k…13

l…62

m…37

n…89

o…71

p…33

q…5

r…96

s…89

t…97

u…55

v…22

w…22

x…6

y…23

z…1

What if we tried to search for each character in a list of English words? Because certain letters occur more often, they get searched for more. It stands to reason, that if we put them higher on the tree, the search will terminate faster. That is how the statistical tree was meant to be built. And that is the results we see below.

Lookup Method	Total time(ms)	Avg lookup time(ms)	Tot time, not lookup(ms)	Compares#
Statistical Tree–>list–>word	13022	0.164835	688	1683870
Balanced Tree–>list–>word	13291	0.168241	688	1909940

TictAwf.com

Blog

How much slower is a indirect reference to an object?

What if we used a tree as indexer

Leave a Reply Cancel reply