Caching deferred executed LINQ query

By remyblok on Tuesday 6 March 2012 22:17 - Comments (5)
Category: .Net / C#, Views: 5.929

When using LINQ extensively you'll eventually encounter the benefits and the cruelties of deferred execution. When not used correctly it can have a serious performance impact. In this blog I will give a method that combines the benefits of deferred execution with the safety of direct execution.

Lets take a closer look at deferred execution. It means that a LINQ query is not executed when you instantiate it, but is executed when you iterate over the produced IEnumerable. Under the hood the LINQ query creates a wrapper object that holds all information you defined in your query. It also implements IEnumerable so it can be used in foreach-loops and other LINQ expressions.

When you loop over the wrapper IEnumerable the actual LINQ logic is executed and then the code, e.g. where, order, grouping etc, you have defined in your query is applied. It also means that the query is executed every time you iterate over the wrapper. If you are ordering a collection of a few thousand items this can cause a serious performance penalty, for example.

There are methods to 'disable' deferred execution. Each IEnumerable has the ToList and ToArray extension-methods. These methods cause that the query is executed and produces a static list. After that the query won't be executed again. Yet, this can have some performance impact as well. When we execute the query and never use the result, the query is executed for nothing. This might be contradictory, but will become clear in the following example.

The following codes defines two LINQ queries. The first query is used in the loop, the second query is used as lookup by use of the FirstOrDefault method.

C#:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var appointments = from a in dataStore
                   where a.StartDate < DateTime.Now
                   select a;

var appointmentsFromCom = from ac in comObject
                          where ac.StartDate < DateTime.Now
                          select ac;

foreach (var appointment in appointments)
{
    if (appointment.Organizer == me)
    {
        // This takes 10 seconds
        comAppointment = appointmentsFromCom.FirstOrDefault(co => co.ID == appointment.ID);
        //....
    }
}



On first look the code looks OK. But when this piece is run the time it takes to iterate over the appointmentsFromCom is 10 seconds!

Due to deferred execution for each appointment where I am the organizer it will take 10 second. With only 6 appointments it will take up to 60 seconds to process, and it could be much more. The performance would not be acceptable.

Let's introduce ToList to alleviate the performance penalty caused by deffered execution:

C#:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
var appointments = from a in dataStore
                   where a.StartDate < DateTime.Now
                   select a;

var appointmentsFromCom = from ac in comObject
                          where ac.StartDate < DateTime.Now
                          select ac;
// This takes 10 seconds
appointmentsFromCom = appointmentsFromCom .ToList();

foreach (var appointment in appointments)
{
    if (appointment.Organizer == me)
    {
        comAppointment = appointmentsFromCom.FirstOrDefault(co => co.ID == appointment.ID);
        //....
    }
}


What happens now is that the query gets execute immediately when the ToList-methods is called. So the call to COM is done only once and always takes about 10 seconds. The FirstOrDefault is executed on the in-memory list and has no performance impact.

This sounds as the solution, but what if the appointments do not contain any items where I'm the organizer? It will always take 10 seconds to load, but it won't do anything with the data. With deferred execution enabled it would take no time as the query was never executed. The performance impact is much smaller than in the first example, but still significant for code that does not produce anything.

As a solution for this problem I created the Cache-method. This methods combines deferred execution and the static list to make sure the best performance is achieved.

C#:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
var appointments = from a in dataStore
                   where a.StartDate < DateTime.Now
                   select a;

var appointmentsFromCom = from ac in comObject
                          where ac.StartDate < DateTime.Now
                          select ac;
appointmentsFromCom = appointmentsFromCom .Cache();

foreach (var appointment in appointments)
{
    if (appointment.Organizer == me)
    {
        // This takes 10 seconds the first time it is used.
        comAppointment = appointmentsFromCom.FirstOrDefault(co => co.ID == appointment.ID);
        //....
    }
}


What does the Cache function do? It deferred executed the ToList method! It again returns a wrapper. When you iterate (in this case use FirstOrDefault) it executes a ToList and stores the result in the wrapper. Every subsequent time the wrapper object is used it just return the List it created and holds in memory.

This way we only have the 10 seconds hit once, but only if we're sure that the data is actually used. The best of both worlds.

Finally the code inside the Cache-method that makes the magic happen:

C#:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
public static class EnumerableExtensions
{
    /// <summary>
    /// Caches the specified source but used deferred execution.
    /// The first time the source is enumerated a cache of the items is created.
    /// Subsequent calls to enumerate the source will use the cache.
    /// </summary>
    /// <typeparam name="TSource">The type of the source.</typeparam>
    /// <param name="source">The source.</param>
    /// <returns></returns>
    public static IEnumerable<TSource> Cache<TSource>(this IEnumerable<TSource> source)
    {
        return new CacheIterator<TSource>(source);
    }

    /// <summary>
    /// 
    /// </summary>
    /// <typeparam name="TSource">The type of the source.</typeparam>
    private class CacheIterator<TSource> : IEnumerable<TSource>
    {
        private IEnumerable<TSource> _source;
        private int _initialThreadId;
        private List<TSource> _cache;

        /// <summary>
        /// Initializes a new instance of the <see cref="CacheIterator&lt;TSource&gt;"/> class.
        /// </summary>
        /// <param name="source">The source.</param>
        [DebuggerHidden]
        public CacheIterator(IEnumerable<TSource> source)
        {
            _source = source;
            _initialThreadId = Thread.CurrentThread.ManagedThreadId;
        }

        #region IEnumerable<T> Members

        /// <summary>
        /// Gets the enumerator.
        /// </summary>
        /// <returns></returns>
        [DebuggerHidden]
        public IEnumerator<TSource> GetEnumerator()
        {
            if (Thread.CurrentThread.ManagedThreadId == _initialThreadId)
            {
                if (_cache == null)
                    _cache = _source.ToList();

                return _cache.GetEnumerator();
            }
            else
            {
                var enumerator = new CacheIterator<TSource>(_source);
                return enumerator.GetEnumerator();
            }
        }

        #endregion

        #region IEnumerable Members

        /// <summary>
        /// Returns an enumerator that iterates through a collection.
        /// </summary>
        /// <returns>
        /// An <see cref="T:System.Collections.IEnumerator"/> object that can be used to iterate through the collection.
        /// </returns>
        [DebuggerHidden]
        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }

        #endregion
    }
}

Volgende: Start all Command Prompts as Visual Studio Developer Command prompt 10-'14 Start all Command Prompts as Visual Studio Developer Command prompt
Volgende: WPF Combo box with empty item using .Net 4 dynamic objects 10-'11 WPF Combo box with empty item using .Net 4 dynamic objects

Comments



By Tweakers user ViperNL, Wednesday 7 March 2012 09:55

thisBlog = thisBlog.Replace("is execute ", "is executed ");

ofzo. :p

By Tweakers user remyblok, Wednesday 7 March 2012 10:19

ViperNL wrote on Wednesday 07 March 2012 @ 09:55:
thisBlog = thisBlog.Replace("is execute ", "is executed ");

ofzo. :p
Thanks! fixxed.

By Tweakers user masterpoi, Thursday 8 March 2012 21:21

For those who know the RX framework, the Reactive extensions have a mathematical dual for IEnumerable:

http://bartdesmet.net/blo...tive-getting-started.aspx

IIRC the MemoizeAll operator does the same thing as your Cache extension method

In either case these blog posts from Bart De Smet about the RX framework are an nice read if you are interested in well-though-over mathematical sound operators for both reactive and interactive LINQ queries

By Tweakers user remyblok, Thursday 8 March 2012 22:49

I was familiar with RX, but not with the MemorizeAll operator. After some Googeling I found that these extensions where taken out of RX and put in a sub project called IX: Interactive Extensions. The NuGet packages can be found here: http://nuget.org/packages/Ix_Experimental-Main.

I've installed the package, unfortunately there is no longer a MemorizeAll operator. There is, however, a Memorize operation. This operation does not do the same as my Cache operator. From the documentation:
Creates a buffer with a view over the source sequence, causing each enumerator to obtain access to all of the sequence's elements without causing multiple enumerations over the source.
The package does contain a Defer-method that should do the trick:
Creates an enumerable sequence based on an enumerable factory function.
You need to use it like this:

C#:
1
2
3
4
var sourceAppointmentsFromCom = from ac in comObject
                                where ac.StartDate < DateTime.Now
                                select ac;
appointmentsFromCom = EnumerableEx.Defer(() => sourceAppointmentsFromCom.ToList());

[Comment edited on Thursday 8 March 2012 22:51]


Comments are closed