Using Parallel LINQ

The LINQ Team at Microsoft is working on extensions to Language Integrated Query (LINQ) referred
to as Parallel LINQ (PLINQ). PLINQ makes it easy to use multiple
threads and all that processing power. This article shows you how and
provides some parameters and guidelines for when you can get some extra
horsepower out of multithreading your LINQ queries.

To Multithread or Not to Multithread?

Some things, like file I/O, are inherently sequential. Other kinds
of things can benefit from multithreading. Candidates for
multithreading and parallel LINQ are algorithms that are
computationally complex; for example, collections with very large data
sets. Parallel LINQ (available by searching online for "parallel LINQ"
or "PLINQ") is a download extension from Microsoft, in beta, that
ultimately will let you write your LINQ queries and use multithreading
too. The beauty, as you will see, is that most of the multithreading
plumbing is handled for you by this extension to LINQ.

When you use PLINQ, keep a few things in mind:

  • Avoid adding ordering clauses.
  • Parallelize outer loops but not inner loops, unless the outer loop's
    range is very small (I equals 0 to 3) and the inner loop is very large
    (j equals 0 to 1000000).

  • Have reasonable expectations. Amdahl's Law suggests that the maximum
    achievable parallel speedup is limited by the amount of sequential code
    remaining. Everyone wants order-of-magnitude performance increases, but
    parallelism alone is unlikely to yield these sorts of returns. (Check
    out Amdahl's Law on Wikipedia.org for details and mathematics.)

Multithreading with PLINQ

It's actually very simple to use multithreading
with LINQ. After you have the Parallel LINQ extensions, call AsParallel on your enumerable collection.

Since the Parallel LINQ extensions are in beta, you need to take a few extra steps to prepare:

  1. You can find the Parallel LINQ extensions on Microsoft
    download pages
    , or use your favorite search engine to find the latest
    release.
  2. Install the Parallel LINQ extensions. By default, the December 2007 Community Tech Preview (CTP) installs here:
    	C:Program FilesMicrosoft Parallel Extensions Dec07 CTP
    	

    This folder contains a version of System.Threading.dll containing the LINQ parallel extensions.

  3. To access the extensions, you'll need to add a reference to the System.Threading.dll
    file contained in the Parallel LINQ folder. Use the Project > Add
    Reference option in Visual Studio and browse to the folder containing
    the downloaded System.Threading.dll in step 2.

Now you're ready.

To use Parallel LINQ, write your LINQ queries as before, include a from clause and select, whatever you need. (Although remember the caveat against ordering.) The from clause contains a range of variables, the keyword in, and the source containing the data to query. For example:

	from num in numbers
	

This statement defines a range num, and the source collection is numbers. Add a call to AsParallel from the source collection numbers:

	numbers.AsParallel()
	

The code below demonstrates the AsParallel method in context.

Using Parallel LINQ to sort a collection of integers.

	using System;
	using System.Collections.Generic;
	using System.Linq;
	using System.Text;
	using System.Threading;
	using System.Diagnostics;
	namespace PLinqDemo
	{
	class Program
	{
	static void Main(string[] args)
	{
	int[] largeArray = new int[10000];
	Random rnd = new Random(DateTime.Now.Millisecond);
	for(int i=0; i<largeArray.Length; i++)
	{
	largeArray[i] = rnd.Next(10000);
	}
	Stopwatch watch = new Stopwatch();
	watch.Start();
	var results = from num in largeArray.AsParallel()
	where num % 2 == 0
	select new
	{
	Number=num,
	ThreadID=Thread.CurrentThread.ManagedThreadId
	};
	watch.Stop();
	Console.WriteLine(watch.Elapsed.ToString());
	Console.ReadLine();
	}
	}
	}
	

At the time this article was written, PLINQ supported parallel
queries over collections of objects and XML sources. (It is worth
noting that on modern PCs 10,000 integers is not sufficiently complex
to get performance gains; hence, the StopWatch may indicate that the code above actually runs faster without the call to AsParallel.)

AsParallel is overloaded and accepts variations on an integer argument and a ParallelQueryOptions enumeration. The integer argument is the degree of parallelism, or number of threads to use. The ParallelQueryOptions are None and PreserveOrdering. PreserveOrdering preserves the order of elements in the collection.

Handling Concurrent Exceptions

One of the challenges of using threads is concurrent exceptions—that
is, how to handling an exception on one or more additional threads.
Parallel LINQ aggregates all exceptions into an instance of System.Threading.AggregateException, marshaled to the calling thread.

The original exceptions are available through the InnerExceptions property, and an AggregateException is created even if only one background thread throws an exception.

Processing Items with Parallel.ForEach

PLINQ introduces iteration control methods like For and ForEach. ForEach accepts an enumerable collection and a generic Action delegate and performs the action on a background thread. The Action can be satisfied with a Lambda expression:

	Parallel.ForEach(results, r=>Console.WriteLine(r));
	

In the fragment results is the input argument from the LINQ query in Listing 1. The argument r=>Console.WriteLine(r)); writes the anonymous type—the type created in the select clause of Listing 1—to the console.

Lambda expressions are condensed anonymous delegates. The left side of the =>
("goes to" operator) represents the function header and input
arguments, and the right side represents the function body and
statements. The Lambda expression is simply read (or understood) as
"Given a value r, write that value to the console."

Summary

When Parallel LINQ extensions are released, it will be easy to use
multithreading with LINQ. The key is to use parallelism for complex
data sets and for data that can be parallelized. Handle background
exceptions by catching System.Threading.AggregateException, examining the InnerExceptions
property for the actual exceptions. And, finally, Amdahl's Law reminds
us to have reasonable expectations about performance gains achievable
when using multithreading.

For additional reading on LINQ and Lambda expressions, see the book LINQ Unleashed for C# (Sams, August 2008, ISBN 978-0-672-32983-8).

 

Original content for this article came from InformIT.

share save 171 16 Using Parallel LINQ

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

Leave a Reply