Up Batch asynchronous tasks with coroutines

Parallel async coroutines

Last modified on August 15, 2012 17:22

Use a parallel asynchronous coroutine when two or more asynchronous tasks should run simultaneously and you want to be notified when all of them have completed ... or if any one of them fails.


We've focused on batching Asynchronous Serial Tasks because it's a common pain point that cries out for some relief.

Having slain that dragon, you discover another one in the same cave. You have several asynchronous tasks that must finish collectively but they are mutually independent and you don't care which finishes first. You suspect that your application might run faster if you fired them off together and let the caller work on something else until all of the parallel tasks are done.

You suspect ... but you do not know. The watch word for performance tuning is "measure.". Measure before and after. Measure for different runtime environments. We have seen plenty of multi-threaded code that ran slower than the original single-threaded approach. While it is often the case that a batch of IO tasks, such as fetching entities from a remote database, complete significantly faster when run in parallel, we urge you to test that assumption in your application.

A parallel scenario: loading lists at launch

Many applications rely on lots of small lists:  lists of states or provinces, lists of status codes, lists of suppliers, etc. Most are very stable (the provinces) and might be worth enshrining in the code. However, you have your good reasons to read them in from the database when the application launches.

Your application will not function properly until all of the lists are loaded. You don't care are about load order; you just want to hold back the user until they've all arrived.

You noticed that waiting for the server to deliver these lists one by one is painfully slow. The size of each list is not the issue - they are all tiny. It's the number of server trips that is killing responsiveness. Most of the overhead is in managing the conversation with the server, not in database query or in the transmission of data. Your client is idle while you wait for each list result. Maybe you can speed things up by firing off a bunch of requests at once. You could be right about that.

You're first thought is to pull down everything you need in a single query. Sometimes you can do that .. .see "Retrieving Many Types in a Single Query" below. More often you can't. There is probably no relationship - neither inheritance nor navigational - among the entities you want to retrieve. You have to issue separate queries for each list.

Coroutine.StartParallel()

The DevForce Coroutine class is a good solution to this problem. It operates much like serial asynchronous batching except you'll call Coroutine.StartParallel  (coroutine) instead of Coroutine.Start  (coroutine).

Two kinds of Coroutine are quite similar. An individual serial asynchronous task looks like an individual parallel asynchronous task. A batch of asynchronous serial tasks looks much like a batch of asynchronous parallel tasks. A producer of asynchronous tasks (such a the coroutine iterator) yields the same kind of INotifyCompleted objects.

The critical difference - and it is critical - is that parallel tasks cannot share their data with each other. They must be completely independent of each other - you can't use the result of one as the input to another - and you should not care whether one finishes before the other.

This difference aside, batches of asynchronous serial and parallel tasks are structurally the same.

In DevForce you define a batch of tasks the same way: with a co-routine that returns an IEnumerable<INotifyCompleted>. Here's an example:

C#
// Coroutine parallel task iterator
private IEnumerable<INotifyCompleted> LoadListEntitiesCoroutine() {

 yield return Manager.Employees.ExecuteAsync(); // only 7 in Northwind
 yield return Manager.Regions.ExecuteAsync()();
 yield return Manager.Territories.ExecuteAsync()();
 yield return Manager.Suppliers.ExecuteAsync()();

 // not in Northwind but you get the idea
 yield return Manager.StatesProvinces.ExecuteAsync()();
 yield return Manager.Colors.ExecuteAsync()();

}

Each line launches an asynchronous query and yields the query's coordinating "Operation" object result.

Once you've defined a batch of tasks, you need some collaborating component to run them for you, to decide when the batch is done, and to redirect failures to your error handling code. We've been calling this the DevForce Coroutine "Consumer".

We know what happens if we run this co-routine iterator with the Serial Coroutine Consumer by calling Coroutine.Start(LoadListEntitiesCoroutine): the first query runs, wait, return, then the next one runs, wait, return, and so forth.

That's not what we want. We want "start the first, and the second, and the third ... and the fifth; now wait until they finish; then tell me about it."

And that's what you get when you call Coroutine.StartParallel(LoadListEntitiesCoroutine). The DevForce Parallel Coroutine Consumer immediately walks through the entire iterator launching all of the asynchronous tasks at once. The iterator yields INotifyCompleted objects (just as we did in the serial Coroutine example). With each yield, the Parallel Consumer identifies an individual asynchronous task to watch. It builds a list of these yielded values as they arrive (see the Notifications collection below).

Happy side-effects of a parallel coroutine

We may not care about the result of our parallel tasks. We might be content with their side-effects. In this example, we're filling the cache with entities that we'll use later. That may be all we need to do.

Imagine that we subsequently query for Orders. We want to know which "employee" was the sales rep who sold that order. The statement anOrder.Employee will find the related employee available in cache. You won't have to worry about making an asynchronous trip to the server to get the employee ... because it is waiting for you in cache.

This approach is popular ... for good reason. It's difficult to write a stable, responsive UI that could fly off to the server any minute for missing data. Development is much easier if you can set up with everything you need - pay the asynchronous management price up front - and then operate locally for as long as possible.

If you adopt this approach, it is a good idea to disable the "lazy load" feature by setting the EntityManager's DefaultQueryStrategy to CacheOnly.

  Manager.DefaultQueryStrategy = QueryStrategy.CacheOnly;

You can still query the database for specific information at any time.  Just remember to specify the "go to server" QueryStrategy when you need to make that trip. For example, when we want to get fresh information about a particular customer, we could write:

C#
  Manager.Customers
    .Where(c => c.CustomerID == someID)
    .With(QueryStrategy.DatabaseThenCache) // go to database first, then check with the cache
   .ExecuteAsync(yourCallback);

   // Do other stuff

The phrase that matter is:

    .With(QueryStrategy.DatabaseThenCache)

Results of a parallel coroutine

On the other hand, you might want to do something with the parallel task results as soon after they are all arrive.

A good place to do that is in your parallel task caller ... more precisely, in a callback method or Completed event handler that you established in your parallel task caller.

C#
// Coroutine caller
private void LoadListEntities() {

  var coop = Coroutine.StartParallel(LoadListEntitiesCoroutine);

  coop.Completed += (sender, args) => {
   if (args.CompletedSuccessfully) {
      FillLists(args.Notifications);
    } else {
      HandleError(args);
    }
  }
}

private void FillLists(IList<INotifyCompleted> notifications) {

 // Although order of finish is indeterminate,
 // the order of notifications matches the Coroutine yield order
 // Simplistic approach for demo purposes.

  EmployeeList = MakeList<Employee>(notifications[0]);
  RegionList = MakeList<Region>(notifications[1]);
  TerritoryList = MakeList<Territory>(notifications[2]);
  SupplierList = MakeList<Supplier>(notifications[3]);
  StatesProvinceList = MakeList<StateProvince>(notifications[4]);
  ColorList = MakeList<Color>(notifications[5]);
}

private IList<T> MakeList<T>(INotifyCompleted completedArg) {
   var op = completedArg as EntityQueryOperation<T> ;   
  return (null == op) ? new List<T>() : new List<T>(op.Results);
 }

The Notifications collection

The highlight of the code example is the Notifications collection. The Notifications collection is the vehicle for processing task results.

The DevForce Coroutine Consumer adds each value yielded by your co-routine Producer (your iterator) to this collection in the order received. You access the  Notifications collection from either the CoroutineOperation and CoroutineCompletedEventArgs.

Such tracking of yielded INotifyCompleted values works for the Serial Asynchronous Coroutine as well. Only the timing differs. In the serial case, the yielded values appear in the Notifications collection over time. They'll all get there eventually if the serial Coroutine succeeds. In the parallel case, they are available immediately.

Retrieving Many Types in a Single Query

Parallel queries are ideal when you have to make a lot of small queries simultaneously. But each query is its own trip to the server, even when you parallelize them. If you restructured your database a little bit, you might be able to reduce many of these trips to a single trip.

We're making a digression into "Polymorphic Queries". The intent is the same - to retrieve a variety of entities at the same time. That's why we're discussing it here. But the technique has nothing to do with parallel queries.

Maybe you could combine certain kinds of entity types into a single physical table. Then you could define a common base entity class for that table, model the specific entity classes as sub-types of the base entity class, and issue a single DevForce "polymorphic query" defined for the base entity. This will pull down all of the derived entities at once.

We've seen this approach used to good effect with Code entities. Codes often share the same structure: {Id, CodeName, Description}. You can often store hundreds of different codes in a single Code table {Id, CodeType, CodeName, Description} and discriminate among them by the "CodeType" field. You define an abstract base type, "Code", and use Entity Framework Table-per-Hierarchy (TPH) to model the hundred distinct code entity types as derivatives of the Code entity.

Start your DevForce application, query for all "Codes", ... and all of the distinct code-like entities arrive in cache in a single query.

This can be extremely effective ... if you're comfortable with it. Not everyone is; some folks want the database foreign key constraint to prevent accidentally setting the Order with a Color code instead of a Shipping code. You have to decide if this is a significant risk for your application.

Created by DevForce on October 14, 2010 01:01

This wiki is licensed under a Creative Commons 2.0 license. XWiki Enterprise 3.2 - Documentation. Copyright © 2015 IdeaBlade