Up Batch asynchronous tasks with coroutines

Serial async challenge

Last modified on August 15, 2012 17:22

This topic describes the difficulty of writing a sequence of asynchronous operations, each of which must wait for the prior operation to complete. Coroutines are the recommended remedy.


What's the big deal?

In "desktop" .NET (console, Windows Forms, ASP, WPF applications) you have the option of writing a sequence of instructions that include calls to the server. The instruction following a server call waits - is "blocked" - until the server returns. Consider this schematic example.

C#
public void GetCustomerOrders() {

  var custs = new List<Customer>();
  GetCustomers(custs); // First server call for customers

  var someCusts = SelectCustomersOfInterest(custs); // Local filtering

  var orders = new List<Order>();

  GetOrdersOfSelectedCustomers(someCusts, orders); // Second server call for orders

  var someOrders = FilterCustomerOrders(orders); // local filtering

  CustomerOrders = someOrders; // set a property for display
}

The SelectCustomersOfInterest method waits until GetCustomers() returns, having populated the "custs" list with the queried customers. Then it runs and produces the subset, someCusts. Then the GetOrdersOfSelectedCustomers method runs while FilterCustomerOrders waits. When the orders arrive, it filters them and sets the CustomerOrders property with the filtered orders.

It may seem a bit contrived but this kind of thing happens frequently in user applications. You want to get some data from the server, process it a little, get some more data, process that a little, and then display the results on screen.

You can't write code like this in Silverlight. Every trip to the server must be asynchronous in Silverlight. If you try write methods such as these, you soon discover that GetCustomers returns immediately with an empty list of customers.  SelectCustomersOfInterest runs immediately - who know what it does with an empty customer argument. GetOrdersOfSelectedCustomers runs and instantly returns nothing. FilterCustomerOrders returns nothing. The screen shows nothing.

Some time later, the server comes back with the initial batch of queried customers but, by that time, the GetCustomerOrders method has delivered an empty result and your user is confused. The retrieved customers are never processed. You don't get the selected orders. It's a disaster.

That's why you've been learning about the DevForce asynchronous operations for querying, saving, and invoking server methods. You've learned, for example, that you could write the GetCustomers method something like this:

C#
private EntityQueryOperation<Customer> GetCustomers(List<Customer> customerList) {

 // query for all customers
 var queryOperation = Manager.Customers.ExecuteAsync();

 // upon success, pour the customers into the supplied customer list
 queryOperation.Completed += (s,args) => {
       if (args.HasError) { HandleTrouble(args); }
        customerList.AddRange(args.Results);
      };

 // return the query operation so caller can wait
 return queryOperation;
}

Plug that back into the original example and GetCustomerOrders starts to look like:

C#
public void GetCustomerOrders() {

  var custs = new List<Customer>();
  var custOperation = GetCustomers(custs); // Get customers into list

  List<Customer> someCusts;
  var custOperation.Completed += (s, args) => {
         someCusts = SelectCustomersOfInterest(custs);
      };

 // we're not done yet

}

The good news is that SelectCustomersOfInterest won't start filtering the "custs" list until it actually has some customers in it. Although GetCustomerOrders still returns immediately - you can't escape the fundamental asynchronous nature of the process - at least you've got this part of the sequence behaving properly.

The bad news is you've written a lot of tricky code and you are only half way home. You still have to get the orders for "someCustomers" - wait - and filter them before displaying them.

Imagine we've written a GetOrdersOfSelectedCustomers in the same fashion as GetCustomers

C#
private EntityQueryOperation<Order> GetOrdersOfSelectedCustomers(
            List<Customer> customerList, List<Order> orderList) {

 // get the ids of the selected customers
 var ids = customerList.Select(c => c.CustomerID).ToList();

 // query for orders of customers who have ids in the selected id list
 var queryOperation = Manager.Orders
                         .Where(o => o.ids.Contains(o.CustomerID))
                         .ExecuteAsync();

 // upon success, pour those orders into the supplied orderList
 queryOperation.Completed += (s,args) => {
       if (args.HasError) { HandleTrouble(args); }
        customerList.AddRange(args.Results);
      };

 // return the query operation so caller can wait
 return queryOperation;
}

Let's rewrite GetCustomerOrders to use it:

C#
public void GetCustomerOrders() {

  var custs = new List<Customer>();
  var custOperation = GetCustomers(custs); // Get customers into list

  var custOperation.Completed += (s, args) => {

         var someCusts = SelectCustomersOfInterest(custs);
         var orders = new List<Order>();
         var orderOperation = GetOrdersOfSelectedCustomers(someCusts, orders);

         orderOperation += (s, args) => {

              var someOrders = FilterCustomerOrders(orders);
              CustomerOrders = someOrders; // set property for display
          }
      };

}

We've got logic distributed over several helper methods and two levels of indentation in our master method. Notice the "Staircase Effect" and imagine if we had to introduce yet another asynchronous operation.

When is it done?

How does the caller of GetCustomerOrders know when all operations have completed successfully? In other words, how does the caller know when the CustomerOrders property has been set?

Answer: it can't know. Our GetCustomerOrders method has no mechanism for telling the caller when all operations are complete.

Could you change the signature to return something that the caller could use? Think about what that would be. Could you return the first operation object (custOperation)?

You could but it wouldn't solve the problem. The caller could listen for when the customer query completed. But that is only the first step in the sequence. The caller wants to know when the last operation completes.

Could you change the signature to return "orderOperation" which is the last operation object?

No ... you cannot. The "orderOperation" object isn't defined until after you've retrieved customers. GetCustomers must return something immediately; there is no way to stall until we've defined the orderOperation object.

What if one of the operations throws an exception?

The GetCustomers and GetOrdersOfSelectedCustomers methods intercept errors via a "HandleTrouble" method. Where did that method come from? Probably from the caller. How did it get into our helper methods? Looks like our query methods are tightly coupled to the caller. That's not going to work long term. To maintain these methods properly and even consider reusing them, we have to disentangle them from the class that happens to be calling them right now.

Face it. The query methods shouldn't know what to do about errors. They shouldn't have specific knowledge about the caller. We want our async methods to propagate errors back to the caller (who should know what to do with them) in some general, de-coupled way.

Bring on the Coroutine

We're about to describe the DevForce Coroutine in prose and code. Check out the Serial Coroutine Flow Diagram  topic for a more visual rendition of the same subject.

Here is the same example we considered above, written this time with the DevForce Coroutine.

C#
// Coroutine Host Method
public CoroutineOperation GetCustomerOrders() {
 return Coroutine.Start(CustomerOrdersLoader);
}

//Coroutine Iterator
private IEnumerable<INotifyCompleted> CustomerOrdersLoader() {

  var custs = new List<Customer>();

 // Query for every customer in the database
 var custOperation = Manager.Customers.ExecuteAsync();

 yield return custOperation; // SUSPEND

  custs.AddRange(custOperation.Results);

 // Reduce to an "interesting" subset of all customer by
 // filtering in some undisclosed fashion
 var someCusts = SelectCustomersOfInterest(custs);

 // Extract the ids of the selected customers
 var ids = someCusts.Select(c => c.CustomerID).ToList();

 // Query for any order with a customer whose id is in that id list
 var orderOperation = = Manager.Orders
                         .Where(o => ids.Contains(o.CustomerID))
                         .ExecuteAsync();

 yield return orderOperation; // SUSPEND

  var someOrders = FilterCustomerOrders(orders);
  CustomerOrders = someOrders; // set property for display

}

First note that the Coroutine.Start method returns a CoroutineOperation object. This is a derivative of the DevForce BaseOperation class ... just like returned value of every other DevForce asynchronous method.

Yes ... the Coroutine is itself an asynchronous operation. It is special in that it bundles into a single package a collection of many asynchronous operations (aka, "tasks"). Life is much easier for the Coroutine caller which need only wait for a single Coroutine completion event ... rather than  cope with the completions of many individual tasks.

We can't see the caller in this example ... the "thing" that calls GetCustomerOrders. But we should expect this caller to add an event handler to the CoroutineOperation's Completed event ... a handler that will take care of business when the Coroutine reports "all tasks completed."

Let's turn our attention to what we can see - the Coroutine.Start and the CustomerOrdersLoader that is passed to it as a parameter.

The "Co" in "Coroutine" implies at least two cooperating actors. The two actors are (1) the "Provider" and (2) the "Consumer".

The "Provider" is your code that performs one task after another. Some of the tasks are synchronous, some of them asynchronous.

CustomerOrderLoader is the Provider in this code sample. As you can see, CustomerOrderLoader does a little synchronous work (e.g., variable initialization) for awhile. Then it hits an asynchronous method at which point it "yields" to its partner, the "Consumer".

What does the Producer yield? It yields the result of the asynchronous method. That result happens to be a special kind of "Coroutine coordination object" with information about the asynchronous task currently underway.

A Cooroutine coordination object must implement the DevForce INotifyCompleted interface. Every DevForce asynchronous method returns an object that implements INotifyCompleted. Therefore, every DevForce asynchronous method returns an object that can both participate in Coroutine processing and report the status and outcome of an async operation.

To whom does the Producer yield? It yields to the DevForce Coroutine class. Actually, it yields to a hidden DevForce "Consumer" that receives the "coordination object". While you don't actually see the Consumer ... or interact with it directly... it's there, the invisible partner to your Producer co-routine.

The Consumer takes the Producer's "coordination object" and immediately injects its own callback method. This is the method that the "coordination object" will call when the asynchronous operation completes.

Then the Consumer method let's go.

At this point, both the Consumer and the Producer are suspended. Technically, they aren't running at all. They are methods which, seemingly, have done their work and finished.

But both the DevForce Consumer and your coroutine Producer are poised to resume work when the asynchronous operation completes.

When that async operation completes, it raises a Completed event on its operation result object. We just saw that this "operation result object" is also a Coroutine coordination object. In that capacity, it also invokes the callback method that the Consumer injected.

That injected callback method effectively revives the Consumer. The Consumer examines the outcome of the asynchronous operation. If the operation completed successfully, the Consumer, calls upon the Producer, asking the Producer for the next "coordination object".

The Producer method, instead of starting over, picks up where it left off.  In our code sample, it picks up just after the first "yield return" where it adds the results of the customer query to the "custs" list.

The Producer continues executing until it comes to the next asynchronous operation, the query for Orders. Then it yields a second time, returning a different "coordination object" to the Consumer, the operation object returned by the Orders query.

Again, the Consumer injects its callback into the coordination object and both Consumer and Producer are suspended again ...

Until the Orders query completes.

And when it completes, the coordination object calls the Consumer, the Consumer sees that everything is still ok and calls the Producer. The Producer resumes where it left off, filters the Orders, and sets the caller's CustomerOrders property.

This time there is no more work to do. There are no more asynchronous operations and no more "yield return" statements. So (implicitly) the Producer tells the Consumer, "I'm done."

The Consumer, realizing that there are no more tasks, wraps up and raises the Completed event on its CoroutineOperation object.

Do you remember that object? That was the object returned by GetCustomerOrders. As we noted above, the GetCustomerOrders caller added its own handler to the CoroutineOperation. That handler kicks in and does whatever needs doing now that all of the asynchronous tasks in the Producer co-routine have finished.

An Iterator Tutorial

The co-routine dance works well because we write the task logic in a linear fashion, much as we write a fully synchronous procedural method.

Visual Basic developers can not code in this iterator style because VB.NET does not support iterators or the yield statement. Call the Coroutine with a function list instead; this approach uses iterators under the hood while shielding the developer from the yield statements. This alternative technique is also useful in many C# scenarios.

The kicker is that "yield return" statement which suspends the Producer while the asynchronous operation works in background. The "yield return" syntax is the tell-tale sign of an "Iterator".

The iterator is the key to the async coroutine pattern. You've undoubtedly used an iterator but you may not have known it and may never have written one.

Iterators are not a DevForce invention. They've been around in the literature for a long time. Iterators were added to .NET in version 2.0.  They're a mechanism for generating values on demand. Here's an example:

C#
public IEnumerable IntegerIterator(int start, int end){
   for (int i = start; i <= end; i++) {
       yield return i;
    }
}

You usually consume it with a foreach expression like so:

C#
 foreach (int n in this.IntegerIterator(1, 10)){
      System.Console.Write(n + " ");
  }
 // Output: 1 2 3 4 5 6 7 8 9 10

A C# foreach takes something that produces an IEnumerable and "iterates over it" , by which we mean, it repeatedly asks the IEnumerable for a next value until the IEnumerable has no more values to give. The IEnumerable could be a collection of values. Or, as in this case, it could be a producer of values.

Iterators can produce values in all kinds of ways. Here's one way that is similar to what you've seen in a coroutine:

C#
public IEnumerable StringIterator(){
   yield return "My dog";
   yield return "has";
   yield return "fleas";
}

We can consume it with a foreach just as we did the IntegerGenerator:

C#
 foreach (int str in this.StringIterator()){
      System.Console.WriteLine(str);
  }
 // Output: My dog
 //         has
 //         fleas

The foreach calls our StringIterator three times because there are three yields. The StringIterator "remembers" where it last yielded. The first call yields "My dog" ... and StringIterator positions itself so that the next time it is called, it yields "has". And the next time it yields "fleas". And the next time it reports "I've got nothing else to give you".

The iterator is suspended after each yield. By "suspended" I mean that the iterator stops. I do not mean that the thread is suspended or that the iterator is "waiting". The iterator is "done" in much the way that any other method you call is done when it hits a return statement.

But there is a crucial difference. Any other method would start from its beginning. When you call the iterator, it resumes where it left off. The iterator maintains a snapshot if its current state and a bookmark of the execution step where it yielded. When called again, it goes to that bookmark, revives the saved state, and continues from there.

Mixing regular and yielding statements

The iterator may be composed with a combination of regular and "yield return" statements. We can rewrite the previous iterator as follows; it produces exactly the same output.

C#
public IEnumerable StringIterator(){

    var myDog = "My dog";
   yield return myDog;

    var has = "has";
   yield return has;

    var ignored = "you'll never see this";
    var pulgas = "fleas";
   yield return pulgas;
}

After each call, the iterator evaluates the next series of statement until it encounters a "yield return". This iterator returns three strings as it did before but this time, each string takes two or three statements to product.

The ability to interleave regular and yielding statement is critical to the asynchronous Coroutine pattern. In real world applications, our iterators need to mix synchronous statements with asynchronous statements that yield until the async operation returns.

Quitting an iterator early

Your iterator logic may require you to terminate an iterator early, ignoring the remaining statements. The pertinent syntax is "yield break" as shown here:

C#
public IEnumerable StringIterator(){

   yield return "My dog";
   yield return "has";

   yield break; // Ignore everything below this point

   yield return "fleas"; // never executed
}

 // Output: My dog
 //         has
Created by DevForce on October 13, 2010 20:57

This wiki is licensed under a Creative Commons 2.0 license. XWiki Enterprise 3.2 - Documentation. Copyright © 2015 IdeaBlade