Apache Camel: Processors Should NEVER Be Stateful

If you have experience with Apache Camel, this one might sound a little obvious.  But, it has recently come up a few times, so it’s worth mentioning. As an example, say you have a route that iterates over paged data and does something with it, and you therefore need to keep track of the pagination. You might use something like the following:

public class FooProcessor implements Processor {
 private int page = 0;
 
 public void process(Exchange exchange) throws Exception {
   List<String> results = new ArrayList<>();
   
   while (page < 5) {
     // Run *anything* involving pagination -- SQL, REST, etc.
     String result = ...;
     results.add(result);
     page++;
   }
   
   exchange.getIn().setBody(results);
 }
}

Spot the issue? At first glance, this looks like it’d work, and it will if the route never executes multiple times at once. But…

In Camel, a Processor is a singleton bean, meaning every flow through a route hits a single instantiation of the Processor object. So, if a route is executed, then executed again before the first flow finishes, “page” will set back to 0 in the middle of the original run. Even worse, both Processor runs then increment “page”, so neither one is able to retrieve the entire result set.  You wind up with something like:

  1. Processor run #1, page = 0
  2. Processor run #1, page = 1
  3. Processor run #2, page = 0
  4. Processor run #1, page = 1
  5. Processor run #2, page = 2
  6. Processor run #1, page = 3
  7. Processor run #2, page = 4

In the end, run #1 will contain pages 0, 1, 1 & 3, while #2 has 0, 2, and 4.  In high throughput applications, or contexts where the Processor execution can take a while, this certainly spells disaster for the end results.

The example is admittedly ridiculous and could be easily fixed by moving the “page” variable into the “process” method itself, rather than at the Class level.  However, the point is that any statefulness within a Processor, Bean, etc. must be avoided!