Tuesday, February 5, 2013

How the World's Fastest ESB was Made

A couple of years ago, at WSO2 we implemented a new HTTP transport for WSO2 ESB. Requirements for this new transport can be summarized as follows:
  1. Ultra-fast, low latency mediation of HTTP requests.
  2. Supporting a very large number of inbound (client-ESB) and outbound (ESB-server) connections concurrently (we were looking at several thousand concurrent connections).
  3. Automatic throttling and graceful performance degradation in the presence of slow or faulty clients and servers.
The default non-blocking HTTP (NHTTP) transport from Apache Synapse, which we were also using in WSO2 ESB, supported the above requirements up to a certain extent but we wanted to do better. The default transport was very generic and it was designed to offer reasonable performance in all the integration scenarios the ESB could potentially participate in. However HTTP load balancing, HTTP URL routing (URL rewriting) and HTTP header-based routing are some of the most widely used integration patterns in the industry and to support these use cases well, we needed a specialized transport. 
The old NHTTP transport was based on a dual buffer model. Incoming message content was placed in a SharedInputBuffer and the outgoing message content was placed in a SharedOutputBuffer. Apache Axiom, Apache Axis2 and the Synapse mediation engine sit between the two buffers, reading from the input buffer and writing to the output buffer. This architecture is illustrated in the following diagram.
The key advantage of this architecture is that it enables the ESB (mediators) to intercept all the messages and manipulate them in any way necessary. The main downside is every message happens to go through the Axiom layer, which is not really necessary in cases like HTTP load balancing and HTTP header-based routing. Also the overhead of moving data from one buffer to another was not always justifiable in this model. So when we started working on the new HTTP transport we wanted to get rid of these limitations. We knew that this might result in a not-so-generic HTTP transport, but we were willing to pay that price at the time.
So after some very interesting brainstorming sessions, an exciting 1-week long hackathon followed by several months of testing, bug-fixing and refactoring we came up with what’s today known as the HTTP pass-through transport. This transport was based on a single buffer model and completely bypassed the Axiom layer. The resulting architecture is illustrated below.
The HTTP pass-through transport was first released in June 2011 along with WSO2 ESB 4.0. Back then it was disabled by default and the user had to enable it by uncommenting a few entries in the axis2.xml file. The performance numbers we were seeing with the new transport were simply remarkable. WSO2 also published some of these benchmarking results in a March 2012 article. However at this point the 2 main limitations in the new transport were starting to give us headaches.
  1. Configuration overhead (Users had to explicitly enable the transport depending on their target use cases)
  2. Cannot support any integration scenario that requires HTTP content manipulation (because Axiom was bypassed, any mediator attempting to access the message payload would not get anything useful to work with)
In addition to these technical issues there were other process related issues that we had to deal with. For instance maintaining two separate HTTP transports was twice as work for the developers and testers. We found that because the pass-through transport was not used as the default, it often lagged behind the default NHTTP transport in terms of features and stability. So after a few brainstorming sessions we decided to try and make the pass-through transport the default HTTP transport in Apache Synapse/WSO2 ESB. But this required making the content manipulation use cases (content aware use cases) work with the new transport. This implied bringing Axiom back into the picture, the very thing we wanted to avoid in our initial implementation. So in order to balance out our performance and heterogeneous integration requirements we came up with the idea of “on-demand message parsing in the mediation engine”.
In this new model, each mediator instance belongs to one of two classes.
  1. Content-unaware mediators – Mediators that never access the message content in anyway (eg: drop mediator)
  2. Content-aware mediators – Mediators that always access the message content (eg: xslt mediator)
We also identified a third class known as conditionally content-aware mediators. These mediators could be either content-aware or content-unaware depending on their exact instance configuration. For an example a simple log mediator instance, configured as <log/> is content-unaware. However a log mediator configured as <log level=”full”/> would be content-aware since it’s expected to log the message payload. Similarly a simple property mediator instance such as <property name=”foo” value=”bar”/> is content-unaware but <property name=”foo” expression=”/some/xpath”/> could be content-aware depending on what the XPath expression does. In order to capture this content-awareness characteristic of mediator instances at runtime, we introduced a new method (isContentAware) to the top level Mediator interface of Synapse. The default implementation in AbstractMediator class returns true by default so as to maintain backward compatibility. 
With this change in place we modified the mediation engine to check the content-awareness of property of each mediator at runtime before submitting a message to it. List mediators such as the SequenceMediator would run the check recursively on its child mediators to obtain the final value. Assuming that messages are always received through the pass-through HTTP transport, the mediation engine would invoke a special message parsing routine whenever a mediator is detected to be content-aware. It is in this special routine that we bring Axiom into the picture. Therefore if none of the mediators in a given flow or a service is content-aware, the pass-through transport works as it usually does without ever engaging Axiom. But whenever a content-aware mediator is involved, we bring Axiom in. This way we can reap the performance benefits of the pass-through transport while supporting all integration scenarios of the ESB. Since we engage Axiom on-demand we get the best possible outcome for all scenarios. For instance a simple pass through proxy would always work without any Axiom interactions. An XSLT proxy that transforms requests would engage Axiom only in the request flow. Response flow would operate without parsing the messages.
Another tricky problem we encountered was dealing with message parsing itself. For instance how do we parse a message and then send it out when there is only one buffer provided by the underlying pass-through transport? Ideally we need two buffers to read the incoming message from and write the outgoing message to. Also the fact that the Axis2 message builder framework can only handle streams posed a few problems. The buffer we maintained in the pass-through transport was a Java NIO ByteBuffer instance. So we needed to adapt the buffer into a stream implementation whenever the mediation engine engages Axiom. We solved the first problem by implementing our message builder routine to create a second output buffer whenever Axiom is dragged into the picture. The outgoing messages are serialized into this second buffer and the pass-through transport was modified to pick the outgoing content from the second buffer when it’s available. Writing an InputStream implementation that can wrap a ByteBuffer instance solved the second problem.
One last problem that needed to be solved was handling security. In Synapse/WSO2 ESB, security is handled by Apache Rampart, which runs as an Axis2 module that intercepts the messages before they hit the mediation engine. So on-demand parsing at the mediation engine doesn’t work in this scenario. We need to parse the messages before Rampart intercepts them. We solved this issue by introducing a new smart handler to the Axis2 handler chain, which intercepts every message and performs an early parse if security is engaged on the flow. The same solution can be extended to support other modules that require parsing message payload in the Axis2 handler chain.
The reason I decided to compile this blog is because WSO2 folks just released WSO2 ESB 4.6. And this release is based on the new model I’ve described here. Pass-through transport is what the users now get by default. The WSO2 team has also published some performance figures that clearly indicate what the new design is capable of. It turns out the latest release of WSO2 ESB outperforms all the major open source ESB vendors by a significant margin. This release also comes with a new XSLT mediator (Fast XSLT) that operates on the top of the pass-through model of the underlying transport and a new streaming XPath implementation based on Antlr.
The next step of this effort would be to get these improvements integrated into the Apache Synapse code base. This work is already underway and you can monitor its progress through SYNAPSE-913 and SYNAPSE-920.



Great post Hiranya.. clearly an amazing article..

Samisa said...

Hiranya, this is an excellent post. This is like a gospel on the design of the pass-through transport, and I would recommend this to anyone who wants to understand it in detail. Not only that; this blog is also a good case study on software design in general, because you clearly articulates both the high level design, and also the challenges faced and how they were overcome in the design process.

Charitha said...

As always, this is yet another great post by Hiranya. This again demonstrates your ability of explaining complex matters in simple terms which can be understood by average people like me :) I listened to many conversations about new PT transport but this is the best explanation I came across.

Unknown said...

Finally found a great explanation about the new Transport.

Asanka Abeysinghe said...

As my other colleagues said this is an awesome post and very informative. BW I'm hijacking few diagrams and content for my WSO2CON-2013 talk from this :).

Harsha's Blog said...

It is very clear and hope your more blogs.

Harsha's Blog said...

This blog was very helpfull for us. There are no such kind of blog for ESB Pass Through.

Unknown said...

Great post. Thanks so much for writing this blog post.

Unknown said...

Great post !!!. Thank you so much for writing this blog post.

shammi Dhananjaya jayasinghe said...

As every one said, Thank you very very much for posting this excellent article. This gave me a clear overview on not only pass through but also the message flow in NHTTP. I think with having this back ground knowledge on PT, any one who face problems when using PT which has carbon knowledge , will be able to dig in to the source and fix it.