[Article] Design for Failure - Integration Error Handling Part 1

  • By CHATHURA KULASINGHE
  • 13 Mar, 2016

Table of contents


Introduction

Integration error handling is a vital aspect that any organization today has to deal with. Yet, the concepts or general practices related to integration error handling are not discussed widely; existing relevant content too are hidden behind many web pages that describe features or capabilities provided by different integration platform vendors or products. This article is the first iteration of a series that is intended to identify and address the challenges in this space.

This article will specifically focus on common integration styles and relevant integration errors with a ew selected scenarios used as examples. Based on these scenarios, some exercises will be performed by modeling each integration error handling case using WSO2’s integration platform components. Thus, this article will also give you some insights into a few important concepts related to the WSO2 integration platform, which will be useful when dealing with similar scenarios.


The scope of study

In the context of systems integration, there are a few well-known integration styles such as

  • Remote procedure invocation
  • Database sharing
  • Files transfer
  • (and specially) Messaging

Modern integration platforms facilitate such different styles of systems integration and are equipped with combinations of common messaging system capabilities to support messaging-style integration scenarios. These capabilities include

  • Message channeling
  • Pipes and filters
  • Message routing
  • Message translation
  • Message endpoints

In other words, such messaging systems could be identified or found within most of the well-known integration platforms. Based on the integration styles or the messaging systems that are being utilized, there are possibilities of many errors being thrown and failures being reported. Unlike a custom application exclusively built for handling a specific point-to-point integration case, such an integration platform may not have a narrow scope. Hence, integration error handling related techniques will be somewhat different from normal domain specific language (DSL) based exception handling mechanisms. However, subject to a particular integration scenario, it is difficult to avoid such error events. Especially compared to a custom application error in an integration error scenario, it makes it harder for integration engineers to identify the error prone components within the systems. The reason is that there are many parties or systems being integrated; therefore, even a perfectly performing entity can throw errors based on the communication and contents published by some other connecting party/system.

These errors may depend on many factors such as

  • The selected Integration-Style
  • Internally utilized Messaging-Systems (conceptual)
  • The Messaging-Channels modeled within the flow
  • Integration-Patterns

Different combinations of the above mentioned factors may create room for different errors and failures, which makes generalization of integration errors much more difficult. Therefore, identifying and modeling patterns for handling common integration errors can help with taking necessary precautions and minimizing failures.

To start with integration error handling patterns, let’s narrow the scope to messaging-style integration.


Integration-platform, connections and broken-channels

A very basic integration scenario involves at least two existing systems and some kind of a connector/bridge/bus component, or simply, an integration platform that bridges the communication between/among these systems, performing any required transformation or translation.

Integration is all about communication and at a very higher level. From the point of view of one particular connecting system (system A), there are two possible failure scenarios that would pose challenges by not receiving a proper response due to broken communication channels of the integration.

  1. One of the connecting systems becomes unresponsive/unavailable


This can happen because of program level errors of one of the system or hardware/network failures. However, a good integration platform should be capable of handling such situations to ensure none of the messages would go unnoticed and the sender system would receive a proper response. In this case, either the platform should retry and deliver the message and complete the task, or it should publish the interrupted message into some other channel that will seek human assistance to fix the problem. Therefore, this is something that needs to be and can be handled easily within the integration platform.

  1. The integration-platform becomes unresponsive/unavailable


Similar to the above mentioned situation, there is a possibility of the integration platform becoming unresponsive/unavailable due to hardware/network level failures or some errors that may occur while performing a task (mediation flow errors).

Generally, the integration platform components are clustered and distributed in order to maintain high-availability. If the inbound communication channel breaks due to a network failure, the external system (System A) will not be able to invoke the functions (services/APIs etc) exposed through the integration platform. Hence, no failures would go unnoticed in that case.

Errors may occur within the integration platform while performing a mediation or while trying to communicate with a back-end system (subject to a particular mediation flow as discussed earlier). However, if a message reaches the integration platform through the inbound communication channel, it’s the integration platform’s responsibility to make sure that the flow does not break inside and an appropriate response is always delivered to the sender (System A).

Therefore, first you may consider how to maintain a healthy communication channel between the Sender system and the Integration platform.


A healthy channel - with the sender

In an integration scenario similar to the above, the response that the sender may receive can either be a response with some data or just an acknowledgement.

In a case (scenario 1) where the sender-system expects a response that contains business data, if something fails, the integration platform should respond to the sender with a proper error message and the sender system has to catch such errors and handle those accordingly.

However, in scenarios (scenario 2) where one-way communication is acceptable, the integration platform should be able to deliver an acknowledgement (such as 202 Accepted status response in HTTP communication) to the sender, and handle the message afterwards taking care of all the possible errors and failures.

To start with integration error handling, we may limit the iteration of this article to discuss how to handle the errors that can disrupt communication between the sender-system and the integration-platform using WSO2’s integration platform capabilities. This iteration would also discuss error handling related to the first scenario mentioned above, focusing more on the error handling basics that someone needs to be aware of.


Scenario 1 - Sender expects a response with data

A response is generally expected by a sender-system in data retrieval cases. If a particular HRM service exposes its employee information in the form of SOAP or REST services, getEmployeeByID or /employees/{id} with GET method can be presented as examples.

  1. Sender system sends a request to the integration platform
  2. Integration platform receives the message
  3. Integration platform starts processing
  4. An error occurs
  5. Integration platform responds to the sender with an error message
  6. Sender system reads the error and handles accordingly

In such cases, the sender expects a response with business information in order to perform some action with these information afterwards. If the required information was not delivered in the response, the sender system/application has to handle that accordingly.

The role of the integration platform here is to perform any additionally required mediation and route the request and the response to the correct end while handling errors and sending a meaningful error message to the end that the request was originated from.

Now let’s see how this could be implemented in the real world utilizing WSO2’s integration platform capabilities.


Mediation flow explained - WSO2 context

To understand the normal flow of a basic mediation flow of the WSO2 integration platform, it is necessary to have a basic understanding of some key terminology such as mediators and sequences. Let’s take a conveyor belt in a fresh milk factory as an example to describe this in brief.

Figure 1 shown below below depicts a few actions performed in a sequential manner in order to prepare a bottle of fresh milk that’s ready to be sold in the market. If the empty glass bottle is considered as the incoming original message, the equipments fixed on top of the conveyor belt can be considered as mediators. Each action performed within a particular mediation flow is represented by a mediator. Based on the commonly executed tasks within such mediation scenarios, the WSO2 integration platform offers a set of pre-built mediators.

Figure 1

If this main conveyor belt and the sequentially arranged set of equipments are considered as a unit this can be compared to a mediation sequence. The WSO2 integration platform allows such reusable action units to be developed and these are referred to as sequences.


The fault sequence

If a fault/error is detected within a particular sequence, the current processing message is immediately dumped by the mediation engine into another sequence called a fault sequence. In general, a particular mediation flow has three main sequences:

  1. InSequence
  2. OutSequence
  3. FaultSequence

InSequence handles the incoming requests, and the OutSequence has been designed to handle the response. When reusable sequences were composed with some mediation logic inside, such sequences can be called within these main sequences. It is also possible to compose a custom fault sequence for each reusable sequence. This way, each mediation code segment could have its own error handling logic to be executed separately.

Figure 2

Now we have a fair knowledge on the basic composition of a mediation flow, especially on how and where to catch and handle errors at a very basic level.


Examples: A response with a proper error message

The below example mediation has been composed based on a sample provided in the WSO2 ESB official documentation. In this example, we are trying to send a message to a non-existing backend service endpoint which obviously would result in an error. This error is caught in the default faultSequence, and a meaningful error message is sent to the Sender system. WSO2 ESB is equipped with a mediator called Fault mediator, which provides the capability of generating a fault message easily. This has been used in the faultSequence of the mediation flow in order to build the relevant error message (see <makeFault> element in the code).


Example 1

<proxy xmlns="http://ws.apache.org/ns/synapse"
       name="FaultsTesterProxy"
       transports="https,http"
       statistics="disable"
       trace="disable"
       startOnLoad="true">
   <target>
      <inSequence>
         <log level="custom">
            <property name="Located" value="inSequence"/>
         </log>
         <log level="full"/>
         <send>
            <endpoint>
               <address uri="http://xxxx:9000/services/NonExistingService"/>
            </endpoint>
         </send>
      </inSequence>
      <outSequence>
         <log level="full"/>
         <send/>
      </outSequence>
      <faultSequence>
         <makefault version="soap11">
            <code xmlns:soap11Env="http://schemas.xmlsoap.org/soap/envelope/"
                  value="soap11Env:Server"/>
            <reason value="General Mediation Error"/>
            <detail>Some error occurred while mediating the message</detail>
         </makefault>
         <send/>
      </faultSequence>
   </target>
   <description/>
</proxy>

If the child nodes of <makeFault> element is observed, it can be noticed that the error reason has been provided as General Mediation Error, rather than mentioning it as Connection Error or detailing it as an error occurred while connecting the backend service endpoint. This was done purposely because with this default fault sequence any error and all the errors within the mediation flow are caught. Therefore, it makes more sense to mark it as a general message to be returned in all the possible error scenarios.

The next example is an enhanced version where the Connection Error related to backend service endpoint has been caught separately and treated accordingly by returning a more specific message to the Sender system.


Example 2

See the below mentioned code segment and compare with the previous sample. Here, instead of directly using the mediator, we have referred to some other sequence called sequence-x.

<proxy xmlns="http://ws.apache.org/ns/synapse"
       name="FaultsTesterProxyEnhanced"
       transports="https,http"
       statistics="disable"
       trace="disable"
       startOnLoad="true">
   <target>
      <inSequence>
         <log level="custom">
            <property name="Located" value="inSequence"/>
         </log>
         <log level="full"/>
         <sequence key="sequence-x"/>
      </inSequence>
      <outSequence>
         <log level="full"/>
         <send/>
      </outSequence>
      <faultSequence>
         <makefault version="soap11">
            <code xmlns:soap11Env="http://schemas.xmlsoap.org/soap/envelope/"
                  value="soap11Env:Server"/>
            <reason value="General Mediation Error"/>
            <detail>Some error occurred while mediating the message</detail>
         </makefault>
         <send/>
      </faultSequence>
   </target>
   <description/>
</proxy>

Observe the code inside the previously mentioned sequence-x. This sequence consists of the sending (backend communication) part of the mediation flow. Notice the onError attribute, where it has referred to some other sequence to be executed if the sending activity fails.

<sequence name="sequence-x" onError="sequence-x-error" xmlns="http://ws.apache.org/ns/synapse">
    <send>
        <endpoint>
            <address uri="http://xxxx:9000/services/NonExistingService"/>
        </endpoint>
    </send>
</sequence>

Now a fault mediator has been placed inside the sequence-x-error sequence. However, the error reason and the detail has more specific information this time because in this case we know exactly that this custom fault sequence called sequence-x-error would only be executed when something goes wrong with the backend service endpoint/connection.

<sequence name="sequence-x-error" xmlns="http://ws.apache.org/ns/synapse">
    <makefault version="soap11">
        <code value="soap11Env:Server" xmlns:soap11Env="http://schemas.xmlsoap.org/soap/envelope/"/>
        <reason value="Connection Error"/>
        <role/>
        <detail>Error occurred while connecting to the backend service.</detail>
    </makefault>
    <send/>
</sequence>

Following the same technique, all the possible errors could be handled precisely within the WSO2 integration platform and a meaningful error message can be delivered to the Sender system so it could handle each error accordingly.


Conclusion

In this article we discussed the integration styles and patterns in brief, qualities of modern integration platforms, messaging style integration and the possible integration errors at a higher level. We also considered the WSO2 integration platform as an example and two examples were presented with relevant mediation logic. These two examples explained how to handle integration errors that may occur in communication between the sender-system and the integration-platform in a scenario where the sender-system expects a response with data. In addition, the basics in designing and modeling mediation flows were discussed in detail in order to approach integration-error-handling in a much more simpler way with a view to use that knowledge in future iterations of this article series. The objective of this article series is approaching and addressing integration error handling in a more regular manner. Therefore, as the very first step, everything was started from the sender-system’s end and errors that may be experienced from the sender-system’s point-of-view. In the next iteration of this article series, we will discuss the second scenario, which is relevant to cases where one-way communication is acceptable. Once we make sure that a healthy communication channel has been properly established between the sender-system and the integration platform, we would be able to discuss the rest of the possible error scenarios and how to handle them within the integration platform.

Update: Please find the second part of this article with Design for Failure - Integration Error Handling Part 2

About Author

  • CHATHURA KULASINGHE
  • Lead - Solutions Engineer
  • WSO2 UK Limited