Reliable Messaging for (IoT) Device Management

  • By Geeth Munasinghe
  • 24 Oct, 2017


Delivering mission-critical messages plays an important role in any device management framework. These messages can carry vital configuration changes and operation instructions. Many of these messages only need to be delivered once. Getting feedback once the message is delivered also plays a significant role as it’s the only way a device management framework knows that the message has reached the device.

WSO2’s IoT Architecture supports integration with popular message delivery frameworks, such as APNS (Apple Push Notification Service), FCM (Fire based Cloud Messaging from Google), WNS (Windows Notification Service), and MQTT. It also has an extension to integrate with many other such services.

This article discusses some of the important concepts in reliable messaging for device management and specific implementation patterns of WSO2 IoT Server with the above-mentioned message delivery services.

What is Device Management?

Devices come in many forms. They vary from computers we use in our day-to-day work, mobile phones, tablets, machinery in factories, appliances used in homes to relatively smaller pieces of hardware that allow bigger machines to be developed. Regardless of the physical appearance or size, device management is a common need for any type of these devices.

Device management is the process of managing the device and its operation/functions. It includes maintenance of a physical and/or virtual device and includes various administrative task and processes to keep devices up to date and adhering to protocols.

Some typical examples of device management include:

  1. Installing drivers/firmware and applications to the devices
  2. Configuring the devices to adhere to company protocols and policies
  3. Implementing security measures to prevent compromising situations
  4. Monitoring and analyzing device data to get real-time insights

In order for devices to comply with rules and regulations, it must receive this information in a reliable manner. This is where reliable and smart messaging (or operation management) comes into play to address this requirement.

Operation Management

What is an Operation?

Device management involves adhering to enterprise policies, security protocols, and upgrading/updating applications/firmware. In order to achieve this, the device must receive the relevant information in a readable format or in its machine language. This relevant information can be sent as a payload to the device in many different ways. Moreover, this information may contain some type of command that instructs the device on how to behave or some type of configuration that makes the device adhere to enterprise policies and security protocols. This could even contain locations of files or system/applications upgrade commands with the required details.

Significance of Operation Management

A notable operation management functionality should support the following properties:

  1. Fail-safe - Ability to deliver the message irrespective of failures that could be caused by external factors, such as network and system crashes
  2. Security - This applies to message initiation, delivery and persistence; the entire flow has to be secured
  3. Integrity - Ensuring only intended parties receive the messages
  4. Adaptability - Ability to send any type of message regardless of size or content to any device over any transport
  5. Nonrepeating - Not sending the same message recurrently
  6. Sequential executions - The sequence of operations added to the server is the same as the execution order in the device
  7. Traceability - Who initiated, when was it completed, what is the response, device state changes
  8. Scalability - As the number of devices increase, the system performance should not degrade and delivery completion should happen. The system should scale both horizontally and vertically

Anatomy of the Operation

The Operation consists of two parts:

  1. Operation identifier
  2. Operation payload

Operation Identifier

The operation identifier is a unique string that distinguishes each operation separately. This unique identifier is dependent on the device type and perceived by the operating system that runs on the device. Sometimes, the operation identifier is used as the command to execute on the device, such as “RING” in Android devices. When a device receives this operation identifier, it will commence ringing. This type of operation does not carry an operation payload. In the WSO2 IoT Platform, the operation identifier is designated as operation code.

Operation Payload

The operation payload is utilized when configurations/setting changes are required on the device. The operation payload is always associated with an operation identifier (operation code), which will uniquely identify what the payload is for. For example, if you need to send WiFi configuration to the device, we set the SSID and the password in the payload and then as the operation code, set Wi-Fi. When the device receives it, it will translate that it has a payload to set the Wi-Fi configuration.

Operation Types

Conceptually, the operations can be classified into 3 types. The classification is assessed on the syntax of the operations. These are not absolute rules, but can referenced for ease of understanding and implementation. Any type of content must be facilitated as the payload of the operation.

  1. Command Operation
  2. Profile Operation
  3. Policy Operation

Command Operation

As mentioned above, command operation only consists of an operation identifier (operation code). These types of operations are employed to execute simple functionalities that do not require any payload in the device. For instance, the REBOOT is a command operation in Android that does not require a payload and as soon as it is obtained by the device, it will reboot itself.

Profile Operation

The profile operation is composed of a single operation identifier (operation code) and a payload. These types of operations are utilized to execute comparatively complex functionalities that require a payload. For instance, VPN is a profile operation in the iOS that requires a payload with specific parameters.

Policy Operation

The policy operation is pressed into the service to send a policy to a device. This type of operation is used when the device has to conform with the enterprise’s guidelines and protocols. Its operation identifier will always be “POLICY_BUNDLE” and the payload will include some sort of instructions that need to be enforced on the device.

Policy, in simple terms, is a collection of operations with a rule set. The collection of operations means that it includes different operations with payloads. Most of these operations are profile operations. For example, a collection of the operations could include Wi-Fi and VPN configurations, device restrictions, and encryption settings. Payload will not contain any operation that does not make any state changes, such as RING, REBOOT.

The rule set defines which devices should have this policy. Rules can have different criteria, such as device type, device ownership (COPE, BYOD, CPE), device groups, users, and user roles. These rules govern how the policy is evaluated when a device is prescribed to apply the policy. When calculating the effective policy that should be applied to a device, device metadata and generic information are evaluated against the policy rules. If the policy rules match with device metadata, that policy is chosen as the effective policy to be applied on the device. Hence, the effective policy is added as a policy operation.

Operation States

When an operation is added to the system, it goes through a life cycle that can be called an operation life cycle.

Figure 1

As shown in Figure 1, the operation originates at the “PENDING” state. When the operation is received by the device and executed successfully, it will be “COMPLETED.” After obtaining the operation by the device, If the execution is unsuccessful, it will be marked as “ERROR”. The “CANCELLED” status is not yet available in the WSO2 IoT Server but will be added soon. This state is beneficial because when the device is removed forcefully from the system or becomes inactive due to a some type of failure (network failure, device broke down, etc.), all “PENDING” operations for that device will be pronounced “CANCELLED”.

The “IN PROGRESS” state is utilized when the device has obtained the operation but is not fully completed. At such times, this state is useful. For instance, let’s consider an operation for installing an application. Even though the device receives the operation, it cannot be marked as COMPLETED until the application is installed in the device. Moreover, it may take hours until the device completes the application download from the remote server. Therefore, while downloading the application, it will be marked as IN PROGRESS to indicate that the operation is still being executed.

The “REPEATED” state is associated when an operation is added when another operation of that particular type is still in pending state. This state can only be used with command operations. The older operation will be marked as REPEATED. For instance, if a RING (command operation) is added when another RING operation is pending to be executed for a particular device, the older operation will be changed to REPEATED state. The reason for this state is that when some operation is executed repeatedly, it does not alter the state of the device. It will be a mere waste of resources.

How Operations are Added

Basically, the operations are added through the APIs exposed. As each device types has different types of operations, the most common device types must be given concrete implementations, e.g. Android has its own APIs and iOS has the same.

There is a base operation management component that controls the operation storing, selecting the precise delivery mechanism, and scheduling delivery time, etc. It’s written in a generic way so it understands the syntax of the operation, but not the semantic meaning. The syntax of the operation is how the structure should be and the semantic is given by the respective device type plugins and APIs. In other words, the operation management core understands the structure of the operation, and that it should include an operation code (operation identifier) and operation payload, which is optional for command operation.

As the operation code changes from operation to operation, so does the operation payload depending on the device type. For instance, the operation to switch off the camera on an Android device is different from that of iOS, and the VPN payload for Android is different for iOS. In these scenarios, the semantics of the operation are given by the respective device type plugins and APIs, such as Android and iOS. As the operation management core does not understand the payload, the respective device type APIs and plugins translate the operation into a device-readable format. For Android, this is just converting the operation payload to a JSON string, but for iOS, it is far more complicated as it should be converted to a Plist.

When an operation is added either from a UI or a respective API, it will create an entry in the database. This entry is platform independent, therefore generally stored as a JSON string.

How Does a Device Receive an Operation?

A device always receives an operation over the network unless the device is virtual. For the device to obtain the operation, an agent should run on the device that can translate content sent from the server. This agent could be the firmware that runs on the device hardware or a software agent that runs on the firmware.

As explained above, when the operation is added, it is saved in the database of the server for reliability, and this operation should be transported to the devices using a reliable and smart technique.

The following are three mechanisms on how a device could receive operations in a reliable manner as explained above:

  1. Device polling (Polling Mechanism)
  2. Device notification (Notification Mechanism)
  3. Device wake up (Wake up Mechanism)

Polling Mechanism

Figure 2

As shown in Figure 2, the device is polling the server at a configured interval. When an operation is added to the system, it will be on the pending state. And when the device polls, it sends a request to retrieve the pending operations from the system. This request is a reverse HTTP call. This means that the request consists of a payload related to the previous operation’s executions. In other words, responses of the previously retrieved operations are taken with the next request to obtain the pending operations.

Notification Mechanism

Figure 3

As illustrated in Figure 3, the operations are delivered through a push mechanism from the IoT server to the device. In this scenario, we have manipulated a message broker that connects to the devices through a secure channel. When an operation is added to the IoT server, it sends the entire operation to the message broker; the message broker, in turn, sends the operation to the device. When device completes the execution of the operation, it sends the response back to the message broker and it will then push the response to the IoT server.

Wake-up Mechanism

Figure 4

Due to constraints of some message brokers, the wake-up notifications mechanism had to be used. This is a combination of previous mechanisms, both polling, and notification. Some message brokers, such as APNS (Apple push notification service), FCM (Firebase cloud messaging) have constraints with the message size. As a result, the IoT server cannot use these message brokers to send the operation to the device.

Therefore, When an operation is added to the IoT server, the message broker is invoked to notify the device about a pending operation. This is a wake-up call to the device, and as soon as it received, the device starts sending a request to receive the pending operations. This request from the device is also a reverse HTTP call as it sends the responses of the previous operations in the same call.

Notifications Methods

As explained above how operations are received by the device, both notification and wake up mechanisms use push notification methods to send and receive messages from the device and the server. In the notification mechanism, the WSO2 IoT Platform has a message broker implemented in the MQTT protocol, which is used to communicate both ways between the server and the device through the message broker.

The other notification mechanisms supported by the WSO2 IoT Server are APNS, GCM/FCM, WNS, and HTTP. These are used in wake up mechanisms to invoke the device to retrieve pending operations. WSO2 IoT Server is facilitated with extensions points to add new push notification mechanisms if required. If the enterprise needs to write its own custom device type and custom push notification mechanism, WSO2 IoT Server is the ideal choice.

Operation Responses

As the device complete the operation it may generate a response. Therefore, when the device requests the next pending operation, it sends the response of the previous operation with the payload. For instance, if the executed operation is the DEVICE_INFO, which device is obligated to send the information of the device, both real-time and static with the response.

Analytics with Operation’s Responses

Responses for the operations are the most prominent part of realizing the system state as they provide vital information about the device’s location, status, firmware/application status, among others. WSO2 IoT Server provides these capabilities to analyze the information in both real time and batch mode. This helps to summarize data to produce conclusions on the information they contain. Moreover, it reacts in milliseconds to compromising situations via real-time analytics.

Implementing Reliable Messaging for Android Powered Devices

Android powered devices managing is implemented with the above concepts in mind. It supports operation management with both polling and wake up mechanisms for mobile device management. And for Android powered IoT devices, WSO2 IoT Platform supports all notification mechanisms for reliable communication between the server and devices.

In Android, we have implemented a custom agent that handles device enrollment and communications. This agent is responsible for executing operations in the devices, monitoring, updating firmware, and applications.

Polling with Android

The polling mechanism is mostly used with Android mobile devices with WSO2 IoT Server. The Android agent is configured by default to approach WSO2 IoT Server every minute. The server URL and other related configurations are provided at the time of enrollment of the device.

The Android device keeps polling at a secured REST endpoint to obtain the pending operations. At every request, the device sends the results of the last operation executions. Therefore, this request is a reverse HTTP call.

Wake-up with Android

As the default setting in the Android agent is configured for polling, it can be changed to the wake-up mechanism at the enrollment time. With this configuration, the server to device communication happens through a broker.

Figure 5

When an operation is added against the Android device, it’s recorded in the database. Simultaneously, a notification is sent to the fire-based cloud messaging (FCM) server to alert the devices, and then the devices will call a predefined, secured REST endpoint to retrieve the pending operations. This is also a reverse HTTP call that takes the payload related to the previous operation’s execution results.

Notification with Android

Other than a standard enterprise mobility management (EMM) solution, with the Android mobile devices, WSO2 IoT Server also has all components in place to utilize the Android mobile devices as IoT sensors that enable users to evaluate its IoT offerings from a single point. This particular Android agent turns the Android mobile device into a collection of sensors, apparently making the device run as a thermometer, barometer, speedometer, etc. All of these events are sent to the analytics server through the broker for real-time and smart analytics.

Figure 6

As shown in Figure 6, when an operation is added through the admin APIs by the operation management core, the operation will be stored in the database. At the same time, the message broker will be alerted to send the operation to the device. When the device receives the operation, it will execute the operation and send the result back to the message broker. The message broker then sends a response back to the IoT server. This is the standard operation flow.

However, because this device is empowered to sense and assemble data from its sensors, all the events generated will also be sent to an analytics server through the message broker. The analytics server can process, summarize, and analyze data in real time and batch mode; therefore, all of these events are sent through the smart analytics process to make the patterns for decision making, fraud detection, and anomaly detection as well as report generation.

Implementing Reliable Messaging for iOS Devices

Wake-up with iOS

WSO2 IoT Server supports iOS device management with a wake-up mechanism. As shown in Figure 7, when an operation was added through the admin APIs, it is stashed in the database. Simultaneously, it sends a wake-up message to the Apple Push Notification Server, which in turn delivers to the iOS devices. As soon as the device receives the wake-up notification, it establishes a connection to the IoT server to retrieve the pending operations.

Figure 7

The iOS devices have a built-in agent for most of the operations that are integrated in the operation system with its design. This built-in agent works differently than the Android agent, which we have built on top of the Android system. As the Android agent can retrieve all the operations in a single call, the iOS agent behaves differently. It takes a single call for every pending operation stack against the device. If there are 10 pending operations, it takes about 11 requests to complete the entire process. Here, the first 10 requests are for retrieving all operations one by one. When there are no operations available, it requires a 400 response code to stop the request flow from the device. That is how iOS agent is designed by Apple, hence the 11th request will receive a 400 response code from WSO2 IoT Server.

Notification with iOS

As the built-in iOS agent handles all major functionalities, there are a few operations that cannot be done with it. Retrieving the device location is one of them, as the built-in iOS agent does not support it. Therefore, WSO2 IoT Server has provided a custom iOS agent for such excluded operations. This custom agent supports DEVICE_LOCATION, NOTIFICATION, and RING operations, respectively.

As shown in Figure 8, when the operation is added through the admin APIs, a record is stockpiled in the database, and simultaneously the operation is sent to the APNS server. Then, the APNS server delivers it to the iOS devices. This delivery is received by the custom agent installed in the device and it will execute the operation. The results of the operation execution is sent to WSO2 IoT Server through exposed APIs related to each operation.

Figure 8

None of the aforementioned operations are added to the pending operations list as they are only received by the built-in agent. These operations take a different path in their life cycles. As soon as these operations are added to the server and delivered to the APNS service, they are marked as completed. But those operations may take some time to receive the operation response as there is no guarantee of when these operations will be delivered to the device by the APNS server.

Furthermore, the aforementioned operations use JSON as the payload message type and the other operations supported by the built-in agent uses a proprietary message type (application/x-apple-aspen-mdm, application/x-apple-aspen-mdm-checkin). In addition, the built-in agent uses a mutual SSL-based authentication and authorization mechanism and the custom agent uses OAuth for the same.

Implementing Reliable Messaging for Windows Devices

WSO2 IoT Server supports windows device management as its core functionality and is somewhat similar to how iOS devices are managed. The Windows devices come with a built-in software agent that governs the complete management capabilities. This agent is a OMA DM (Open Mobile Alliance) (Device Management) client that supports the SyncML (Synchronization Markup Language) as the communication protocol between devices and the server. Unlike the iOS, Windows devices do not have a custom-built WSO2 agent at the moment.

The Windows devices are supported with both polling and a wake-up mechanism.

Polling with Windows

Polling of Windows is the same as the Android polling mechanism. When an operation is added, a record of that goes to the database. At the enrollment of the device, it receives a configuration that governs the polling interval. Therefore, the device starts polling at the configured time interval and receives the above operation. The next polling request will include the execution results of this operation regardless of it being successful or throwing an error.

Wake-up with Windows

Windows also supports a wake-up notification mechanism that sends a message to the device to retrieve the pending operations.

Figure 9

As shown in Figure 9, a user adds an operation through the admin APIs and it is recorded in the database against the device. Simultaneously, the WNS notification adaptor is initiated to send a wake-up call to the device. When WNS (Windows Notification Service) receives the messages, it relays to the device. As soon as the device receives the wake-up call, it starts calling the pre-configured endpoint to retrieve pending operations and every request takes the results of the previously executed operations in its payload.


WSO2 IoT platform is a comprehensive open source offering that covers device management with a smart and reliable messaging implementation; it enables the managing of devices and applications with enterprise grade capabilities. WSO2 IoT Server has integrated with well-known message delivery frameworks, such as APNS, FCM, WNS and MQTT, with fail-safe mechanisms in place and in a highly secure manner. It also has many extension points to plugin any new delivery protocols and implementation without any hassle.