2015/05/08
8 May, 2015

[Article] Fast Topic Matching Algorithm Implementation For WSO2 Message Broker

  • WSO2 Team
  • Content Writer - WSO2
Archived Content
This article is provided for historical perspective only, and may not reflect current conditions. Please refer to relevant product page for more up-to-date product information and resources.

Contents

  1. Introduction
  2. Description of the current topic matching algorithm used
  3. Introduction to the inverted bit map algorithm
  4. Optimized topic routing algorithm
  5. Important implementation details
  6. Conclusion

Introduction

Topic matching refers to the process of finding out the subscriptions to which a particular message should be sent. According to the Advanced Message Queueing Protocol (AMQP) specification, topic matching is defined as “topic exchange type”.

According to the AMQP specification topic exchange type works as follows;

  1. A message queue binds to the exchange using a routing pattern - P
  2. A publisher sends the exchange a message with the routing key - R
  3. The message is passed to the message queue if R matches P

The AMQP specification puts the following restrictions on the routing key and routing pattern;

  1. The routing key used for a topic exchange must consist of zero or more words delimited by dots. Each word may contain the letters A-Z and a-z and digits 0-9.
  2. The routing pattern follows the same rules as the routing key with the addition that * matches a single word and # matches zero or more words.

For MQTT subscriptions, two wildcard characters are supported;

  1. A '#' character represents a complete subtree of the hierarchy and thus must be the last character in a subscription topic string, such as SENSOR/#. This will match any topic starting with SENSOR/, such as SENSOR/1/TEMP and SENSOR/2/HUMIDITY.
  2. A '+' character represents a single level of the hierarchy and is used between delimiters. For example, SENSOR/+/TEMP will match SENSOR/1/TEMP and SENSOR/2/TEMP.

The WSO2 Message Broker currently supports AMQP and MQTT specified wildcards for each protocol;

  1. Topic only (E.g. WSO2.Products)
  2. Immediate child of the topic (E.g. for AMQP: WSO2.Products.*)
  3. Topic and its children (E.g. for AMQP: WSO2.Products.#)

Description of the current topic matching algorithm

Currently topic matching is implemented by using the conventional topic matching method. Following are the details regarding the implementation of the topic matching in WSO2 Message Broker;

  1. A new subscription is created by subscribing to a topic.
  2. The new subscription is added to the list of subscriptions which contains the same routing key.
  3. This list is attached to a hash map where the key of the hash map is the routing key.
  4. Whenever a message comes with a specific routing pattern the WSO2 Message Broker iterates through each key in the hash map.
  5. For each key in the hash map which is a routing key, it is matched against the routing pattern of the message. The matching is done as a string matching.
  6. The matching subscriptions are fully found after going through the hashmap of the subscriptions fully.

If there are n subscriptions with different routing keys, the time complexity to perform the topic matching will be O(n). 10000 subscriptions with different routing keys will have a time complexity of O(10000). This is a bottleneck for performance.

In this algorithm the searching efficiency of the hash map cannot be used since the AMQP allows wildcard characters in the routing key but not in the routing pattern. Since the wildcard characters can represent any words, we cannot predict the possible keys initially. Because of this, the iteration is done throughout the hash map in order to find the matching subscriptions for the particular routing.

This model is acceptable for a system with a few number of subscriptions. But in a real world scenario, systems will normally have thousands of subscriptions and topics. Iterating through all these subscriptions imposes a bad impact on the performance of the message broker.

In order to overcome this bottleneck the inverted bitmap algorithm can be used. The following section describes the inverted bit map algorithm.

Introduction to the inverted bitmap algorithm

The following assumptions are made for the inverted bitmap algorithm;

  1. The data is rarely changed
  2. Searches are more frequent than the changes in the data
  3. Searches are finite

Inverted bitmap Algorithm have two major processes;

  1. Indexing process
  2. Matching process

Indexing Process

This process includes the mapping of possible match requests with the subscriptions. The subscriptions will initially be numbered from zero to (no. of subscriptions - 1). In the indexing process, if a match request matches the subscription's routing key, we will put 1 else we will put 0. This is a time consuming process that builds a bitmap. Whenever a new topic or subscription is added the bitmap needs to be updated accordingly. But with the assumption that the searches are more frequent than the changes, the time spent on the indexing process is acceptable.

Matching Process

Here is the point where the actual topic matching takes place. Whenever a message with a specific routing pattern is received, the respective row in which the routing pattern belongs to is analyzed and the matching subscription that the message needs to be sent to can be found according to the bits. Bits can be found by doing bitwise AND and OR operations.

For example, let us assume that we have the following subscriptions with the mentioned routing keys in the AMQP protocol;

  • WSO2
  • WSO2.*
  • WSO2.Products

If we number the subscriptions starting from 0 it will be as follows;

  • WSO2 - 0
  • WSO2.* - 1
  • WSO2.Products - 2

If we create the bitmap table for the above subscriptions, it will look like the table below;

0 1 2
WSO2 1 0 0
WSO2.Products 0 1 1

Whenever a new message comes with a routing pattern we can easily find the matching subscriptions where the message should be sent by taking the bit entries for the specified pattern.

For example, if a message is sent to the topic WSO2.Products, by using this algorithm, we can get the entry related to it. In the above table we can see the entry for WSO2.Products is 0 1 1. This means that subscriber one and subscriber two are subscribed to the topic, so the message will be delivered to both subscribers.

Optimized topic routing algorithm

The approach that is illustrated above works fast, but consumes a lot of memory. In this section, we are going to use a method where the memory consumption is efficiently utilized.

The basic idea is to break the subscriptions into constituent parts. Let’s look at the example below which handles AMQP wildcard subscriptions.

Subscriptions - WSO2.MB, WSO2.MB.*

Subscriptions Number of constituent parts
WSO2.MB 2
WSO2.MB.* 3

In this approach we are going to have tables for each constituent part.

No. of tables = No. of constituent parts of the longest subscription.

Now let us look at the algorithm using a real life example.

Subscriptions - trade, forex.eur, forex.*

The inverted bitmap for the first constituent is as follows;

trade forex.eur forex.*
null 0 0 0
trade 1 0 0
forex 0 1 1
other 0 0 0

The inverted bit map for the second constituent is as follows;

trade forex.eur forex.*
null 1 0 0
eur 0 1 1
other 0 0 1

The subscriptions are indicated by the columns of the tables. The rows of the tables indicate the constituent parts. If the subscription has a constituent which is in the row, then there will be an entry of 1. Otherwise the entry will be 0.

Here we need to think about two important constituent parts which are mentioned below;

  1. null
  2. other

null
This means that subscription have no corresponding constituent. For example, the trade subscription in the second constituent table has 1 for the entry null because it does not have a second constituent in its subscription.

other
This is used to limit the number of rows in the constituent table. Suppose there is a message called “forex.pound”, but the pound is not included as a constituent in the second constituent table. So it will be considered as 'other'

Suppose a message has 'forex.eur' topic. The following table will illustrate how to get the subscriptions;

trade forex.eur forex.*
1st constituent = forex 0 1 1
2nd constituent = eur 0 1 1
AND 0 1 1

Here the bits are extracted from the constituent tables. An AND operation needs to be performed in order to find the subscriptions. In this case there are two subscribers; ‘forex.eur’ and ‘forex.*’.

Now we will look at another example where the message contains the topic ‘forex.pound’;

trade forex.eur forex.*
1st constituent = forex 0 1 1
2nd constituent = pound 0 0 1
AND 0 0 1

In this case, we will have only one subscriber, 'forex.*'.

Important implementation details

Since this algorithm is related to the bitmap, we have used the BitSet class of the Java standard libraries to represent a single row of the bitmap. We have selected the BitSet for this purpose because of the following reasons;

  1. It has the bitwise logical operation functions
  2. It is more space efficient than a boolean array
  3. Operations are time efficient
  4. We have used the map as the high level data structure for a single bitmap. That means if we take the BitMap table, it is a hash map. The key for the hash map is the constituent part of the routing key and the values for the hash map will be a BitArray class (it is a user defined class, which handles the operations of the BitSet). Each bitmap table related to the separate constituent parts will be placed in the array list.

    Indexing details will be saved in a separate hash map. Keys of the indexing hash map are routing keys and the values of the indexing hash map are the integer numbers. This facilitates the addition and deletion of subscriptions. Subscriptions are saved in the hash map and the keys for that hash map are the indexes. The values for the hash map is an array list of the subscriptions that have the same routing key.

    Dealing with wildcard characters is the tricky part of this algorithm. In order to maintain an efficient time and space complexity, we have an entry for ‘immediate children’ and ‘topic and children’ in each bit map table so that whenever a matching is performing we can easily do the necessary bitwise AND operations on wildcard characters and produce the results.

    In the implementation, we maintain the list of subscriptions as a java List implementation with proper indexes to represent each subscription.

    When removing a subscription, we use this index to easily find it by using the map. The maps are then updated accordingly.

    Since the algorithm is slightly complicated, lets go through an example in order to get a better understanding of it.

    Let us assume there aren’t any subscriptions;

    1. The first subscription comes with the routing key WSO2.Products.
    2. To add this we will first search in the map localSubscriptionMapping (key - routing key, value - index of the routing key) whether there is previous subscription with the same routing key.
    3. The mapping needs to be created since this is the first subscription with the routing key WSO2.Products. Here mapping means assigning an index for the routing key of the subscription (indexing).
      • First we will look into the deletedLocals array list which contains the vacant positions of the deleted subscriptions. When all the subscriptions belong to a particular routing key is deleted, we do not delete the bit position of that subscription but rather we put that index into the deletedLocals array list after setting bits position to zero.

        For example, if there are indexes 0, 1 and 2. Let’s assume the following are the routing keys for the indexes 0 - WSO2, 1 - WSO2.* and 2 - WSO2.#. If all the subscriptions with the routing key WSO2.* are deleted, we don’t move the WSO2.# to the first position, rather we change all the bits related to index 1 to index 0 and put index 1 into the deletedLocals variable for later use).
      • If there aren’t any vacant positions we assign the index as the value of the localSubscriptionCount variable and increment it. After that we divide the routing key into constituent parts.

        For example, if the routing key is WSO2.Products. the constituent parts are 1st constituent - WSO2 and 2nd constituent - Products.
      • We will loop through all constituent parts since we have a separate bitmap table for each of them. If the subscription has n constituents and there are only (n - 1) bitmap tables, we need to create a bitmap table for the nth constituent. Since each bitmap is expected to have some default keys (null and other) we will add them during the creation of the bitmap table.
        • For the null key, we should set all the previous subscription indexes to 1.
          For example if we have WSO2 and WSO2.Products and then add the “WSO2.Products.MB” subscription, since the first two subscriptions have a maximum of two constituent parts and the WSO2.Products.MB has three constituent parts we need to create a bitmap table for the third constituent part. Since those two subscriptions do not have any constituent parts, null should be set true to all the previous subscriptions).
        • The key ‘multi level wildcard’ or ‘topic and children wildcard’ is a bit tricky since it is a wildcard character which represents zero or more words. So if a subscription comes with the constituent part for ‘topic and children’, all the bits related to that subscription will be set to 1. Another important point to note here is that if a column in a bitmap table is updated, the rows of the subscription which contains ‘topic and children wildcard’ should also be updated.
      • Each constituent part is added to the specific tables and the index is updated accordingly.
      • After looping through each constituent part, if the number of bitmap tables is greater than the number of constituents in the the routing key, the index position which belongs to null key will be set to true. If it has the ‘topic and children wildcard’ as the final constituent, all the indexes will be set to true. Finally the LocalSubscription will be added into the localSubscriptions map (key - index, value - map of the subscriptions which has the same routing key).

        For example,. if the index of the routing key WSO2.# is 1 and if there are three subscriptions with the routing key WSO2.#, those three subscriptions will be put into the map with key 1 (index). If those three subscriptions have the subscription IDs a, b and c, the entry on the map will be as follows;
        localSubscriptionMapping <WSO2.#, 1>
        	localSubscriptions <1,<a, Subscription1> <b,Subscription2> <c,Subscription3>>
        
    4. Whenever a message comes with the routing pattern, the matching happens as follows;
      • The routing pattern will be divided into constituent parts. It needs to loop until a matching constituent table exists for the looping constituent. If there are more constituents available than constituent tables, then stop the loop when all constituent tables are checked.
      • During each loop,
        • Get the bits related to the constituent part in a bit map table. If there are no entries for the constituent part get the bits related to the 'other' key. Do a bitwise OR operation with the entry related to the 'other' key.
      • After finishing the loop,
        • If the number of bitmap tables is greater than the number of constituent parts in the routing pattern do an OR operation with 'other' and nul' positions of the bitmap table at the position of the size of constituents of the routing pattern.
        • If the number of constituents in the routing pattern is greater than the number of bitmap tables do an OR operation with the ‘topic and children wildcard’ ('other' + null) position of the last bitmap table. If there are n bitmap tables, take the entry from the nth table.
      • With the final bitmap entry we can get the subscriptions to which the message should be sent.
    5. When deleting a subscriber with a subscription ID, the program will go through each key of the localSubscriptions map and if the subscription ID is found, it will be removed. The program will then check whether this was the last subscription with a particular routing key, then the localMapping map entry and localSubscriptions entry related to the routing key will be removed and all the positions in the bitmap related to the index of the routing key will be set to 0. The index will then be put into the array list deletedLocals for future use.

    WSO2 MB integration

    Within WSO2 Message Broker, the bitmap algorithm has been used to resolve wildcard subscriptions to topics. Although the Bitmap implementation is fully capable of handling all the subscriptions, when there are direct topics subscriptions without wildcards, keeping them in a java Map using the destination as the key will be much faster as the complexity will be O(1).

    WSO2 Message Broker checks if a wildcard character is present in the subscribed destination before feeding it into the bitmap implementation. If no wildcards are present then that subscription is kept in a simple map. If wildcards are present it will be fed into the bitmap implementation.

    In this way, when retrieving all the subscriptions for a specific destination, first, all the direct subscriptions will be retrieved from the Map giving the destination as the key. Then the same destination will be provided to the bitmap implementation to retrieve all the matching wildcard subscriptions. The results from the above two methods will be concatenated into one result set before returning it for processing.

    The current implementation supports both AMQP and MQTT wildcard subscription handling using bitmaps. This has been handled by calling the respective wildcard matching methods depending on the subscription type (AMQP or MQTT).

    Conclusion

    In summary, the optimized inverted bitmap algorithm for fast topic matching is memory efficient as well as fast enough to handle a large number of requests. During performance tests, the fast topic matching algorithm proved to be three times faster than the conventional algorithm when there were 1000 subscriptions.

    In WSO2 Message Broker, conventional selector mechanism will be a performance killer in the circumstances where the topics and subscriptions are in large quantities. In such scenarios to overcome the bottleneck of of the WSO2 Message Broker, fast topic matching is recommended.

    Authors

    Pranavan Theivendiram - Intern, WSO2
    Megala Uthayakumar - Intern, WSO2

 

About Author

  • WSO2 Team
  • Content Writer
  • WSO2