AI and Choreo
By Srinath Perera
- 28 Jun, 2021
AI plays a key role in the Choreo user experience, anticipating and assisting users to make their experience effortless. Let’s explore key AI use cases in Choreo and their implementations.
Choreo users often use APIs in their applications. While unlocking vast power, the use of APIs adds performance challenges. For example, using too many APIs or using an API within critical places in the program (e.g., within a loop) can significantly impact application performance.
To overcome this challenge, Choreo includes real-time performance feedback as part of the application authoring experience. As users write or compose the code, the left hand lower part of the interface shows the estimated application performance, and users can also see throughput and latency curves if they need more information. These forecasts will also adapt as users change the application.
The following picture shows how performance feedback works.
The forecasts are based on two key ideas.
With computers, Input Output (I/O) operations (or operations that include network or disk operations) are thousands to millions of times slower than CPU operations. Choreo applications running as a service, or operations triggered by events, include at least a single IO (network) operation. Furthermore, Choreo applications mostly perform data manipulation and rarely do heavy CPU computations. Consequently, we can get a sound estimate of application performance by analyzing the structure of IO operations.
As shown in the above image, Choreo has information about API and network calls performed within the Choreo environment. When a user has written a new application, the coding environment sends code to the performance analyzer, which extracts the IO operations structure from the code and estimates operations execution using a machine learning model.
From historical observability data, we can calculate the distribution of latency for each IO operation. Using this data, we can use machine learning to model the performance characteristics of different IO operations using gradient boosting trees combined with the Universal Scalability Law (USL). By combining these performance characteristic models following theoretical performance models, we can accurately estimate the performance of complete applications. The machine learning model combines those distributions using a queuing theory model and XGBoost.
The output of the model is a statistical distribution; but for simplicity, we report the average value. A user can also look into more details if he or she is interested.
Sometimes, the same API call or other IO operation may have different performance characteristics based on the inputs it received, which add variations to results from the performance forecast (e.g., search operations). Statistical distribution of the results would capture such behavior.
Choreo developers use the low-code editor to visually implement new applications rapidly. This involves joining several connectors (e.g., a Google sheet, HTTP client, etc.) and statements (e.g., if, while) to create new user experiences, instead of writing code.
While the visual composition is faster, most visual elements need configurations and expressions. Furthermore, developers need to discover correct visual elements (e.g., connectors and statements) while composing their application.
Appropriate suggestions given within the right context can significantly ease the developer experience while reducing the time taken for building applications. Choreo employs AI to provide such suggestions to the user.
These suggestions take two forms: expression suggestions given while configuring low-code elements and suggestions for what low-code elements could be used next in the composition based on the code. This can significantly reduce the time and effort put to browse through all the connectors and low-code statements.
To achieve this, we use long short-term memory (LSTM) based deep learning models to learn application use cases developed using the low-code editor. We employ a deep neural network followed by LSTM to learn how the user adds each visual element to compose new experiences. So, when the model sees a similar pattern, it can suggest the next visual element based on historical patterns of the low-code developers. The LSTM layer can capture and encode the common sequences of visual components, whereas the deep neural network stores the encoded patterns to perform future predictions. We use the applications developed by the low-code developers to train our suggestion models. We periodically update our models so they can evolve to handle new use cases.
As shown in the diagram, the low-code suggestion service sends suggestions to users by using the current model. Then, the low-code editor will send feedback from the users, which include the correct sequence of the visual element selected by the user. We store this data in a feedback database, which we will use to update our LSTM model. This way, our model will continuously evolve and identify new user patterns.
Choreo apps often connect multiple APIs and make them work together. Each API carries different data types, and even the same information is often represented using different data types. For example, the HR system and payroll system would represent an employee often with two different data types.
Hence, while programming multiple APIs to work together or implementing other integration scenarios, we often need to map data types to each other. Integration use cases often have complex data types, which have tens and sometimes hundreds of attributes. Manually mapping such data types is tedious work.
Choreo’s AI based automatic data mapping makes this common use case easier.
With the Choreo editor, when you select two data types (schemas) to be mapped, the platform automatically maps the data types and lets the user review and edit the mapping as needed, significantly reducing complexity.
Each data type is composed of many simple attributes. For example, as shown in the image, an employee data type may be composed of attributes such as fullname, emphNo, name, gender, home address, etc. Here, data mapping will map the attributes from person data type to employee data type.
Choreo data mapping uses a semi-supervised learning approach. The following figure shows the pipeline.
In summary, the data mapping algorithm can learn from master data and apply the mapping to a schema it has never seen before. Instead of trying to learn from individual mapping, automatic data mapping learns inherent data types in the domain (such as name, age, dollar value, address, city, yes/no, etc) and the behavior of each, and uses that information to infer the best mappings. Hence, automatic data mapping works out of the box for each Choreo user, without needing to have mapping data from each schema.
When a user has deployed an application in Choreo it is imperative that those services are highly available and work as expected. To achieve this goal, service developers would like to know when the service has deviated from its behavior.
They can check the logs and other telemetry from time to time. However, this is neither practical nor ensures quick response times. Setting up alerts is an option, yet it leads to many false positives.
In Choreo, we assist the owners of deployed services by detecting and alerting them regarding any sort of performance anomalies that the system has encountered, which has led to deviation from the expected behavior.
Performance anomalies can be of many types, and many performance anomalies can occur in a deployed service in Choreo. Currently, Choreo focuses on detecting the five most common types of anomalies: latency spikes, spontaneous user surges, backend failures leading to response failures, and response delays due to slow backend applications. Furthermore, since Choreo services are deployed in cloud infrastructure, anomalies such as CPU hogs, memory leaks, and network delays could occur at the infrastructure level. Choreo’s AI-based Anomaly Detector will raise alerts to the relevant development teams and the anomalies would be resolved internally.
When an anomaly occurs, Choreo alerts Service Owners through email as well as through a notification in the Choreo console. When the user clicks on the notification he/she can visualise the anomaly in Choreo’s Observability portal and can further debug the cause of the anomaly using the Diagnostics view.
Let’s dig deeper to understand the mechanics of anomaly detection in Choreo.
Choreo’s Anomaly Detector is a real-time multivariate time-series anomaly detector, where it monitors many time series of data that includes both applications and system metrics. Application metrics define the attributes of an application that users can directly experience such as throughput, latency, etc. On the other hand, system metrics are collected at the infrastructure level and attributes to the physical resource utilization by the application.
The Anomaly Detector includes a service called “Metric Aggregator”, which connects the observability sub-systems to the metric data as events. The Anomaly Detector separates metrics events to different event streams for each service, generates features based on the data, and evaluates them against the anomaly detection ML models.
Anomaly detection models use a self-supervised multivariate model, which is trained on Choreo’s historical data (USAD algorithm). The model includes an auto-encoder with adversarial training. The auto encoder is used to map the time series features to a lower-dimensional latent space and reconstruct the original input from the latent vector. Additionally, the adversarial training ensures that decoder 2 learns to discriminate between the original input W and reconstructed input AE1(W).
The following figure shows the training procedure.
The model takes as input a time window of multivariate features - W. The model is composed of two auto-encoders with a shared encoder. The training of the model takes place in two phases. In the first phase, both the auto-encoders learn to reconstruct the original signal W. Then, in the second phase, Decoder 1 still learns to recreate W from encoded latent vector z, while Decoder 2 tries to differentiate between the z latent vector of original input W and z’ vector of the reconstructed signal AE1(W). This two-phase training is shown in the diagram above. In the second phase, Decoder 1 acts as a generator, and Decoder 2 acts as a discriminator in a GAN. This adversarial training helps the overall model to better learn the latent space distribution of the normal data of Choreo.
By training on the historical non-anomalous data in a self-supervised manner, the model learns to map the normal data to a reduced latent space and to reconstruct the normal data from that latent space. In deployment, if the model encounters an anomalous point (or window) that significantly deviates from normal behavior, the model will not be able to reconstruct it accurately. By monitoring the reconstruction error of the model, we can identify such anomalies. Once identified, we alert the user or dev team accordingly.
The following diagram explains the offline training process that we followed when training the anomaly detection models.
Furthermore, we are working on automatic AI-based root cause analysis to detect anomalies, which will make it easier for developers to find and fix problems in their services.
Click here to find out more on how Choreo gives organizations a single, all-inclusive platform for creating integrations, services and APIs; managing APIs; and deploying services and APIs—going from ideas to production in hours.
 Audibert, Julien, et al. "USAD: UnSupervised Anomaly Detection on Multivariate Time Series." Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020. Available at - https://www.researchgate.net/publication/343779877_USAD_UnSupervised_Anomaly_Detection_on_Multivariate_Time_Series