04 Dec, 2024

AI-Driven Phishing Detection in Choreo

Lahiru Ganegoda
Senior Technical Lead, WSO2

Introduction

One of the major challenges in detecting phishing is the limitations at the Internet Service Provider (ISP) level. Traditional tools often lack the visibility and ability to recognize phishing sites in real time, as they mostly rely on network-level information. Phishing schemes frequently use complex, changing tactics, like rotating domains or imitating legitimate sites, which can go unnoticed without a detailed analysis of the site’s content. As a result, without manually reviewing the site’s structure, behavior, and intent, it’s difficult to accurately identify these threats. Choreo, WSO2’s internal developer platform as a service, has introduced a solution utilizing custom-developed AI technologies to detect potentially malicious sites. This document outlines the underlying methodology Choreo uses to identify such malicious activities, providing a safer platform for its clients and other internet users.

The Problem: Hosting Phishing Sites

Phishing is a prevalent challenge across many online hosting platforms. It involves deceiving individuals into sharing sensitive information by mimicking legitimate entities, often through websites crafted to appear trustworthy. With the accessibility of free-tier hosting options, there is potential for misuse, as low entry barriers can sometimes be exploited for phishing activities.

Why Do We Need to Address This?

Protecting Users: Ensuring that everyone interacting with sites hosted on our platform can do so safely, without the risk of falling victim to phishing attempts, is our top priority.
Safeguarding Our Platform's Integrity: Maintaining a high standard of security is essential for building trust, ensuring our platform remains a reliable space for both developers and users.
Supporting Legitimate Development: By tackling these security challenges, we help foster a positive experience for genuine developers, ensuring their work isn’t compromised by malicious users.

To continue fostering a safe and secure environment, we are committed to strengthening our measures for identifying and preventing phishing activities across our platform.

The Role of AI in Phishing Detection

Traditional security tools often struggle to keep up with the evolving tactics used by phishing sites. Signature-based detection methods can be easily bypassed by slight modifications to phishing sites, and manual analysis is neither scalable nor feasible given the volume of content hosted on most of the hosting platforms.

This is where AI comes into play. By utilizing OpenAI's advanced models, we can automatically analyze the content and structure of websites hosted on our platform, identifying patterns and anomalies indicative of phishing.

Advantages of AI in Phishing Detection

Proactive Identification: AI can detect phishing sites based on content analysis, even before they are reported or flagged by other users.
Adaptability: AI models learn from vast datasets, enabling them to recognize and adapt to new phishing techniques that may not yet be cataloged by traditional security systems.
Efficiency: The AI framework operates real-time, scanning and analyzing websites as they are deployed, ensuring prompt detection and action.

Our AI-driven solution is designed to minimize false positives while maximizing the detection of genuinely malicious sites, ensuring that legitimate developers are not unduly impacted.

AI-Based Phishing Detection Methodology

Solution Overview

The AI-based phishing detection framework we've developed is designed to be highly efficient and scalable, ensuring it can handle the diverse and dynamic environments of modern hosting platforms.

Data Collection Layer

Data Feeding Engine: The solution will extract information from the logs and database and feed it into the analysis engine. Since the analysis engine operates at a consistent frequency, the data feeding process will run at short intervals, with some overlap, to ensure that all created, modified, and requested web apps are thoroughly analyzed.

AI Analysis Engine

Information Gathering: This layer involves scraping and crawling all websites identified by the data feeding layer. The system collects OCR (Optical Character Recognition) text, image elements, and HTML content for further analysis. However, relying solely on OpenAI for extracting all the required information may not be sufficient. To enhance the data extraction process, additional technologies are needed. For instance, developing a custom web crawler that leverages image elements, OCR text, and HTML content from the target sites can help uncover more detailed insights. This approach allows us to gather information at a granular level, tailored to specific use cases. The enriched data can then be fed into the model, enabling it to generate more accurate and relevant results..
Model Training: The AI engine is trained using OpenAI's language models, with a focus on distinguishing between legitimate sites and phishing attempts. The training data includes a wide range of phishing and non-phishing sites to ensure robustness.
Pattern Recognition: The engine analyzes the collected data, looking for phishing indicators such as suspicious URLs, misleading content, brand abuse and anomalous patterns.

Decision-Making Layer

Risk Scoring: Each site is assigned a risk score based on the AI's analysis. Sites with high-risk scores are flagged for further inspection or immediate action.
Automated Notification: Depending on the nature of the detected threat, the system can automatically notify the WSO2 Security Operations team for further analysis and initiate, suspend, or take down the sites and block access.

Feedback and Learning Loop

Continuous Improvement: The system is designed to learn from its decisions, incorporating feedback to refine its detection algorithms continuously. This ensures that the AI adapts to new phishing strategies over time.

Results and Impact

With the implementation of our AI-driven phishing detection framework, we have achieved a substantial reduction in phishing-related activities across the Choreo platform, significantly lowering the noise generated by these malicious sites. This proactive measure not only helps protect Choreo users from phishing scams but also maintains the platform’s integrity as a trusted environment for developers.

By reducing the prevalence of phishing, we create a cleaner, more reliable space where developers can focus on innovation without unnecessary distractions. This framework leverages advanced AI technology to ensure a safer, more streamlined experience for the entire development community.

Open Source

SaaS

API Management

Open Source

SaaS

Integration

Open Source

SaaS

Identity and Access
Management

Open Source

SaaS

Internal Developer Platform

SaaS

Choreo is now WSO2 Developer Platform. We’ve rebranded to better reflect our mission, but the product you love hasn’t changed.

AI-Driven Phishing Detection in Choreo

Introduction

The Problem: Hosting Phishing Sites

Why Do We Need to Address This?

The Role of AI in Phishing Detection

Advantages of AI in Phishing Detection

AI-Based Phishing Detection Methodology

Solution Overview

Data Collection Layer

AI Analysis Engine

Decision-Making Layer

Feedback and Learning Loop

Results and Impact

Products

Solutions

Resources

Support

Discover

Open Source

SaaS

API Management

Open Source

SaaS

Integration

Open Source

SaaS

Identity and Access Management

Open Source

SaaS

Internal Developer Platform

SaaS

Choreo is now WSO2 Developer Platform. We’ve rebranded to better reflect our mission, but the product you love hasn’t changed.

Introduction

The Problem: Hosting Phishing Sites

Why Do We Need to Address This?

The Role of AI in Phishing Detection

Advantages of AI in Phishing Detection

AI-Based Phishing Detection Methodology

Solution Overview

Data Collection Layer

AI Analysis Engine

Decision-Making Layer

Feedback and Learning Loop

Results and Impact

Identity and Access
Management