Skip to main content

RAG Query

The query integration runs on every user request. It retrieves relevant chunks from the vector knowledge base populated during ingestion, combines them with the user's question, and calls the LLM to produce a grounded response.

This page covers building the query integration in WSO2 Integrator: wiring up retrieve, augment, and generate nodes, and testing the endpoint.

info

Complete RAG ingestion before starting this page. The query integration reads from the same Knowledge Base that ingestion writes to.


What the RAG query does

The four nodes — Retrieve, Augment User Query, Generate, and Return — map directly to Steps 2–6 below.


Prerequisites
  • The ingestion integration from RAG ingestion has been run at least once so the Knowledge Base contains vectors.
  • The same Knowledge Base and Embedding Provider used during ingestion are available in this project.
  • A configured model provider. The default WSO2 provider works out of the box. Run Ballerina: Configure default WSO2 model provider if you haven't already.
  • An HTTP service with a POST /query resource and a userQuery string payload parameter. See Step 2 below.

Step 1: Create an HTTP service

  1. In the design view, select + Add Artifact.

    Artifacts panel showing integration types including HTTP Service under Integration as API.

  2. Under Integration as API, select HTTP Service.

    Create HTTP Service form with Service Contract and Service Base Path fields.

  3. Leave Design From Scratch selected, leave the base path as /, and select Create.

  4. In the HTTP Service editor, select + Add Resource. A method selection panel opens on the right.

    HTTP Service editor showing no resources and the Select HTTP Method to Add panel.

  5. Select POST from the method list.

    Method selection panel with POST highlighted.

  6. In the New Resource Configuration panel, set Resource Path to query.

  7. Select + Define Payload, add a parameter named userQuery of type string, then select Save.

    New Resource Configuration panel with POST method and query resource path filled in.


Step 2: Retrieve from the knowledge base

The Retrieve action queries the Knowledge Base for chunks most similar to the user's question.

  1. In the flow editor, click + to open the Add Node panel.

  2. Go to AI > RAG > Knowledge Base and select the Retrieve action.

    info

    If you don't have a Knowledge Base yet, create one first by following Knowledge Bases. Use the same Knowledge Base as ingestion. For the in-memory knowledge base, both ingestion and querying must be done in the same integration.

    Add Node panel showing AI > RAG > Knowledge Base with Retrieve action selected.

  3. Configure the node:

    FieldRequiredValue
    Knowledge BaseYesThe same Knowledge Base created during ingestion, for example knowledgeBase.
    QueryYesBind to the incoming user question, for example userQuery.
    Top KNoNumber of chunks to return. Default is 10. Increase if relevant content is being missed; use -1 to return all.
    FiltersNoMetadata filters to restrict results. Useful for multi-tenant scenarios where users should only see their own documents.
    Result variableFor example, context
  4. Click Save.

    Retrieve action form showing Knowledge Base, Query, Top K, Filters, and Result variable fields.

The result is an array of ai:QueryMatch values. Each entry contains a chunk and its similarity score against the query.

info

Retrieve is the read-side counterpart to Ingest. It must point to the same Knowledge Base and the same Embedding Provider. Pointing to a different one returns no useful results.

Flow editor showing the Retrieve node added after the HTTP service resource.


Step 3: Augment the user query

The Augment User Query node combines the retrieved chunks with the original question into a single formatted ai:ChatUserMessage ready for the LLM.

  1. Click + after the Retrieve node.

  2. Go to AI > RAG > Augment Query.

  3. Configure the node:

    FieldRequiredValue
    ContextYesThe retrieval results, for example context.
    QueryYesThe original user question, for example userQuery.
    Result variableFor example, augmentedUserMsg
  4. Click Save.

    Augment User Query form showing Context, Query, and Result variable fields.

This step handles prompt construction automatically. You do not need to manually interleave chunks and questions.

Flow editor showing the Augment User Query node added after the Retrieve node.


Step 4: Add a model provider

  1. Click + after the Augment node.
  2. Go to AI > Model Provider.
  3. Select a model provider, for example Default Model Provider (WSO2), and set the name to defaultModel.
  4. Click Save.

Flow editor showing the model provider node added after the Augment User Query node.


Step 5: Generate the response

The Generate action calls the LLM with the augmented message and returns the model's answer.

  1. Click + after the model provider node.

  2. Select the defaultModel variable and choose the Generate action.

    Model provider node with the Generate action selected.

  3. Configure the node:

    FieldRequiredValue
    PromptYesThe augmented message content, for example check augmentedUserMsg.content.ensureType().
    Expected typeNoSet to string for plain-text responses. Use a record type to get a structured response.
    Result variableFor example, response
  4. Click Save.

    Generate action form showing Prompt, Expected type, and Result variable fields.

    Flow editor showing the Generate node added after the model provider node.


Step 6: Return the response

  1. Click + after the Generate node.
  2. Select Return.
  3. Set the expression to response.
  4. Click Save.

Complete RAG query integration with HTTP service, Retrieve, Augment User Query, Generate, and Return nodes.


Running and testing

Click Run at the top right. Once the integration starts, test the endpoint:

curl -X POST http://localhost:9090/query \
-H "Content-Type: application/json" \
-d '"<your question>"'

The response will be grounded in the documents you ingested.


Tuning retrieval quality

ParameterWhereWhat it does
Top KRetrieve nodeControls how many chunks are passed to the LLM. Too few and relevant content is missed; too many and the model gets noisy context. Start at 510.
FiltersRetrieve nodeRestrict results by metadata. Use a source or tenantId field to isolate results per user or document set.
ChunkerKnowledge Base (ingestion)Affects chunk boundaries and size. Switch from ai:AUTO to a structure-aware chunker (Markdown, HTML) if retrieval quality is poor. Re-ingest after changing.

What's next

  • RAG ingestion — populate the knowledge base the query integration reads from.
  • Knowledge Bases — retrieve, delete-by-filter, and tuning reference.
  • Embedding Providers — available providers and dimension requirements.
  • Chunkers — controlling how documents are split for better retrieval.