Skip to main content

RAG Retrieval

Retrieval-augmented generation (RAG) retrieval searches a vector store for the most relevant information that answers a user query. Use it after ingestion to find supporting chunks for a response.

Prerequisites
  • You have already ingested files into a vector store. See RAG Ingestion.
  • API credentials for the vector store and the embedding provider.

To retrieve chunks that have already been ingested, open your organization from the Organization dropdown in the console header. In the left navigation menu, click RAG, then select Retrieval.

Step 1: Initialize the vector store

  1. Select Pinecone as the vector database.
  2. Enter the key in API Key.
info

To create a key, see the Pinecone API key documentation.

  1. Enter the Collection name to retrieve from.
  2. Click Next.

Step 2: Configure the embedding model

  1. Select text-embedding-ada-002 from the OpenAI provider list.
  2. Enter the key in Embedding model API key.
info

To create a key, see the OpenAI embeddings documentation.

  1. Click Next.

Step 3: Query and retrieve chunks

  1. Enter a query that matches the content of ingested files.
  2. Review Maximum chunks to retrieve and Minimum similarity threshold. Update if needed.
info
  • Maximum chunks to retrieve sets how many matching chunks are returned.
  • Minimum similarity threshold is a value between 0 and 1 that filters out low-similarity results (for example, 0.7).
  1. Click Retrieve. Results display matching chunks and their similarity scores.
info

WSO2 Cloud - Integration Platform's retrieval process can apply reranking models to return the most contextually relevant chunks.

Retrieve relevant chunks from the vector store

Step 4: Enable reranking (optional)

Reranking reorders retrieved chunks by contextual relevance, improving the quality of results passed to your LLM. WSO2 Cloud supports reranking with Cohere.

To enable reranking:

  1. In the retrieval form, turn on the Reranking option.
  2. Enter your Cohere API key.

For more information, see the Cohere documentation.

What's next