Data Loaders

A Data Loader reads documents from a source and returns them as ai:Document values, ready to be chunked, embedded, and indexed by a Knowledge Base. It is the entry point of the RAG ingestion pipeline.

Available actions

Every data loader exposes one action.

Action	What it does	Required parameters
Load	Reads the configured source and returns the documents.	None.

load returns a single ai:Document when exactly one document is resolved, and an ai:Document[] otherwise.

Where to find data loaders

In the flow editor, open the Add Node panel and go to AI > RAG > Data Loader, then click + Add Data Loader. The Data Loaders picker lists the available types.

Implementations overview

Data Loader	Module	Reads from	Result type
Text Data Loader	`ballerina/ai`	Files on the local file system.	`ai:TextDataLoader`
Microsoft SharePoint Text Data Loader	`ballerinax/ai.microsoft.sharepoint`	SharePoint document libraries and site pages, via the Microsoft Graph API.	`sharepoint:TextDataLoader`

Text Data Loader

Reads files from the local file system and wraps their content as ai:Document values. It loads supported file types as ai:TextDocument values.

Create form

Field	Required	Description
Paths	Yes	One or more paths to the files to load.
Data Loader Name	Yes	The variable name for the loader instance.
Result Type	Yes	The variable type, set to `ai:TextDataLoader`.

For an end-to-end example of wiring this loader into an ingestion pipeline, see RAG ingestion — add a text data loader.

Microsoft SharePoint Text Data Loader

Retrieves documents from SharePoint document libraries and site pages and returns them as text, accessed through the Microsoft Graph API. A single loader instance can read from multiple sites and libraries, individual files, entire folders (optionally recursively), and modern site pages.

Each file is returned as an ai:TextDocument based on its MIME type / extension:

Inherently textual files (e.g. txt, md, html, json, csv, xml) are decoded directly.
pdf files have their text extracted.
Other files that cannot be represented as text (e.g. images, audio, archives) are skipped with a logged warning. Naming such a file explicitly as a path is an error.

Create form

Field	Required	Description
SharePoint Connection Configurations	Yes	The authentication and service configuration shared by all sources. See Connection configurations.
Data Sources	Yes	One or more SharePoint sources to load documents from. At least one is required. See Data sources.
Data Loader Name	Yes	The variable name for the loader instance.
Result Type	Yes	The variable type, set to `sharepoint:TextDataLoader`.

Connection configurations

The connection configuration is shared by every source.

Field	Type	Default	Description
auth	`OAuth2ClientCredentialsGrantConfig \| OAuth2RefreshTokenGrantConfig \| http:BearerTokenConfig`	—	Authentication configuration for the Microsoft Graph API.
serviceUrl	`string`	`https://graph.microsoft.com/v1.0`	The base URL of the Microsoft Graph service.

Plus the Standard HTTP advanced configurations, which tune the underlying HTTP client and are forwarded to the Graph sites and pages clients.

Data sources

Data Sources is an array of Source records, each describing one SharePoint site to read from.

`Source`

Field	Type	Default	Description
siteId	`string`	—	The Microsoft Graph site id. Accepts the composite id (`{hostname},{spsite-guid},{spweb-guid}`) or the path form (`{hostname}:/sites/{site-name}`).
libraries	`Library[]`	`[{}]`	Document libraries to read from, each with its own paths and options. The default loads the whole of the site's default document library; `[]` loads no document-library content.
pages	`string[]?`	`()`	Site pages to load as text, matched by name, title, or id. Use `["*"]` for all pages; `()` for none.

`Library`

Field	Type	Default	Description
name	`string`	`Documents`	Display name of the document library, as shown in SharePoint. The default `Documents` is the English name; localized tenants use a translated name (e.g. `Dokumente`, `Documentos`). Use `"*"` for every library on the site.
paths	`string[]`	`["/"]`	File and/or folder paths relative to this library's root (e.g. `/Reports`). The default loads the entire library; `[]` loads nothing from it.
recursive	`boolean`	`false`	Whether folder paths are traversed into sub-folders.
includeExtensions	`string[]?`	`()`	Case-insensitive extension allowlist applied to folder contents (e.g. `["pdf"]`); a leading dot is optional. `()` loads all types. A file listed explicitly in `paths` is always loaded, even if its extension is not in the list.

What's next

RAG ingestion - Wire a data loader into an ingestion pipeline.
Knowledge Bases - Combine a chunker, embedding provider, and vector store to ingest the loaded documents.
Chunkers - Control how loaded documents are split before embedding.

Available actions​

Where to find data loaders​

Implementations overview​

Text Data Loader​

Create form​

Microsoft SharePoint Text Data Loader​

Create form​

Connection configurations​

Data sources​

Source​

Library​

What's next​

Available actions

Where to find data loaders

Implementations overview

Text Data Loader

Create form

Microsoft SharePoint Text Data Loader

Create form

Connection configurations

Data sources

`Source`

`Library`

What's next