Skip to main content

Data Loaders

A Data Loader reads documents from a source and returns them as ai:Document values, ready to be chunked, embedded, and indexed by a Knowledge Base. It is the entry point of the RAG ingestion pipeline.

Available actions

Every data loader exposes one action.

ActionWhat it doesRequired parameters
LoadReads the configured source and returns the documents.None.

load returns a single ai:Document when exactly one document is resolved, and an ai:Document[] otherwise.

Where to find data loaders

In the flow editor, open the Add Node panel and go to AI > RAG > Data Loader, then click + Add Data Loader. The Data Loaders picker lists the available types.

Data Loaders picker listing Text Data Loader (a data loader that loads supported file types as text documents) and Microsoft SharePoint Text Data Loader (a data loader that retrieves documents from SharePoint document libraries as text).

Implementations overview

Data LoaderModuleReads fromResult type
Text Data Loaderballerina/aiFiles on the local file system.ai:TextDataLoader
Microsoft SharePoint Text Data Loaderballerinax/ai.microsoft.sharepointSharePoint document libraries and site pages, via the Microsoft Graph API.sharepoint:TextDataLoader

Text Data Loader

Reads files from the local file system and wraps their content as ai:Document values. It loads supported file types as ai:TextDocument values.

Create form

ai Data Loader create form titled 'Initializes the data loader with the given paths' showing Paths (the paths to the files to load), Data Loader Name (default aiTextdataloader), and Result Type (ai).

FieldRequiredDescription
PathsYesOne or more paths to the files to load.
Data Loader NameYesThe variable name for the loader instance.
Result TypeYesThe variable type, set to ai:TextDataLoader.

For an end-to-end example of wiring this loader into an ingestion pipeline, see RAG ingestion — add a text data loader.

Microsoft SharePoint Text Data Loader

Retrieves documents from SharePoint document libraries and site pages and returns them as text, accessed through the Microsoft Graph API. A single loader instance can read from multiple sites and libraries, individual files, entire folders (optionally recursively), and modern site pages.

Each file is returned as an ai:TextDocument based on its MIME type / extension:

  • Inherently textual files (e.g. txt, md, html, json, csv, xml) are decoded directly.
  • pdf files have their text extracted.
  • Other files that cannot be represented as text (e.g. images, audio, archives) are skipped with a logged warning. Naming such a file explicitly as a path is an error.

Create form

ai.microsoft.sharepoint Data Loader create form titled 'Initializes the SharePoint data loader' showing SharePoint Connection Configurations (a Record), Data Sources (an Array), Data Loader Name (default sharepointTextdataloaderResult), and Result Type (sharepoint).

FieldRequiredDescription
SharePoint Connection ConfigurationsYesThe authentication and service configuration shared by all sources. See Connection configurations.
Data SourcesYesOne or more SharePoint sources to load documents from. At least one is required. See Data sources.
Data Loader NameYesThe variable name for the loader instance.
Result TypeYesThe variable type, set to sharepoint:TextDataLoader.

Connection configurations

The connection configuration is shared by every source.

FieldTypeDefaultDescription
authOAuth2ClientCredentialsGrantConfig | OAuth2RefreshTokenGrantConfig | http:BearerTokenConfigAuthentication configuration for the Microsoft Graph API.
serviceUrlstringhttps://graph.microsoft.com/v1.0The base URL of the Microsoft Graph service.

Plus the Standard HTTP advanced configurations, which tune the underlying HTTP client and are forwarded to the Graph sites and pages clients.

Data sources

Data Sources is an array of Source records, each describing one SharePoint site to read from.

Source

FieldTypeDefaultDescription
siteIdstringThe Microsoft Graph site id. Accepts the composite id ({hostname},{spsite-guid},{spweb-guid}) or the path form ({hostname}:/sites/{site-name}).
librariesLibrary[][{}]Document libraries to read from, each with its own paths and options. The default loads the whole of the site's default document library; [] loads no document-library content.
pagesstring[]?()Site pages to load as text, matched by name, title, or id. Use ["*"] for all pages; () for none.

Library

FieldTypeDefaultDescription
namestringDocumentsDisplay name of the document library, as shown in SharePoint. The default Documents is the English name; localized tenants use a translated name (e.g. Dokumente, Documentos). Use "*" for every library on the site.
pathsstring[]["/"]File and/or folder paths relative to this library's root (e.g. /Reports). The default loads the entire library; [] loads nothing from it.
recursivebooleanfalseWhether folder paths are traversed into sub-folders.
includeExtensionsstring[]?()Case-insensitive extension allowlist applied to folder contents (e.g. ["pdf"]); a leading dot is optional. () loads all types. A file listed explicitly in paths is always loaded, even if its extension is not in the list.

What's next

  • RAG ingestion - Wire a data loader into an ingestion pipeline.
  • Knowledge Bases - Combine a chunker, embedding provider, and vector store to ingest the loaded documents.
  • Chunkers - Control how loaded documents are split before embedding.