Document Loaders
Document loaders fetch data into BroxiAI from various document sources including structured file systems, wikis, and unstructured documents.
Confluence
The Confluence component integrates with the Confluence wiki collaboration platform to load and process documents. It utilizes the ConfluenceLoader from LangChain to fetch content from a specified Confluence space.
Inputs
url
Site URL
The base URL of the Confluence Space (e.g., https://company.atlassian.net/wiki
)
username
Username
Atlassian User E-mail (e.g., email@example.com
)
space_key
Space Key
The key of the Confluence space to access
cloud
Use Cloud?
Whether to use Confluence Cloud (default: true)
content_format
Content Format
Specify content format (default: STORAGE)
max_pages
Max Pages
Maximum number of pages to retrieve (default: 1000)
Outputs
data
Data
List of Data objects containing the loaded Confluence documents
GitLoader
The GitLoader component uses the GitLoader from LangChain to fetch and load documents from a specified Git repository.
Inputs
repo_path
Repository Path
The local path to the Git repository
clone_url
Clone URL
The URL to clone the Git repository from (optional)
branch
Branch
The branch to load files from (default: 'main')
file_filter
File Filter
Patterns to filter files (e.g., '.py' to include only .py files, '!.py' to exclude .py files)
content_filter
Content Filter
A regex pattern to filter files based on their content
Outputs
data
Data
List of Data objects containing the loaded Git repository documents
Unstructured
This component uses the Unstructured.io Serverless API to load and parse files into a list of structured Data objects.
Inputs
api_key
API Key
Unstructured.io Serverless API Key
api_url
Unstructured.io API URL
Optional URL for the Unstructured API
chunking_strategy
Chunking Strategy
Strategy for chunking the document (options: "", "basic", "by_title", "by_page", "by_similarity")
unstructured_args
Additional Arguments
Optional dictionary of additional arguments for the Unstructured.io API
Outputs
data
Data
List of Data objects containing the parsed content from the input file
Usage Notes
Document Parsing: Automatically converts various document formats into structured data
Repository Integration: Direct access to Git repositories for code and documentation
Wiki Integration: Seamless integration with Confluence wikis
Flexible Input: Support for various file formats and content sources
Chunking Support: Advanced document chunking strategies for better processing
Enterprise Ready: Support for enterprise platforms with proper authentication
Last updated