BroxiAI
  • Welcome
  • Account
    • Quickstart
    • Password
    • Team
      • Create team
      • Join team
      • Payment & Billing
      • Payment Policy
    • Global Variables
    • API Keys
  • Workflow
    • Overview
    • Quickstart
    • Components
    • Playground
    • Publish Workflow
    • File Manager
    • Webhook
  • Components
    • Input & Output
    • Agents
    • AI Models
    • Data
    • Embeddings
    • Helper
    • Loader
    • Logic
    • Memories
    • Processing
    • Prompt
    • Tools
    • Vector database
  • Advanced
    • Use Agent in flow
    • MCP Connect
    • MCP Astra DB
  • Integration
    • Apify
    • AssemblyAI
    • Composio
    • Google
      • Google Auth
      • Vertex AI
    • Notion
      • Setup
      • Notion Conversational Agent
      • Notion Meeting Notes Agent
Powered by GitBook
On this page
  • Use a data component in a flow​
  • API Request​
  • Directory​
  • File​
  • Gmail Loader​
  • Google Drive Loader​
  • Google Drive Search​
  • SQL Query​
  • URL​
  • Webhook​
  1. Components

Data

PreviousAI ModelsNextEmbeddings

Last updated 12 days ago

Data components load data from a source into your flow.

They may perform some processing or type checking, like converting raw HTML data into text, or ensuring your loaded file is of an acceptable type.

Use a data component in a flow

The URL data component loads content from a list of URLs.

In the component's URLs field, enter a comma-separated list of URLs you want to load. Alternatively, connect a component that outputs the Message type, like the Chat Input component, to supply your URLs with a component.

To output a Data type, in the Output Format dropdown, select Raw HTML. To output a Message type, in the Output Format dropdown, select Text. This option applies postprocessing with the data_to_text helper function.

In this example of a document ingestion pipeline, the URL component outputs raw HTML to a text splitter, which splits the raw content into chunks for a vector database to ingest.

URL component in a data ingestion pipeline

This component makes HTTP requests using URLs or cURL commands.

  1. To use this component in a flow, connect the Data output to a component that accepts the input. For example, connect the API Request component to a Chat Output component.

  1. In the API component's URLs field, enter the endpoint for your request. This example uses https://dummy-json.mock.beeceptor.com/posts, which is a list of technology blog posts.

  2. In the Method field, enter the type of request. This example uses GET to retrieve a list of blog posts. The component also supports POST, PATCH, PUT, and DELETE.

  3. Optionally, enable the Use cURL button to create a field for pasting curl requests. The equivalent call in this example is curl -v https://dummy-json.mock.beeceptor.com/posts.

  4. Click Playground, and then click Run Flow. Your request returns a list of blog posts in the result field.

The API Request component retrieved a list of JSON objects in the result field. For this example, you will use the Lambda Filter to extract the desired data nested within the result field.

  1. Connect a Lambda Filter to the API request component, and a Language model to the Lambda Filter. This example connects a Groq model component.

  2. In the Groq model component, add your Groq API key.

  3. To filter the data, in the Lambda filter component, in the Instructions field, use natural language to describe how the data should be filtered. For this example, enter:

I want to explode the result column out into a Data object

Avoid punctuation in the Instructions field, as it can cause errors.

  1. To run the flow, in the Lambda Filter component, click .

  2. To inspect the filtered data, in the Lambda Filter component, click . The result is a structured DataFrame.

| userId | id | title | body | link | comment_count |
|---|----|-------|------|------|---------------|
| 1 | 1 | Introduction to Artificial Intelligence | Learn the basics of AI ...| https://example.com/article1 | 8 |
| 2 | 2 | Web Development with React | Build modern web applications ...| https://example.com/article2 | 12 |
Name
Display Name
Info

urls

URLs

Enter one or more URLs, separated by commas.

curl

cURL

Paste a curl command to populate the dictionary fields for headers and body.

method

Method

The HTTP method to use.

use_curl

Use cURL

Enable cURL mode to populate fields from a cURL command.

query_params

Query Parameters

The query parameters to append to the URL.

body

Body

The body to send with the request as a dictionary (for POST, PATCH, PUT).

headers

Headers

The headers to send with the request as a dictionary.

timeout

Timeout

The timeout to use for the request.

follow_redirects

Follow Redirects

Whether to follow http redirects.

save_to_file

Save to File

Save the API response to a temporary file

include_httpx_metadata

Include HTTPx Metadata

Include properties such as headers, status_code, response_headers, and redirection_history in the output.

Name
Display Name
Info

data

Data

The result of the API requests. Returns a Data object containing source URL and results.

dataframe

DataFrame

Converts the API response data into a tabular DataFrame format.

This component recursively loads files from a directory, with options for file types, depth, and concurrency.

Input
Type
Description

path

MessageTextInput

Path to the directory to load files from

types

MessageTextInput

File types to load (leave empty to load all types)

depth

IntInput

Depth to search for files

max_concurrency

IntInput

Maximum concurrency for loading files

load_hidden

BoolInput

If true, hidden files are loaded

recursive

BoolInput

If true, the search is recursive

silent_errors

BoolInput

If true, errors do not raise an exception

use_multithreading

BoolInput

If true, multithreading is used

Output
Type
Description

data

List[Data]

Loaded file data from the directory

This component loads and parses files of various supported formats and converts the content into a Data object. It supports multiple file types and provides options for parallel processing and error handling.

To load a document, follow these steps:

  1. Click the Select files button.

The loaded file name appears in the component.

Name
Display Name
Info

path

Files

Path to file(s) to load. Supports individual files or bundled archives.

file_path

Server File Path

Data object with a file_path property pointing to the server file or a Message object with a path to the file. Supersedes 'Path' but supports the same file types.

separator

Separator

Specify the separator to use between multiple outputs in Message format.

silent_errors

Silent Errors

If true, errors do not raise an exception.

delete_server_file_after_processing

Delete Server File After Processing

If true, the Server File Path is deleted after processing.

ignore_unsupported_extensions

Ignore Unsupported Extensions

If true, files with unsupported extensions are not processed.

ignore_unspecified_files

Ignore Unspecified Files

If true, Data with no file_path property is ignored.

use_multithreading

[Deprecated] Use Multithreading

Set 'Processing Concurrency' greater than 1 to enable multithreading. This option is deprecated.

concurrency_multithreading

Processing Concurrency

When multiple files are being processed, the number of files to process concurrently. Default is 1. Values greater than 1 enable parallel processing for 2 or more files.

Name
Display Name
Info

data

Data

dataframe

DataFrame

message

Message

Text files:

  • .txt - Text files

  • .md, .mdx - Markdown files

  • .csv - CSV files

  • .json - JSON files

  • .yaml, .yml - YAML files

  • .xml - XML files

  • .html, .htm - HTML files

  • .pdf - PDF files

  • .docx - Word documents

  • .py - Python files

  • .sh - Shell scripts

  • .sql - SQL files

  • .js - JavaScript files

  • .ts, .tsx - TypeScript files

Archive formats (for bundling multiple files):

  • .zip - ZIP archives

  • .tar - TAR archives

  • .tgz - Gzipped TAR archives

  • .bz2 - Bzip2 compressed files

  • .gz - Gzip compressed files

This component loads emails from Gmail using provided credentials and filters.

Input
Type
Description

json_string

SecretStrInput

JSON string containing OAuth 2.0 access token information for service account access

label_ids

MessageTextInput

Comma-separated list of label IDs to filter emails

max_results

MessageTextInput

Maximum number of emails to load

Output
Type
Description

data

Data

Loaded email data

This component loads documents from Google Drive using provided credentials and a single document ID.

Input
Type
Description

json_string

SecretStrInput

JSON string containing OAuth 2.0 access token information for service account access

document_id

MessageTextInput

Single Google Drive document ID

Output
Type
Description

docs

Data

Loaded document data

This component searches Google Drive files using provided credentials and query parameters.

Input
Type
Description

token_string

SecretStrInput

JSON string containing OAuth 2.0 access token information for service account access

query_item

DropdownInput

The field to query

valid_operator

DropdownInput

Operator to use in the query

search_term

MessageTextInput

The value to search for in the specified query item

query_string

MessageTextInput

The query string used for searching (can be edited manually)

Output
Type
Description

doc_urls

List[str]

URLs of the found documents

doc_ids

List[str]

IDs of the found documents

doc_titles

List[str]

Titles of the found documents

Data

Data

Document titles and URLs in a structured format

This component executes SQL queries on a specified database.

Name
Display Name
Info

query

Query

The SQL query to execute.

database_url

Database URL

The URL of the database.

include_columns

Include Columns

Include columns in the result.

passthrough

Passthrough

If an error occurs, return the query instead of raising an exception.

add_error

Add Error

Add the error to the result.

Name
Display Name
Info

result

Result

The result of the SQL query execution.

This component fetches content from one or more URLs, processes the content, and returns it in various formats. It supports output in plain text, raw HTML, or JSON, with options for cleaning and separating multiple outputs.

  1. To use this component in a flow, connect the DataFrame output to a component that accepts the input. For example, connect the URL component to a Chat Output component.

  1. In the URL component's URLs field, enter the URL for your request.

  2. Optionally, in the Max Depth field, enter how many pages away from the initial URL you want to crawl. Select 1 to crawl only the page specified in the URLs field. Select 2 to crawl all pages linked from that page. The component crawls by link traversal, not by URL path depth.

  3. Click Playground, and then click Run Flow. The text contents of the URL are returned to the Playground as a structured DataFrame.

  4. In the URL component, change the output port to Message, and then run the flow again. The text contents of the URL are returned as unstructured raw text, which you can extract patterns from with the Regex Extractor tool.

  5. Connect the URL component to a Regex Extractor and Chat Output.

  1. In the Regex Extractor tool, enter a pattern to extract text from the URL component's raw output. This example extracts the first paragraph from the "In the News" section of https://en.wikipedia.org/wiki/Main_Page.

In the news\s*\n(.*?)(?=\n\n)

Result:

Peruvian writer and Nobel Prize in Literature laureate Mario Vargas Llosa (pictured) dies at the age of 89.
Name
Display Name
Info

urls

URLs

Enter one or more URLs. URLs are automatically validated and cleaned.

format

Output Format

Output Format. Use Text to extract text from the HTML, Raw HTML for the raw HTML content, or JSON to extract JSON from the HTML.

separator

Separator

Specify the separator to use between multiple outputs. Default for Text is \n. Default for Raw HTML is \n<!-- Separator -->.

clean_extra_whitespace

Clean Extra Whitespace

Whether to clean excessive blank lines in the text output. Only applies to Text format.

Name
Display Name
Info

data

Data

text

Text

Fetched content as formatted text, with applied separators and cleaning.

dataframe

DataFrame

This component defines a webhook trigger that runs a flow when it receives an HTTP POST request.

If the input is not valid JSON, the component wraps it in a payload object so that it can be processed and still trigger the flow. The component does not require an API key.

When a Webhook component is added to the workspace, a new Webhook cURL tab becomes available in the API pane that contains an HTTP POST request for triggering the webhook component. For example:

curl -X POST \
  "https://use.broxi.ai/api/v1/webhook/**YOUR_FLOW_ID**" \
  -H 'Content-Type: application/json'\
  -d '{"any": "data"}'

To test the webhook component:

  1. Add a Webhook component to the flow.

  2. In the Parser component, under Mode, select Stringify. This mode passes the webhook's data as a string for the Chat Output component to print.

  3. To send a POST request, copy the code from the Webhook cURL tab in the API pane and paste it into a terminal.

  4. Send the POST request.

  5. Open the Playground. Your JSON data is posted to the Chat Output component, which indicates that the webhook component is correctly triggering the flow.

Name
Display Name
Description

data

Payload

Receives a payload from external systems through HTTP POST requests.

curl

cURL

The cURL command template for making requests to this webhook.

endpoint

Endpoint

The endpoint URL where this webhook receives requests.

Name
Display Name
Description

output_data

Data

API Request

API request into a chat output component

Filter API request data

Inputs

Outputs

Directory

Inputs

Outputs

File

Select a local file or a file loaded with , and then click Select file.

Inputs

Outputs

Parsed content of the file as a object.

File content as a object.

File content as a object.

Supported File Types

Gmail Loader

Inputs

Outputs

Google Drive Loader

Inputs

Outputs

Google Drive Search

Inputs

Outputs

SQL Query

Inputs

Outputs

URL

URL request into a chat output component
Regex extractor connected to url component

Inputs

Outputs

List of objects containing fetched content and metadata.

Content formatted as a object.

Webhook

Connect the Webhook component's Data output to the Data input of a component.

Connect the Parser component's Parsed Text output to the Text input of a component.

Inputs

Outputs

Outputs processed data from the webhook input, and returns an empty object if no input is provided. If the input is not valid JSON, the component wraps it in a payload object.

​
​
​
​
​
​
​
​
File management
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Parser
Chat Output
​
​
Data
DataFrame
Message
Data
DataFrame
Data
​