Text Processing
Text processing components handle splitting, parsing, and formatting of text content for various workflow needs.
Split Text
This component splits text into chunks based on specified criteria. It's ideal for chunking data to be tokenized and embedded into vector databases.
The Split Text component outputs Chunks or DataFrame. The Chunks output returns a list of individual text chunks. The DataFrame output returns a structured data format, with additional text and metadata columns applied.
To use this component in a flow, connect a component that outputs Data or DataFrame to the Split Text component's Data port. This example uses the URL component, which is fetching JSON placeholder data.
In the Split Text component, define your data splitting parameters.
This example splits incoming JSON data at the separator },, so each chunk contains one JSON object.
The order of precedence is Separator, then Chunk Size, and then Chunk Overlap. If any segment after separator splitting is longer than chunk_size, it is split again to fit within chunk_size.
After chunk_size, Chunk Overlap is applied between chunks to maintain context.
Connect a Chat Output component to the Split Text component's DataFrame output to view its output.
Click Playground, and then click Run Flow. The output contains a table of JSON objects split at
},.
{
"userId": 1,
"id": 1,
"title": "Introduction to Artificial Intelligence",
"body": "Learn the basics of Artificial Intelligence and its applications in various industries.",
"link": "https://example.com/article1",
"comment_count": 8
},
{
"userId": 2,
"id": 2,
"title": "Web Development with React",
"body": "Build modern web applications using React.js and explore its powerful features.",
"link": "https://example.com/article2",
"comment_count": 12
},
Clear the Separator field, and then run the flow again. Instead of JSON objects, the output contains 50-character lines of text with 10 characters of overlap.
First chunk: "title": "Introduction to Artificial Intelligence"" Second chunk: "elligence", "body": "Learn the basics of Artif" Third chunk: "s of Artificial Intelligence and its applications"
Inputs
chunk_overlap
Chunk Overlap
The number of characters to overlap between chunks. Default: 200.
chunk_size
Chunk Size
The maximum number of characters in each chunk. Default: 1000.
separator
Separator
The character to split on. Default: newline.
text_key
Text Key
The key to use for the text column (advanced). Default: text.
Outputs
Parser
This component formats DataFrame or Data objects into text using templates, with an option to convert inputs directly to strings using stringify.
To use this component, create variables for values in the template the same way you would in a Prompt component. For DataFrames, use column names, for example Name: {Name}. For Data objects, use {text}.
To use the Parser component with a Structured Output component, do the following:
Connect a Structured Output component's DataFrame output to the Parser component's DataFrame input.
Connect the File component to the Structured Output component's Message input.
Connect the OpenAI model component's Language Model output to the Structured Output component's Language Model input.
The flow looks like this:

In the Structured Output component, click Open Table. This opens a pane for structuring your table. The table contains the rows Name, Description, Type, and Multiple.
Create a table that maps to the data you're loading from the File loader. For example, to create a table for employees, you might have the rows
id,name, andemail, all of typestring.In the Template field of the Parser component, enter a template for parsing the Structured Output component's DataFrame output into structured text. Create variables for values in the
templatethe same way you would in a Prompt component. For example, to present a table of employees in Markdown:
# Employee Profile
## Personal Information
- **Name:** {name}
- **ID:** {id}
- **Email:** {email}To run the flow, in the Parser component, click .
To view your parsed text, in the Parser component, click .
Optionally, connect a Chat Output component, and open the Playground to see the output.
For an additional example of using the Parser component to format a DataFrame from a Structured Output component, see the Market Research template flow.
Inputs
mode
Mode
Tab selection between "Parser" and "Stringify" modes. "Stringify" converts input to a string instead of using a template.
pattern
Template
Template for formatting using variables in curly brackets. For DataFrames, use column names, such as Name: {Name}. For Data objects, use {text}.
input_data
Data or DataFrame
The input to parse - accepts either a DataFrame or Data object.
sep
Separator
String used to separate rows/items. Default: newline.
clean_data
Clean Data
When stringify is enabled, cleans data by removing empty rows and lines.
Outputs
Usage Notes
Intelligent Chunking: Split text into optimal chunks for vector storage and processing
Template Formatting: Convert structured data into readable text using custom templates
Context Preservation: Maintain context between chunks with overlap settings
Flexible Output: Generate both individual chunks and structured DataFrames
Variable Support: Use template variables for dynamic content formatting
Multiple Modes: Choose between template-based parsing and simple string conversion
Last updated