# Text Processing

Text processing components handle splitting, parsing, and formatting of text content for various workflow needs.

## Split Text

This component splits text into chunks based on specified criteria. It's ideal for chunking data to be tokenized and embedded into vector databases.

The **Split Text** component outputs **Chunks** or **DataFrame**. The **Chunks** output returns a list of individual text chunks. The **DataFrame** output returns a structured data format, with additional `text` and `metadata` columns applied.

1. To use this component in a flow, connect a component that outputs [Data or DataFrame](https://guidenai.gitbook.io/broxi/data-transformation#data-to-dataframe) to the **Split Text** component's **Data** port. This example uses the **URL** component, which is fetching JSON placeholder data.
2. In the **Split Text** component, define your data splitting parameters.

This example splits incoming JSON data at the separator `},`, so each chunk contains one JSON object.

The order of precedence is **Separator**, then **Chunk Size**, and then **Chunk Overlap**. If any segment after separator splitting is longer than `chunk_size`, it is split again to fit within `chunk_size`.

After `chunk_size`, **Chunk Overlap** is applied between chunks to maintain context.

3. Connect a **Chat Output** component to the **Split Text** component's **DataFrame** output to view its output.
4. Click **Playground**, and then click **Run Flow**. The output contains a table of JSON objects split at `},`.

```json
{
"userId": 1,
"id": 1,
"title": "Introduction to Artificial Intelligence",
"body": "Learn the basics of Artificial Intelligence and its applications in various industries.",
"link": "https://example.com/article1",
"comment_count": 8
},
{
"userId": 2,
"id": 2,
"title": "Web Development with React",
"body": "Build modern web applications using React.js and explore its powerful features.",
"link": "https://example.com/article2",
"comment_count": 12
},

```

5. Clear the **Separator** field, and then run the flow again. Instead of JSON objects, the output contains 50-character lines of text with 10 characters of overlap.

> First chunk: "title": "Introduction to Artificial Intelligence""\
> Second chunk: "elligence", "body": "Learn the basics of Artif"\
> Third chunk: "s of Artificial Intelligence and its applications"

### Inputs

| Name           | Display Name    | Info                                                                                                                                                                                        |
| -------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| data\_inputs   | Input Documents | The data to split.The component accepts [Data](https://docs.langflow.org/concepts-objects#data-object) or [DataFrame](https://docs.langflow.org/concepts-objects#dataframe-object) objects. |
| chunk\_overlap | Chunk Overlap   | The number of characters to overlap between chunks. Default: `200`.                                                                                                                         |
| chunk\_size    | Chunk Size      | The maximum number of characters in each chunk. Default: `1000`.                                                                                                                            |
| separator      | Separator       | The character to split on. Default: `newline`.                                                                                                                                              |
| text\_key      | Text Key        | The key to use for the text column (advanced). Default: `text`.                                                                                                                             |

### Outputs

| Name      | Display Name | Info                                                                                                                          |
| --------- | ------------ | ----------------------------------------------------------------------------------------------------------------------------- |
| chunks    | Chunks       | List of split text chunks as [Data](https://github.com/GuidenAI/broxi_document/blob/main/components/data.md) objects.         |
| dataframe | DataFrame    | List of split text chunks as [DataFrame](https://guidenai.gitbook.io/broxi/data-transformation#dataframe-operations) objects. |

## Parser

This component formats `DataFrame` or `Data` objects into text using templates, with an option to convert inputs directly to strings using `stringify`.

To use this component, create variables for values in the `template` the same way you would in a Prompt component. For `DataFrames`, use column names, for example `Name: {Name}`. For `Data` objects, use `{text}`.

To use the **Parser** component with a **Structured Output** component, do the following:

1. Connect a **Structured Output** component's **DataFrame** output to the **Parser** component's **DataFrame** input.
2. Connect the **File** component to the **Structured Output** component's **Message** input.
3. Connect the **OpenAI** model component's **Language Model** output to the **Structured Output** component's **Language Model** input.

The flow looks like this:

![A parser component connected to OpenAI and structured output](https://2739525811-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGUcjvdv7MzVy7GYqImRT%2Fuploads%2FVbIjvLjKeGVa134WgQBe%2FUntitled%20diagram%20_%20Mermaid%20Chart-2025-09-07-172410.png?alt=media\&token=6a0d71ab-b9f6-40ee-a861-cc7183c1d9fa)

4. In the **Structured Output** component, click **Open Table**. This opens a pane for structuring your table. The table contains the rows **Name**, **Description**, **Type**, and **Multiple**.
5. Create a table that maps to the data you're loading from the **File** loader. For example, to create a table for employees, you might have the rows `id`, `name`, and `email`, all of type `string`.
6. In the **Template** field of the **Parser** component, enter a template for parsing the **Structured Output** component's DataFrame output into structured text. Create variables for values in the `template` the same way you would in a Prompt component. For example, to present a table of employees in Markdown:

```markdown
# Employee Profile
## Personal Information
- **Name:** {name}
- **ID:** {id}
- **Email:** {email}
```

7. To run the flow, in the **Parser** component, click .
8. To view your parsed text, in the **Parser** component, click .
9. Optionally, connect a **Chat Output** component, and open the **Playground** to see the output.

For an additional example of using the **Parser** component to format a DataFrame from a **Structured Output** component, see the **Market Research** template flow.

### Inputs

| Name        | Display Name      | Info                                                                                                                                                 |
| ----------- | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| mode        | Mode              | Tab selection between "Parser" and "Stringify" modes. "Stringify" converts input to a string instead of using a template.                            |
| pattern     | Template          | Template for formatting using variables in curly brackets. For DataFrames, use column names, such as `Name: {Name}`. For Data objects, use `{text}`. |
| input\_data | Data or DataFrame | The input to parse - accepts either a DataFrame or Data object.                                                                                      |
| sep         | Separator         | String used to separate rows/items. Default: newline.                                                                                                |
| clean\_data | Clean Data        | When stringify is enabled, cleans data by removing empty rows and lines.                                                                             |

### Outputs

| Name         | Display Name | Info                                                                                                           |
| ------------ | ------------ | -------------------------------------------------------------------------------------------------------------- |
| parsed\_text | Parsed Text  | The resulting formatted text as a [Message](https://docs.langflow.org/concepts-objects#message-object) object. |

## Usage Notes

* **Intelligent Chunking**: Split text into optimal chunks for vector storage and processing
* **Template Formatting**: Convert structured data into readable text using custom templates
* **Context Preservation**: Maintain context between chunks with overlap settings
* **Flexible Output**: Generate both individual chunks and structured DataFrames
* **Variable Support**: Use template variables for dynamic content formatting
* **Multiple Modes**: Choose between template-based parsing and simple string conversion
