Batch Processing

Batch processing components handle bulk operations on datasets, enabling efficient processing of multiple data items through language models.

Batch Run

The Batch Run component runs a language model over each row of a DataFrame text column and returns a new DataFrame with the original text and an LLM response.

The response contains the following columns:

  • text_input: The original text from the input DataFrame.

  • model_response: The model's response for each input.

  • batch_index: The processing order, with a 0-based index.

  • metadata (optional): Additional information about the processing.

These columns, when connected to a Parser component, can be used as variables within curly braces.

To use the Batch Run component with a Parser component, do the following:

  1. Connect a Model component to the Batch Run component's Language model port.

  2. Connect a component that outputs DataFrame, like File component, to the Batch Run component's DataFrame input.

  3. Connect the Batch Run component's Batch Results output to a Parser component's DataFrame input. The flow looks like this:

A batch run component connected to OpenAI and a Parser
  1. In the Column Name field of the Batch Run component, enter a column name based on the data you're loading from the File loader. For example, to process a column of name, enter name.

  2. Optionally, in the System Message field of the Batch Run component, enter a System Message to instruct the connected LLM on how to process your file. For example, Create a business card for each name.

  3. In the Template field of the Parser component, enter a template for using the Batch Run component's new DataFrame columns. To use all three columns from the Batch Run component, include them like this: record_number: {batch_index}, name: {text_input}, summary: {model_response}

  4. To run the flow, in the Parser component, click .

  5. To view your created DataFrame, in the Parser component, click .

  6. Optionally, connect a Chat Output component, and open the Playground to see the output.

Inputs

Name
Display Name
Type
Info

model

Language Model

HandleInput

Connect the 'Language Model' output from your LLM component here. Required.

system_message

System Message

MultilineInput

Multi-line system instruction for all rows in the DataFrame.

df

DataFrame

DataFrameInput

The DataFrame whose column is treated as text messages, as specified by 'column_name'. Required.

column_name

Column Name

MessageTextInput

The name of the DataFrame column to treat as text messages. Default='text'. Required.

enable_metadata

Enable Metadata

BoolInput

If True, add metadata to the output DataFrame.

Outputs

Name
Display Name
Method
Info

batch_results

Batch Results

run_batch

A DataFrame with columns: 'text_input', 'model_response', 'batch_index', and optional 'metadata' containing processing information.

Usage Notes

  • Bulk Processing: Efficiently process multiple data items in a single operation

  • Consistent Results: Apply the same LLM processing to all rows in a dataset

  • Structured Output: Maintain original data alongside LLM responses

  • Template Integration: Works seamlessly with Parser components for formatting

  • Metadata Support: Optional metadata tracking for debugging and analysis

  • Sequential Processing: Maintains processing order with batch index tracking

Last updated