Data Filtering

Data filtering components enable precise selection and transformation of data based on specific criteria and intelligent filtering logic.

Filter Data

This component filters a Data object based on a list of keys.

Inputs

Name
Display Name
Info

data

Data

Data object to filter.

filter_criteria

Filter Criteria

List of keys to filter by.

Outputs

Name
Display Name
Info

filtered_data

Filtered Data

A new Data object containing only the key-value pairs that match the filter criteria.

Filter Values

The Filter values component filters a list of data items based on a specified key, filter value, and comparison operator.

Inputs

Name
Display Name
Info

input_data

Input data

The list of data items to filter.

filter_key

Filter Key

The key to filter on, for example, 'route'.

filter_value

Filter Value

The value to filter by, for example, 'CMIP'.

operator

Comparison Operator

The operator to apply for comparing the values.

Outputs

Name
Display Name
Info

filtered_data

Filtered data

The resulting list of filtered data items.

Lambda Filter

This component uses an LLM to generate a Lambda function for filtering or transforming structured data.

To use the Lambda filter component, you must connect it to a Language Model component, which the component uses to generate a function based on the natural language instructions in the Instructions field.

This example gets JSON data from the https://jsonplaceholder.typicode.com/users API endpoint. The Instructions field in the Lambda filter component specifies the task extract emails. The connected LLM creates a filter based on the instructions, and successfully extracts a list of email addresses from the JSON data.

Inputs

Name
Display Name
Info

data

Data

The structured data to filter or transform using a Lambda function.

llm

Language Model

The connection port for a Model component.

filter_instruction

Instructions

Natural language instructions for how to filter or transform the data using a Lambda function, such as Filter the data to only include items where the 'status' is 'active'.

sample_size

Sample Size

For large datasets, the number of characters to sample from the dataset head and tail.

max_size

Max Size

The number of characters for the data to be considered "large", which triggers sampling by the sample_size value.

Outputs

Name
Display Name
Info

filtered_data

Filtered Data

The filtered or transformed Data object.

dataframe

DataFrame

The filtered data as a DataFrame.

Usage Notes

  • Intelligent Filtering: Lambda filter uses AI to understand complex filtering requirements

  • Key-Based Filtering: Simple filtering based on specific data keys

  • Comparison Operations: Support for various comparison operators

  • Natural Language: Describe filtering logic in plain English

  • Multiple Outputs: Get results as both Data objects and DataFrames

  • Large Dataset Support: Automatic sampling for processing large datasets efficiently

Last updated