Open Source Databases

Open source vector database components provide access to self-hosted and community-driven vector storage solutions.

Chroma DB

This component creates a Chroma Vector Store with search capabilities.

The Chroma DB component creates an ephemeral vector database for experimentation and vector storage.

  1. To use this component in a flow, connect it to a component that outputs Data or DataFrame. This example splits text from a URL component, and computes embeddings with the connected OpenAI Embeddings component. Chroma DB computes embeddings by default, but you can connect your own embeddings model, as seen in this example.

  2. In the Chroma DB component, in the Collection field, enter a name for your embeddings collection.

  3. Optionally, to persist the Chroma database, in the Persist field, enter a directory to store the chroma.sqlite3 file. This example uses ./chroma-db to create a directory relative to where BroxiAI is running.

  4. To load data and embeddings into your Chroma database, in the Chroma DB component, click .

tip

When loading duplicate documents, enable the Allow Duplicates option in Chroma DB if you want to store multiple copies of the same content, or disable it to automatically deduplicate your data.

  1. To view the split data, in the Split Text component, click .

  2. To query your loaded data, open the Playground and query your database. Your input is converted to vector data and compared to the stored vectors in a vector similarity search.

For more information, see the Chroma documentation.

Inputs

Name
Type
Description

collection_name

String

The name of the Chroma collection. Default: "BroxiAI".

persist_directory

String

The directory to persist the Chroma database.

search_query

String

The query to search for in the vector store.

ingest_data

Data

The data to ingest into the vector store (list of Data objects).

embedding

Embeddings

The embedding function to use for the vector store.

chroma_server_cors_allow_origins

String

CORS allow origins for the Chroma server.

chroma_server_host

String

Host for the Chroma server.

chroma_server_http_port

Integer

HTTP port for the Chroma server.

chroma_server_grpc_port

Integer

gRPC port for the Chroma server.

chroma_server_ssl_enabled

Boolean

Enable SSL for the Chroma server.

allow_duplicates

Boolean

Allow duplicate documents in the vector store.

search_type

String

Type of search to perform: "Similarity" or "MMR".

number_of_results

Integer

Number of results to return from the search. Default: 10.

limit

Integer

Limit the number of records to compare when Allow Duplicates is False.

Outputs

Name
Type
Description

vector_store

Chroma

Chroma vector store instance

search_results

List[Data]

Results of similarity search

Local DB

The Local DB component is BroxiAI's enhanced version of Chroma DB.

The component adds a user-friendly interface with two modes (Ingest and Retrieve), automatic collection management, and built-in persistence in BroxiAI's cache directory.

Local DB includes Ingest and Retrieve modes.

The Retrieve mode can query your Chroma DB collections.

For more information, see the Chroma documentation.

Inputs

Name
Type
Description

collection_name

String

The name of the Chroma collection. Default: "BroxiAI".

persist_directory

String

Custom base directory to save the vector store. Collections will be stored under {directory}/vector_stores/{collection_name}. If not specified, it will use your system's cache folder.

existing_collections

String

Select a previously created collection to search through its stored data.

embedding

Embeddings

The embedding function to use for the vector store.

allow_duplicates

Boolean

If false, will not add documents that are already in the Vector Store.

search_type

String

Type of search to perform: "Similarity" or "MMR".

ingest_data

Data/DataFrame

Data to store. It will be embedded and indexed for semantic search.

search_query

String

Enter text to search for similar content in the selected collection.

number_of_results

Integer

Number of results to return. Default: 10.

limit

Integer

Limit the number of records to compare when Allow Duplicates is False.

Outputs

Name
Type
Description

vector_store

Chroma

A local Chroma vector store instance configured with the specified parameters.

search_results

ListData

Results of similarity search.

FAISS

This component creates a FAISS Vector Store with search capabilities. For more information, see the FAISS documentation.

Inputs

Name
Type
Description

index_name

String

The name of the FAISS index. Default: "broxiai_index".

persist_directory

String

Path to save the FAISS index. It will be relative to where BroxiAI is running.

search_query

String

The query to search for in the vector store.

ingest_data

Data

The data to ingest into the vector store (list of Data objects or documents).

allow_dangerous_deserialization

Boolean

Set to True to allow loading pickle files from untrusted sources. Default: True (advanced).

embedding

Embeddings

The embedding function to use for the vector store.

number_of_results

Integer

Number of results to return from the search. Default: 4 (advanced).

Outputs

Name
Type
Description

vector_store

FAISS

A FAISS vector store instance configured with the specified parameters.

Usage Notes

  • Cost-Effective: No licensing costs, ideal for development and small-scale deployments

  • Self-Hosted: Full control over your data and infrastructure

  • Community Support: Active open source communities with extensive documentation

  • Flexibility: Can be customized and extended based on specific requirements

Last updated