DataStax Solutions

DataStax vector database components provide integration with Astra DB, Cassandra, and HCD (Hyper-Converged Database) for scalable vector storage and retrieval.

Astra DB Vector Store

This component implements a Vector Store using Astra DB with search capabilities.

For more information, see the DataStax documentation.

Use a vector store component in a flow

This example uses the Astra DB vector store component. Your vector store component's parameters and authentication may be different, but the document ingestion workflow is the same. A document is loaded from a local machine and chunked. The Astra DB vector store generates embeddings with the connected model component, and stores them in the connected Astra DB database.

This vector data can then be retrieved for workloads like Retrieval Augmented Generation.

Inputs

Name
Display Name
Info

token

Astra DB Application Token

The authentication token for accessing Astra DB.

environment

Environment

The environment for the Astra DB API Endpoint. For example, dev or prod.

database_name

Database

The database name for the Astra DB instance.

api_endpoint

Astra DB API Endpoint

The API endpoint for the Astra DB instance. This supersedes the database selection.

collection_name

Collection

The name of the collection within Astra DB where the vectors are stored.

keyspace

Keyspace

An optional keyspace within Astra DB to use for the collection.

embedding_choice

Embedding Model or Astra Vectorize

Choose an embedding model or use Astra vectorize.

embedding_model

Embedding Model

Specify the embedding model. Not required for Astra vectorize collections.

number_of_results

Number of Search Results

The number of search results to return (default: 4).

search_type

Search Type

The search type to use. The options are Similarity, Similarity with score threshold, and MMR (Max Marginal Relevance).

search_score_threshold

Search Score Threshold

The minimum similarity score threshold for search results when using the Similarity with score threshold option.

advanced_search_filter

Search Metadata Filter

An optional dictionary of filters to apply to the search query.

autodetect_collection

Autodetect Collection

A boolean flag to determine whether to autodetect the collection.

content_field

Content Field

A field to use as the text content field for the vector store.

deletion_field

Deletion Based On Field

When provided, documents in the target collection with metadata field values matching the input metadata field value are deleted before new data is loaded.

ignore_invalid_documents

Ignore Invalid Documents

A boolean flag to determine whether to ignore invalid documents at runtime.

astradb_vectorstore_kwargs

AstraDBVectorStore Parameters

An optional dictionary of additional parameters for the AstraDBVectorStore.

Outputs

Name
Display Name
Info

vector_store

Vector Store

Astra DB vector store instance configured with the specified parameters.

search_results

Search Results

The results of the similarity search as a list of Data objects.

Generate embeddings

The Astra DB Vector Store component offers two methods for generating embeddings.

  1. Embedding Model: Use your own embedding model by connecting an Embeddings component.

  2. Astra Vectorize: Use Astra DB's built-in embedding generation service. When creating a new collection, choose the embeddings provider and models, including NVIDIA's NV-Embed-QA model hosted by Datastax.

The Astra DB component includes hybrid search, which is enabled by default.

The component fields related to hybrid search are Search Query, Lexical Terms, and Reranker.

  • Search Query finds results by vector similarity.

  • Lexical Terms is a comma-separated string of keywords, like features, data, attributes, characteristics.

  • Reranker is the re-ranker model used in the hybrid search. The re-ranker model is nvidia/llama-3.2-nv.reranker.

Hybrid search performs a vector similarity search and a lexical search, compares the results of both searches, and then returns the most relevant results overall.

Cassandra

This component creates a Cassandra Vector Store with search capabilities. For more information, see the Cassandra documentation.

Inputs

Name
Type
Description

database_ref

String

Contact points for the database or AstraDB database ID

username

String

Username for the database (leave empty for AstraDB)

token

SecretString

User password for the database or AstraDB token

keyspace

String

Table Keyspace or AstraDB namespace

table_name

String

Name of the table or AstraDB collection

ttl_seconds

Integer

Time-to-live for added texts

batch_size

Integer

Number of data to process in a single batch

setup_mode

String

Configuration mode for setting up the Cassandra table

cluster_kwargs

Dict

Additional keyword arguments for the Cassandra cluster

search_query

String

Query for similarity search

ingest_data

Data

Data to be ingested into the vector store

embedding

Embeddings

Embedding function to use

number_of_results

Integer

Number of results to return in search

search_type

String

Type of search to perform

search_score_threshold

Float

Minimum similarity score for search results

search_filter

Dict

Metadata filters for search query

body_search

String

Document textual search terms

enable_body_search

Boolean

Flag to enable body search

Outputs

Name
Type
Description

vector_store

Cassandra

A Cassandra vector store instance configured with the specified parameters.

search_results

List[Data]

The results of the similarity search as a list of Data objects.

Usage Notes

  • Enterprise Scale: DataStax solutions provide enterprise-grade scalability and reliability

  • Cloud Native: Astra DB is a fully managed cloud service

  • Hybrid Deployment: Cassandra can be deployed on-premises or in hybrid environments

  • Advanced Features: Support for hybrid search, graph capabilities, and multi-modal data

Last updated