DataStax Solutions
DataStax vector database components provide integration with Astra DB, Cassandra, and HCD (Hyper-Converged Database) for scalable vector storage and retrieval.
Astra DB Vector Store
This component implements a Vector Store using Astra DB with search capabilities.
For more information, see the DataStax documentation.
Use a vector store component in a flow
This example uses the Astra DB vector store component. Your vector store component's parameters and authentication may be different, but the document ingestion workflow is the same. A document is loaded from a local machine and chunked. The Astra DB vector store generates embeddings with the connected model component, and stores them in the connected Astra DB database.
This vector data can then be retrieved for workloads like Retrieval Augmented Generation.
Inputs
token
Astra DB Application Token
The authentication token for accessing Astra DB.
environment
Environment
The environment for the Astra DB API Endpoint. For example, dev
or prod
.
database_name
Database
The database name for the Astra DB instance.
api_endpoint
Astra DB API Endpoint
The API endpoint for the Astra DB instance. This supersedes the database selection.
collection_name
Collection
The name of the collection within Astra DB where the vectors are stored.
keyspace
Keyspace
An optional keyspace within Astra DB to use for the collection.
embedding_choice
Embedding Model or Astra Vectorize
Choose an embedding model or use Astra vectorize.
embedding_model
Embedding Model
Specify the embedding model. Not required for Astra vectorize collections.
number_of_results
Number of Search Results
The number of search results to return (default: 4
).
search_type
Search Type
The search type to use. The options are Similarity
, Similarity with score threshold
, and MMR (Max Marginal Relevance)
.
search_score_threshold
Search Score Threshold
The minimum similarity score threshold for search results when using the Similarity with score threshold
option.
advanced_search_filter
Search Metadata Filter
An optional dictionary of filters to apply to the search query.
autodetect_collection
Autodetect Collection
A boolean flag to determine whether to autodetect the collection.
content_field
Content Field
A field to use as the text content field for the vector store.
deletion_field
Deletion Based On Field
When provided, documents in the target collection with metadata field values matching the input metadata field value are deleted before new data is loaded.
ignore_invalid_documents
Ignore Invalid Documents
A boolean flag to determine whether to ignore invalid documents at runtime.
astradb_vectorstore_kwargs
AstraDBVectorStore Parameters
An optional dictionary of additional parameters for the AstraDBVectorStore.
Outputs
vector_store
Vector Store
Astra DB vector store instance configured with the specified parameters.
Generate embeddings
The Astra DB Vector Store component offers two methods for generating embeddings.
Embedding Model: Use your own embedding model by connecting an Embeddings component.
Astra Vectorize: Use Astra DB's built-in embedding generation service. When creating a new collection, choose the embeddings provider and models, including NVIDIA's
NV-Embed-QA
model hosted by Datastax.
Hybrid search
The Astra DB component includes hybrid search, which is enabled by default.
The component fields related to hybrid search are Search Query, Lexical Terms, and Reranker.
Search Query finds results by vector similarity.
Lexical Terms is a comma-separated string of keywords, like
features, data, attributes, characteristics
.Reranker is the re-ranker model used in the hybrid search. The re-ranker model is
nvidia/llama-3.2-nv.reranker
.
Hybrid search performs a vector similarity search and a lexical search, compares the results of both searches, and then returns the most relevant results overall.
Cassandra
This component creates a Cassandra Vector Store with search capabilities. For more information, see the Cassandra documentation.
Inputs
database_ref
String
Contact points for the database or AstraDB database ID
username
String
Username for the database (leave empty for AstraDB)
token
SecretString
User password for the database or AstraDB token
keyspace
String
Table Keyspace or AstraDB namespace
table_name
String
Name of the table or AstraDB collection
ttl_seconds
Integer
Time-to-live for added texts
batch_size
Integer
Number of data to process in a single batch
setup_mode
String
Configuration mode for setting up the Cassandra table
cluster_kwargs
Dict
Additional keyword arguments for the Cassandra cluster
search_query
String
Query for similarity search
ingest_data
Data
Data to be ingested into the vector store
embedding
Embeddings
Embedding function to use
number_of_results
Integer
Number of results to return in search
search_type
String
Type of search to perform
search_score_threshold
Float
Minimum similarity score for search results
search_filter
Dict
Metadata filters for search query
body_search
String
Document textual search terms
enable_body_search
Boolean
Flag to enable body search
Outputs
vector_store
Cassandra
A Cassandra vector store instance configured with the specified parameters.
search_results
List[Data]
The results of the similarity search as a list of Data
objects.
Usage Notes
Enterprise Scale: DataStax solutions provide enterprise-grade scalability and reliability
Cloud Native: Astra DB is a fully managed cloud service
Hybrid Deployment: Cassandra can be deployed on-premises or in hybrid environments
Advanced Features: Support for hybrid search, graph capabilities, and multi-modal data
Last updated