Vector Store Configuration Guideยถ
This guide covers how to configure vector store backends in ReMe using the default.yaml configuration file.
๐ Overviewยถ
ReMe provides multiple vector store backends for different use cases:
LocalVectorStore (
backend=local) - ๐ Simple file-based storage for development and small datasetsChromaVectorStore (
backend=chroma) - ๐ฎ Embedded vector database for moderate scaleEsVectorStore (
backend=elasticsearch) - ๐ Elasticsearch-based storage for production and large scaleQdrantVectorStore (
backend=qdrant) - ๐ฏ High-performance vector database with advanced filteringMemoryVectorStore (
backend=memory) - โก In-memory storage for ultra-fast access and testing
All vector stores implement the BaseVectorStore interface, providing a consistent API across implementations.
๐ Comparison Tableยถ
Feature |
LocalVectorStore |
ChromaVectorStore |
EsVectorStore |
QdrantVectorStore |
MemoryVectorStore |
|---|---|---|---|---|---|
Storage |
File (JSONL) |
Embedded DB |
Elasticsearch |
Qdrant Server |
In-Memory |
Performance |
Medium |
Good |
Excellent |
Excellent |
Ultra-Fast |
Scalability |
< 10K vectors |
< 1M vectors |
> 1M vectors |
> 10M vectors |
< 1M vectors |
Persistence |
โ Auto |
โ Auto |
โ Auto |
โ Auto |
โ ๏ธ Manual |
Setup Complexity |
๐ข Simple |
๐ก Medium |
๐ด Complex |
๐ก Medium |
๐ข Simple |
Dependencies |
None |
ChromaDB |
Elasticsearch |
Qdrant |
None |
Filtering |
โ Basic |
โ Metadata |
โ Advanced |
โ Advanced |
โ Basic |
Concurrency |
โ Limited |
โ Good |
โ Excellent |
โ Excellent |
โ Single Process |
Async Support |
โ No |
โ No |
โ No |
โ Native |
โ No |
Best For |
Development |
Local Apps |
Production |
Production/Cloud |
Testing |
โ๏ธ Configuration in default.yamlยถ
All vector stores are configured in the vector_store section of reme_ai/config/default.yaml. The configuration structure is:
vector_store:
default:
backend: <backend_name> # Required: local, chroma, elasticsearch, qdrant, or memory
embedding_model: default # Required: Name of the embedding model configuration
params: # Optional: Backend-specific parameters
# Backend-specific parameters go here
Configuration Fieldsยถ
backend(required): The vector store backend to use. Valid values:local,chroma,elasticsearch,qdrant,memoryembedding_model(required): The name of the embedding model configuration from theembedding_modelsectionparams(optional): A dictionary of backend-specific parameters that will be passed to the vector store constructor
๐ Vector Store Backend Configurationsยถ
1. LocalVectorStore (backend=local)ยถ
A simple file-based vector store that saves data to local JSONL files.
๐ก When to Useยถ
Development and testing - No external dependencies required ๐ ๏ธ
Small datasets - Suitable for datasets with < 10,000 vectors ๐
Single-user applications - Limited concurrent access support ๐ค
โ๏ธ Configurationยถ
vector_store:
default:
backend: local
embedding_model: default
params:
store_dir: "./local_vector_store" # Directory to store JSONL files (default: "./local_vector_store")
batch_size: 1024 # Batch size for operations (default: 1024)
Configuration Parametersยถ
store_dir(optional): Directory path where workspace files are stored. Default:"./local_vector_store"batch_size(optional): Batch size for bulk operations. Default:1024
2. ChromaVectorStore (backend=chroma)ยถ
An embedded vector database that provides persistent storage with advanced features.
๐ก When to Useยถ
Local development with persistence requirements ๐
Medium-scale applications (10K - 1M vectors) ๐
Applications requiring metadata filtering ๐
โ๏ธ Configurationยถ
vector_store:
default:
backend: chroma
embedding_model: default
params:
store_dir: "./chroma_vector_store" # Directory for Chroma database (default: "./chroma_vector_store")
batch_size: 1024 # Batch size for operations (default: 1024)
Configuration Parametersยถ
store_dir(optional): Directory path where ChromaDB data is persisted. Default:"./chroma_vector_store"batch_size(optional): Batch size for bulk operations. Default:1024
3. EsVectorStore (backend=elasticsearch)ยถ
Production-grade vector search using Elasticsearch with advanced filtering and scaling capabilities.
๐ก When to Useยถ
Production environments requiring high availability ๐ญ
Large-scale applications (1M+ vectors) ๐
Complex filtering requirements on metadata ๐ฏ
๐ ๏ธ Setup Elasticsearchยถ
Before using EsVectorStore, set up Elasticsearch:
Option 1: Docker Runยถ
# Pull the latest Elasticsearch image
docker pull docker.elastic.co/elasticsearch/elasticsearch-wolfi:9.0.0
# Run Elasticsearch container
docker run -p 9200:9200 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-e "xpack.license.self_generated.type=trial" \
-e "http.host=0.0.0.0" \
docker.elastic.co/elasticsearch/elasticsearch-wolfi:9.0.0
Environment Configurationยถ
export FLOW_ES_HOSTS=http://localhost:9200
โ๏ธ Configurationยถ
vector_store:
default:
backend: elasticsearch
embedding_model: default
params:
hosts: "http://localhost:9200" # Elasticsearch host(s) - can be string or list (default: from FLOW_ES_HOSTS env var or "http://localhost:9200")
basic_auth: null # Optional: ("username", "password") tuple for authentication
batch_size: 1024 # Batch size for bulk operations (default: 1024)
Configuration Parametersยถ
hosts(optional): Elasticsearch host(s) as a string or list of strings. Defaults to theFLOW_ES_HOSTSenvironment variable or"http://localhost:9200"if not setbasic_auth(optional): Tuple of("username", "password")for basic authentication. Default:null(no authentication)batch_size(optional): Batch size for bulk operations. Default:1024
4. QdrantVectorStore (backend=qdrant)ยถ
A high-performance vector database designed for production workloads with native async support and advanced filtering.
๐ก When to Useยถ
Production environments requiring high performance and reliability ๐ญ
Large-scale applications (10M+ vectors) with excellent horizontal scaling ๐
Applications requiring native async operations for better concurrency โก
Complex filtering and metadata queries on large datasets ๐ฏ
Cloud-native deployments with Qdrant Cloud support โ๏ธ
๐ ๏ธ Setup Qdrantยถ
Before using QdrantVectorStore, set up Qdrant:
Option 1: Docker Run (Recommended for Development)ยถ
# Pull the latest Qdrant image
docker pull qdrant/qdrant
# Run Qdrant container
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrant
Option 2: Qdrant Cloudยถ
For production, you can use Qdrant Cloud for managed hosting.
Environment Configurationยถ
# For local setup
export FLOW_QDRANT_HOST=localhost
export FLOW_QDRANT_PORT=6333
# For cloud setup (optional)
export FLOW_QDRANT_API_KEY=your-api-key
โ๏ธ Configurationยถ
Local Qdrant Instanceยถ
vector_store:
default:
backend: qdrant
embedding_model: default
params:
host: "localhost" # Qdrant host (default: from FLOW_QDRANT_HOST env var or "localhost")
port: 6333 # Qdrant port (default: from FLOW_QDRANT_PORT env var or 6333)
batch_size: 1024 # Batch size for operations (default: 1024)
distance: "COSINE" # Distance metric: "COSINE", "EUCLIDEAN", or "DOT" (default: "COSINE")
Qdrant Cloud or Remote Serverยถ
vector_store:
default:
backend: qdrant
embedding_model: default
params:
url: "https://your-cluster.qdrant.io:6333" # Qdrant server URL (if provided, host and port are ignored)
api_key: "your-api-key" # API key for Qdrant Cloud authentication
batch_size: 1024 # Batch size for operations (default: 1024)
distance: "COSINE" # Distance metric (default: "COSINE")
Configuration Parametersยถ
url(optional): Complete URL for connecting to Qdrant. If provided,hostandportare ignored. Useful for Qdrant Cloud or custom deploymentshost(optional): Host address of the Qdrant server. Defaults to theFLOW_QDRANT_HOSTenvironment variable or"localhost"if not setport(optional): Port number of the Qdrant server. Defaults to theFLOW_QDRANT_PORTenvironment variable or6333if not setapi_key(optional): API key for authentication (required for Qdrant Cloud). Can also be set viaFLOW_QDRANT_API_KEYenvironment variabledistance(optional): Distance metric for vector similarity. Valid values:"COSINE","EUCLIDEAN","DOT". Default:"COSINE"batch_size(optional): Batch size for bulk operations. Default:1024
๐ Key Featuresยถ
Native Async Support - All operations have async equivalents for better concurrency
Upsert Operations - Insert automatically updates existing nodes with the same ID
Advanced Filtering - Support for term and range filters on metadata
High Performance - Optimized for large-scale vector similarity search
Horizontal Scaling - Supports clustering for distributed deployments
Multiple Distance Metrics - Cosine, Euclidean, and Dot Product similarity
Persistent Storage - Data is automatically persisted to disk
Efficient Iteration - Scroll through large collections with pagination
5. MemoryVectorStore (backend=memory)ยถ
An ultra-fast in-memory vector store that keeps all data in RAM for maximum performance.
๐ก When to Useยถ
Testing and development - Fastest possible operations for unit tests ๐งช
Small to medium datasets that fit in memory (< 1M vectors) ๐พ
Applications requiring ultra-low latency search operations โก
Temporary workspaces that donโt need persistence ๐
โ๏ธ Configurationยถ
vector_store:
default:
backend: memory
embedding_model: default
params:
store_dir: "./memory_vector_store" # Directory for backup/restore operations (default: "./memory_vector_store")
batch_size: 1024 # Batch size for operations (default: 1024)
Configuration Parametersยถ
store_dir(optional): Directory path for backup/restore operations. Default:"./memory_vector_store"batch_size(optional): Batch size for bulk operations. Default:1024
โก Performance Benefitsยถ
Zero I/O latency - All operations happen in RAM
Instant search results - No disk or network overhead
Perfect for testing - Fast setup and teardown
Memory efficient - Only stores what you need
๐จ Important Notesยถ
Data is volatile - Lost when process ends unless explicitly saved
Memory usage - Entire dataset must fit in available RAM
No persistence - Use
dump_workspace()to save to diskSingle process - Not suitable for distributed applications
๐ Example Configurationsยถ
Minimal Configuration (Memory Store)ยถ
vector_store:
default:
backend: memory
embedding_model: default
Local File Storageยถ
vector_store:
default:
backend: local
embedding_model: default
params:
store_dir: "./my_vector_store"
batch_size: 2048
Elasticsearch Production Setupยถ
vector_store:
default:
backend: elasticsearch
embedding_model: default
params:
hosts: "http://elasticsearch.example.com:9200"
basic_auth: ["username", "password"]
batch_size: 2048
Qdrant Cloud Setupยถ
vector_store:
default:
backend: qdrant
embedding_model: default
params:
url: "https://your-cluster.qdrant.io:6333"
api_key: "your-api-key-here"
distance: "COSINE"
batch_size: 1024
๐ Environment Variablesยถ
Some vector store backends support environment variables for configuration:
Elasticsearch:
FLOW_ES_HOSTS- Elasticsearch host(s)Qdrant:
FLOW_QDRANT_HOST- Qdrant host (default: โlocalhostโ)FLOW_QDRANT_PORT- Qdrant port (default: 6333)FLOW_QDRANT_API_KEY- Qdrant API key for authentication
Environment variables are used as fallbacks when parameters are not explicitly set in the YAML configuration.
๐งฉ Integration with Embedding Modelsยถ
All vector stores require an embedding model configuration. The embedding_model field in the vector store configuration references a model defined in the embedding_model section of default.yaml:
embedding_model:
default:
backend: openai_compatible
model_name: text-embedding-v4
params:
dimensions: 1024
vector_store:
default:
backend: memory
embedding_model: default # References the embedding_model.default configuration
The embedding model configuration provides the model name, backend, and parameters needed for generating vector embeddings.