Vector Store Configuration Guideยถ

This guide covers how to configure vector store backends in ReMe using the default.yaml configuration file.

๐Ÿ“‹ Overviewยถ

ReMe provides multiple vector store backends for different use cases:

  • LocalVectorStore (backend=local) - ๐Ÿ“ Simple file-based storage for development and small datasets

  • ChromaVectorStore (backend=chroma) - ๐Ÿ”ฎ Embedded vector database for moderate scale

  • EsVectorStore (backend=elasticsearch) - ๐Ÿ” Elasticsearch-based storage for production and large scale

  • QdrantVectorStore (backend=qdrant) - ๐ŸŽฏ High-performance vector database with advanced filtering

  • MemoryVectorStore (backend=memory) - โšก In-memory storage for ultra-fast access and testing

All vector stores implement the BaseVectorStore interface, providing a consistent API across implementations.

๐Ÿ“Š Comparison Tableยถ

Feature

LocalVectorStore

ChromaVectorStore

EsVectorStore

QdrantVectorStore

MemoryVectorStore

Storage

File (JSONL)

Embedded DB

Elasticsearch

Qdrant Server

In-Memory

Performance

Medium

Good

Excellent

Excellent

Ultra-Fast

Scalability

< 10K vectors

< 1M vectors

> 1M vectors

> 10M vectors

< 1M vectors

Persistence

โœ… Auto

โœ… Auto

โœ… Auto

โœ… Auto

โš ๏ธ Manual

Setup Complexity

๐ŸŸข Simple

๐ŸŸก Medium

๐Ÿ”ด Complex

๐ŸŸก Medium

๐ŸŸข Simple

Dependencies

None

ChromaDB

Elasticsearch

Qdrant

None

Filtering

โŒ Basic

โœ… Metadata

โœ… Advanced

โœ… Advanced

โŒ Basic

Concurrency

โŒ Limited

โœ… Good

โœ… Excellent

โœ… Excellent

โŒ Single Process

Async Support

โŒ No

โŒ No

โŒ No

โœ… Native

โŒ No

Best For

Development

Local Apps

Production

Production/Cloud

Testing

โš™๏ธ Configuration in default.yamlยถ

All vector stores are configured in the vector_store section of reme_ai/config/default.yaml. The configuration structure is:

vector_store:
  default:
    backend: <backend_name>        # Required: local, chroma, elasticsearch, qdrant, or memory
    embedding_model: default        # Required: Name of the embedding model configuration
    params:                         # Optional: Backend-specific parameters
      # Backend-specific parameters go here

Configuration Fieldsยถ

  • backend (required): The vector store backend to use. Valid values: local, chroma, elasticsearch, qdrant, memory

  • embedding_model (required): The name of the embedding model configuration from the embedding_model section

  • params (optional): A dictionary of backend-specific parameters that will be passed to the vector store constructor

๐Ÿ“ Vector Store Backend Configurationsยถ

1. LocalVectorStore (backend=local)ยถ

A simple file-based vector store that saves data to local JSONL files.

๐Ÿ’ก When to Useยถ

  • Development and testing - No external dependencies required ๐Ÿ› ๏ธ

  • Small datasets - Suitable for datasets with < 10,000 vectors ๐Ÿ“Š

  • Single-user applications - Limited concurrent access support ๐Ÿ‘ค

โš™๏ธ Configurationยถ

vector_store:
  default:
    backend: local
    embedding_model: default
    params:
      store_dir: "./local_vector_store"  # Directory to store JSONL files (default: "./local_vector_store")
      batch_size: 1024                    # Batch size for operations (default: 1024)

Configuration Parametersยถ

  • store_dir (optional): Directory path where workspace files are stored. Default: "./local_vector_store"

  • batch_size (optional): Batch size for bulk operations. Default: 1024

2. ChromaVectorStore (backend=chroma)ยถ

An embedded vector database that provides persistent storage with advanced features.

๐Ÿ’ก When to Useยถ

  • Local development with persistence requirements ๐Ÿ 

  • Medium-scale applications (10K - 1M vectors) ๐Ÿ“ˆ

  • Applications requiring metadata filtering ๐Ÿ”

โš™๏ธ Configurationยถ

vector_store:
  default:
    backend: chroma
    embedding_model: default
    params:
      store_dir: "./chroma_vector_store"  # Directory for Chroma database (default: "./chroma_vector_store")
      batch_size: 1024                    # Batch size for operations (default: 1024)

Configuration Parametersยถ

  • store_dir (optional): Directory path where ChromaDB data is persisted. Default: "./chroma_vector_store"

  • batch_size (optional): Batch size for bulk operations. Default: 1024

3. EsVectorStore (backend=elasticsearch)ยถ

Production-grade vector search using Elasticsearch with advanced filtering and scaling capabilities.

๐Ÿ’ก When to Useยถ

  • Production environments requiring high availability ๐Ÿญ

  • Large-scale applications (1M+ vectors) ๐Ÿš€

  • Complex filtering requirements on metadata ๐ŸŽฏ

๐Ÿ› ๏ธ Setup Elasticsearchยถ

Before using EsVectorStore, set up Elasticsearch:

Option 1: Docker Runยถ
# Pull the latest Elasticsearch image
docker pull docker.elastic.co/elasticsearch/elasticsearch-wolfi:9.0.0

# Run Elasticsearch container
docker run -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -e "xpack.license.self_generated.type=trial" \
  -e "http.host=0.0.0.0" \
  docker.elastic.co/elasticsearch/elasticsearch-wolfi:9.0.0
Environment Configurationยถ
export FLOW_ES_HOSTS=http://localhost:9200

โš™๏ธ Configurationยถ

vector_store:
  default:
    backend: elasticsearch
    embedding_model: default
    params:
      hosts: "http://localhost:9200"     # Elasticsearch host(s) - can be string or list (default: from FLOW_ES_HOSTS env var or "http://localhost:9200")
      basic_auth: null                    # Optional: ("username", "password") tuple for authentication
      batch_size: 1024                    # Batch size for bulk operations (default: 1024)

Configuration Parametersยถ

  • hosts (optional): Elasticsearch host(s) as a string or list of strings. Defaults to the FLOW_ES_HOSTS environment variable or "http://localhost:9200" if not set

  • basic_auth (optional): Tuple of ("username", "password") for basic authentication. Default: null (no authentication)

  • batch_size (optional): Batch size for bulk operations. Default: 1024

4. QdrantVectorStore (backend=qdrant)ยถ

A high-performance vector database designed for production workloads with native async support and advanced filtering.

๐Ÿ’ก When to Useยถ

  • Production environments requiring high performance and reliability ๐Ÿญ

  • Large-scale applications (10M+ vectors) with excellent horizontal scaling ๐Ÿš€

  • Applications requiring native async operations for better concurrency โšก

  • Complex filtering and metadata queries on large datasets ๐ŸŽฏ

  • Cloud-native deployments with Qdrant Cloud support โ˜๏ธ

๐Ÿ› ๏ธ Setup Qdrantยถ

Before using QdrantVectorStore, set up Qdrant:

Option 2: Qdrant Cloudยถ

For production, you can use Qdrant Cloud for managed hosting.

Environment Configurationยถ
# For local setup
export FLOW_QDRANT_HOST=localhost
export FLOW_QDRANT_PORT=6333

# For cloud setup (optional)
export FLOW_QDRANT_API_KEY=your-api-key

โš™๏ธ Configurationยถ

Local Qdrant Instanceยถ
vector_store:
  default:
    backend: qdrant
    embedding_model: default
    params:
      host: "localhost"                   # Qdrant host (default: from FLOW_QDRANT_HOST env var or "localhost")
      port: 6333                          # Qdrant port (default: from FLOW_QDRANT_PORT env var or 6333)
      batch_size: 1024                    # Batch size for operations (default: 1024)
      distance: "COSINE"                  # Distance metric: "COSINE", "EUCLIDEAN", or "DOT" (default: "COSINE")
Qdrant Cloud or Remote Serverยถ
vector_store:
  default:
    backend: qdrant
    embedding_model: default
    params:
      url: "https://your-cluster.qdrant.io:6333"  # Qdrant server URL (if provided, host and port are ignored)
      api_key: "your-api-key"                     # API key for Qdrant Cloud authentication
      batch_size: 1024                            # Batch size for operations (default: 1024)
      distance: "COSINE"                          # Distance metric (default: "COSINE")

Configuration Parametersยถ

  • url (optional): Complete URL for connecting to Qdrant. If provided, host and port are ignored. Useful for Qdrant Cloud or custom deployments

  • host (optional): Host address of the Qdrant server. Defaults to the FLOW_QDRANT_HOST environment variable or "localhost" if not set

  • port (optional): Port number of the Qdrant server. Defaults to the FLOW_QDRANT_PORT environment variable or 6333 if not set

  • api_key (optional): API key for authentication (required for Qdrant Cloud). Can also be set via FLOW_QDRANT_API_KEY environment variable

  • distance (optional): Distance metric for vector similarity. Valid values: "COSINE", "EUCLIDEAN", "DOT". Default: "COSINE"

  • batch_size (optional): Batch size for bulk operations. Default: 1024

๐ŸŒŸ Key Featuresยถ

  • Native Async Support - All operations have async equivalents for better concurrency

  • Upsert Operations - Insert automatically updates existing nodes with the same ID

  • Advanced Filtering - Support for term and range filters on metadata

  • High Performance - Optimized for large-scale vector similarity search

  • Horizontal Scaling - Supports clustering for distributed deployments

  • Multiple Distance Metrics - Cosine, Euclidean, and Dot Product similarity

  • Persistent Storage - Data is automatically persisted to disk

  • Efficient Iteration - Scroll through large collections with pagination

5. MemoryVectorStore (backend=memory)ยถ

An ultra-fast in-memory vector store that keeps all data in RAM for maximum performance.

๐Ÿ’ก When to Useยถ

  • Testing and development - Fastest possible operations for unit tests ๐Ÿงช

  • Small to medium datasets that fit in memory (< 1M vectors) ๐Ÿ’พ

  • Applications requiring ultra-low latency search operations โšก

  • Temporary workspaces that donโ€™t need persistence ๐Ÿš€

โš™๏ธ Configurationยถ

vector_store:
  default:
    backend: memory
    embedding_model: default
    params:
      store_dir: "./memory_vector_store"  # Directory for backup/restore operations (default: "./memory_vector_store")
      batch_size: 1024                     # Batch size for operations (default: 1024)

Configuration Parametersยถ

  • store_dir (optional): Directory path for backup/restore operations. Default: "./memory_vector_store"

  • batch_size (optional): Batch size for bulk operations. Default: 1024

โšก Performance Benefitsยถ

  • Zero I/O latency - All operations happen in RAM

  • Instant search results - No disk or network overhead

  • Perfect for testing - Fast setup and teardown

  • Memory efficient - Only stores what you need

๐Ÿšจ Important Notesยถ

  • Data is volatile - Lost when process ends unless explicitly saved

  • Memory usage - Entire dataset must fit in available RAM

  • No persistence - Use dump_workspace() to save to disk

  • Single process - Not suitable for distributed applications

๐Ÿ“ Example Configurationsยถ

Minimal Configuration (Memory Store)ยถ

vector_store:
  default:
    backend: memory
    embedding_model: default

Local File Storageยถ

vector_store:
  default:
    backend: local
    embedding_model: default
    params:
      store_dir: "./my_vector_store"
      batch_size: 2048

Elasticsearch Production Setupยถ

vector_store:
  default:
    backend: elasticsearch
    embedding_model: default
    params:
      hosts: "http://elasticsearch.example.com:9200"
      basic_auth: ["username", "password"]
      batch_size: 2048

Qdrant Cloud Setupยถ

vector_store:
  default:
    backend: qdrant
    embedding_model: default
    params:
      url: "https://your-cluster.qdrant.io:6333"
      api_key: "your-api-key-here"
      distance: "COSINE"
      batch_size: 1024

๐Ÿ”„ Environment Variablesยถ

Some vector store backends support environment variables for configuration:

  • Elasticsearch: FLOW_ES_HOSTS - Elasticsearch host(s)

  • Qdrant:

    • FLOW_QDRANT_HOST - Qdrant host (default: โ€œlocalhostโ€)

    • FLOW_QDRANT_PORT - Qdrant port (default: 6333)

    • FLOW_QDRANT_API_KEY - Qdrant API key for authentication

Environment variables are used as fallbacks when parameters are not explicitly set in the YAML configuration.

๐Ÿงฉ Integration with Embedding Modelsยถ

All vector stores require an embedding model configuration. The embedding_model field in the vector store configuration references a model defined in the embedding_model section of default.yaml:

embedding_model:
  default:
    backend: openai_compatible
    model_name: text-embedding-v4
    params:
      dimensions: 1024

vector_store:
  default:
    backend: memory
    embedding_model: default  # References the embedding_model.default configuration

The embedding model configuration provides the model name, backend, and parameters needed for generating vector embeddings.