Migration Tool
The Vectra::Migration tool provides utilities for copying vectors between providers, enabling zero-downtime migrations and provider switching.
Overview
The migration tool uses existing Vectra features:
- Streaming to fetch all vectors from source
- Batch operations (
Batch.upsert_async) to efficiently write to target - Progress tracking for monitoring large migrations
Supported scenarios:
- Memory → Qdrant (testing to production)
- Pinecone → Qdrant (vendor migration)
- Qdrant → pgvector (self-hosted migration)
- Any provider → Any provider
Basic Usage
Simple Migration
require 'vectra'
# Setup clients
source_client = Vectra::Client.new(provider: :memory)
target_client = Vectra::Client.new(
provider: :qdrant,
host: "http://localhost:6333"
)
# Create migration tool
migration = Vectra::Migration.new(source_client, target_client)
# Migrate vectors
result = migration.migrate(
source_index: "source-index",
target_index: "target-index"
)
puts "Migrated #{result[:migrated_count]} vectors"
puts "Batches: #{result[:batches]}"
puts "Errors: #{result[:errors].size}" if result[:errors].any?
With Progress Tracking
result = migration.migrate(
source_index: "products",
target_index: "products",
on_progress: ->(stats) {
percentage = stats[:percentage]
migrated = stats[:migrated]
total = stats[:total]
batches = stats[:batches_processed]
total_batches = stats[:total_batches]
puts "[#{percentage}%] Migrated #{migrated}/#{total} vectors " \
"(batch #{batches}/#{total_batches})"
}
)
With Custom Batch Sizes
# Fetch 500 vectors at a time, upsert in chunks of 50
result = migration.migrate(
source_index: "large-index",
target_index: "large-index",
batch_size: 500, # Vectors per fetch batch
chunk_size: 50 # Vectors per upsert chunk
)
Namespace Migration
Migrate vectors between different namespaces:
result = migration.migrate(
source_index: "products",
target_index: "products",
source_namespace: "tenant-1",
target_namespace: "tenant-2"
)
Verification
After migration, verify that all vectors were copied:
verification = migration.verify(
source_index: "old-index",
target_index: "new-index"
)
if verification[:match]
puts "✅ Migration verified: #{verification[:source_count]} vectors"
else
puts "❌ Mismatch detected!"
puts " Source: #{verification[:source_count]} vectors"
puts " Target: #{verification[:target_count]} vectors"
end
Verification with namespaces:
verification = migration.verify(
source_index: "products",
target_index: "products",
source_namespace: "tenant-1",
target_namespace: "tenant-2"
)
Real-World Examples
Example 1: Testing to Production
# Development: Memory provider
dev_client = Vectra::Client.new(provider: :memory)
# Production: Qdrant
prod_client = Vectra::Client.new(
provider: :qdrant,
host: ENV['QDRANT_HOST'],
api_key: ENV['QDRANT_API_KEY']
)
migration = Vectra::Migration.new(dev_client, prod_client)
# Migrate test data to production
result = migration.migrate(
source_index: "test-products",
target_index: "products",
on_progress: ->(stats) {
Rails.logger.info("Migration progress: #{stats[:percentage]}%")
}
)
if result[:errors].any?
Rails.logger.error("Migration errors: #{result[:errors]}")
else
Rails.logger.info("✅ Migrated #{result[:migrated_count]} vectors")
end
Example 2: Provider Migration (Pinecone → Qdrant)
# Old provider: Pinecone
pinecone_client = Vectra::Client.new(
provider: :pinecone,
api_key: ENV['PINECONE_API_KEY'],
environment: 'us-west-4'
)
# New provider: Qdrant
qdrant_client = Vectra::Client.new(
provider: :qdrant,
host: ENV['QDRANT_HOST'],
api_key: ENV['QDRANT_API_KEY']
)
migration = Vectra::Migration.new(pinecone_client, qdrant_client)
# Migrate all indexes
indexes = pinecone_client.list_indexes.map { |idx| idx[:name] }
indexes.each do |index_name|
puts "Migrating #{index_name}..."
result = migration.migrate(
source_index: index_name,
target_index: index_name,
on_progress: ->(stats) {
print "\r#{stats[:percentage]}% (#{stats[:migrated]}/#{stats[:total]})"
}
)
puts "\n✅ #{index_name}: #{result[:migrated_count]} vectors"
# Verify
verification = migration.verify(
source_index: index_name,
target_index: index_name
)
unless verification[:match]
puts "⚠️ Warning: Count mismatch for #{index_name}"
end
end
Example 3: Zero-Downtime Migration
# Strategy: Dual-write during migration, then switch
# Source: Old provider
old_client = Vectra::Client.new(provider: :pinecone, ...)
# Target: New provider
new_client = Vectra::Client.new(provider: :qdrant, ...)
migration = Vectra::Migration.new(old_client, new_client)
# Step 1: Migrate existing data
puts "Migrating existing vectors..."
result = migration.migrate(
source_index: "products",
target_index: "products",
on_progress: ->(stats) {
puts "Progress: #{stats[:percentage]}%"
}
)
# Step 2: Verify
verification = migration.verify(
source_index: "products",
target_index: "products"
)
unless verification[:match]
raise "Migration verification failed!"
end
# Step 3: Enable dual-write (in your application code)
# - Write to both old and new providers
# - Read from old provider (or both for comparison)
# Step 4: After verification period, switch to new provider
# - Update application config to use new_client
# - Remove dual-write logic
Error Handling
The migration tool continues processing even if individual batches fail:
result = migration.migrate(
source_index: "products",
target_index: "products"
)
if result[:errors].any?
puts "⚠️ Migration completed with #{result[:errors].size} errors:"
result[:errors].each do |error|
puts " - #{error.class}: #{error.message}"
end
# Retry failed batches or handle manually
else
puts "✅ Migration completed successfully"
end
Performance Tips
- Batch Size: Larger
batch_sizereduces fetch overhead but uses more memory - Chunk Size: Smaller
chunk_sizeprovides better progress granularity - Concurrency: Migration uses
Batch.upsert_asyncwhich handles concurrency automatically - Progress Callbacks: Use lightweight callbacks to avoid slowing down migration
Recommended settings:
# For small indexes (< 10K vectors)
migration.migrate(
source_index: "small",
target_index: "small",
batch_size: 1000,
chunk_size: 100
)
# For large indexes (> 100K vectors)
migration.migrate(
source_index: "large",
target_index: "large",
batch_size: 5000, # Larger batches
chunk_size: 500 # Larger chunks
)
Limitations
- Provider-Specific Features: Some provider-specific features may not migrate (e.g., custom indexes, special metadata formats)
- Large Indexes: Very large indexes (> 1M vectors) may take significant time
- Memory Usage: Migration loads batches into memory; monitor memory usage for very large migrations
- Network: Migration performance depends on network speed between source and target
See Also
- API Methods - Complete API reference
- Switching Providers - Provider migration guide
- Batch Operations - Understanding batch processing
- Recipes & Patterns - Real-world examples