Runbook: Cache Issues

Alert: VectraLowCacheHitRatio
Severity: Warning
Threshold: Cache hit ratio <50% for 10 minutes

Symptoms

Quick Diagnosis

cache = Vectra::Cache.new
stats = cache.stats

puts "Size: #{stats[:size]} / #{stats[:max_size]}"
puts "TTL: #{stats[:ttl]} seconds"
puts "Keys: #{stats[:keys].count}"
# Prometheus: Check hit ratio
sum(vectra_cache_hits_total) / 
(sum(vectra_cache_hits_total) + sum(vectra_cache_misses_total))

Investigation Steps

1. Check Cache Configuration

# Current config
puts Vectra.configuration.cache_enabled    # Should be true
puts Vectra.configuration.cache_ttl        # Default: 300
puts Vectra.configuration.cache_max_size   # Default: 1000

2. Analyze Access Patterns

# Check what's being cached
cache.stats[:keys].each do |key|
  parts = key.split(":")
  puts "Index: #{parts[0]}, Type: #{parts[1]}"
end

# Count by type
keys = cache.stats[:keys]
queries = keys.count { |k| k.include?(":q:") }
fetches = keys.count { |k| k.include?(":f:") }
puts "Query cache entries: #{queries}"
puts "Fetch cache entries: #{fetches}"

3. Check for Cache Thrashing

# If max_size is too small, cache thrashes
# Sign: entries being evicted immediately after creation
# Solution: Increase max_size

stats = cache.stats
if stats[:size] >= stats[:max_size] * 0.9
  puts "WARNING: Cache near capacity - consider increasing max_size"
end

4. Check TTL Appropriateness

# If TTL is too short, cache misses are high
# If TTL is too long, stale data is served

# Check data freshness requirements
# - Real-time data: TTL 30-60s
# - Semi-static data: TTL 300-600s
# - Static data: TTL 3600s+

Resolution Steps

Low Hit Ratio

Increase Cache Size

cache = Vectra::Cache.new(
  ttl: 300,
  max_size: 5000  # Increase from 1000
)
cached_client = Vectra::CachedClient.new(client, cache: cache)

Adjust TTL

# For high-churn data
cache = Vectra::Cache.new(ttl: 60)  # 1 minute

# For stable data  
cache = Vectra::Cache.new(ttl: 3600)  # 1 hour

Cache Warming

# Pre-populate cache on startup
common_queries = load_common_queries()
common_queries.each do |q|
  cached_client.query(
    index: q[:index],
    vector: q[:vector],
    top_k: q[:top_k]
  )
end

Stale Data

Reduce TTL

cache = Vectra::Cache.new(ttl: 60)  # Reduce from 300

Implement Cache Invalidation

# After upsert, invalidate affected cache
def upsert_with_invalidation(index:, vectors:)
  result = client.upsert(index: index, vectors: vectors)
  cached_client.invalidate_index(index)
  result
end

Use Cache-Aside Pattern

def get_vector(id)
  # Check cache first
  cached = cache.get("vector:#{id}")
  return cached if cached

  # Fetch from source
  vector = client.fetch(index: "main", ids: [id])[id]
  
  # Cache with appropriate TTL
  cache.set("vector:#{id}", vector)
  vector
end

Cache Thrashing

Increase Max Size

# Rule of thumb: max_size = unique_queries_per_ttl * 1.5
# Example: 1000 unique queries per 5 min, max_size = 1500
cache = Vectra::Cache.new(
  ttl: 300,
  max_size: 1500
)

Implement Tiered Caching

# Hot cache: Small, short TTL
hot_cache = Vectra::Cache.new(ttl: 60, max_size: 100)

# Warm cache: Large, longer TTL
warm_cache = Vectra::Cache.new(ttl: 600, max_size: 5000)

# Check hot first, then warm
def cached_query(...)
  hot_cache.fetch(key) do
    warm_cache.fetch(key) do
      client.query(...)
    end
  end
end

Memory Issues

Monitor Memory Usage

# Estimate cache memory usage
# Approximate: 1KB per cached query result
estimated_mb = cache.stats[:size] * 1.0 / 1000
puts "Estimated cache memory: #{estimated_mb} MB"

Implement LRU Eviction

# Vectra::Cache already implements LRU
# If memory is still an issue, reduce max_size
cache = Vectra::Cache.new(max_size: 500)

Prevention

1. Right-size Cache

# Calculate based on query patterns
unique_queries_per_minute = 100
ttl_minutes = 5
buffer = 1.5

max_size = unique_queries_per_minute * ttl_minutes * buffer
# = 100 * 5 * 1.5 = 750

2. Monitor Cache Metrics

# Alert on low hit ratio
sum(rate(vectra_cache_hits_total[5m])) /
(sum(rate(vectra_cache_hits_total[5m])) + 
 sum(rate(vectra_cache_misses_total[5m]))) < 0.5

3. Implement Cache Warm-up

# In application boot
Rails.application.config.after_initialize do
  VectraCacheWarmer.perform_async
end

4. Use Cache Namespacing

# Separate caches for different use cases
search_cache = Vectra::Cache.new(ttl: 60)   # Fast invalidation
embed_cache = Vectra::Cache.new(ttl: 3600)  # Long-lived embeddings

Escalation

Time Action
10 min Adjust TTL/max_size
30 min Implement cache warming
1 hour Review access patterns
2 hours Consider Redis/Memcached