Runbook: Cache Issues

Alert: VectraLowCacheHitRatio
Severity: Warning
Threshold: Cache hit ratio <50% for 10 minutes

Symptoms

High cache miss rate
Increased database load
Higher latency than expected
Stale data being returned

Quick Diagnosis

cache = Vectra::Cache.new
stats = cache.stats

puts "Size: #{stats[:size]} / #{stats[:max_size]}"
puts "TTL: #{stats[:ttl]} seconds"
puts "Keys: #{stats[:keys].count}"

# Prometheus: Check hit ratio
sum(vectra_cache_hits_total) / 
(sum(vectra_cache_hits_total) + sum(vectra_cache_misses_total))

Investigation Steps

1. Check Cache Configuration

# Current config
puts Vectra.configuration.cache_enabled    # Should be true
puts Vectra.configuration.cache_ttl        # Default: 300
puts Vectra.configuration.cache_max_size   # Default: 1000

2. Analyze Access Patterns

# Check what's being cached
cache.stats[:keys].each do |key|
  parts = key.split(":")
  puts "Index: #{parts[0]}, Type: #{parts[1]}"
end

# Count by type
keys = cache.stats[:keys]
queries = keys.count { |k| k.include?(":q:") }
fetches = keys.count { |k| k.include?(":f:") }
puts "Query cache entries: #{queries}"
puts "Fetch cache entries: #{fetches}"

3. Check for Cache Thrashing

# If max_size is too small, cache thrashes
# Sign: entries being evicted immediately after creation
# Solution: Increase max_size

stats = cache.stats
if stats[:size] >= stats[:max_size] * 0.9
  puts "WARNING: Cache near capacity - consider increasing max_size"
end

4. Check TTL Appropriateness

# If TTL is too short, cache misses are high
# If TTL is too long, stale data is served

# Check data freshness requirements
# - Real-time data: TTL 30-60s
# - Semi-static data: TTL 300-600s
# - Static data: TTL 3600s+

Resolution Steps

Low Hit Ratio

Increase Cache Size

cache = Vectra::Cache.new(
  ttl: 300,
  max_size: 5000  # Increase from 1000
)
cached_client = Vectra::CachedClient.new(client, cache: cache)

Adjust TTL

# For high-churn data
cache = Vectra::Cache.new(ttl: 60)  # 1 minute

# For stable data  
cache = Vectra::Cache.new(ttl: 3600)  # 1 hour

Cache Warming

# Pre-populate cache on startup
common_queries = load_common_queries()
common_queries.each do |q|
  cached_client.query(
    index: q[:index],
    vector: q[:vector],
    top_k: q[:top_k]
  )
end

Stale Data

Reduce TTL

cache = Vectra::Cache.new(ttl: 60)  # Reduce from 300

Implement Cache Invalidation

# After upsert, invalidate affected cache
def upsert_with_invalidation(index:, vectors:)
  result = client.upsert(index: index, vectors: vectors)
  cached_client.invalidate_index(index)
  result
end

Use Cache-Aside Pattern

def get_vector(id)
  # Check cache first
  cached = cache.get("vector:#{id}")
  return cached if cached

  # Fetch from source
  vector = client.fetch(index: "main", ids: [id])[id]
  
  # Cache with appropriate TTL
  cache.set("vector:#{id}", vector)
  vector
end

Cache Thrashing

Increase Max Size

# Rule of thumb: max_size = unique_queries_per_ttl * 1.5
# Example: 1000 unique queries per 5 min, max_size = 1500
cache = Vectra::Cache.new(
  ttl: 300,
  max_size: 1500
)

Implement Tiered Caching

# Hot cache: Small, short TTL
hot_cache = Vectra::Cache.new(ttl: 60, max_size: 100)

# Warm cache: Large, longer TTL
warm_cache = Vectra::Cache.new(ttl: 600, max_size: 5000)

# Check hot first, then warm
def cached_query(...)
  hot_cache.fetch(key) do
    warm_cache.fetch(key) do
      client.query(...)
    end
  end
end

Memory Issues

Monitor Memory Usage

# Estimate cache memory usage
# Approximate: 1KB per cached query result
estimated_mb = cache.stats[:size] * 1.0 / 1000
puts "Estimated cache memory: #{estimated_mb} MB"

Implement LRU Eviction

# Vectra::Cache already implements LRU
# If memory is still an issue, reduce max_size
cache = Vectra::Cache.new(max_size: 500)

Prevention

1. Right-size Cache

# Calculate based on query patterns
unique_queries_per_minute = 100
ttl_minutes = 5
buffer = 1.5

max_size = unique_queries_per_minute * ttl_minutes * buffer
# = 100 * 5 * 1.5 = 750

2. Monitor Cache Metrics

# Alert on low hit ratio
sum(rate(vectra_cache_hits_total[5m])) /
(sum(rate(vectra_cache_hits_total[5m])) + 
 sum(rate(vectra_cache_misses_total[5m]))) < 0.5

3. Implement Cache Warm-up

# In application boot
Rails.application.config.after_initialize do
  VectraCacheWarmer.perform_async
end

4. Use Cache Namespacing

# Separate caches for different use cases
search_cache = Vectra::Cache.new(ttl: 60)   # Fast invalidation
embed_cache = Vectra::Cache.new(ttl: 3600)  # Long-lived embeddings

Escalation

Time	Action
10 min	Adjust TTL/max_size
30 min	Implement cache warming
1 hour	Review access patterns
2 hours	Consider Redis/Memcached

Runbook: Cache Issues

Symptoms

Quick Diagnosis

Investigation Steps

1. Check Cache Configuration

2. Analyze Access Patterns

3. Check for Cache Thrashing

4. Check TTL Appropriateness

Resolution Steps

Low Hit Ratio

Increase Cache Size

Adjust TTL

Cache Warming

Stale Data

Reduce TTL

Implement Cache Invalidation

Use Cache-Aside Pattern

Cache Thrashing

Increase Max Size

Implement Tiered Caching

Memory Issues

Monitor Memory Usage

Implement LRU Eviction

Prevention

1. Right-size Cache

2. Monitor Cache Metrics

3. Implement Cache Warm-up

4. Use Cache Namespacing

Escalation

Related