Skip to main content

Saga Troubleshooting Guide

Common issues, solutions, and debugging techniques for Saga service discovery. This guide helps you diagnose and resolve problems quickly.

Quick Diagnosis

If you're experiencing issues, start with the health check:

curl http://localhost:8030/api/v1/health

Check the redis field - if it shows "disconnected", Redis connectivity is the issue.

Redis Connection Issues

Problem: Cannot connect to Redis

Symptoms:

  • Health check shows "redis": "disconnected"
  • Service registration fails with Redis client not available
  • Error logs show connection errors

Step 1: Verify Redis is running

redis-cli ping
# Expected: PONG

Step 2: Check Redis URL

echo $REDIS_URL
# Expected: redis://localhost:6379

Step 3: Test Redis connection

redis-cli -u redis://localhost:6379 ping

Solution 1: Start Redis

# Using Docker
docker run -d -p 6379:6379 redis:7-alpine

# Using Docker Compose
docker compose -f infra/compose.yml up -d redis

# Using local installation
redis-server

Solution 2: Fix Redis URL

# Set correct Redis URL
export REDIS_URL=redis://localhost:6379

# Or in config file
echo 'redis_url = "redis://localhost:6379"' > ~/.saga/config.toml

Solution 3: Check network connectivity

# Test TCP connection
telnet localhost 6379

# Or using nc
nc -zv localhost 6379

Solution 4: Verify Redis authentication

# If Redis requires password
redis-cli -a password ping

# Update REDIS_URL with password
export REDIS_URL=redis://:password@localhost:6379

Enable debug logging:

RUST_LOG=saga=debug cargo run

Look for these log messages:

  • Connecting to Redis... - Connection attempt
  • Connected to Redis successfully - Success
  • Redis connection failed - Failure
  • Redis client not available - Client initialization failed

Check Redis logs:

# Docker
docker logs redis

# Systemd
journalctl -u redis -n 50

Problem: Redis connection timeout

Symptoms:

  • Connection hangs indefinitely
  • Timeout errors in logs
  • Health check never completes

Check Redis configuration:

# Check if Redis is accepting connections
redis-cli CONFIG GET bind
# Should show: 0.0.0.0 or 127.0.0.1

# Check max connections
redis-cli CONFIG GET maxclients

Solution 1: Check firewall rules

# Allow Redis port
sudo ufw allow 6379

# Or check iptables
sudo iptables -L -n | grep 6379

Solution 2: Verify Redis is listening

# Check listening ports
netstat -tlnp | grep 6379
# or
ss -tlnp | grep 6379

# Should show Redis listening on port 6379

Solution 3: Check Redis max connections

# Increase max connections if needed
redis-cli CONFIG SET maxclients 10000

# Or in redis.conf
maxclients 10000

Solution 4: Check network latency

# Test latency to Redis
redis-cli --latency

# Should show < 1ms for localhost

Service Registration Issues

Problem: Service registration fails

Symptoms:

  • 400 Bad Request response
  • Error: Service name cannot be empty
  • Error: Service name cannot contain ':' or spaces

Check request format:

# Test registration
curl -X POST http://localhost:8030/api/v1/services/register \
-H "Content-Type: application/json" \
-d '{
"service_name": "my-service",
"service_url": "http://localhost:8000",
"capabilities": ["rest"]
}'

Common errors:

  • Empty service name
  • Service name contains : or spaces
  • Invalid service URL format
  • Missing required fields

Solution 1: Validate service name

# ✅ Valid names
"authentication"
"payment-service"
"api-gateway"

# ❌ Invalid names
"my service" # Contains space
"my:service" # Contains colon
"" # Empty

Solution 2: Check request format

{
"service_name": "my-service", // Required, no spaces/colons
"service_url": "http://localhost:8000", // Required, valid URL
"capabilities": ["rest"] // Optional, array of strings
}

Solution 3: Verify service URL format

# ✅ Valid URLs
"http://localhost:8000"
"https://api.example.com"
"http://192.168.1.100:8080"

# ❌ Invalid URLs
"localhost:8000" # Missing protocol
"http://" # Incomplete
"not-a-url" # Invalid format

Enable request logging:

RUST_LOG=saga=debug cargo run

Look for validation errors:

  • Service name cannot be empty
  • Service name cannot contain ':'
  • Invalid service URL format

Test with curl:

# Verbose output
curl -v -X POST http://localhost:8030/api/v1/services/register \
-H "Content-Type: application/json" \
-d '{"service_name":"test","service_url":"http://localhost:8000"}'

Problem: Service registration expires quickly

Symptoms:

  • Services disappear after 60 seconds
  • Need to re-register frequently
  • Services not discoverable after short time

Check TTL configuration:

# Check current TTL
echo $REGISTRATION_TTL
# Default: 60 seconds

# Check heartbeat interval
echo $HEARTBEAT_INTERVAL
# Default: 30 seconds

Verify service is sending heartbeats:

# Check service logs for heartbeat messages
# Should see heartbeats every 30 seconds

Solution 1: Implement heartbeat

# Send heartbeat manually
curl -X POST http://localhost:8030/api/v1/services/my-service/heartbeat

# Or implement in your service (see Integration Examples)

Solution 2: Increase TTL

# Set longer TTL
export REGISTRATION_TTL=120 # 2 minutes

# Restart Saga
cargo run

Solution 3: Reduce heartbeat interval

# Send heartbeats more frequently
export HEARTBEAT_INTERVAL=20 # Every 20 seconds

# Restart Saga
cargo run

Solution 4: Implement automatic heartbeat

// In your service code
tokio::spawn(async move {
let mut interval = tokio::time::interval(Duration::from_secs(30));
loop {
interval.tick().await;
saga_client.refresh_registration("my-service").await?;
}
});

Service Discovery Issues

Problem: Service not found

Symptoms:

  • 404 Not Found when querying service
  • Service was registered but not discoverable
  • Service appears in list but not individual query

Step 1: Verify service is registered

# List all services
curl http://localhost:8030/api/v1/services

# Check if your service appears in the list

Step 2: Check service name spelling

# Service names are case-sensitive
curl http://localhost:8030/api/v1/services/My-Service # Wrong
curl http://localhost:8030/api/v1/services/my-service # Correct

Step 3: Check TTL expiration

# Query Redis directly
redis-cli GET "service:my-service"

# If null, service has expired

Solution 1: Re-register service

curl -X POST http://localhost:8030/api/v1/services/register \
-H "Content-Type: application/json" \
-d '{
"service_name": "my-service",
"service_url": "http://localhost:8000",
"capabilities": ["rest"]
}'

Solution 2: Send heartbeat

# Refresh registration TTL
curl -X POST http://localhost:8030/api/v1/services/my-service/heartbeat

Solution 3: Check Redis directly

# List all service keys
redis-cli KEYS "service:*"

# Get specific service
redis-cli GET "service:my-service"

# Check TTL
redis-cli TTL "service:my-service"

Solution 4: Verify cache refresh

# Check cache statistics
curl http://localhost:8030/api/v1/health | jq .cache

# Wait for cache refresh (30 seconds)
# Then try discovery again

Problem: Stale service information

Symptoms:

  • Service URL changed but discovery returns old URL
  • Cache shows outdated information
  • Service metadata not updating

Check cache refresh:

# Check last cache refresh time
curl http://localhost:8030/api/v1/health | jq .cache.last_refresh

# Compare with current time
date -u +"%Y-%m-%dT%H:%M:%SZ"

Check Redis directly:

# Get service from Redis (source of truth)
redis-cli GET "service:my-service"

# Compare with cache response
curl http://localhost:8030/api/v1/services/my-service

Solution 1: Wait for cache refresh

# Cache refreshes every 30 seconds
# Wait and try again
sleep 35
curl http://localhost:8030/api/v1/services/my-service

Solution 2: Re-register service

# Re-register with new URL
curl -X POST http://localhost:8030/api/v1/services/register \
-H "Content-Type: application/json" \
-d '{
"service_name": "my-service",
"service_url": "http://new-url:8000",
"capabilities": ["rest"]
}'

Solution 3: Restart Saga

# Restart clears cache
docker restart saga
# or
sudo systemctl restart saga

Solution 4: Reduce cache refresh interval

# Not currently configurable, but can be modified in code
# Default: 30 seconds

Performance Issues

Problem: Slow service discovery

Symptoms:

  • High latency on discovery requests (>100ms)
  • Slow API responses
  • Timeout errors

Check Redis performance:

# Test Redis latency
redis-cli --latency

# Should show < 1ms for localhost
# Higher values indicate network issues

Check cache performance:

# Get cache statistics
curl http://localhost:8030/api/v1/health | jq .cache

# Look for:
# - hit_ratio: Should be > 0.8
# - misses: High misses = slow performance

Check network latency:

# Test network to Redis
ping redis-host

# Or test connection
time redis-cli ping

Solution 1: Optimize Redis

# Use local Redis for development
export REDIS_URL=redis://localhost:6379

# For production, use Redis on same network
export REDIS_URL=redis://redis-cluster:6379

Solution 2: Improve cache hit ratio

# Check current hit ratio
curl http://localhost:8030/api/v1/health | jq .cache.hit_ratio

# If < 0.8, consider:
# - Reducing cache refresh interval
# - Increasing cache size
# - Optimizing service discovery patterns

Solution 3: Monitor performance

# Enable performance logging
RUST_LOG=saga=debug cargo run

# Look for:
# - Request duration
# - Redis query times
# - Cache hit/miss rates

Solution 4: Scale Redis

# Use Redis Cluster for better performance
export REDIS_URL=redis-cluster://node1:6379,node2:6379,node3:6379

# Or use Redis Sentinel for HA
export REDIS_URL=redis-sentinel://sentinel1:26379/mymaster

Problem: High memory usage

Symptoms:

  • Saga process using excessive memory (>500MB)
  • Redis memory growing
  • System running out of memory

Check cache size:

# Get cache size
curl http://localhost:8030/api/v1/health | jq .cache.size

# Large cache = more memory usage

Check Redis memory:

# Get Redis memory info
redis-cli INFO memory

# Look for:
# - used_memory: Current memory usage
# - used_memory_peak: Peak memory usage

Check service registrations:

# Count registered services
redis-cli KEYS "service:*" | wc -l

# More services = more memory

Solution 1: Clean up unused services

# List all services
curl http://localhost:8030/api/v1/services | jq .services[].service_name

# Unregister unused services
curl -X DELETE http://localhost:8030/api/v1/services/unused-service

Solution 2: Reduce TTL

# Shorter TTL = faster expiration
export REGISTRATION_TTL=30 # 30 seconds instead of 60

# Services expire faster, freeing memory

Solution 3: Monitor Redis memory

# Set Redis max memory
redis-cli CONFIG SET maxmemory 256mb
redis-cli CONFIG SET maxmemory-policy allkeys-lru

# Redis will evict least recently used keys

Solution 4: Restart services

# Restart Saga to clear cache
docker restart saga

# Restart Redis to clear memory
docker restart redis

Configuration Issues

Problem: Configuration not loading

Symptoms:

  • Default values used instead of config
  • Environment variables ignored
  • Config file not read

Check config file location:

# Default location
ls -la ~/.saga/config.toml

# Should exist and be readable

Verify TOML syntax:

# Validate TOML syntax
toml-cli validate ~/.saga/config.toml

# Or use online validator

Check environment variables:

# List all Saga-related env vars
env | grep -E "(REDIS_URL|PORT|HOST|SAGA)"

# Should show your configuration

Solution 1: Fix config file location

# Create config directory
mkdir -p ~/.saga

# Create config file
cat > ~/.saga/config.toml << EOF
redis_url = "redis://localhost:6379"
host = "0.0.0.0"
port = 8030
EOF

Solution 2: Fix TOML syntax

# ✅ Valid TOML
redis_url = "redis://localhost:6379"
host = "0.0.0.0"
port = 8030

# ❌ Invalid TOML
redis_url: "redis://localhost:6379" # Wrong syntax
host = 0.0.0.0 # Missing quotes

Solution 3: Use environment variables

# Environment variables override config file
export REDIS_URL=redis://localhost:6379
export PORT=8030
export HOST=0.0.0.0

# Restart Saga
cargo run

Solution 4: Enable verbose logging

# See what configuration is loaded
RUST_LOG=saga=debug cargo run

# Look for:
# - Configuration loading messages
# - Environment variable usage
# - Config file parsing

Problem: Port already in use

Symptoms:

  • Address already in use error
  • Cannot bind to port 8030
  • Port conflict errors

Find process using port:

# Linux/macOS
lsof -i :8030

# Or using netstat
netstat -tlnp | grep 8030

# Or using ss
ss -tlnp | grep 8030

Check if Saga is already running:

# Check processes
ps aux | grep saga

# Check Docker containers
docker ps | grep saga

Solution 1: Kill existing process

# Find PID
lsof -i :8030

# Kill process
kill -9 <PID>

# Or kill by name
pkill saga

Solution 2: Use different port

# Set different port
export PORT=8031

# Restart Saga
cargo run

Solution 3: Stop Docker container

# Stop Saga container
docker stop saga

# Or remove container
docker rm -f saga

Solution 4: Check systemd service

# Stop systemd service
sudo systemctl stop saga

# Or disable service
sudo systemctl disable saga

Logging and Debugging

Enable Debug Logging

Available levels:

# Error only
RUST_LOG=saga=error cargo run

# Warnings and errors
RUST_LOG=saga=warn cargo run

# Info (default)
RUST_LOG=saga=info cargo run

# Debug information
RUST_LOG=saga=debug cargo run

# Very verbose
RUST_LOG=saga=trace cargo run

Docker:

# Follow logs
docker logs saga -f

# Last 100 lines
docker logs saga --tail 100

# Since timestamp
docker logs saga --since 10m

Systemd:

# Follow logs
journalctl -u saga -f

# Last 100 lines
journalctl -u saga -n 100

# Since timestamp
journalctl -u saga --since "10 minutes ago"

Direct:

# Save to file
cargo run 2>&1 | tee saga.log

# Or redirect
cargo run > saga.log 2>&1

Common Log Messages

Redis Errors:

  • Redis client not available - Redis connection failed
  • Failed to connect to Redis - Connection error
  • Redis operation failed - Redis query error

Service Errors:

  • Service 'X' not found - Service not registered
  • Failed to register service - Registration error
  • Service name cannot be empty - Validation error

Cache Errors:

  • Cache refresh failed - Cache update error
  • Failed to update cache - Cache write error

Startup:

  • Starting Saga service... - Service starting
  • Connected to Redis successfully - Redis connected
  • Listening on 0.0.0.0:8030 - Server listening

Operations:

  • Registered service: X - Service registered
  • Discovered service: X - Service discovered
  • Cache refreshed - Cache updated

Health Check Failures

Problem: Health check returns unhealthy

Symptoms:

  • Health endpoint returns 503 Service Unavailable
  • "redis": "disconnected" in response
  • Health checks failing in orchestration

Check health endpoint:

curl http://localhost:8030/api/v1/health

# Look for:
# - status: "healthy" or "unhealthy"
# - redis: "connected" or "disconnected"

Check Redis connection:

# Test Redis directly
redis-cli ping

# Should return: PONG

Solution 1: Fix Redis connection

# Check Redis URL
echo $REDIS_URL

# Test connection
redis-cli -u $REDIS_URL ping

# Fix URL if needed
export REDIS_URL=redis://localhost:6379

Solution 2: Restart Saga

# Docker
docker restart saga

# Systemd
sudo systemctl restart saga

# Direct
# Stop and restart cargo run

Solution 3: Check Redis health

# Check Redis is running
docker ps | grep redis

# Check Redis logs
docker logs redis

# Restart Redis if needed
docker restart redis

Getting Help

If you encounter issues not covered here:

  1. Check health endpoint

    curl http://localhost:8030/api/v1/health
  2. Enable debug logging

    RUST_LOG=saga=debug cargo run
  3. Verify Redis connectivity

    redis-cli ping
  4. Test API endpoints

    curl http://localhost:8030/api/v1/services
  5. Review configuration

    env | grep -E "(REDIS_URL|PORT|HOST)"
Still Stuck?

If you've tried all troubleshooting steps and still have issues:

  1. Check the logs with RUST_LOG=saga=debug
  2. Verify Redis is working independently
  3. Test API endpoints directly with curl
  4. Review configuration values
  5. Check network connectivity

For additional support, review the documentation links above or check the project repository for issues and discussions.