Saga Troubleshooting Guide
Common issues, solutions, and debugging techniques for Saga service discovery. This guide helps you diagnose and resolve problems quickly.
If you're experiencing issues, start with the health check:
curl http://localhost:8030/api/v1/health
Check the redis field - if it shows "disconnected", Redis connectivity is the issue.
Redis Connection Issues
Problem: Cannot connect to Redis
Symptoms:
- Health check shows
"redis": "disconnected" - Service registration fails with
Redis client not available - Error logs show connection errors
Step 1: Verify Redis is running
redis-cli ping
# Expected: PONG
Step 2: Check Redis URL
echo $REDIS_URL
# Expected: redis://localhost:6379
Step 3: Test Redis connection
redis-cli -u redis://localhost:6379 ping
Solution 1: Start Redis
# Using Docker
docker run -d -p 6379:6379 redis:7-alpine
# Using Docker Compose
docker compose -f infra/compose.yml up -d redis
# Using local installation
redis-server
Solution 2: Fix Redis URL
# Set correct Redis URL
export REDIS_URL=redis://localhost:6379
# Or in config file
echo 'redis_url = "redis://localhost:6379"' > ~/.saga/config.toml
Solution 3: Check network connectivity
# Test TCP connection
telnet localhost 6379
# Or using nc
nc -zv localhost 6379
Solution 4: Verify Redis authentication
# If Redis requires password
redis-cli -a password ping
# Update REDIS_URL with password
export REDIS_URL=redis://:password@localhost:6379
Enable debug logging:
RUST_LOG=saga=debug cargo run
Look for these log messages:
Connecting to Redis...- Connection attemptConnected to Redis successfully- SuccessRedis connection failed- FailureRedis client not available- Client initialization failed
Check Redis logs:
# Docker
docker logs redis
# Systemd
journalctl -u redis -n 50
Problem: Redis connection timeout
Symptoms:
- Connection hangs indefinitely
- Timeout errors in logs
- Health check never completes
Check Redis configuration:
# Check if Redis is accepting connections
redis-cli CONFIG GET bind
# Should show: 0.0.0.0 or 127.0.0.1
# Check max connections
redis-cli CONFIG GET maxclients
Solution 1: Check firewall rules
# Allow Redis port
sudo ufw allow 6379
# Or check iptables
sudo iptables -L -n | grep 6379
Solution 2: Verify Redis is listening
# Check listening ports
netstat -tlnp | grep 6379
# or
ss -tlnp | grep 6379
# Should show Redis listening on port 6379
Solution 3: Check Redis max connections
# Increase max connections if needed
redis-cli CONFIG SET maxclients 10000
# Or in redis.conf
maxclients 10000
Solution 4: Check network latency
# Test latency to Redis
redis-cli --latency
# Should show < 1ms for localhost
Service Registration Issues
Problem: Service registration fails
Symptoms:
400 Bad Requestresponse- Error:
Service name cannot be empty - Error:
Service name cannot contain ':' or spaces
Check request format:
# Test registration
curl -X POST http://localhost:8030/api/v1/services/register \
-H "Content-Type: application/json" \
-d '{
"service_name": "my-service",
"service_url": "http://localhost:8000",
"capabilities": ["rest"]
}'
Common errors:
- Empty service name
- Service name contains
:or spaces - Invalid service URL format
- Missing required fields
Solution 1: Validate service name
# ✅ Valid names
"authentication"
"payment-service"
"api-gateway"
# ❌ Invalid names
"my service" # Contains space
"my:service" # Contains colon
"" # Empty
Solution 2: Check request format
{
"service_name": "my-service", // Required, no spaces/colons
"service_url": "http://localhost:8000", // Required, valid URL
"capabilities": ["rest"] // Optional, array of strings
}
Solution 3: Verify service URL format
# ✅ Valid URLs
"http://localhost:8000"
"https://api.example.com"
"http://192.168.1.100:8080"
# ❌ Invalid URLs
"localhost:8000" # Missing protocol
"http://" # Incomplete
"not-a-url" # Invalid format
Enable request logging:
RUST_LOG=saga=debug cargo run
Look for validation errors:
Service name cannot be emptyService name cannot contain ':'Invalid service URL format
Test with curl:
# Verbose output
curl -v -X POST http://localhost:8030/api/v1/services/register \
-H "Content-Type: application/json" \
-d '{"service_name":"test","service_url":"http://localhost:8000"}'
Problem: Service registration expires quickly
Symptoms:
- Services disappear after 60 seconds
- Need to re-register frequently
- Services not discoverable after short time
Check TTL configuration:
# Check current TTL
echo $REGISTRATION_TTL
# Default: 60 seconds
# Check heartbeat interval
echo $HEARTBEAT_INTERVAL
# Default: 30 seconds
Verify service is sending heartbeats:
# Check service logs for heartbeat messages
# Should see heartbeats every 30 seconds
Solution 1: Implement heartbeat
# Send heartbeat manually
curl -X POST http://localhost:8030/api/v1/services/my-service/heartbeat
# Or implement in your service (see Integration Examples)
Solution 2: Increase TTL
# Set longer TTL
export REGISTRATION_TTL=120 # 2 minutes
# Restart Saga
cargo run
Solution 3: Reduce heartbeat interval
# Send heartbeats more frequently
export HEARTBEAT_INTERVAL=20 # Every 20 seconds
# Restart Saga
cargo run
Solution 4: Implement automatic heartbeat
// In your service code
tokio::spawn(async move {
let mut interval = tokio::time::interval(Duration::from_secs(30));
loop {
interval.tick().await;
saga_client.refresh_registration("my-service").await?;
}
});
Service Discovery Issues
Problem: Service not found
Symptoms:
404 Not Foundwhen querying service- Service was registered but not discoverable
- Service appears in list but not individual query
Step 1: Verify service is registered
# List all services
curl http://localhost:8030/api/v1/services
# Check if your service appears in the list
Step 2: Check service name spelling
# Service names are case-sensitive
curl http://localhost:8030/api/v1/services/My-Service # Wrong
curl http://localhost:8030/api/v1/services/my-service # Correct
Step 3: Check TTL expiration
# Query Redis directly
redis-cli GET "service:my-service"
# If null, service has expired
Solution 1: Re-register service
curl -X POST http://localhost:8030/api/v1/services/register \
-H "Content-Type: application/json" \
-d '{
"service_name": "my-service",
"service_url": "http://localhost:8000",
"capabilities": ["rest"]
}'
Solution 2: Send heartbeat
# Refresh registration TTL
curl -X POST http://localhost:8030/api/v1/services/my-service/heartbeat
Solution 3: Check Redis directly
# List all service keys
redis-cli KEYS "service:*"
# Get specific service
redis-cli GET "service:my-service"
# Check TTL
redis-cli TTL "service:my-service"
Solution 4: Verify cache refresh
# Check cache statistics
curl http://localhost:8030/api/v1/health | jq .cache
# Wait for cache refresh (30 seconds)
# Then try discovery again
Problem: Stale service information
Symptoms:
- Service URL changed but discovery returns old URL
- Cache shows outdated information
- Service metadata not updating
Check cache refresh:
# Check last cache refresh time
curl http://localhost:8030/api/v1/health | jq .cache.last_refresh
# Compare with current time
date -u +"%Y-%m-%dT%H:%M:%SZ"
Check Redis directly:
# Get service from Redis (source of truth)
redis-cli GET "service:my-service"
# Compare with cache response
curl http://localhost:8030/api/v1/services/my-service
Solution 1: Wait for cache refresh
# Cache refreshes every 30 seconds
# Wait and try again
sleep 35
curl http://localhost:8030/api/v1/services/my-service
Solution 2: Re-register service
# Re-register with new URL
curl -X POST http://localhost:8030/api/v1/services/register \
-H "Content-Type: application/json" \
-d '{
"service_name": "my-service",
"service_url": "http://new-url:8000",
"capabilities": ["rest"]
}'
Solution 3: Restart Saga
# Restart clears cache
docker restart saga
# or
sudo systemctl restart saga
Solution 4: Reduce cache refresh interval
# Not currently configurable, but can be modified in code
# Default: 30 seconds
Performance Issues
Problem: Slow service discovery
Symptoms:
- High latency on discovery requests (>100ms)
- Slow API responses
- Timeout errors
Check Redis performance:
# Test Redis latency
redis-cli --latency
# Should show < 1ms for localhost
# Higher values indicate network issues
Check cache performance:
# Get cache statistics
curl http://localhost:8030/api/v1/health | jq .cache
# Look for:
# - hit_ratio: Should be > 0.8
# - misses: High misses = slow performance
Check network latency:
# Test network to Redis
ping redis-host
# Or test connection
time redis-cli ping
Solution 1: Optimize Redis
# Use local Redis for development
export REDIS_URL=redis://localhost:6379
# For production, use Redis on same network
export REDIS_URL=redis://redis-cluster:6379
Solution 2: Improve cache hit ratio
# Check current hit ratio
curl http://localhost:8030/api/v1/health | jq .cache.hit_ratio
# If < 0.8, consider:
# - Reducing cache refresh interval
# - Increasing cache size
# - Optimizing service discovery patterns
Solution 3: Monitor performance
# Enable performance logging
RUST_LOG=saga=debug cargo run
# Look for:
# - Request duration
# - Redis query times
# - Cache hit/miss rates
Solution 4: Scale Redis
# Use Redis Cluster for better performance
export REDIS_URL=redis-cluster://node1:6379,node2:6379,node3:6379
# Or use Redis Sentinel for HA
export REDIS_URL=redis-sentinel://sentinel1:26379/mymaster
Problem: High memory usage
Symptoms:
- Saga process using excessive memory (>500MB)
- Redis memory growing
- System running out of memory
Check cache size:
# Get cache size
curl http://localhost:8030/api/v1/health | jq .cache.size
# Large cache = more memory usage
Check Redis memory:
# Get Redis memory info
redis-cli INFO memory
# Look for:
# - used_memory: Current memory usage
# - used_memory_peak: Peak memory usage
Check service registrations:
# Count registered services
redis-cli KEYS "service:*" | wc -l
# More services = more memory
Solution 1: Clean up unused services
# List all services
curl http://localhost:8030/api/v1/services | jq .services[].service_name
# Unregister unused services
curl -X DELETE http://localhost:8030/api/v1/services/unused-service
Solution 2: Reduce TTL
# Shorter TTL = faster expiration
export REGISTRATION_TTL=30 # 30 seconds instead of 60
# Services expire faster, freeing memory
Solution 3: Monitor Redis memory
# Set Redis max memory
redis-cli CONFIG SET maxmemory 256mb
redis-cli CONFIG SET maxmemory-policy allkeys-lru
# Redis will evict least recently used keys
Solution 4: Restart services
# Restart Saga to clear cache
docker restart saga
# Restart Redis to clear memory
docker restart redis
Configuration Issues
Problem: Configuration not loading
Symptoms:
- Default values used instead of config
- Environment variables ignored
- Config file not read
Check config file location:
# Default location
ls -la ~/.saga/config.toml
# Should exist and be readable
Verify TOML syntax:
# Validate TOML syntax
toml-cli validate ~/.saga/config.toml
# Or use online validator
Check environment variables:
# List all Saga-related env vars
env | grep -E "(REDIS_URL|PORT|HOST|SAGA)"
# Should show your configuration
Solution 1: Fix config file location
# Create config directory
mkdir -p ~/.saga
# Create config file
cat > ~/.saga/config.toml << EOF
redis_url = "redis://localhost:6379"
host = "0.0.0.0"
port = 8030
EOF
Solution 2: Fix TOML syntax
# ✅ Valid TOML
redis_url = "redis://localhost:6379"
host = "0.0.0.0"
port = 8030
# ❌ Invalid TOML
redis_url: "redis://localhost:6379" # Wrong syntax
host = 0.0.0.0 # Missing quotes
Solution 3: Use environment variables
# Environment variables override config file
export REDIS_URL=redis://localhost:6379
export PORT=8030
export HOST=0.0.0.0
# Restart Saga
cargo run
Solution 4: Enable verbose logging
# See what configuration is loaded
RUST_LOG=saga=debug cargo run
# Look for:
# - Configuration loading messages
# - Environment variable usage
# - Config file parsing
Problem: Port already in use
Symptoms:
Address already in useerror- Cannot bind to port 8030
- Port conflict errors
Find process using port:
# Linux/macOS
lsof -i :8030
# Or using netstat
netstat -tlnp | grep 8030
# Or using ss
ss -tlnp | grep 8030
Check if Saga is already running:
# Check processes
ps aux | grep saga
# Check Docker containers
docker ps | grep saga
Solution 1: Kill existing process
# Find PID
lsof -i :8030
# Kill process
kill -9 <PID>
# Or kill by name
pkill saga
Solution 2: Use different port
# Set different port
export PORT=8031
# Restart Saga
cargo run
Solution 3: Stop Docker container
# Stop Saga container
docker stop saga
# Or remove container
docker rm -f saga
Solution 4: Check systemd service
# Stop systemd service
sudo systemctl stop saga
# Or disable service
sudo systemctl disable saga
Logging and Debugging
Enable Debug Logging
Available levels:
# Error only
RUST_LOG=saga=error cargo run
# Warnings and errors
RUST_LOG=saga=warn cargo run
# Info (default)
RUST_LOG=saga=info cargo run
# Debug information
RUST_LOG=saga=debug cargo run
# Very verbose
RUST_LOG=saga=trace cargo run
Docker:
# Follow logs
docker logs saga -f
# Last 100 lines
docker logs saga --tail 100
# Since timestamp
docker logs saga --since 10m
Systemd:
# Follow logs
journalctl -u saga -f
# Last 100 lines
journalctl -u saga -n 100
# Since timestamp
journalctl -u saga --since "10 minutes ago"
Direct:
# Save to file
cargo run 2>&1 | tee saga.log
# Or redirect
cargo run > saga.log 2>&1
Common Log Messages
Redis Errors:
Redis client not available- Redis connection failedFailed to connect to Redis- Connection errorRedis operation failed- Redis query error
Service Errors:
Service 'X' not found- Service not registeredFailed to register service- Registration errorService name cannot be empty- Validation error
Cache Errors:
Cache refresh failed- Cache update errorFailed to update cache- Cache write error
Startup:
Starting Saga service...- Service startingConnected to Redis successfully- Redis connectedListening on 0.0.0.0:8030- Server listening
Operations:
Registered service: X- Service registeredDiscovered service: X- Service discoveredCache refreshed- Cache updated
Health Check Failures
Problem: Health check returns unhealthy
Symptoms:
- Health endpoint returns
503 Service Unavailable "redis": "disconnected"in response- Health checks failing in orchestration
Check health endpoint:
curl http://localhost:8030/api/v1/health
# Look for:
# - status: "healthy" or "unhealthy"
# - redis: "connected" or "disconnected"
Check Redis connection:
# Test Redis directly
redis-cli ping
# Should return: PONG
Solution 1: Fix Redis connection
# Check Redis URL
echo $REDIS_URL
# Test connection
redis-cli -u $REDIS_URL ping
# Fix URL if needed
export REDIS_URL=redis://localhost:6379
Solution 2: Restart Saga
# Docker
docker restart saga
# Systemd
sudo systemctl restart saga
# Direct
# Stop and restart cargo run
Solution 3: Check Redis health
# Check Redis is running
docker ps | grep redis
# Check Redis logs
docker logs redis
# Restart Redis if needed
docker restart redis
Getting Help
If you encounter issues not covered here:
-
Check health endpoint
curl http://localhost:8030/api/v1/health -
Enable debug logging
RUST_LOG=saga=debug cargo run -
Verify Redis connectivity
redis-cli ping -
Test API endpoints
curl http://localhost:8030/api/v1/services -
Review configuration
env | grep -E "(REDIS_URL|PORT|HOST)"
- API Reference - Complete endpoint documentation
- Configuration - Configuration options
- Integration Examples - Code examples
- Architecture - System design details
If you've tried all troubleshooting steps and still have issues:
- Check the logs with
RUST_LOG=saga=debug - Verify Redis is working independently
- Test API endpoints directly with
curl - Review configuration values
- Check network connectivity
For additional support, review the documentation links above or check the project repository for issues and discussions.