5.9 KiB
5.9 KiB
Log Analysis Summary - Infinite Refresh Loop Fix
Problem Identified
Your logs showed a critical infinite refresh loop:
Keycloak session invalidated, clearing token to force re-authentication
Keycloak session invalidated, clearing token to force re-authentication
Keycloak session invalidated, clearing token to force re-authentication
... (repeating infinitely)
Root Cause
- Session Invalidated: User's Keycloak session became invalid (logged out elsewhere, expired, etc.)
- Multiple Widgets: All widgets/components making parallel API requests
- JWT Callback Triggered: Each request triggers NextAuth JWT callback
- Refresh Attempt: Each callback tries to refresh the expired token
- Refresh Fails: Refresh fails because session is invalid
- No Circuit Breaker: Next request sees expired token → tries refresh again → infinite loop
Impact
- Performance: Hundreds of refresh attempts per second
- Server Load: CPU/memory spike
- Keycloak Load: Potential DoS on Keycloak server
- User Experience: App appears broken
- Logs: Spam with error messages
Solution Implemented
Circuit Breaker Pattern
Added a 5-second cooldown after failed refresh attempts:
- Track Failures: Record timestamp when refresh fails
- Cooldown Period: Don't retry refresh for 5 seconds after failure
- Early Return: If in cooldown, return error immediately (no API call)
- Memory Management: Cleanup old entries to prevent memory leaks
Code Changes
File: app/api/auth/options.ts
Added:
refreshCooldownMap to track last failure per userREFRESH_COOLDOWN_MS = 5000(5 seconds)cleanupRefreshCooldown()function to prevent memory leaks- Cooldown check before refresh attempt
- Failure recording after failed refresh
How It Works:
// Before refresh attempt:
if (timeSinceFailure < REFRESH_COOLDOWN_MS) {
// Skip refresh, return error immediately
return errorToken;
}
// After failed refresh:
if (refreshedToken.error === "SessionNotActive") {
refreshCooldown.set(userId, Date.now()); // Record failure
return errorToken;
}
Expected Behavior After Fix
Before Fix
Request 1 → Refresh attempt → Fail → Clear tokens
Request 2 → Refresh attempt → Fail → Clear tokens
Request 3 → Refresh attempt → Fail → Clear tokens
... (infinite loop)
After Fix
Request 1 → Refresh attempt → Fail → Record failure → Clear tokens
Request 2 → Check cooldown → Skip refresh → Return error immediately
Request 3 → Check cooldown → Skip refresh → Return error immediately
... (cooldown prevents refresh attempts)
After 5s → Next request can try refresh again (if session restored)
What You'll See in Logs
Good Signs:
- ✅ "Refresh cooldown active, skipping refresh attempt" (instead of infinite failures)
- ✅ Only 1-2 refresh attempts per user when session invalidates
- ✅ User redirected to sign-in page
- ✅ No refresh storm
Bad Signs (if still happening):
- ❌ Still seeing infinite "Keycloak session invalidated" messages
- ❌ Multiple refresh attempts within 5 seconds
- ❌ Cooldown not working
Testing the Fix
Test Scenario 1: Session Invalidation
- Log in to the app
- Logout from Keycloak admin console (or expire session)
- Expected:
- 1-2 refresh attempts
- Then cooldown messages
- User redirected to sign-in
- NOT infinite loop
Test Scenario 2: Multiple Widgets
- Open app with all widgets loading
- Invalidate session
- Expected:
- All widgets respect cooldown
- No refresh storm
- Clean error handling
Test Scenario 3: Normal Operation
- Valid session
- Token expires naturally
- Expected:
- Refresh succeeds
- No cooldown triggered
- Normal operation continues
Monitoring
Metrics to Watch
- Refresh Attempts: Should be low (1-2 per user per session)
- Cooldown Activations: Should only happen when session invalid
- Refresh Success Rate: Should be high for valid sessions
- Error Rate: Should drop significantly
Log Patterns
Healthy:
[DEBUG] Refresh cooldown active, skipping refresh attempt
[INFO] Keycloak session invalidated, setting cooldown
Unhealthy (if still happening):
Keycloak session invalidated, clearing token... (repeating)
Future Improvements
Short-term (Recommended)
- ✅ Done: In-memory circuit breaker
- ⚠️ Next: Migrate to Redis-based circuit breaker (for multi-instance)
- ⚠️ Next: Add client-side session guard to stop requests
Long-term
- ⚠️ Add metrics/monitoring
- ⚠️ Implement exponential backoff
- ⚠️ Add request cancellation on client-side
- ⚠️ Better error boundaries
Additional Notes
Why 5 Seconds?
- Too Short (< 2s): Still allows refresh storms
- Too Long (> 10s): Delays legitimate refresh attempts
- 5 Seconds: Good balance - prevents storms, allows quick recovery
Memory Considerations
- Map Size: Limited to 1000 entries (auto-cleanup)
- Memory Per Entry: ~50 bytes (userId + timestamp)
- Total Memory: ~50KB max
- Cleanup: Automatic (removes entries older than 50s)
Multi-Instance Deployment
Current: In-memory Map (per-instance)
- Works for single instance
- Each instance has its own cooldown
Future: Redis-based (shared across instances)
- Better for multi-instance
- Shared cooldown state
- See
CRITICAL_ISSUE_ANALYSIS.mdfor Redis implementation
Summary
✅ Fixed: Infinite refresh loop with circuit breaker ✅ Impact: Prevents refresh storms, reduces server load ✅ Testing: Verify with session invalidation scenarios ⚠️ Next: Monitor logs, consider Redis migration for multi-instance
The fix is production-ready and should immediately stop the refresh loop you're seeing in your logs.